File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/j95-4001_metho.xml

Size: 78,085 bytes

Last Modified: 2025-10-06 14:13:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-4001">
  <Title>The Repair of Speech Act Misunderstandings by Abductive Inference</Title>
  <Section position="3" start_page="437" end_page="438" type="metho">
    <SectionTitle>
4 The amount of reasoning is a function of the size of one's plan hierarchy. So, if it is believed that
</SectionTitle>
    <Paragraph position="0"> questionnaires are used to obtain a driver's license, which is needed to drive a car, which is needed to get to California, then this same utterance could even be interpreted as an incomplete attempt to get to California. Thus, the hearer must also assume that he and the speaker share the same plan hierarchy.</Paragraph>
    <Paragraph position="1">  McRoy and Hirst The Repair of Speech Act Misunderstandings discourse context) could accomplish the desired goal. Interpretation and repair attempt to apply this process in reverse, working back from an observed utterance to the underlying goal. Such reasoning is clearly nonmonotonic; here we suggest that it can be characterized quite naturally as abduction. The model is expressed as a logical theory in the Prioritized Theorist framework (Poole, Goebel, and Aleliunas 1987; van Arragon 1990).</Paragraph>
  </Section>
  <Section position="4" start_page="438" end_page="467" type="metho">
    <SectionTitle>
2. The structured intentional approach
</SectionTitle>
    <Paragraph position="0"> We now introduce a model of dialogue that extends both intentional and social accounts of discourse. The model unifies theories of speech act production, interpretation, and the repair of misunderstandings. This unification is achieved by treating production as default reasoning, while using abduction to model interpretation and repair. In addition, the model avoids open-ended inference about goals by using expectations derived from social norms to guide interpretation. As a result, the model provides a constrained, yet principled, account of interpretation; it also links social accounts of expectation with other mental states.</Paragraph>
    <Paragraph position="1"> In this section, we will discuss how the model addresses the following concerns: * The need to control the inference from observed actions to expected replies. Extended inference about goals is usually unnecessary and a waste of resources.</Paragraph>
    <Paragraph position="2"> * The need to account for nonmonotonicity in both the interpretation and production of utterances. This nonmonotonicity takes two forms. First, utterances can make only a part of the speaker's goals explicit to the hearer, so hearers must reason abductively to account for them. Second, expectations are defeasible. At any given moment, speakers may differ in their beliefs about the dialogue and hence can only assume that they understand each other. Speakers manage the nonmonotonicity by negotiating with each other to achieve understanding.</Paragraph>
    <Paragraph position="3"> * The need to detect and correct misunderstandings. Speakers rely on their expectations to decide whether they have understood each other. When hearers identify an apparent inconsistency, they can reinterpret an earlier utterance and respond to it anew. However, if they fail to identify a misunderstanding, the communication might mislead them into prematurely believing that their goals have been achieved.</Paragraph>
    <Paragraph position="4"> * The need for an alternative to the notion of mutual belief. Typically, models rely on mutual beliefs without accounting for how speakers achieve them or for why speakers should believe that they have achieved them.</Paragraph>
    <Section position="1" start_page="438" end_page="440" type="sub_section">
      <SectionTitle>
2.1 Using social conventions to guide interpretation and repair
</SectionTitle>
      <Paragraph position="0"> Our account of interpretation avoids the extended inference required by plan-based models by reversing the standard dependency between an agent's expectations and task-related goals. Plan-based approaches (Allen and Perrault 1979; Litman 1986; Carberry 1990; Lambert and Carberry 1991) start by applying context-independent inference rules to identify the agent's task-related plan, possibly favoring alternatives that extend a previously recognized plan. By contrast, our approach begins with an expectation, using it to premise both the analysis of utterance meaning and any inference  Computational Linguistics Volume 21, Number 4 about an agent's goals. Moreover, our approach treats apparent conflicts with expectations as meaningful; for example, if an utterance is inconsistent with expectations, then the reasoner will try to explain the inconsistency.</Paragraph>
      <Paragraph position="1"> The model focuses on two convention-based sources of expectation. The first is conventions about what attitudes (belief, desire, intention, etc) each speech act expresses; s we call these the linguistic intentions of the speech act. The second is conventions for each speech act about what act should follow; we call these linguistic expectations. Speakers will expect each other to display their understanding of these conventions and how they apply to their conversation. Thus, they can expect each other to be consistent in the attitudes that they express and to respond to each act with its conventional reply, unless they have (and can provide) a valid reason not to. Linguistic intentions are based on Grice's (1957) notion of reflexive intention. For example, an inform(S,H,P) expresses the linguistic intentions whose content is P and intend(S,know(H,P)) (i.e., the speaker intends the hearer to believe (1) that P is true and (2) that the speaker intends that the hearer know P). Linguistic expectations capture the notion of adjacency pairs. 6 In defining linguistic intentions, which are shown in Figure 1, we have followed existing speech act taxonomies, especially those given by Bach and Harnish (1979), Allen (1983), and Hinkelman (1990). 7 Thus, when a speaker produces an askref about P she expresses (and thereby intends the hearer to recognize that she expresses) that she does not know the referent of some description in P, intends to find out the referent of that description, and intends the hearer to tell her that referent. If the speaker is sincere, she actually believes the content of what she expresses; if the hearer is trusting, he might come to believe that she believes it.</Paragraph>
      <Paragraph position="2"> Following Schegloff's (1988) analysis of Example 2, we provide a speech act definition for pretell. 8 In order to capture the linguistic intentions of pretelling, we also add a new attitude, knowsBetterRef(S, H, P) that is true if the knowledge of S is strictly better than the knowledge of P--for example, because S is the expert or S has had more recent experience with P.</Paragraph>
      <Paragraph position="3"> We allow that individuals might not all share the same taxonomy of speech acts and linguistic intentions and that certain social groups or activities might have their own specialized sets of linguistic expectations. 9 Our theory supports this flexibility by having each speaker evaluate the coherence of all utterances within her own view of the discourse. Thus, where we refer to the &amp;quot;displayed interpretation&amp;quot; of an utterance, we mean displayed given the perspective of a particular speaker. 1deg  5 We assume that these attitudes are a function of discourse or illocutionary level of speech acts, rather than the surface or locutionary level. This approach has worked well for us, but, as one reviewer remarked, it is an interesting issue as to whether they are also a function of the locutionary level. 6 Note that although linguistic intentions often express that an action is intended (e.g., questions express an intention that the hearer answer), the two conventions are independent. For example, while an invitation to visit at 6pm might create an expectation that dinner will be served, it does not express an intention to serve it.</Paragraph>
      <Paragraph position="4"> 7 In the figure, we have used the symbol intend to name both the intention to achieve a situation in which a property holds and the intention to do action.</Paragraph>
      <Paragraph position="5"> 8 Schegloff actually argues against representing such sequences as speech acts; however, as in the computational work cited above, we have used the notion of &amp;quot;discourse-level speech act&amp;quot; to represent the functional relationship between the surface form of an utterance, the context, and the attitudes expressed by the speaker.</Paragraph>
      <Paragraph position="6"> 9 Reithinger and Maier (1995) have used n-gram dialogue act probabilities to induce the adjacency pairs from a corpus of dialogues for appointment scheduling.</Paragraph>
      <Paragraph position="7"> 10 Communication can occur despite such differences because speakers with similar linguistic experiences presumably will develop similar expectations about how discourse works. Differences in expectations might very well be one thing that new acquaintances must resolve in order to avoid social conflict.  McRoy and Hirst The Repair of Speech Act Misunderstandings Act type Speech act name Linguistic intentions informative assert(S, H, P) know(S, P) assertref(S, H, P) knowref(S, P) assertif(S, H, P) knowif(S, P) inform(S, H, P) know(P) intend(S, know(H, P ) ) informref(S, H, P) knowref(S, P) intend(S, knowref(H, P)) informif(S, H, P) knowif(S, P) intend(S, knowif(H, P)) inquisitive askref(S, H, P) not knowref(S, P) intend(S, knowref( S , P ) ) intend(S, do(H, informref(H, S, P))) askif(S, H, P) not knowif(S, P) intend(S, knowif(S, P)) intend(S, do(H, informif(H, S, P ) ) ) requestive request(S, H, do(H, P) ) intend(S, do(H, P)) pretell(S,H, P) knowref(S, P) knowsBetterRef(S, H, P) intend(S, do(S, informref(S, H, P))) intend(S, knowref(H, P)) testref(S, H, P) knowref(S, P) intend(S, do(H, assertref(H, S, P))) testif(S, H, P) knowif(S, P) intend(S, do(H, assertif(H, S, P ) ) )  Figure 1 Linguistic intentions.</Paragraph>
      <Paragraph position="8"> The figure shows a list of attitudes that each act expresses; the lists are assumed to be exhaustive with respect to the theory (but not to the various connotations that might be associated with each act). The set of acts itself is not necessarily exhaustive, but sufficient to handle the examples that we consider. While our taxonomy might seem small, most other acts appear to be specializations of those that we selected. Similarly, the model incorporates only a small number of linguistic expectations; these are shown in Figure 2.11</Paragraph>
    </Section>
    <Section position="2" start_page="440" end_page="442" type="sub_section">
      <SectionTitle>
2.2 Characterizing interpretation, production, and repair
</SectionTitle>
      <Paragraph position="0"> Our model unifies the fundamental tasks of interpreting speech acts, producing speech acts, and repairing speech act interpretations within a nonmonotonic framework. In particular, speakers' knowledge about language is represented as a set of default rules.</Paragraph>
      <Paragraph position="1"> The rules describe conventional strategies for producing coherent utterances, thereby displaying understanding, and strategies for identifying misunderstanding. As a resuit, speakers' decisions about what utterances they might coherently generate next correspond to default inference over this theory, while decisions about possible in11 Quantitative results by Jose (1988) and Nagata and Morimoto (1993) provide evidence for these adjacency pairs. In addition, we have used pairs discovered by Conversation Analysis from real dialogues (Schegloff 1988).</Paragraph>
      <Paragraph position="2">  Examples of different types of coherence strategies.</Paragraph>
      <Paragraph position="3"> terpretations of utterances fincluding recognizing misunderstanding) correspond to abductive inference over the theory.</Paragraph>
      <Paragraph position="4"> Definition 1 Given a theory T and a goal proposition G, we say that one can abduce a set of assumptions A from ~ if T U A ~ G and T U A is consistent.</Paragraph>
      <Paragraph position="5"> Abduction has been applied to the solution of local pragmatics problems (Hobbs et al. 1988, 1993) and to story understanding (Charniak and Goldman 1988).</Paragraph>
      <Paragraph position="6"> The model incorporates five strategies, or metaplans, for generating coherent utterances: plan adoption, acceptance, challenge, repair, and closing (the model treats opening as a kind of plan adoption). Figure 3 contains a conversation that includes an example for each of the five types. In plan adoption, speakers simply choose an action that can be expected to achieve a desired illocutionary goal, given social norms and the discourse context. (The goal itself must originate within the speaker's non-linguistic planning mechanism.) The first utterance in the figure is a plan adoption. The second utterance in the figure, if it occurs immediately after an utterance such as the first one, would be an acceptance. With acceptance of an utterance, agents perform actions that have been elicited by a discourse partner. That is, the hearer displays his understanding and acceptance of the appropriateness of a speaker's utterance (independent of whether he actually agrees with it). Challenges display understanding of an utterance, while denying its appropriateness. For example, an agent might challenge the presuppositions of a previous action. The third utterance, if it occurs immediately after an utterance such as the first one, would be a challenge. Repairs display non-acceptance of a previously displayed interpretation (see Section 1.2). The fourth utterance, occurring after an exchange such as/1, 3/, would be a third-turn repair by A; the fifth utterance, occurring  McRoy and Hirst The Repair of Speech Act Misunderstandings after (1, 3, 4), would be a fourth-turn repair by BJ 2 Closings signal that the participants are ready to terminate the conversation (and that they accept the conversation as a whole). The last utterance in the figure is a closing.</Paragraph>
      <Paragraph position="7"> Misunderstandings are classified according to which participant recognizes that the misunderstanding has occurred and whom she thinks has misunderstood. Selfmisunderstandings are those in which a hearer finds that a speaker's current utterance is inconsistent with something that that speaker said earlier and decides that his own interpretation of the earlier utterance must be incorrect. Conversely, other-misunderstandings are those in which the hearer attributes a misunderstanding to the speaker. Fourth-turn repairs may occur after a self-misunderstanding is recognized; third-turn repairs may occur after other-misunderstanding.</Paragraph>
      <Paragraph position="8"> The model addresses both classes of misunderstanding (see Section 3.3.3), but is limited to misunderstandings that appear as misrecognized speech acts) 3 Such misunderstandings are especially important to detect, because the discourse role attributed to an utterance creates expectations that influence the interpretation of subsequent ones. These misunderstandings are also difficult to prevent, because they can result from many common sources, including intra-sentential ambiguity and mishearing.</Paragraph>
    </Section>
    <Section position="3" start_page="442" end_page="442" type="sub_section">
      <SectionTitle>
2.3 Building a model of the interpreted discourse
</SectionTitle>
      <Paragraph position="0"> For a hearer to interpret an utterance as a particular metaplan or as a manifestation of misunderstanding, he needs a model of his understanding of. the prior discourse.</Paragraph>
      <Paragraph position="1"> The typical way to model interpretations has been to represent the discourse as a partially completed plan corresponding to the actual beliefs (perhaps even mutual beliefs) of the participants (cf. Carberry 1990). This representation incorporates two assumptions that must be relaxed in any model that accounts for the negotiation of meaning: first, that hearers are always credulous about what the speaker says, and second, that neither participant makes mistakes. To relax these assumptions, the hearer's model distinguishes the beliefs that speakers claim or act as if they have during the dialogue from those that the hearer actually believes they have. TM The model also represents the alternative interpretations that the hearer has considered as a result of repair. 15 We will now consider an axiomatization of the model.</Paragraph>
      <Paragraph position="2"> 3. The architecture of the model Our model characterizes a participant in a dialogue, alternately acting as speaker and hearer. In this section, we will give both the knowledge structures that enable the participant's behavior and the reasoning algorithms that produce it. (Section 4 and Appendix A present machine-to-machine dialogues involving two instantiations of the implemented model.)</Paragraph>
    </Section>
    <Section position="4" start_page="442" end_page="444" type="sub_section">
      <SectionTitle>
3.1 The reasoning framework: Prioritized Theorist
</SectionTitle>
      <Paragraph position="0"> The model has been formulated using the Prioritized Theorist framework (Poole, Goebel, and Aleliunas 1987; Brewka 1989; van Arragon 1990), because it supports both default and abductive reasoning. Theorist typifies what is known as a &amp;quot;proof12 Non-understanding, which entails non-acceptance (or deferred acceptance), is signaled by second-turn repair. This type of repair will not be considered here.</Paragraph>
      <Paragraph position="1"> 13 Other misunderstandings are possible; for example there can be disagreement about what object a speaker is trying to identify with a referring expression (cf. Heeman and Hirst 1995; Hirst et al. 1994). 14 This distinction is similar to the one made by Luperfoy (1992).</Paragraph>
      <Paragraph position="2"> 15 For present purposes, we also assume that the complete model is accessible to the hearer; one could better simulate the limitations of working memory by limiting access to only the most recent utterances.  Computational Linguistics Volume 21, Number 4 based approach&amp;quot; to abduction because it relies on a theorem prover to collect the assumptions that would be needed to prove a given set of observations and to verify their consistency. Our reasoning algorithm is based on Poole's implementation of Theorist, which we extended to incorporate preferences among defaults as suggested by van Arragon (1990). 16 A Prioritized Theorist reasoner can assume any default d that the programmer has designated as a potential hypothesis, unless it can prove -~d from some overriding fact or hypothesis. This makes the reasoning nonmonotonic, because the addition of a new fact or overriding default may make less preferable hypotheses underivable.</Paragraph>
      <Paragraph position="3"> The syntax of Theorist is an extension of the predicate calculus. It distinguishes two types of formulae, facts and defaults. In Poole's implementation, facts are given by &amp;quot;FACT W.&amp;quot;, where w is a wff. A default can be given either by &amp;quot;DEFAULT (p, d).&amp;quot; or &amp;quot;DEFAULT (p, d) : w.&amp;quot;, where p is a priority value, d is an atomic formula with only free variables as arguments, and w is a wff. For example, we can express the default that birds normally fly, as:</Paragraph>
      <Paragraph position="5"> If 9 t&amp;quot; is the set of facts and AP is the set of defaults with priority p, then an expression DEFAULT(p,d) : W asserts that d E AP and (d D w) E 5 r. The language lacks explicit quantification; as in Prolog, variable names are understood to be universally quantified.</Paragraph>
      <Paragraph position="6"> Facts are taken as true in the domain, whereas defaults correspond to the hypotheses of the domain (i.e., formulae that can be assumed true when the facts alone are insufficient to explain some observation). A priority value is an integer associated with a given default (and all ground instances of it), where a default with priority i is stronger than one with priority j, if i &lt; j. When two defaults conflict, the stronger one (i.e., the one having the lower priority value) takes precedence. For sets of defaults A i and AJ such that i &lt; j, no d E AJ can be used in an explanation if --d E Ai and -~d is consistent with defaults usable from any A h, h &lt; i.</Paragraph>
      <Paragraph position="7"> In the Theorist framework, explanation is a process akin to scientific theory formation-if a closed formula representing an observation is a logical consequence of the facts and a consistent set of default assumptions, then it can be explained: Definition 2 An explanation from the set of facts 9 v and the sets of prioritized defaults A 1 ..... A n of a closed formula g is a set Y U D 1 U ... U D n, where each D i is a set of ground instances of elements of A i, such that:  1. )r U D 1 U... U D n is consistent 2. ,T U D 1 U.--U D n ~g 3. For all D i such that 2 &lt; i &lt; n, there is no ,T U D t 1 U ... U D ~ i-1 that satisfies the priority constraints and is inconsistent with D i.</Paragraph>
      <Paragraph position="8"> 16 Poole's Theorist implements a full first-order clausal theorem prover in Prolog. Like Prolog, it applies a  resolution-based procedure, reducing goals to their subgoals using rules of the form goal *--- subgoall A * * * A subgoaln. However, unlike Prolog, it incorporates a model-elimination strategy (Loveland 1978; Stickel 1989; Umrigar and Pitchumani 1985) to reason by cases.  McRoy and Hirst The Repair of Speech Act Misunderstandings Priority constraints require that no ground instance of d E Ai can be in D i if its negation is explainable with defaults usable from any A J, j &lt; i.</Paragraph>
      <Paragraph position="9"> Priorities enable one to specify that one default is stronger than another, perhaps because it represents an exception. In our model, defaults will have one of three priority values: strong, weak, or very weak. The strongest value is reserved for attitudes about the prior context, whereas assumptions about expectations are given as weak defaults and assumptions about unexpected actions or interpretations are given as very weak defaults. This allows us to specify a preference for expected analyses when there is an ambiguity.</Paragraph>
    </Section>
    <Section position="5" start_page="444" end_page="446" type="sub_section">
      <SectionTitle>
3.2 The language of our model
</SectionTitle>
      <Paragraph position="0"> The model is based on a sorted first-order language, where every term is either an agent, a turn, a sequence of turns, an action, a description, or a supposition. The language includes an infinite number of variables and function symbols of every sort and arity. We also define several special ones to characterize suppositions, actions, and sequences of turns.</Paragraph>
      <Paragraph position="1"> 3.2.1 Suppositions. Suppositions are terms that name propositions that agents believe or express. Suppositions can be thought of as quoted propositions, but with a limited syntax and semantics. We define the following functional expressions: * do(s,a) expresses that agent s has performed the action a; * mistake(s, al,a2) expresses that agent s has mistaken an act al for act a2; * and(pl,p2) expresses the conjunction of suppositions Pl and P2, where Pl must be simple (i.e., not formed from others using the function symbol and); * not p expresses the negation of a simple supposition p.17 We also define several suppositions for expressions of knowledge and intention. Two suppositions are equivalent if and only if they are syntactically identical. To capture the notion that speakers are normally consistent in the suppositions that they choose to express, we need to know how different suppositions relate to each other. More to the point, we need to know when the expressing of two simple suppositions is or is not consistent. A complete account must take into consideration possible entailments among expressed propositions; however, no such account yet exists. As a placeholder for such a theory, there is a compatibility relation for expressed suppositions. Our approach is to make compatibility a default and define axioms to exclude clearly incompatible cases, such as these:  expressed by an agent who says something negative, e.g., &amp;quot;I do not want to go,&amp;quot; which might be represented as inform(s, h, not wantToGo}.</Paragraph>
      <Paragraph position="2">  Computational Linguistics Volume 21, Number 4 The supposition of the performance of some act that expresses, via a linguistic intention, any supposition that would be incompatible with (another supposition of) the agent's interpretation of the discourse.</Paragraph>
      <Paragraph position="3"> The supposition of an intention to perform some act expressing any supposition that is incompatible with the agent's interpretation of the discourse.</Paragraph>
      <Paragraph position="4"> The supposition of an intention to knowif Q if either Q or not Q is already true in the agent's interpretation of the discourse.</Paragraph>
      <Paragraph position="5"> When suppositions are not simple, we check their compatibility by verifying that each of the conjuncts of each supposition is compatible. (In the system, this is implemented as a special predicate, inconsistentLI).</Paragraph>
      <Paragraph position="6"> There is a danger in treating compatibility as a default in that one might miss some intuitively incompatible cases and hence some misunderstandings might not be detectable. An alternative would be to base compatibility on the notion of consistency in the underlying logic, if a complete logic has been defined. TM 3.2.2 Speech acts. For simplicity, we represent utterances as surface-level speech acts in the manner first used by Perrault and Allen (1980). 19 Following Cohen and Levesque (1985), we limit the surface language to the acts surface-request, surface-inform, surface-informref, and surface-informif. Example 3 shows the representation of the literal form of Example 2, the fourth-turn repair example. (We abbreviate &amp;quot;m&amp;quot; for &amp;quot;Mother&amp;quot;, &amp;quot;r&amp;quot; for &amp;quot;Russ&amp;quot;, and &amp;quot;whoIsGoing&amp;quot; for &amp;quot;who's going&amp;quot;.)</Paragraph>
      <Paragraph position="8"> We assume that such forms can be identified by the parser, for example treating all declarative sentences as surface-informs. 2deg  18 Note that human behavior lies somewhere in between these two extremes; in particular, people do not seem to express all the entailments of what they utter (Walker 1991). 19 Other representation languages, such as one based on case semantics, would also be compatible with  the approach and would permit greater flexibility. The cost of the increased flexibility would be increased difficulty in mapping surface descriptions onto speech acts; however, because less effort would be required in sentence processing, the total complexity of the problem need not increase. Using a more finely-grained representation, one could reason about sentence type, particles, and prosody explicitly, instead of requiring the sentence processor to interpret this information (cf. Hinkelman 1990; Beun 1990).</Paragraph>
      <Paragraph position="9"> 20 We also presume that a parser can recognize surface-informref and surface-informif syntactically when the input is a sentence fragment, but it would not hurt our analysis to input them all as surface-inform.</Paragraph>
      <Paragraph position="10">  McRoy and Hirst The Repair of Speech Act Misunderstandings The theory includes the discourse-level acts inform, informif, informref, assert, assertif, assertref, askref, askif, request, preteU, testref, and warn, which we represent using a similar notation. 2~,22 3.2.3 Turn sequences. A turn sequence represents the interpretations of the discourse that a participant has considered up to a particular time. It is structured as a tree, where each level below the root corresponds to a single turn in the sequence, ordered as they occurred in time. Each path from the root to a leaf represents a single interpretation of the dialogue. Nodes that are siblings (i.e., that have the same parent) correspond to different interpretations of the same turn. Nodes at the same level, but having different parents, represent repairs. The currently active interpretation is defined by its most recent turn, which we shall call the focus of the sequence.</Paragraph>
      <Paragraph position="11"> The purpose of this tree structure is to capture the sequential structure of the dialogue and, for each state of the dialogue, what attitudes the participants are accountable for having expressed. 23 Branches in the sequential structure enable the participants to retract attitudes via repair and to reason about the alternatives that they have achieved.</Paragraph>
      <Paragraph position="12"> We will call the turn sequence whose focus is the current turn the &amp;quot;discourse context&amp;quot;. In order to consider previous states of the context, such as before a possible misunderstanding occurred, we define a successor relation on turn sequences: Definition 3 A turn sequence TS2 is a successor to turn sequence TS1 if TS2 is identical to TS1 except that TS2 has an additional turn t that is not a turn of TS1 and t is the successor to the focused turn of TS1.</Paragraph>
    </Section>
    <Section position="6" start_page="446" end_page="453" type="sub_section">
      <SectionTitle>
3.3 The characterization of a discourse participant
</SectionTitle>
      <Paragraph position="0"> We will now consider the knowledge structures that enable a participant's behavior and the reasoning algorithms that produce it. We divide our specification of a participant into three subtheories: A set/3 of prior assumptions about the beliefs and goals expressed by the speakers (including assumptions about misunderstanding).</Paragraph>
      <Paragraph position="1"> A set A/I of potential assumptions about misunderstandings and metaplanning decisions.</Paragraph>
      <Paragraph position="2"> A theory T describing his or her linguistic knowledge, including principles of interaction and facts relating linguistic acts.</Paragraph>
      <Paragraph position="3"> Given these three subtheories, an interpretation of an utterance is a set of ground instances of assumptions that explain the utterance. An utterance would be a coherent  21 In the utterance language, a yes-no question is taken to be a surface-request to informif and a wh-question is taken to be a surface-request to informref. We then translate these request forms into the discourse-level actions askif and askref. An alternative would be to identify them as surface-askif or surface-askref during sentence processing, as Hinkelman (1990) does.</Paragraph>
      <Paragraph position="4"> 22 Speech act names that end with the suffix -ref take a description as an argument; speech act names that end with -if take a supposition. The act inform(s,p) asserts that the proposition is true. The act informif(s, p) asserts the truth value of the proposition named by p (i.e., informif is equivalent to &amp;quot;inform V inform-not&amp;quot;). 23 Tree structures are often used to represent discourse, but usually the hierarchical structure of the discourse, rather than its temporal structure (see Lambert and Carberry 1991, 1992).</Paragraph>
      <Paragraph position="5">  Computational Linguistics Volume 21, Number 4 reply to an immediately preceding utterance if it would logically follow, given the selection of some metaplan: Definition 4 An interpretation of an utterance u to hearer h by speaker s in discourse context ts is a set M of instances of elements of A4, such that</Paragraph>
      <Paragraph position="7"> conflict with any stronger defaults that might apply.</Paragraph>
      <Paragraph position="8"> Definition 5 It would be coherent for s to utter u in discourse context ts if the utterance can be derived from an agent's linguistic knowledge, assuming some set M meta of metaplanning decisions, such that  .</Paragraph>
      <Paragraph position="9"> 2.</Paragraph>
      <Paragraph position="10"> 3.</Paragraph>
      <Paragraph position="11"> '-d-&amp;quot; U \]3 S M meta is consistent T Y 13 U M meta ~ utter(s, h, u, ts) &amp;quot;~ U \]3 U M meta satisfies the priority constraints. That is, u is a solution to the following default reasoning problem: T U 13 U M meta ~- (3u) utter(s, h, u, ts)  In the language of the model, the predicate shouldTry is used for discourse actions that are coherent (M meta) and the predicate try is for actions that are explainable (M). If shouldTry(S1,S2,A,TS) is true, it means that, given discourse context TS (which corresponds to a particular agent's perspective), it would be appropriate for speaker $1 to address speaker $2 with discourse-level speech act A (i.e., according to social conventions, here represented by the linguistic expectations and the meta-plans, S1 should do A next).</Paragraph>
      <Paragraph position="12"> By contrast, try(S1,S2,A,T2) would mean. that, given a discourse context TS, $1 has performed the discourse-level act A. Discourse-level acts are related to surface-level acts by the following default: DEFAULT (3, pickForm (sl, s2, asurfaceForm, a, ts)) :24 decomp( asurfaceForm, a) A try(s1, s2, a, ts) D utter(s1, s2, asurfaceForm, ts).</Paragraph>
      <Paragraph position="13"> This says that the fact that the surface form asurfaceForm can be used to perform discourse act a in some context and the apparent occurrence of a would be a reason for agent sl to utter asurfaceFormdeg 24 The model does not discriminate between equally acceptable alternatives. The default pickForm allows us to account for the fact that the same surface form can perform several discourse acts and the same discourse act might be accomplished by one of several different surface forms. In our system, this default is also used as an oracle, allowing us to see how different interpretations affect the participants' understanding of subsequent turns. Because the default has a very weak priority, it can be overridden by user input, without influencing other defaults.</Paragraph>
      <Paragraph position="14">  The relationship between try and shouldTry and their possible explanations. The predicates shouldTry and try are related because the appropriateness of a potential interpretation is taken as (default) evidence that it is, in fact, the correct interpretation: null DEFAULT (1, intentionalAct(sl,s2,a, ts)): shouldTry( s l, s2, a, ts ) D try(s1, $2, a, ts).</Paragraph>
      <Paragraph position="15"> The key difference is that try allows that the best interpretation might be contextually inappropriate (see Figure 4).</Paragraph>
      <Paragraph position="16"> Interpretation corresponds to the following problem in Theorist: EXPLAIN utter(sl, s2, u, ts).</Paragraph>
      <Paragraph position="17"> Generation corresponds to the following problem in Theorist: EXPLAIN shouldTry(sl, s2, aa, ts) A decomp(as, aa).</Paragraph>
      <Paragraph position="18"> In addition, acts of interpretation and generation update the set of beliefs and goals assumed to be expressed during the discourse. 25 3.3.1 The discourse context. The first component of the model, B, represents the beliefs and goals that the participants have expressed during their conversation. We assume that an agent will maintain a record of these expressed attitudes, represented as a turn sequence. To keep track of the current interpretation of the dialogue, we introduce the notion of activation of a supposition with respect to a turn sequence. If during a turn T, a supposition is expressed by an agent through the utterance of some speech act or the display of misunderstanding, then we say it becomes active in the turn sequence that has T as its focus (see Section 3.2.3). Moreover, once active, a supposition will remain active in all succeeding turn sequences, unless it is explicitly refuted. Individual ttrrns are represented by a set of facts of the form expressed(P,T) and expressedNot(P,T), where P is an unnegated supposition that has not been formed from  any simpler suppositions using the function and. 26 25 A related concern is how an agent's beliefs might change after an utterance has been understood as an act of a particular type. Although we have nothing new to add here, Perrault (1990) shows how default logic might be used to address this problem.</Paragraph>
      <Paragraph position="19"> 26 The intended meaning of expressedNot(P, T) is that during turn T speakers have acted as if the  tial assumptions about misunderstandings and metaplanning decisions. This is given by the following set of Theorist defaults: 27 intentionalAct, expectedReply, acceptance, adoptPlan, challenge, makeFourth TurnRepair, make-ThirdTurnRepair, reconstruction, otherMisunderstanding, selfMisunderstanding, and done. The theorem prover may assume ground instances of any of these predicates if they are consistent with all facts and with any defaults having higher priority. As mentioned in Section 3.1, each of these defaults will have one of threepriority values: strong, weak, or very weak. The strongest level is reserved for attitudes about beliefs and suppositions. Assumptions about expectations (i.e., expectedReply, acceptance, makeThird-TurnRepair, and makeFourthTurnRepair) are given as weak defaults. Assumptions about unexpected actions or interpretations (i.e., adoptPlan, challenge, done, selfMisunderstanding, and otherMisunderstanding) are given as very weak defaults, so that axioms can be written to express a preference for expected analyses when there is an ambiguity. We will consider each of these predicates in greater detail in the next section, when we discuss the third component of the model.</Paragraph>
      <Paragraph position="20"> 3.3.3 A speaker's theory of language. The third component of the model is T, a speaker's theory of communicative interaction. This theory includes strategies for expressing beliefs and intentions, for displaying understanding, and for identifying when understanding has broken down. The strategies for displaying understanding suggest performing speech acts that have an identifiable, but defeasible, relationship to other speech acts in the discourse (or to the situation). Misunderstandings are recognized when an utterance is inconsistent or incoherent; strategies for repair suggest reanalyzing previous utterances or making the problem itself public.</Paragraph>
      <Paragraph position="21"> Relations on linguistic knowledge. There are three important linguistic knowledge relations: decomp, lintention, and lexpectation. They are shown as circles in Figure 5; the boxes in the figure are the objects that they relate.</Paragraph>
      <Paragraph position="22"> supposition P were false. Although expressed(not(P), T) and expressedNot(P, T) represent the same state of affairs, the latter expression avoids infinite recursion by Theorist.</Paragraph>
      <Paragraph position="23"> 27 The theory also contains defaults to capture the persistence of activation (persists), and the willingness of participants to assume that others have a particular belief or goal (credulousB and credulousI, respectively).</Paragraph>
      <Paragraph position="24">  McRoy and Hirst The Repair of Speech Act Misunderstandings The decomp relation links surface-level forms to the discourse-level forms that they might accomplish in different contexts. It corresponds to the body relation in STRIPS-based approaches. 2s Two speech acts are ambiguous whenever they can be performed with the same surface-level form. Lintentions relate discourse acts to the linguistic intentions that they conventionally express (see Section 2.1). The lexpectation relation captures the notion of linguistic expectation discussed in Section 2.1, relating each act to the acts that might be expected to follow. Where there is more than one expected act, a condition is used to distinguish them. For example, the axioms representing the linguistic expectations of askref are shown below. 29  FACT lexpectation(do(sl, askref(sl, $2, d)), knowref(s2, d), do(s2, informref(s2, Sl, d))).</Paragraph>
      <Paragraph position="25"> &amp;quot;A speaker Sl can expect that making an askref of d to s2 will result in s2 telling Sl the referent of d, if s2 knows it.&amp;quot; FACT lexpectation(dO(Sl, askref(sl, S2, d)), not knowref(s2, d), do(s2, inform(s2, Sl, not knowref(s2, d)))).</Paragraph>
      <Paragraph position="26"> &amp;quot;A speaker Sl can expect that making an askref of d to s2  will result in s2 telling sl that s2 does not know the referent of d, if s2 does not know it.&amp;quot; Beliefs and goals. In the model, participants' actual beliefs and goals are distinguished from those that they express through their utterances. For the examples considered here, any model of belief would suffice; for simplicity we chose to include beliefs and goals explicitly in the initial background theory and allow agents to make assumptions about each other's beliefs and goals by default. Bdeg Expectation. In addition to the notion of linguistic expectations, which exist in any situation, the model incorporates a cognitive, &amp;quot;belief-about-the-future&amp;quot; notion of expectation. These expectations depend on a speaker's knowledge of social norms, her understanding of the discourse so far, and her beliefs about the world at a particular time. They are captured by the following Theorist rules:  DEFAULT (2, expectedReply(pao, Pcondition, dO(Sl, areplv), ts) ) : active(pdo, ts ) /~ lexpectation (P ao, Pcondition, do(s1, areply ) ) /k believe( sl, Pcondition ) 9 expected(s1, arepl~, ts).</Paragraph>
      <Paragraph position="27"> FACT -qintentionsOk(a, ts ) D -~expectedReply(pao, Pcondition, do(s, a), ts).</Paragraph>
      <Paragraph position="28">  28 Pollack (1986a) calls this the &amp;quot;is-a-way-to&amp;quot; relation.</Paragraph>
      <Paragraph position="29"> 29 It is actually controversial whether an askref followed by an inform-not-knowref is a valid adjacency pair. If such questions are taken to presuppose that the hearer knows the answer, a response to the contrary could also be considered a challenge of this presupposition (Tsui 1991). 30 It would have been possible to characterize actual belief using an appropriate set of axioms, such as  those defining a weak $4 modal logic. However, current formalizations do not seem to account for the context-sensitivity of speakers' beliefs. See McRoy (1993b) for a discussion.  Computational Linguistics Volume 21, Number 4 The second rule says that one would not expect the action areply if the linguistic intentions associated with it are incompatible with the context ts. 31 Normally, as the discourse progresses, expectations for action that held in previous states of the context eventually cease to hold in the current context, because after the action occurs, it would be incompatible for an agent to say that he intends to achieve something that is already true. The compatibility between each of the linguistic intentions of a proposed action and each of the active suppositions in a context is captured by the predicate lintentionsOk, which is true if and only if none of the incompatibilities described in Section 3.2.1 hold.</Paragraph>
      <Paragraph position="30"> For convenience, we also define a subjunctive form of expectation to reason about expectations that would arise as a result of future actions (e.g., plan adoption) or that must be considered when evaluating a potential repair. This type of expectation differs from the type defined above in that it depends on the real beliefs of the agent performing the first (rather than the second) part of an adjacency pair and it does not depend on the activity of any suppositions or actions.</Paragraph>
      <Paragraph position="31"> FACT lexpectation (do(s1, al ), p, do(s2, a2)) A believe(s1, p) =- wouldExpect(st, al, a2).</Paragraph>
      <Paragraph position="32"> Metaplans and misunderstandings. Metaplans encode strategies for selecting an appropriate act. The antecedents of these axioms refer to expectations. In addition, in order to preserve discourse coherence, they require either that the linguistic intentions of suggested actions be compatible with the context or that there be some overt acknowledgement of the discrepancy. (The theory presented here addresses only the former case; the latter one might be handled by adding an extra default with a stronger priority level.) Tables 1-6 give each of these axioms in detail.</Paragraph>
      <Paragraph position="33"> Along with these metaplans, a speaker's linguistic theory includes two diagnostic axioms that characterize speech act misunderstandings: self-misunderstanding and other-misunderstanding. The antecedents of these axioms refer to ambiguities and inconsistencies with expressed linguistic intentions, as well as expectations. For example, Table 5 describes how an observed inconsistency of Sl performing anew might be a symptom of s2's misinterpretation of an earlier act by Sl. Such mistakes are possible when the surface form of the earlier act might be used to accomplish either aobserved or a intende d.32 The defaults that characterize misunderstandings have a lower priority than the metaplans, because speakers consider misunderstandings only when no coherent interpretation is possible. The preference for coherent interpretations is especially important when there is more than one discourse-level act for which the utterance is a possible decomposition.</Paragraph>
      <Paragraph position="34"> 31 Although, like expectedReply, active is a default, active will take precedence over expectedReply, because it has been given a higher priority on the assumption that memory for suppositions is stronger than expectation.</Paragraph>
      <Paragraph position="35"> 32 It is possible that the same surface form might accomplish several different discourse acts, in which case it might be desirable to evaluate the likelihood of alternative choices. The work discussed by Reithinger and Maier (1995), for example, found statistical regularities in the misinterpretations that occurred in their corpus of appointment-scheduling dialogues.</Paragraph>
      <Paragraph position="36">  1. sl wants speaker s2 to do action a2; 2. sl would expect a2 to follow an action al; and 3. sl may adopt the plan of performing al to trigger a2 (i.e., the linguistic intentions of al are compatible with ts).</Paragraph>
      <Paragraph position="37">  expected(s1, areply , ts) D shouldTry(sl, s2, areply, ts).</Paragraph>
      <Paragraph position="38"> FACT active(do(s1, a), ts) D -~acceptance(sl, a, ts).</Paragraph>
      <Paragraph position="39">  Summary Speaker Sl should do action areply in discourse ts when: 1. sl expects areply to Occur next; and 2. Sl may accept the interpretation corresponding to ts.</Paragraph>
      <Paragraph position="40"> 4. A detailed example To show how our abductive account of repair works, we offer two examples that show repair of self-misunderstanding and other-misunderstanding, respectively. Here we will discuss Example 2 from Russ's perspective, considering in detail Russ's reasoning about each turn and showing an output trace from our implemented system. From  Russ's perspective, this example demonstrates the detection of a self-misunderstanding and the production of a fourth-turn repair. In Appendix A we show the system's output for a third-turn repair, interleaving the perspectives of its two participants.  1. sl has mistaken an instance of act aintended as an instance of act aobserved ; 2. A reconstruction of the discourse is possible; 3. sl would expect to do ar~ply in this reconstruction; and 4. s may perform a fourth-turn repair.</Paragraph>
      <Paragraph position="41">  if: 1. S2 has apparently mistaken an instance of act aintended for act aobserved; 2. Sl would expect arepty to follow aintended; and 3. sl may perform a third-turn repair (i.e., it would be reasonable and compatible for s2 to perform areply).</Paragraph>
    </Section>
    <Section position="7" start_page="453" end_page="455" type="sub_section">
      <SectionTitle>
4.1 Overview
</SectionTitle>
      <Paragraph position="0"> 1. sl has performed action aobserve~; 2. But, the linguistic intentions of a,ew are inconsistent with the linguistic intentions of aobservee; 3. aobserved and action aintendee can be performed using a similar surface-level speech act; and 4. s2 may have mistaken aintended for aobserved.</Paragraph>
      <Paragraph position="1"> T3 Mother: T4 Russ: I don't know.</Paragraph>
      <Paragraph position="2"> Oh. Probably Mrs. McOwen and probably Mrs. Cadry and some of the teachers.</Paragraph>
      <Paragraph position="3">  In the input we represent this dialogue as the following sequence:</Paragraph>
      <Paragraph position="5"> From Russ's perspective, these utterances had the following discourse-level interpretations at the time each was produced:</Paragraph>
      <Paragraph position="7"> 1. Earlier, speaker s2 performed act aintended; 2. Actions aintended and asimilar can be performed using a similar surface form; 3. If s2 had performed asimitar, then anew would be expected; 4. Sl may have mistaken aintended for asirailar.</Paragraph>
      <Paragraph position="8">  After Russ hears T3, he decides that his interpretation of Mother's first turn as a pretelling is incorrect. This revision then leads him to reinterpret it as an askref and to provide a new response.</Paragraph>
      <Paragraph position="9"> We will now show how Russ's beliefs might progress this way. In particular, we shall address the following questions: * How Russ decides, after first concluding that T1 was a pretelling, that he will respond with an askref.</Paragraph>
      <Paragraph position="10"> * How Russ decides, after hearing Mother's response T3, that his earlier decision was incorrect.</Paragraph>
      <Paragraph position="11"> * How Russ decides to produce an informref in T4.</Paragraph>
      <Paragraph position="12"> Figures 6, 7, 9, and 10 will show the output of the system for each of the four turns of this dialogue, from Russ's perspective.</Paragraph>
    </Section>
    <Section position="8" start_page="455" end_page="458" type="sub_section">
      <SectionTitle>
4.2 Initial assumptions
</SectionTitle>
      <Paragraph position="0"> For this example, we shall assume that Russ believes that he knows who is going to the meeting (but also allows that Mother's knowledge about the meeting would be more accurate than his own). For simplicity, we represent these beliefs as facts. 33  FACT believe(r, knowref(r, whoIsGoing)).</Paragraph>
      <Paragraph position="1"> FACT believe(r, knowsBetterRef(m,r, whoIsGoing)).</Paragraph>
      <Paragraph position="2"> 33 We might have used priorities to express different degrees of belief.  expressed(knowref(m, wholsGoing), I) expressed(knowsBetterRef(m, r, whoIsGoing), I) expressed(intend(m, do(m, informref(m, r, wheIsGoing))), i) expressed(intend(m, knowref(r, whoIsGoing)), i)  Agent m adopted plan to achieve: askref(r,m,whoIsGoing) Figure 6 The output for turn 1 from Russ's perspective.</Paragraph>
      <Paragraph position="3"> We also assume that Russ believes that he knows whether (or not) he knows. FACT believe(r, knowif(r, knowref(r, whoIsGoing))).</Paragraph>
      <Paragraph position="4"> Lastly, we assume that he has linguistic expectations regarding pretell, askref, and askif as in Section 2.1. 34 34 To keep this example of manageable size, we will not assume that he has any expectations regarding testif or testref, although in life he would.</Paragraph>
      <Paragraph position="5">  According to the model, after Russ hears Mother's surface-request, &amp;quot;Do you know who is going to that meeting?&amp;quot;, he interprets it by attempting to construct a plausible explanation of it. This requires tentatively choosing a discourse-level act on the basis of the decomposition relation and then attempting to abduce either that it is an intentional display of understanding or that it is a symptom of misunderstanding. Theorist is called to explain the utterance and returns with a list of assumptions that were made to complete the explanation. (The portion of the output from the update describes Russ's interpretation of this explanation; see Figure 6.) In this simulation, T1 was explained as an intentional pretelling. The explanation contains the metaplanning assumption that Mother was pretelling as part of a plan to get Russ to ask a question. The reasoner also attributed to her the linguistic intentions of pretelling. We will now consider the complete explanation in detail. Inference begins with a call to Theorist to explain the input: utter(m, r, surface-request(m,r, informif(r,m, knowref(r, whoIsGoing))),ts(0)) This utterance must be explained by finding a discourse-level speech act that it might accomplish and a metaplan or misunderstanding that would explain this act. This makes use of the following default:  DEFAULT (3, pickForm(sl, s2, asurfaceForm, a, ts) ) : decomp ( asurfaceForm, a) A try(sl,s2,a, ts) 9 utter(s1, s2, asurfaceFo~m, ts).</Paragraph>
      <Paragraph position="6">  To satisfy the first premise, the reasoner would need to find a speech act that is related to the surface form by the decomp relation, for example, either an askif, an askref, or a pretelling: decomp(surface-request(m, r, informif(r, m, knowref(r, whoIsGoing))), pretell(m, r, whoIsGoing)) decomp(surface-request(m, r, informif(r, m, knowref(r, whoIsGoing))), askref(m, r, whoIsGoing)) decomp(surface-request(m, r, informif(r, m, knowref(r, whoIsGoing))), askif(m, r, knowref(r, whoIsGoing))) In this case, the possibility that Mother is attempting a pretelling was considered. (The system uses an oracle, represented by the default pickForm, to simulate this choice. 35) It is important to note that this is just one of the possible explanations available to Russ. Nothing in his beliefs rules out abducing explanations from either the askif or the askref interpretation.</Paragraph>
      <Paragraph position="7"> To satisfy the second premise of the rule, the reasoner must explain: try(m, r, pretell(m, r, whoIsGoing), ts(0)) Two kinds of explanation are possible: a hearer might assume that the act fulfills the speaker's intention to coherently extend the discourse as he has understood it or 35 This oracle thus allows the analyst to test different interpretations.  McRoy and Hirst The Repair of Speech Act Misunderstandings he might assume that one of the two types of misunderstanding has occurred. 36 If a discourse has just begun, then any utterance that starts an adjacency pair will be coherent. In this case, Russ finds that the former type of explanation is possible using the metaplan (for plan adoption) to explain shouldTry(m, r, pretell(m, r, whoIsGoing), ts(0)). The relevant defaults are repeated here:  DEFAULT (1, intentionalAct(sl, $2, a, ts) ) : shouldTry( sl, s2, a, ts ) D try(s1, $2, a, ts).</Paragraph>
      <Paragraph position="8"> DEFAULT (3, adoptPlan (st, $2, al, a2, ts) ) : hasGoal(sl, do(s2, a2), ts) A wouldExpect(sl, do(s1, al), do(s2, a2)) D shouldTry(sl, s2, al, ts).</Paragraph>
      <Paragraph position="9">  The conditions of the metaplan are satisfiable because there is a plausible goal act that a pretelling would help Mother to achieve and it is consistent for Russ to assume that achieving this act was, in fact, her goal. 37 Also, when we consider possible evidence against Mother adopting this plan, namely whether the linguistic intentions of pretelling were incompatible with those that have been expressed, it would be consistent to assume that Mother is intending this plan.</Paragraph>
    </Section>
    <Section position="9" start_page="458" end_page="458" type="sub_section">
      <SectionTitle>
Russ infers
</SectionTitle>
      <Paragraph position="0"> wouldExpect(r, pretell(m, r, whoIsGoing), askref(r, m, whoIsGoing)) because he has a linguistic expectation to that effect: FACT lexpectation(do(m, pretell(m, r, whoIsGoing)), knowsBetterRef(m, r, whoIsGoing), do(r, askref(r, m, whoIsGoing))).</Paragraph>
    </Section>
    <Section position="10" start_page="458" end_page="460" type="sub_section">
      <SectionTitle>
4.4 Turn 2: Russ decides to respond with an askref
</SectionTitle>
      <Paragraph position="0"> In turn 2, Russ produces a surface-request. This utterance is appropriate, independent of whether or not Russ actually wants to know who is going to the meeting, because it displays acceptance of Mother's pretelling. From Russ's perspective it displays acceptance, because a surface-request is one way to perform an askref, an act that is expected according to Russ's model of the discourse after the first turn. 38 As shown in Figure 7, Theorist finds that if Russ accepts Mother's pretelling, he should perform an askref. An askref would demonstrate acceptance because it is the expected next act. The derivation of this act relies on the rule for intentional action shown earlier in Section 4.3, along with the metaplan for acceptance repeated here:  36 The former possibility admits that an utterance that displays a misconception, such as a mistaken belief about initial knowledge, might still be coherent, unless such knowledge has been introduced into the discourse explicitly. Misconceptions are addressed by second-turn repairs, which are not considered here.</Paragraph>
      <Paragraph position="1"> 37 Because Russ's previous utterance had not been the first part of an adjacency pair, he cannot explain her utterance as acceptance or challenge.</Paragraph>
      <Paragraph position="2"> 38 If, for some reason, Russ did not want to know the information, he might decide not to produce an askref. However, he would then be accountable for justifying his action as well as for displaying his acceptance of Mother's displayed understanding (e.g., by including an explicit rejection of her offer); otherwise she might think that one of them has misunderstood.</Paragraph>
      <Paragraph position="3">  The output for turn 2 from Russ's perspective.</Paragraph>
      <Paragraph position="4"> DEFAULT (2, acceptance(s1, areply, ts) ) : expected(s1, areply, ts ) D shouldTry(sl, $2, areply, ts).</Paragraph>
      <Paragraph position="5"> The askref would be expected (see Section 3.3.3) because: 39 * According to the discourse model, it is true that active(do(m pretell(m, r, whoIsGoing)), ts(1)).</Paragraph>
      <Paragraph position="6">  McRoy and Hirst The Repair of Speech Act Misunderstandings If we assume that Mother produced the first turn as an askif, she might also hear T2 as an intentional askref, but for a reason different than Russ would. Her explanation would include the metaplanning assumption that he was doing so as part of an adopted plan to get her to produce an informref. Although T2 might also be explained by abducing that Russ misunderstood T1 as an attempted pretelling, we see that she considers this explanation to be less likely because otherwise she would have been more inclined to make T3 a third-turn repair (&amp;quot;No, I'm asking you&amp;quot;). 40 Plan adoption (see Table 1) provides Mother a plausible explanation for T2 because:  1. wouldExpect(r, askref(r, m, whoIsGoing), informref(m, r, whoIsGoing)) is explained because Mother has a linguistic expectation that says that an askref normally creates an expectation for the listener to tell the speaker the answer: ~ACT lexpectation(do(r, askref(r, m, whoIsGoing)), knowref(m, whoIsGoing), do(m, informref(m, r, whoIsGoing))).</Paragraph>
      <Paragraph position="7"> 2. Mother's credulousness about Russ's goals explains her belief that he wants her to perform the expected informref.</Paragraph>
      <Paragraph position="8"> 3. The linguistic intentions of askref are compatible with those that have been expressed, so it is consistent to assume that Russ is intending to use it as part of a plan. (They are consistent with the context because T1 expresses only that Mother does not know whether Russ knows and not that she does not herself know.) 4. Thus, by 1-3 and the metaplan for plan adoption, shouldTry(r, m, askref(r, m,  whoIsGoing), ts(0)) is explainable.</Paragraph>
      <Paragraph position="9"> Assuming this interpretation, Mother can then demonstrate acceptance using an inform-notknowref. null Figure 8 How Mother interprets T2.</Paragraph>
    </Section>
    <Section position="11" start_page="460" end_page="462" type="sub_section">
      <SectionTitle>
4.5 Turn 3: Russ decides that his interpretation of Turn 1 was wrong
</SectionTitle>
      <Paragraph position="0"> Mother replies with a surface-inform. This is interpreted as a discourse-level informnot-knowref. This act signals a misunderstanding, because the linguistic intentions associated with it are incompatible with those previously assumed, ruling out an explanation that uses the default for intentional acts. 41 Figure 9 shows that Theorist abduces that T3 is attributable to a misunderstanding on Russ's part, in particular, to his having incorrectly interpreted one of Mother's utterances as a pretelling, rather than as an askref. This explanation succeeded because each of the conditions of the default for self-misunderstanding were explainable. Below we will repeat this rule and then sketch the proof, considering each of the premises in the default.</Paragraph>
      <Paragraph position="1"> 40 In the model, it is always possible to begin an embedded sequence without addressing the question on the floor; however, when the embedded sequence is complete, the top-level one is resumed. It is a limitation of the model that we do not distinguish interruptions from clarifications.</Paragraph>
      <Paragraph position="2"> 41 For Russ to have heard T3 as demonstrating Mother's acceptance of his T2 (i.e., as a display of understanding), the linguistic intentions of inform(m, r, not knowref(m, whoIsGoing)) would need to have been compatible with this interpretation of the discourse. However, not knowref(m, whoIsGoing) is among these intentions, while active(knowref(m, whoIsGoing),ts(2)). As a result, T3 cannot be attributed to any expected act, and must be attributed to a misunderstanding either by Russ or by Mother.</Paragraph>
      <Paragraph position="3">  intend(m, knowref(r, wholsGoing))))) The linguistic intentions of inform-not-knowref are: and(not knowref(m, whoIsGoing), intend(m, knowif(r, not knowref(m, whoIsGoing)))).</Paragraph>
      <Paragraph position="4"> But these intentions are inconsistent, because knowref(m, whoIsGoing) and not knowref(m, whoIsGoing) are incompatible. As a result, inconsistentLI holds for these linguistic intentions.</Paragraph>
      <Paragraph position="5"> Premise 5: This is a plausible mistake because the acts pretell and askref both have the same surface form: surface-request(m, r, informif(r, m, knowref(r, whoIsGoing))) So, ambiguous(pretell(m, r, whoIsGoing), askref(m, r, whoIsGoing)).</Paragraph>
      <Paragraph position="6"> The constraints: There is no other coherent interpretation, so it is consistent to assume that a misunderstanding occurred: selfMisunderstanding(m,r, mistake(r, askref(m, r, whoIsGoing), pretell(m, r, whoIsGoing)), inform(m, r, not knowref(m, whoIsGoing)), ts(2)).</Paragraph>
      <Paragraph position="7"> Thus, try(m, r, inform(m, r, not knowref(m, whoIsGoing)), ts(2)) is explained. As a result of this interpretation, not knowref(m, whoIsGoing) is added to the discourse model as the fact expressedNot(knowref(m, whoIsGoing)). This addition terminates the activation of knowref(m, whoIsGoing) from the first turn. (At the same time, if Russ had revised any of his real beliefs on the basis of the first turn, he might now reconsider those revisions; however, our theory does not account for this.)</Paragraph>
    </Section>
    <Section position="12" start_page="462" end_page="463" type="sub_section">
      <SectionTitle>
4.6 Turn 4: Russ performs a repair
</SectionTitle>
      <Paragraph position="0"> After revising his understanding of Turn 1, Russ performs a surface-informref that displays his acceptance of the revised interpretation. When Theorist is called to find a coherent discourse-level act (i.e., by using the default for intentional acts) it finds that Russ can perform a fourth-turn repair. The metaplan for this repair, repeated below, is similar to that for acceptance, but involves the reconstruction of the discourse model.</Paragraph>
      <Paragraph position="1"> 42 In the discourse model, this was expressed as expressed(do(m, preteU(m, r, whoIsGoing)), 0), from which one can assume persists(do(m, pretell(m, r, wholsGoing)), 2) by default.  Computational Linguistics Volume 21, Number 4 DEFAULT (2, makeFourth TurnRepair( sl, $2, areply, ts ) ) : active(mistake ( s l, a intended, aobserved ) , ts ) A reconstruction (ts, tSreconstructed) A expected(s1, areply, tSreconstructed) 3 shouldTry(sl, 82, areply, ts).</Paragraph>
      <Paragraph position="2"> This metaplan applies because Russ had misunderstood a prior utterance by Mother, a reconstruction of the discourse is possible, and, within the reconstructed discourse, an informref is expected (as a reply to the misunderstood askref). 43 An informref by Russ is expected (see Section 3.3.3) in the reconstructed dialogue</Paragraph>
    </Section>
    <Section position="13" start_page="463" end_page="464" type="sub_section">
      <SectionTitle>
5.1 Accounts based on plan recognition
</SectionTitle>
      <Paragraph position="0"> Plan-based accounts interpret speech acts by chaining from subaction to action, from actions to effects of other actions, and from preconditions to actions to identify a plan (i.e., a set of actions) that includes the observed act. Heuristics are applied to discriminate among alternatives.</Paragraph>
      <Paragraph position="1"> 5.1.1 Allen and Perrault. Allen and Perrault (1979), Perrault and Allen (1980) show how plan recognition can be used to understand indirect speech acts (such as the use of &amp;quot;Can you pass the salt?&amp;quot; as a polite request to pass the salt). To interpret an utterance, the approach applies a set of context-independent inference rules to identify all plausible plans. For example, one rule says that if a speaker wants to know the truth value of some proposition, then she might want the proposition to be made true. The final interpretation is then determined by a set of rating heuristics, such as &amp;quot;Decrease the rating of a path if it contains an action whose effects are already true at the time the action starts.&amp;quot; These rating heuristics are problematic because they conflate linguistic and pragmatic knowledge with knowledge about the search mechanism itself. This approach cannot handle more than a few relationships between utterances and plans and cannot handle any utterances that do not relate to the domain plan in a direct manner.</Paragraph>
      <Paragraph position="2"> Although we have not yet considered the problem of indirect utterances in detail, we anticipate that such explanations might include as a subtask the kind of plan-based inference that has been proposed, but this inference would be limited by the hearer's own goals and expectations. However, many common uses of indirectness can be explained by the existence of a well-accepted social convention that makes them expected.</Paragraph>
      <Paragraph position="3"> 43 From Mother's perspective, if indeed she did make an askif in T1, T4 can be seen as a display of acceptance of it, because a surface-informref is one way to do an informif. Thus, from her perspective, she need never recognize that Russ has misunderstood.</Paragraph>
    </Section>
    <Section position="14" start_page="464" end_page="466" type="sub_section">
      <SectionTitle>
Suppositions Added:
</SectionTitle>
      <Paragraph position="0"> expressed(do(m, askref(m, r, wholsGoing)), alt(1)) expressedNot(knowref(m, whoIsGoing), alt(1)) expressed(intend(m, knowref(m, whoIsGoing)), alt(1)) expressed(intend(m, do(r, informref(r, m, whoIsGoing))), alt(1))  The output for turn 4 from Russ's perspective.</Paragraph>
      <Paragraph position="1"> 5.1.2 Litman. Work by Litman (1986) attempts to overcome some of the limitations of Allen and Perrault's approach by extending the plan hierarchy to include discourse-level metaplans, in addition to domain-level plans. Metaplans include actions, such as introduce, continue, or clarify and are recognized, in part, by identifying cue phrases. Although the metaplans add flexibility by increasing the number of possible paths, they also add to the problem of pruning and ordering the paths, requiring additional heuristics. For example, there are specific rules for choosing among alternative meta-plans on the basis of clue words, implicit expectations, or default preferences. Litman also adds a new general heuristic: stop chaining if an ambiguity cannot be resolved. 5.1.3 Carberry and Lambert. Carberry (1985, 1987, 1990) uses a similar approach. Her model introduces a new set of discourse-level goals such as seek-confirmation that are recognized on the basis of the current properties of the dialogue model and the mutual beliefs of the participants. Once a discourse-level goal is selected, a set of can- null Computational Linguistics Volume 21, Number 4 didate plans is identified, and Allen-style heuristics are applied to choose one of them. Subsequent work by Lambert and Carberry (1991, 1992) introduces an intermediate, problem-solving level of plans that link the discourse-level acts to domain plans. The processing rules, by their specificity, eliminate the need for many of the heuristics. The sacrifice here is a loss of generality; the mechanisms for recognizing goals are specific to Carberry's implementation.</Paragraph>
      <Paragraph position="2"> 5.1.4 Cawsey. Cawsey (1991) proposes a method of extending Perrault and Allen's (1980) inference rule approach to produce repairs. She also suggests including some of the information captured by various rating heuristics as premises in the rules, allowing that these new premises may be assumed by default. For example, the following rule is proposed for capturing pretellings: if request(Sl, S2, informif(S2, Sl, knowref(S2, D))) and know(S2, knowref(Sl, D)) then know(S2, wants(Sl, knowref(S2, D))) To handle misunderstandings, she suggests that such assumptions be retracted if they become inconsistent and then any subsequent utterance whose interpretation depends on a retracted belief be reinterpreted from scratch. This approach is thus much stronger than most accounts of negotiation, such as ours, which allow that a participant might choose to forego a complete repair. Allowing defeasible beliefs is a step in the right direction; however, the approach still misses the point that participants are able to negotiate meanings. Preconditions such as know(S2, not knowref (Sl, D)) influence interpretations only to the extent that they provide support for, or evidence against, a particular (abductive) explanation. In Example 2, even if Mother knew who was going, she could still be asking Russ a question, albeit insincerely. Similarly, even if Russ suspected that Mother did not know who was going, he might still have chosen to treat her utterance as a pretelling, perhaps to confirm his suspicions or to delay answering.</Paragraph>
      <Paragraph position="3"> 5.1.5 Traum and Hinkelman. Hinkelman's (1990) work incorporates some abductive reasoning in her model of utterance interpretation. The model treats different features in the input, such as the mood of a sentence or the presence of a particular lexical item, as manifestations of different speech acts. During interpretation, procedures that test for particular features of the input suggest candidates. The system then removes any candidates whose implicatures are inconsistent with prior beliefs.</Paragraph>
      <Paragraph position="4"> Traum and Hinkelman (1992) extend this work by generalizing the notion of speech act to conversation act. Conversation acts include traditional speech act types as well as what Traum and Hinkelman call grounding acts. Conversation acts, however, are not assumed to be understood without some positive evidence by the receiver, such as an acknowledgment. Grounding acts include initiating, clarifying, or acknowledging an utterance, and taking and releasing a turns. These acts differ from our own meta-plans in that they are organized into a finite state grammar, and do not account for grounding acts that would violate a receiver's expectations. In conversation, grounding acts that violate the grammar are not recognized. Traum and Hinkelman suggest that such violations should be used to trigger a repair, but admit that, except when a repair has been requested explicitly, the model itself says nothing about when a repair should be uttered (p. 593). 44 44 Interpretations that have the right pragmatic force but inconsistent implicatures are ruled out as in  McRoy and Hirst The Repair of Speech Act Misunderstandings Traum and Allen (1994) extend the work to include a notion of social obligation, which serves much the same purpose as expectations in our model.</Paragraph>
    </Section>
    <Section position="15" start_page="466" end_page="466" type="sub_section">
      <SectionTitle>
5.2 Other expectation-driven accounts
</SectionTitle>
      <Paragraph position="0"> Within the speech understanding community, the word &amp;quot;expectation&amp;quot; has been used differently from our use here. Expectation in the speech context refers to what the next word or utterance is likely to be about. 45 For example, after the computer asks the user to perform some action A, it might expect any of the following types of responses:  1. A statement about background knowledge that might be needed.</Paragraph>
      <Paragraph position="1"> 2. A statement about the underlying purpose of A.</Paragraph>
      <Paragraph position="2"> 3. A statement about related task steps (i.e., subgoals of A, tasks that contain A as a step, or tasks that might follow A).</Paragraph>
      <Paragraph position="3"> 4. A statement about the accomplishment of A.</Paragraph>
      <Paragraph position="4">  These expectations are independent of the belief state of an agent and are specified down to the semantic (and sometimes even lexical) level. This information has long been used to discriminate between ambiguous interpretations and correct mistakes made by the speech recognizer (Fink and Biermann 1986; Smith 1992). Typically, an utterance will be interpreted according to the expectation that matches it most closely. By contrast, our approach and that of the plan-based accounts use &amp;quot;expectation&amp;quot; to refer to agents' beliefs about how future utterances might relate to prior ones. These expectations are determined both by an agent's understanding of typical behavior and by his or her mental state. These two notions of expectation are complementary, and any dialogue model that uses speech as input must be able to represent and reason with both.</Paragraph>
    </Section>
    <Section position="16" start_page="466" end_page="467" type="sub_section">
      <SectionTitle>
5.3 Approaches to misconception
</SectionTitle>
      <Paragraph position="0"> Misconceptions are a deficit in an agent's knowledge of the world; they can become a barrier to understanding if they cause an agent to unintentionally evoke a concept or relation. To prevent misconceptions from triggering a misunderstanding, agents can check for evidence of misconception and try to resolve apparent errors. The symptoms of misconception include references to entities that do not map to previously known objects or operations (Webber and Mays 1983) or requests for clarification (Moore 1989). Errors are corrected by replacing or deleting parts of the problematic utterance so that it makes sense. Several correction strategies have been suggested:  Computational Linguistics Volume 21, Number 4 achieving a goal, then an entity that corresponds to a step from one strategy might be replaced by one corresponding to a step from one of the other strategies (see Carberry 1985, 1987; Eller and Carberry 1992; Moore 1989).</Paragraph>
      <Paragraph position="1"> Although these approaches do quite well at preventing certain classes of misunderstandings, they cannot prevent them all. Moreover, these approaches may actually trigger misunderstandings because they always find some substitution, and yet they lack any mechanisms for detecting when one of their own previous repairs was inappropriate. Thus, a conversational participant will still need to be able to address actual misunderstandings.</Paragraph>
    </Section>
    <Section position="17" start_page="467" end_page="467" type="sub_section">
      <SectionTitle>
5.4 Collaboration in the resolution of nonunderstanding
</SectionTitle>
      <Paragraph position="0"> In this paper, we have concentrated on the repair of mis-understanding. Our colleagues Heeman and Edmonds have looked at the repair of non-understanding. The difference between the two situations is that in the former, the agent derives exactly one interpretation of an utterance and hence is initially unaware of any problem; in the latter, the agent derives either more than one interpretation, with no way to choose between them, or no interpretation at all, and so the problem is immediately apparent. Heeman and Edmonds looked in particular at cases in which a referring expression uttered by one conversant was not understood by the other (Heeman and Hirst 1995; Edmonds 1994; Hirst et al. 1994). Clark and his colleagues (Clark and Wilkes-Gibbs 1986; Clark 1993) have shown that in such situations, conversants will collaborate on repairing the problem by, in effect, negotiating a reconstruction or elaboration of the referring expression. Heeman and Edmonds model this with a plan recognition and generation system that can recognize faulty plans and try to repair them. Thus (as in our own model) two copies of the system can converse with each other, negotiating referents of referring expressions that are not understood by trying to recognize the referring plans of the other, repairing them where necessary, and presenting the new referring plan to the other for approval.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML