File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/j93-4004_metho.xml

Size: 111,327 bytes

Last Modified: 2025-10-06 14:13:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="J93-4004">
  <Title>Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information</Title>
  <Section position="5" start_page="656" end_page="659" type="metho">
    <SectionTitle>
2 The first step of the schema is to identify commonalities of the two entities being contrasted. In McKeown (1985), this step is optional if no commonalities exist. We have changed this definition
</SectionTitle>
    <Paragraph position="0"> slightly to render this step optional if the speaker believes these commonalities are known to the hearer. This is the case here, so the instantiated schema does not contain this step.</Paragraph>
    <Paragraph position="1">  Intentional structure of system's utterance.</Paragraph>
    <Paragraph position="2"> rhetorical predicates that cause sentences to be generated) that are used to achieve that goal. All of the intermediate structure shown in Figure 5 has been lost. 3 Because of this compilation, schemata provide a computationally efficient way to produce multisentential texts for achieving discourse purposes. They are &amp;quot;recipes&amp;quot; of rhetorical actions that encode frequently occurring patterns of discourse structure. Using schemata, the system need not reason directly about how speech acts affect the beliefs of the hearer and speaker, nor about the effects of juxtaposing speech acts. The system is guaranteed that each schema will lead to a coherent text that achieves the specified discourse purpose.</Paragraph>
    <Paragraph position="3"> However, this compilation is also a disadvantage. If the hearer does not understand the utterance produced by a schema, it is very difficult for the system to recover. In the example above, without understanding why the COMPARE &amp; CONTRAST schema and IDENTIFICATION predicate are present in the instantiated structure, the system cannot determine how to proceed if the user does not immediately understand and accept the system's utterance. In the sample text under consideration, the COMPARE &amp; CONTRAST occurs as part of the top-level schema in order to persuade the hearer to perform the recommended replacement action. This is represented in the intentional representation of Figure 5 by intentions 12 and 13 and the dominance relationship dominates(I2, I3).  well as the rhetorical methods and speech acts that are used to achieve the intentions. Figure 5 shows only the intentional structure.</Paragraph>
    <Paragraph position="4">  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs If the text produced by the COMPARE &amp; CONTRAST portion of the schema fails to persuade the hearer, the speaker should try other strategies for achieving this intention. For example, the speaker may paraphrase the reasoning that led to recommending this act, cite the advice of experts, or invoke authority. However, because information about intentions has been &amp;quot;compiled out&amp;quot; of the schema representation, the system cannot recover because it cannot determine which goal failed or what other linguistic strategies can be used to achieve the goal.</Paragraph>
    <Paragraph position="5"> Note that the problem stems from the fact that, in general, there is not a one-to-one mapping from intentions to schemata or rhetorical predicates. For example, the COMPARE &amp; CONTRAST schema can be used to persuade the hearer to perform a replacement action as in the example above, but it could also be used for several other purposes. This schema could be used to identify the differences between two objects so that the hearer can discriminate one from the other. It could also be used in a strategy where a new concept is defined by comparing it to a known concept. Thus, simply knowing that a COMPARE &amp; CONTRAST schema (or predicate) appears in a text does not tell the system why that COMPARE &amp; CONTRAST appears. As a result, if it fails to achieve its intended effect, the system cannot determine how to recover. To find alternative strategies, the system must know what goal it was trying to achieve! Rigidity of Schemata. A second problem with schemata is that they are too rigid. In the naturally occurring dialogues we analyzed, we have often observed what appear to be opportunistic effects. One such effect we identified is the tendency of the explainer to define a concept immediately after mentioning that concept in the explanation. One possible reason for this is that the explainer only realizes that the hearer may not know that concept after a sentence including it has been planned or uttered. Recall that this occurred in the teacher-student dialogue in Figure 1 (turn 8). As another example, consider the following explanation given by a doctor when asked about the possible ways to treat migraine (italics indicate our present concern, not spoken emphasis). 4 Some of the possible ways that I approach migraine would be, depending on the frequency of your headaches, we would determine which approach I would recommend. \[... \] So, for example, say that you told me that you had three to four headaches \[... \] in the month, what I would recommend with that frequency is that you should be on something prophylactically. Prophylactically basically means preventing the headache from occurring before it actually starts. If you had infrequent headaches, maybe several times a year, \[... \] then I would recommend more something abortive. That means that when the headache came on, I would treat you at that point. I would rather, to help prevent side effects from you having to take a medicine on a daily basis, just try to abort them, if they were infrequent.</Paragraph>
    <Paragraph position="6"> Because a new term can be introduced in virtually any statement, one could only handle this phenomenon in the schema-based approach by incorporating an optional IDENTIFICATION predicate for every term mentioned in the previous predicate after every entry in every schema. Note that this would be inefficient because if these predicates appear in the schema, the system must consider them (i.e., search for instantiations in 4 This example is taken from transcripts gathered by Claudia Tapia and Johanna Moore at the University of Pittsburgh.</Paragraph>
    <Paragraph position="7">  Computational Linguistics Volume 19, Number 4 the knowledge base and invoke the focusing mechanism) for every additional predicate added to the schema. 5 We believe that a more elegant and efficient approach is to use planning techniques that permit new goals to be posted as they arise and to be worked into an evolving text plan according to rules of discourse as represented in plan operators. The framework that we propose in this paper handles this type of opportunistic planning in a limited way. Suthers (1991) handles a wider array of opportunistic effects using data-driven plan critics to decide when additional information should be included.</Paragraph>
    <Section position="1" start_page="659" end_page="659" type="sub_section">
      <SectionTitle>
3.3 Requirements for a Text Planner
</SectionTitle>
      <Paragraph position="0"> Like others, e.g., Levy (1979) and Appelt (1985), we take the view that speakers have goals to affect the mental states of their hearers and must choose from among the linguistic resources available to them the ones that will satisfy these goal(s). We argued that an approach to text planning that attempts to reason directly about how speech acts can be combined into coherent multisentential texts to achieve a speaker's intentions is likely to be computationally infeasible. However, our discussion of schemata pointed out that we must be careful about what and how much is compiled out of our representations for the sake of efficiency. While we believe the schema-based approach to be correct in spirit, we think that too much information has been lost in schemata composed simply of rhetorical predicates.</Paragraph>
      <Paragraph position="1"> Ideally, what we desire is an approach that: records enough information about the system's intentions and how those intentions were achieved, so that the system can reason about its own previous utterances to determine how to recover when its explanations are not understood, and is capable of producing texts in a computationally efficient manner.</Paragraph>
      <Paragraph position="2"> To achieve these goals, we propose an approach to text planning in which plan operators encode knowledge about how intentions may be achieved via a set of techniques commonly found in natural discourse. Thus we require a plan language that links intentions to the rhetorical means for achieving them.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="659" end_page="667" type="metho">
    <SectionTitle>
4. Linking Intentions and Rhetorical Structure
</SectionTitle>
    <Paragraph position="0"> Two theories of discourse structure that make the connection between rhetorical relations and speaker intentions have been proposed: Hobbs' (1979, 1983, 1985) theory of coherence relations, and Mann and Thompson's (1988) Rhetorical Structure Theory.</Paragraph>
    <Paragraph position="1"> As we will see, Rhetorical Structure Theory can be adapted to a computational model in a fairly natural way, and in fact there is an implemented prototype of the theory (Hovy 1991). However, the straightforward operationalization that is used in this prototype suffers from a fundamental problem for our purposes. After discussing the two theories that link intention to rhetorical structure, we discuss this problem in detail.</Paragraph>
    <Paragraph position="2"> In Section 5, we describe how this problem may be solved and present a text planner that implements this solution.</Paragraph>
    <Paragraph position="3"> 5 It is also unclear how we could represent this opportunistic strategy in the schema-based approach, since it is only when &amp;quot;unpacking&amp;quot; complex concepts such as assign-value-to-generalized-variable that the system recognizes that it will introduce the terms assign, value and generalized-variable. See Moore and Paris (1992) for more discussion about this topic.</Paragraph>
    <Paragraph position="4">  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs</Paragraph>
    <Section position="1" start_page="660" end_page="662" type="sub_section">
      <SectionTitle>
4.1 Theoretical Background
</SectionTitle>
      <Paragraph position="0"> Hobbs characterizes coherence in terms of a set of binary coherence relations between a current utterance and the preceding discourse. He identified four reasons why a speaker breaks a discourse into more than one clause and classified the relations accordingly. For example, if a speaker needs to connect new information with what is already known by the hearer, the speaker chooses one of the linkage relations, such as BACKGROUND or EXPLANATION. As another example, when a speaker wishes to move between specific and general statements, he or she must employ one of the expansion relations, such as ELABORATION or GENERALIZATION. According to Hobbs, how the speaker chooses to continue a discourse is equivalent to deciding which relation to employ. From the hearer's perspective, understanding why the speaker continued as he or she did is equivalent to determining what relation was used.</Paragraph>
      <Paragraph position="1"> Hobbs originally proposed coherence relations as a way of solving some of the problems in interpreting discourse, e.g., anaphora resolution (Hobbs 1979). He defines coherence relations in terms of inferences that can be drawn from the propositions asserted in the items being related. For example, Hobbs (1985, p. 25) defines ELABORATION as follows: ELABORATION: S 1 is an ELABORATION of So if the hearer can infer the same proposition P from the assertions of So and St.</Paragraph>
      <Paragraph position="2"> Here $1 represents the current clause or larger segment of discourse, and So an immediately preceding segment. $1 usually adds crucial information, but this is not part of the definition, since Hobbs wishes to include pure repetitions under ELABORATION.</Paragraph>
      <Paragraph position="3"> Hobbs' theory of coherence is attractive because it relates rhetorical relations to the functions that speakers wish to accomplish in a discourse. Thus, Hobbs' theory could potentially tell a text planner what kind of rhetorical relation should be used to achieve a particular goal of the speaker. For example, Hobbs (1979) notes two functions of ELABORATION. One is to overcome misunderstanding or lack of understanding, and another is to &amp;quot;enrich the understanding of the listener by expressing the same thought from a different perspective.&amp;quot; However, note that such specifications of the speaker's intentions are not an explicit part of the formal definition of the relation. For this reason we have chosen an alternative theory of text structure, Rhetorical Structure Theory (RST) (Mann and Thompson 1988), as a basis for our text planning system.</Paragraph>
      <Paragraph position="4"> In contrast to Hobbs' coherence relations, the definition of each rhetorical relation in RST indicates constraints on the two entities being related as well as constraints on their combination, and a specification of the effect that the speaker is attempting to achieve on the hearer's beliefs or inclinations. Thus RST provides an explicit connection between the speaker's intention and the rhetorical means used to achieve those intentions. As an example, consider the RST definition of the MOTIVATION relation shown in Figure 6.</Paragraph>
      <Paragraph position="5"> As shown, an RST relation has two parts: a nucleus (N) and a satellite (S). 6 The MOTIVATION relation associates text expressing the speaker's desire that the hearer perform an action (the nucleus) with material intended to increase the hearer's desire to perform the action (the satellite). For example, in the text below, (1) and (2) are related by MOTIVATION:  (1) Come to the party for the new President. (2) There will be lots of good food.</Paragraph>
      <Paragraph position="6"> 6 This is an oversimplification. In fact, there are a small number of RST relations, e.g., SEQUENCE and 30INT, that are multinuclear and can relate more than two pieces of text.</Paragraph>
      <Paragraph position="7">  Computational Linguistics Volume 19, Number 4 relation name: MOTIVATION constraints on N: Presents an action (unrealized with respect to N) in which the Hearer is the actor.</Paragraph>
      <Paragraph position="8"> constraints on S: none constraints on N + S combination: Comprehending S increases the Hearer's desire to perform the action presented in N.</Paragraph>
      <Paragraph position="9"> effect: The Hearer's desire to perform the action presented in N  Graphical representation of an RST schema.</Paragraph>
      <Paragraph position="10"> The nucleus of the relation is that item in the pair that is most essential to the writer's purpose. In the example above, assuming that the writer's intent is to make the hearer go to the party, clause (1) is nuclear. In general, the nucleus could stand on its own, but the satellite would be considered a non sequitur without its corresponding nucleus. In our example, without the recommendation to come to the party the satellite in (2) is out of place. Moreover, RST states that the satellite portion of a text may be replaced without significantly altering the intended function of the text. The same is not true for the nucleus. For example, replacing (2) above with: (2') All the important people will be there.</Paragraph>
      <Paragraph position="11"> does not greatly change the function of the text as a whole. However, replacing the recommendation in the nucleus, e.g., (1') Don't go to the party.</Paragraph>
      <Paragraph position="12"> significantly alters the purpose of the text.</Paragraph>
      <Paragraph position="13"> RST relations may be combined into schemata that define how a text can be broken down into smaller units. Each schema contains a nucleus and zero or more satellites related to the nucleus by one of the RST relations. RST schemata do not constrain the  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs ordering of the nucleus and satellites, and relations may occur any number of times within a schema. Furthermore, the schemata are recursive; text serving as a satellite in one schema may itself consist of a nucleus and any number of satellites. A graphical depiction of one schema defined by Mann and Thompson (1988) appears in Figure 7. This schema consists of a nucleus and two satellites: one providing MOTIVATION for the material in the nucleus, and the other providing ENABLEMERT for the material in the nucleus.</Paragraph>
    </Section>
    <Section position="2" start_page="662" end_page="664" type="sub_section">
      <SectionTitle>
4.2 Using RST in Text Generation
</SectionTitle>
      <Paragraph position="0"> Although originally intended for generation, until recently RST was used primarily as a tool for analyzing texts in order to investigate various linguistic issues. The analyst breaks the text down into parts called text spans and then tries to find an RST relation that connects each pair of spans until all pairs are accounted for. To determine whether or not a relation holds between two spans of text, the analyst checks to see whether the constraints on the nucleus and satellite hold and whether it is plausible that the writer desires the condition specified in the Effect field. All of the text is given. One need only determine whether or not the constraints are satisfied.</Paragraph>
      <Paragraph position="1"> Using RST in a constructive process, such as generating a response to a question, is a very different task. For example, in order to produce text that will succeed in getting the hearer to perform an action, a system must determine what (if any) information in its knowledge base could be used to increase the hearer's desire to perform the action (MOTIVATION), what information could be used to increase the hearer's ability to perform the action (ENABLEMENT), how much of this information to present, in what order, at what level of detail, etc. Moreover, the theory states that the nucleus and satellite portions of a text may occur in any order, relations may occur any number of times, and a nucleus or satellite may be expanded into a text employing any other relation at any point. In order to use RST, a text generation system must have control strategies that dictate how to find such knowledge in the knowledge base, when and what relations should occur, how many times, and in what order.</Paragraph>
      <Paragraph position="2"> Mann (1984) suggested that goal pursuit methods used in artificial intelligence could be applied to RST for text generation. Schemata can be viewed as means for achieving the goals stated as their effects, and the constraints on relations as a kind of precondition to using a particular schema. However, much work must be done to formalize the constraints and effects of the RST relations and schemata in order to use RST in a text generation system.</Paragraph>
      <Paragraph position="3"> One attempt at formalization was made by Hovy (1991), who operationalized a subset of the RST relation definitions for use as plan operators in a text structuring process. Hovy's structurer employs a top-down planning mechanism to order a given set of input elements into a coherent text. To form plan operators from RST relation definitions, Hovy maps the intended effect of the relation to the l~esults field of the corresponding operator. In Hovy's system, the contents of the Results field are viewed as the communicative goal(s) that may be achieved by using the associated relation, and the relation name as the rhetorical strategy that achieves the goal(s).</Paragraph>
      <Paragraph position="4"> The constraints on the nucleus, satellite, and their combination that are specified in the relation definition become subgoals in Hovy's operators. The relation name is also included in the plan operator (and in the evolving plan) so that appropriate connectives can be inserted in the final text.</Paragraph>
      <Paragraph position="5"> Figure 8 shows the RST relation definition for the CIRCUMSTANCE relation, and Figure 9 shows Hovy's characterization of this relation as a plan operator. Note from this figure that-Hovy's operator includes fields called growth points. These post optional subgoals, which will be expanded if information satisfying these goals appears in the  Computational Linguistics Volume 19, Number 4 relation name: CIRCUMSTANCE constraints on N: none constraints on S: presents a situation (not unrealized) constraints on N + S combination: S sets a framework in the subject matter within which Hearer is intended to interpret the situation presented in N.</Paragraph>
      <Paragraph position="6"> effect: The Hearer recognizes that the situation presented in S provides the framework for interpreting N.</Paragraph>
      <Paragraph position="7">  Hovy's RST plan for CIRCUMSTANCE (from Hovy \[1991\] p. 90).</Paragraph>
      <Paragraph position="8"> set of input items to be expressed. As Hovy (1991, p. 94) points out, RST operators with growth points can be viewed as schemata, and this is how they are used in his implementation. He also argues that by viewing growth points as &amp;quot;suggestions&amp;quot; rather than &amp;quot;injunctions,&amp;quot; the RST operators can be used in a more &amp;quot;open-ended&amp;quot; approach to planning. Hovy (1991, p. 98) goes on to suggest a range of criteria that a planner might use to determine when additional material should be included/excluded, but these have not been implemented.</Paragraph>
      <Paragraph position="9"> The operator in Figure 9 says that in order to achieve the state where the speaker believes that the hearer and speaker mutually believe that ?C/IRC is a circumstance of some event ?X, the speaker can state ?X (this will be the result of posting the nucleus subgoal (BMB SPEAKER HEARER (TOPIC ?X))), and then state the circumstantial information (this will be the result of posting the satellite subgoal (BMB SPEAKER HEARER  (TOPIC ?CIRC))). The constraints on the Nucleus + Satellite check that whatever is bound to the variable ?CIRC either stands in a HEADING.R or TIME.R relation to the event bound to ?X. 7 Using this operator, Hovy's structurer can generate portions of 7 This particular operator was used in a naval application where the main events were ship movements  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs text such as: Knox is en route to Sasebo. Knox is heading SSW.</Paragraph>
      <Paragraph position="10"> The second sentence (the satellite) stands in a CIRCUMSTANCE relation to the first sentence (the nucleus). Typically, this text would be only a portion of a larger paragraph.</Paragraph>
    </Section>
    <Section position="3" start_page="664" end_page="667" type="sub_section">
      <SectionTitle>
4.3 A Problem with the Straightforward Operationalization of RST
</SectionTitle>
      <Paragraph position="0"> Above we have argued that, in order for a system to be able to participate in dialogues, it must have an explicit representation of the intentional structure of its own utterances.</Paragraph>
      <Paragraph position="1"> Further, we have seen that RST provides a link between intentions and rhetorical relations, and that RST can be adapted in a straightforward manner for use in a text-structuring task by encoding the specification of the intended effect of an RST relation as the goal that the plan operator can be used to achieve, and the constraints on relations as the subgoals that must be satisfied.</Paragraph>
      <Paragraph position="2"> However, in our efforts to use RST to construct a text plan that includes both the intentions of individual segments of the text and an indication of the rhetorical relations between segments, we found such an operationalization to be inadequate in the general case. More specifically, we found that there is an important distinction between two previously identified classes of RST relations that must be taken into account when attempting to use RST for text planning. Mann and Thompson (1988, pp. 256-257) break the RST relations into two classes: presentational and subject matter. Presentational relations are those whose intended effect is to increase some inclination in the hearer, such as the desire to act (MOTIVATION), or the degree of positive regard for (ANTITHESIS), the degree of belief in (EVIDENCE), or the degree of acceptance of (JUSTIFY) the information presented in the nucleus. In contrast, the intended effect of a subject matter relation is that the hearer recognize that the relation in question holds &amp;quot;in the subject matter.&amp;quot; For example, VOLITIONAL-CAUSE relates two text spans if the speaker intends the hearer to recognize that the situation presented in the satellite is a cause for the volitional action presented in the nucleus.</Paragraph>
      <Paragraph position="3"> The Effects of the presentational relations can be adopted in a straightforward manner as intentions for plan operators. However, the Effects of the subject matter relations are not sufficient for representing speakers' intentions. The Effects of subject matter relations capture the speaker's intention to make the hearer understand a piece of subject matter, but do not indicate why the speaker is presenting this information.</Paragraph>
      <Paragraph position="4"> Consider again the CIRCUMSTANCE relation in Figure 8. The specified effect of this relation is that the hearer knows some circumstance of the situation presented in the nucleus. As we have seen from Hovy's work, this relation is typically used to provide information about the time of an event, the location of an object, etc. But in order to participate in a dialogue, the system must know more than the fact that it intended to convey the time of an event or location of an object: it must know why it intended to convey that. We illustrate the problem with an example.</Paragraph>
      <Paragraph position="5"> Consider the hypothetical fragment of task-oriented dialogue shown in Figure 10, in which a system is instructing a user in how to take apart a physical device. The system tells the user to remove the cover. In order to increase the user's ability to perform the act, the system employs an ENABLEMENT relation that tells the user what tool to use. Then, in order to help the user find the appropriate tool, the system tells the user where the tool is located, a CIRCUMSTANCE. Now, suppose that we have a text planner and the circumstances that were important to report were heading and time. The operator could be generalized to a wider range of circumstances.</Paragraph>
      <Paragraph position="6">  both been recorded on the arcs representing the relations.</Paragraph>
      <Paragraph position="7"> Now, let us consider how to respond to the user's &amp;quot;No&amp;quot; in the sample dialogue.</Paragraph>
      <Paragraph position="8"> Clearly the user is indicating an inability to locate the Phillips screwdriver, i.e., either she or he is unable to identify the referent of the term &amp;quot;Phillips screwdriver,&amp;quot; or there is no Phillips screwdriver in the toolbox. For the sake of this example, let us assume that the hearer is unable to identify the referent. In terms of building a system that can participate in such dialogues, there are two problems with the representation shown in Figure 11. First, as the user indicates that he or she cannot identify the Phillips screwdriver, it is clear that the portion of the text that failed is the part that attempts to uniquely identify the Phillips screwdriver, namely the text generated in the CIRCUMSTANCE satellite: It's in the top drawer of the toolbox. However, it is difficult for a system to determine that this is the portion of text that failed, since the only intention represented is that the system wants the hearer to know that the screwdriver being in the toolbox is a circumstance of the Phillips screwdriver. The system has no representation of why this bit of circumstantial information is being conveyed.</Paragraph>
      <Paragraph position="9"> Second, even if the system could identify this as the offending portion of the plan tree, it is very limited in terms of recovery strategies. In this case, the system's only recourse is to try to find different ways of achieving the subgoal (BEL USER (CIRCUMSTANCE-0F PHILLIPS ?CIRC)), i.e., other operators that have this effect but post different subgoals, or by finding different pieces of information that satisfy the constraints on the operator of Figure 9, i.e., alternative bindings for the variable ?CIRC,  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs which is currently bound to (IN-LOCATION PHILLIPS DRAWER-I). But it is likely that the location of the Phillips screwdriver is the only piece of circumstantial information relevant to identifying the screwdriver. 8 So presenting other circumstantial information that may be stored in the knowledge base (e.g., when it was purchased, or how long it has been in the toolbox) is almost certainly not relevant and should not be mentioned here. But how can the system tell what is relevant without knowing why circumstantial information is being presented? What is missing from the representation in Figure 11 is a critical piece of information indicating that the speaker is presenting the circumstantial information because she or he intends this information to allow the hearer to identify the tool. (We could denote this intention as (KNOW USER (REF PHILLIPS))).</Paragraph>
      <Paragraph position="10"> One may argue that the system could recover by trying other operators that implement the ENABLEMENT relation. But again, there are two problems. First, there may not be any other such operators whose constraints are satisfied in this situation. That is, perhaps the only way to remove the cover without damaging it is to use a Phillips screwdriver. Second, even if there were other operators for implementing the ENABLEMENT relation, the system is not behaving intelligently. It will never be able to figure out that it can recover by telling the user some attributive information about the object it is trying to identify, e.g., The Phillips screwdriver has a yellow handle and a cone-shaped head. This is because such a text would be produced by an ELABORATION-0BJECT-ATTRIBUTE relation, not a CIRCUMSTANCE relation. Again, this problem arises because the representation lacks the crucial piece of intentional information: (KNOW USER (REF PHILLIPS)), as well as the information that once the Phillips screwdriver has been mentioned, the rhetorical continuations that can be used to help achieve the intention (KNOW USER (REF PHILLIPS)) are CIRCUMSTANCE, ELAB-ORATION-OBJECT-ATTRIBUTE, CONTRAST (one could identify an object by contrasting it with an object known to the user), etc.</Paragraph>
      <Paragraph position="11"> The argument ultimately hinges on the observation that, in the case of the subject matter RST relations, the mapping between intentions and rhetorical relations is not a one-to-one mapping. In fact, the mapping is many-to-many. This is similar to the argument we made earlier with respect to schemata of rhetorical predicates. Because the mapping of intentions to rhetorical predicates is not one-to-one, intentions cannot be recovered from a record of the rhetorical predicates used to produce a text.</Paragraph>
      <Paragraph position="12"> With respect to subject matter relations, we have just shown that an intention such as (KNOW USER (REF PHILLIPS)) may be achieved by a variety of different rhetorical relations. Similarly, a given subject matter relation may be used in service of several different intentions. For example, the CIRCUMSTANCE relation can be used when the speaker wants the hearer to identify the referent of a description (&amp;quot;It's in the toolbox.&amp;quot;) or know a precondition for an action (&amp;quot;You want flight I01. It leaves at 8:00 pm.') In addition, the CIRCUMSTANCE relation is used when the speaker wants the hearer to know how entities are temporally or spatially related. (&amp;quot;He volunteered as a classical music announcer. That was in 1970. He left to go to graduate school. That would've been 1973.&amp;quot;) Contrast this with the presentational RST relations where the mapping between intention and rhetorical relation is a one-to-one mapping. For example, if the speaker's goal is to &amp;quot;increase the hearer's desire to perform the action presented in the nucleus,&amp;quot; then whatever text is used to achieve this goal, it stands in a MOTIVATION relation to the nucleus.</Paragraph>
      <Paragraph position="13"> 8 This is because the knowledge base search originally retrieved this as the information that is most relevant to helping this hearer identify the Phillips screwdriver.  Computational Linguistics Volume 19, Number 4 Because of this dichotomy between the two classes of RST relations, we conclude that any approach to discourse structure that relies solely on rhetorical relations or predicates and does not explicitly encode information about intentions is inadequate for handling dialogues. Hovy's (1991) approach suffers from this problem. 9 Moreover, as Moore and Pollack (1992) argue, a straightforward approach to revising such an operationalization of RST by modifying subject matter operators to indicate associated intentions cannot succeed. Such an approach &amp;quot;presumes a one-to-one mapping between the ways in which information can be related and the ways in which intentions combine into a coherent plan to affect a hearer's mental state.&amp;quot; We have just shown examples indicating that no such mapping exists.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="667" end_page="672" type="metho">
    <SectionTitle>
5. A Text Planner for Advisory Dialogues
</SectionTitle>
    <Paragraph position="0"> In this section we present a text planner that constructs explanations based on the intentions of the speaker at each step and the linguistic means available for realizing these intentions. Given a goal representing the speaker's intention, our planner finds the linguistic resources available for achieving that goal. These linguistic resources can be either speech acts, which are directly satisfiable, or rhetorical strategies indicating how what can be said next relates to what has already been said. While planning, our system records both the intentions behind text spans and the rhetorical relation that holds between each two text spans.</Paragraph>
    <Paragraph position="1"> This approach improves upon Hovy's work in two ways. First, as we have seen, Hovy's operationalization of RST relations into plan operators conflates intentional and rhetorical structure. In contrast, our plan language preserves an explicit representation of both intentional and rhetorical knowledge, and thus is more suitable for participating in dialogues. Second, our text planning system is able to select the content to include in its explanations in addition to structuring that content. Recall that Hovy's system is given a set of input elements to express. Like Appelt (1985, pp. 6-10), we believe that the tasks of choosing what to say and choosing a strategy for saying it cannot be divided. They are intertwined and influence one another in many ways.</Paragraph>
    <Paragraph position="2"> What knowledge is included in a response greatly depends on the speaker's intention and the linguistic strategy chosen to achieve it. For example, the information to be included when describing an object by drawing an analogy with a similar object will be quite different from the information to be included in describing the object by discussing its components. At the same time, which strategy is chosen to satisfy an intention must depend on what knowledge is available. For example, whether the system chooses to draw an analogy will depend on whether an analogous concept familiar to the hearer is available in the system's knowledge sources. In our text planner, decisions are made locally each time alternative strategies for achieving a (sub)goal are considered. The content of the text is not selected a priori.</Paragraph>
    <Section position="1" start_page="667" end_page="668" type="sub_section">
      <SectionTitle>
5.1 The Plan Language
</SectionTitle>
      <Paragraph position="0"> To enable our planner to construct text plans that explicitly capture the intentional and rhetorical structures of the text it produces, we distinguish two types of goals:  * Communicative goals. These represent the speaker's intentions to affect the beliefs or goals of the hearer. They are denoted as states, such as the 9 To be fair, the goal of Hovy's work was to show that RST could be used to order a set of input propositions into a coherent text. He did not set out to provide a system that could participate in a dialogue.  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs state in which the hearer believes a certain proposition, has a goal to perform an action, or has the knowledge necessary to perform an action.</Paragraph>
      <Paragraph position="1"> The presence of a communicative goal in a text plan does not cause any text to be generated directly. Achieving goals of this type leads to the posting of linguistic goals.</Paragraph>
      <Paragraph position="2"> Linguistic goals. These correspond to the linguistic means available to speakers for achieving their communicative goals. They lead to the generation of text and are of two types: speech acts and rhetorical goals. In our system, we assume that speech acts, such as INFORM or RECOMMEND, can be straightforwardly mapped into utterances that form part of the final text. 1deg They are considered primitives by the text planner. Rhetorical goals, such as MOTIVATION and CIRCUMSTANCE, cannot be achieved directly and must be refined into one or more subgoals. The strategies for achieving them may post further communicative goals or speech act goals in order to express satellite information in a text. They are intended to establish rhetorical links between text spans, and often cause connective markers to be generated in the final text.</Paragraph>
      <Paragraph position="3"> We distinguish these two types of goals for two reasons. First, the distinction is necessary to handle the many-to-many mapping between intentions and rhetorical strategies for achieving them. Table 1 summarizes the mapping between intentions and rhetorical relations we have identified in generating the texts necessary for our application. These mappings have been encoded in our library of plan operators. In this table, the presentational RST relations appear above the double line. As shown, there is a one-to-one mapping between these relations and speaker intentions. However, note that for the subject-matter relations that appear below the double line, there is not a one-to-one mapping between rhetorical relations and speaker's intentions. In general, there may be many different rhetorical strategies for achieving any given intention. For example, as discussed above and indicated in Table 1, the intention (KNOW ?Ix (REF ?object-descriptor)) can be achieved by telling the hearer circumstantial information about the object (CIRCUMSTANCE), by contrasting the object with an object known to the user (CONTRAST), or by telling the hearer some of the attributes or parts of the object (ELABORATION-0BJECT-ATTRIBUTE or ELABORATION-WHOLE-PART respectively). Moreover, the mapping in Table I makes it clear that a particular rhetorical strategy may serve many intentions. For example, CONTRAST may be used to identify the referent of a description, to define a new term, to identify the differences between entities, to make the hearer believe that a method is the best one for achieving a domain goal, etc. As we will see, operators in our plan language achieve intentions by posting rhetorical subgoals and/or speech acts.</Paragraph>
      <Paragraph position="4"> The second reason for separately and explicitly representing the two types of goals is so that the completed plan structure will contain an explicit representation of the speaker's intentions as well as a record of the speech acts and rhetorical strategies used to achieve them. As we have argued above, it is essential that the intentional structure of a text be recorded so that the system may respond to the user's follow-up questions. In addition, having the rhetorical structure explicitly noted in the text plan allows the text generator to include discourse markers in the final text in a straightforward manner. Such markers enhance the coherence of the text and aid the reader in understanding the text as a whole, by helping him or her understand how  the parts of the text relate to one another (Brewer 1980; Cahour, Falzon, find Robert 1990; Ehrlich and Cahour 1991; Goldman and Dur~n 1988; Levy 1979; Meyer, Brandt, and Bluth 1980). 11 In a system such as ours, sets of connectives are associated with each rhetorical relation, and one can be chosen based on features of the final text being produced (e.g., whether or not a sentence boundary occurs between nucleus and satellite, etc.). The system need not reason directly about the relationships between the effects of speech acts to determine a suitable connective a process we believe to be much more computationally intensive.</Paragraph>
    </Section>
    <Section position="2" start_page="668" end_page="672" type="sub_section">
      <SectionTitle>
5.2 Representing Plan Operators
</SectionTitle>
      <Paragraph position="0"> Our plan language provides operators for achieving the two types of goals presented in the previous section. Each operator consists of: * an effect: a characterization of the goal that this operator can be used to achieve. This may be a communicative goal, such as The speaker intends to 11 Note that rhetorical structure is not the only source of discourse markers. They may be used to mark shifts in attentional structure, discourse segment boundaries, or aspects of the exchange structure in interactive discourse (Grosz and Sidner 1986; Redeker 1990; Schiffrin 1987).</Paragraph>
      <Paragraph position="1">  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs achieve the state in which the hearer believes a proposition or a linguistic goal, such as Establish motivation between an act and a goal or Inform the user of a proposition.</Paragraph>
      <Paragraph position="2"> * a constraint list: a list of conditions that should be true in order for the operator to have the intended effect. Constraints may refer to facts in the system's domain knowledge base, information in the user model, information in the dialogue history, or information about the evolving text plan.</Paragraph>
      <Paragraph position="3"> * a nucleus: a subgoal that is most essential to achievement of the operator's effect. Every operator must contain a nucleus.</Paragraph>
      <Paragraph position="4"> * satellites: additional subgoal(s) that may contribute to achieving the effect of the operator. An operator can have zero or more satellites.</Paragraph>
      <Paragraph position="5"> When present, satellites may be required or optional. Unless otherwise indicated, a satellite is assumed to be required.</Paragraph>
      <Paragraph position="6">  in the plan language, we introduce the representational primitives used to express the knowledge contained in plan operators. The 12 predicates listed here were sufficient to represent the communicative goals that were needed to produce responses to the range of questions handled by the PEA system. Development of other application systems in a range of domains using the text planner described here is underway, and experience with these systems will help us determine whether this set of mental states is sufficient or must be extended (Carenini and Moore 1993; Rosenblum and Moore 1993).</Paragraph>
      <Paragraph position="7"> We express information about agents' mental states in the following terms. In the notation below, constants are denoted by symbols written in all uppercase letters, e.g., DO, and typed variables are denoted by lowercase symbols beginning with a &amp;quot;?', e.g., ?agent.</Paragraph>
      <Paragraph position="8"> 1. (KN0W-ABOUT ?agent (CONCEPT ?c)): The agent knows of the existence of the concept ?c. This does not imply that the agent knows any particular properties of the concept, its subconcepts, the instances of this concept, or how this concept is used in problem solving.</Paragraph>
      <Paragraph position="9">  2. (KNOW ?agent (REF ?description)): The agent can identify the real world entity described by ?description.</Paragraph>
      <Paragraph position="10"> 3. (BEL ?agent (?predicate ?el ?e2)): The agent believes that the two-place predicate holds between entities ?el and ?e2.</Paragraph>
      <Paragraph position="11"> 4. (BEL ?agent (SOMEREF (?predicate ?argl ?arg2 ... ?argN-1))): The agent believes that there exists some entity(ies) satisfying the N-ary predicate, i.e., there is some referent of this N-ary predicate.</Paragraph>
      <Paragraph position="12"> 5. (BEL ?agent (REF (?predicate ?argl ?arg2 ...?argN-1) ?referent)): The agent believes that the referent satisfies the N-ary predicate. This notation is used when N &gt; 2. When N = 2, we use the notation shown in 3 above.</Paragraph>
      <Paragraph position="13"> 6. (GOAL ?agent ?goal): The agent has adopted the specified domain goal.  In this expression, ?goal must be a nonprimitive domain action, i.e., a goal that requires further refinement before it can be achieved.</Paragraph>
      <Paragraph position="14">  Computational Linguistics Volume 19, Number 4 7. (GOAL ?agent (D0 ?agent ?action)): ?agent has adopted a goal to perform the specified action. The action must be a primitive in the domain.</Paragraph>
      <Paragraph position="15"> 8. (PERSUADED ?agent (DO ?agent ?act)): The agent is persuaded to perform the action at some unspecified time in the future. This is how we have chosen to represent the RST effect Increase the hearer's desire to perform an action.</Paragraph>
      <Paragraph position="16"> 9. (PERSUADED ?agent (ACHIEVE ?agent ?goal)): The agent is persuaded to adopt the goal.</Paragraph>
      <Paragraph position="17"> 10. (PERSUADED ?agent ?proposition): The agent is persuaded that proposition is true. This is how we have chosen to represent the RST effect Increase the hearer's belief in a proposition.</Paragraph>
      <Paragraph position="18"> 11. (COMPETENT ?agent (DO ?agent ?act)): The agent knows the information necessary to perform the primitive domain action.</Paragraph>
      <Paragraph position="19"> 12. (COMPETENT ?agent (ACHIEVE ?agent ?goal)): The agent knows a method for achieving the nonprimitive domain action ?goal.</Paragraph>
      <Paragraph position="20"> Representing Communicative and Linguistic Goals. Communicative goals represent the speaker's intention to produce a certain effect on the hearer's mental state, e.g., to make the hearer believe some proposition or adopt a goal to perform a certain action. In the plan language, communicative goals are written simply in terms of mental states of the hearer. When a goal of the form (BEL ?hearer ?proposition) appears in a text plan, it should be read as The speaker intends to achieve the state where ?hearer believes ?proposition. When (BEL ?hearer ?proposition) appears in the effect field of an operator, it indicates that the operator is capable of having this effect on the hearer's mental state.</Paragraph>
      <Paragraph position="21"> To achieve a communicative goal, operators post linguistic (rhetorical and speech act) subgoals. Rhetorical goals are of the form (relation-name argl ... argN), where relation-name is one of the relations defined in RST. An expression of this form represents a goal to establish the rhetorical relation between the entities listed as arguments. For example, (MOTIVATION REPLACE-SETQ-WITH-SETF ENHANCE-READABILITY) indicates the speaker's rhetorical goal to establish that the domain goal ENHANCE-READABILITY is motivation for the act REPLACE-SETQ-WITH-SETF.</Paragraph>
      <Paragraph position="22"> Communicative and rhetorical goals are eventually refined into primitive actions that can be executed to cause changes in the hearer's mental state. In our planning formalism, we treat speech acts as primitive. Speech act goals thus appear at the leaf nodes of text plans. There are currently four speech acts in our system: INFORM, ASK, RECOMMEND, and COMMAND.</Paragraph>
      <Paragraph position="23"> Finally, there are two special forms in the plan language: FORALL and SETQ. These may appear in the nucleus or satellite fields of plan operators. FORALL has the form (FORALL ?variable-name ?goal).</Paragraph>
      <Paragraph position="24"> A FORALL clause in a nucleus or satellite field causes an instance of the goal, ?goal, to be posted for each possible binding of the variable named by ?variable-name. ?goal may be a communicative goal, a rhetorical goal, or a speech act. Its specification must contain an occurrence of the variable named by ?variable-name. An example of this special form appears in the nucleus of the plan operator in Figure 14.</Paragraph>
      <Paragraph position="25">  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs The SETQ special form is expressed as: (SETQ ?variable-name ?expression).</Paragraph>
      <Paragraph position="26"> SETQ causes the variable named ?variable-name to be bound to the result of evaluating the expression ?expression. This special form is useful for assigning values to variables that will be used in later steps of the nucleus or satellites. An example appears in the operator in Figure 16.</Paragraph>
      <Paragraph position="27"> Representing Operator Constraints. Constraints on the user's knowledge and goals are expressed using the notation given above for representing the hearer's mental states. Any of the expressions described above can appear in the constraint list of an operator. For example, if the expression</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="672" end_page="682" type="metho">
    <SectionTitle>
(KNOW-ABOUT USER (CONCEPT STORAGE-LOCATION))
</SectionTitle>
    <Paragraph position="0"> appears in the constraint field of an operator, it indicates that in order to use the operator (without making assumptions), the hearer should be familiar with the concept STORAGE-L0CATION.</Paragraph>
    <Paragraph position="1"> Constraints on the expert system's knowledge are of the form (7predicate ?argl ... ?arg-N), where ?predicate is an N-ary predicate referring to the expert system's domain knowledge. Since the expert system's knowledge base is made up of several complex data structures, predicates are not tested by a simple unification process. Instead an access function must be written for each predicate in order to test if an instantiated predicate is true, or to find acceptable bindings when a predicate contains variables in some argument positions. The range of predicates over the domain knowledge is quite large, and therefore we will not enumerate them here. We provide English paraphrases of knowledge base constraint predicates wherever they appear in examples.</Paragraph>
    <Paragraph position="2"> Finally, constraints may refer to the dialogue history or the status of the current text plan under construction. There are currently two types of constraints in this category. First, there are constraints that indicate whether the operator can be used in nucleus or satellite position in a text plan. The clause (NUCLEUS) in the constraint field of an operator, indicates that this operator can be used to expand the nucleus branch of a text plan. Likewise, the clause (SATELLITE) indicates that the operator can be used to expand a satellite branch.</Paragraph>
    <Paragraph position="3"> Second, there are constraints on the focus of attention. We are currently using a simple implementation of local focus rules based on the work of Sidner (1979) and McKeown (1985). We have found the need for two such constraints in the operators we have encoded thus far: * (CURRENT-FOCUS ?entity): indicates that the operator can be used if ?entity is currently in focus.</Paragraph>
    <Paragraph position="4"> * (IN-POTENTIAL-F0CUS-LIST ?entity): indicates that the operator can be used if ?entity is on the list of items that have just been mentioned, and therefore could become the next focus.</Paragraph>
    <Paragraph position="5"> 5.2.20perationalizing RST Schemata. Given these notational conventions, let us consider how we operationalize an RST schema in our plan language. In general, several operators are required to represent the knowledge in an RST schema. For example, Figures 12, 14, and one of Figures 15 or 16 operationalize a portion of the RST schema  1. Recommend the act AND optionally, 2. Achieve state where the hearer is persuaded to do the act 3. Achieve state where the hearer is competent to do the act  Graphical representation of an RST schema.</Paragraph>
    <Paragraph position="6"> depicted in Figure 7, repeated in Figure 13 for the reader's convenience. The operator shown in Figure 12 says that one way of achieving the state where the hearer has the goal of performing an action is to recommend the act (the nucleus), and, optionally, to post a subgoal to achieve the state where the hearer is persuaded to perform the recommended act (the first satellite), and, optionally, to post a subgoal to achieve the state where the hearer has the knowledge necessary to perform the act (the second satellite). RECOMMEND is considered a primitive in our system, and can be mapped directly into a specification for the sentence generator. Therefore no operators are needed to achieve it.</Paragraph>
    <Paragraph position="7"> However, the system does require operators for achieving the two satellite subgoals. Figure 14 shows an operator that can be used to achieve the first satellite. Informally, this plan operator states that if an act is a step in achieving some goal(s) of  To achieve the state in which the hearer is persuaded to do an act, IF the act is a step in achieving some goal(s) of the hearer, AND the goal(s) are the most specific along any refinement path AND act is the current focus of attention AND the planner is expanding a satellite branch of the text plan THEN motivate the act in terms of those goal(s).</Paragraph>
    <Paragraph position="8"> Figure 14 Plan operator for persuading user to do an act.</Paragraph>
    <Paragraph position="9"> the hearer, the speaker can persuade the hearer to perform the action by motivating the action in terms of those goals. This plan operator thus indicates that the communicative goal of persuading the hearer to do an act can be achieved by using the rhetorical strategy MOTIVATION. This operator thus links the intention with the rhetorical means used to achieve it.</Paragraph>
    <Paragraph position="10"> Finally, the system needs an operator that can achieve a MOTIVATION subgoal. In our current operator library, there are several such operators. Two of these are shown in Figures 15 and 16. The operator in Figure 15 is very general and can be used to motivate any action in terms of a hearer goal that the act may help to achieve. It posts a subgoal to make the hearer believe that the act is a step in achieving the goal. In the simplest case, this subgoal will be refined directly into a surface speech act that informs the hearer of this fact. The operator in Figure 16 is a more specific operator and can be used only when the act to be motivated is a replacement (e.g., replace SETQ with SETF). In this case, one strategy for motivating the act is to compare the object being replaced and the object that replaces it with respect to a goal of the hearer. The three operators shown in Figures 12, 14, and 15 together form one operadeg tionalization of a portion of the RST schema shown in Figure 13. Referring back to this schema and the definition of the RST relation MOTIVATION shown in Figure 6, the reader will note that these three operators can be used to produce the nucleus and the MOTIVATION satellite portion of the schema. To complete the operationalization of the full schema shown in Figure 13, we must also provide operators to refine the second optional satellite, (COMPETENT ?hearer (DO ?hearer ?act)). For the sake of brevity, we have omitted these operators here.</Paragraph>
    <Paragraph position="11"> There are three important points to note about our operationalization. First, operators like the one shown in Figure 14 provide an explicit link between speaker intentions and the rhetorical means that achieve them. In the case of this particular operator, there is actually a one-to-one mapping between the intention (Achieve state where  Plan operator for motivating a replace act by describing differences between replacer and replacee.</Paragraph>
    <Paragraph position="12"> hearer is persuaded to do an act) and the rhetorical strategy for achieving it (MOTIVATION). However, as Table 1 shows, this is not the case for subject matter relations. By providing operators that explicitly link intentions with rhetorical strategies, our system can handle the many-to-many mapping between intentions and the communicative strategies that achieve them. So, to return to the Phillips screwdriver example, our system would have several plan operators for achieving the intention: (KNOW ?hearer (REF ?description)). One plan operator would indicate that a rhetorical strategy for achieving this goal is to tell the hearer the location of the object (CIRCUMSTANCE), another would indicate that another way to achieve the goal is to tell the hearer the identifying attributes of the object (ELABOKATION-0BJECT-ATTKIBUTE), etc.</Paragraph>
    <Paragraph position="13"> Second, operators like the one shown in Figure 14 contribute to the computational efficiency of the system. These operators encode knowledge about the rhetorical strategies that may be used to satisfy particular intentions. Other operators, e.g., the MOTIVATION operators shown in Figures 15 and 16, encode different methods for realizing these rhetorical strategies in different situations. Thus we have provided a plan language that preserves an explicit representation of the intention behind each portion of the text, while maintaining the efficiency advantages that originally motivated the use of schemata of rhetorical predicates or relations for natural language generation. Third, as illustrated by the two alternative operators for achieving a MOTIVATION subgoal shown in Figures 15 or 16, in our plan language we can represent very general strategies that are applicable across domains, as well as very specific strategies that may be necessary to handle the idiosyncratic language used in a particular domain. While one may argue that the operator in Figure 16 could be replaced by a more general, domain-independent operator, this does not obviate the need for domain-specific communication strategies. Rambow (1990) argues that domain-specific communication knowledge must be used (whether implicitly or explicitly) in all planned communica- null Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs tion, and advocates that domain communication knowledge be represented explicitly. In our plan language, some types of domain-specific communication strategies can be represented in plan operators. When there are multiple operators capable of achieving a given effect, the constraint mechanism controls which operators are deemed appropriate in a given context. Note, however, that we do not wish to claim that a top-down planning formalism that maps speakers' intentions to rhetorical and speech acts can or should be used to generate all text types. In particular, Kittredge, Korelsky, and Rambow (1991) have shown that RST cannot account for the structure of report texts. Moreover, they argue that report generation does not require reasoning about the speaker's intentions to affect the mental attitudes of the hearer. For report generation, they advocate a data-driven approach in which domain communication knowledge plays a central role.</Paragraph>
    <Section position="1" start_page="676" end_page="677" type="sub_section">
      <SectionTitle>
5.3 Constraints Integrate Multiple Sources of Knowledge
</SectionTitle>
      <Paragraph position="0"> Operators contain applicability constraints that specify the knowledge that must be available and the state of the text-planning process that must exist if the operator is to be used. These constraints integrate multiple sources of knowledge; they may refer to the expert system's knowledge bases, the user model, the dialogue history, or the evolving text plan. To our knowledge, our text planner is unique in its explicit representation of constraints from all of these knowledge sources.</Paragraph>
      <Paragraph position="1"> It is important to recognize that in our formalism, constraints do more than just limit the applicability of an operator. They also specify the type of knowledge to be included in an explanation, so that the process of satisfying constraints causes the planner to find information that will be included in the text. Thus, the selection of information to be presented and the determination of how to present that information are truly integrated in our planning model.</Paragraph>
      <Paragraph position="2"> For example, when attempting to apply the operator shown in Figure 14, the variables ?act and ?hearer will be bound. What is not yet determined is which domain goals should be used to motivate the act. Checking to see if the first three constraints on this operator are satisfied causes the planner to search its knowledge sources to find acceptable bindings for the variable ?goal. The first constraint, (STEP ?act ?goal), says that there must be some domain goal(s) that the act is a step in achieving. Satisfying this constraint requires the planner to search the expert system's domain planning knowledge for such goals. The second constraint, (GOAL ?hearer ?goal), further specifies that any such domain goals must be goals of the hearer (user). To check this constraint, the system must inspect the user model. The third constraint requires that ?goal be the most specific goal along any refinement path satisfying the first two constraints.</Paragraph>
      <Paragraph position="3"> The last two constraints of the operator in Figure 14 refer to the evolving text plan. The fourth constraint, (CURRENT-FOCUS ?act), says that this operator can be used when the act is the current focus of attention. The fifth and final constraint, (SATELLITE), says that this operator can only be used when the planner is working on a satellite branch of the current text plan.</Paragraph>
      <Paragraph position="4"> In our system, constraints are treated differently depending on the knowledge source to which they refer. We consider the expert system's domain knowledge to be complete and correct. Therefore, constraints referring to the expert system's knowledge bases are considered &amp;quot;rigid.&amp;quot; If any one of these constraints fails to be satisfied, the operator is rejected from consideration immediately. In contrast, we do not assume that the system's model of the user is either complete or correct. Thus, constraints referring to the user's knowledge state are treated more loosely. They specify what the user should know in order to understand the text that will be produced by the oper- null Computational Linguistics Volume 19, Number 4 ator. However, when selecting an operator, if the user model contains no information relevant to a particular constraint, the planner may simply assume that the constraint is satisfied. When it does so, the assumption is recorded in the plan structure so that this assumption can be questioned if the explanation is not understood.</Paragraph>
      <Paragraph position="5"> In the current implementation, constraints on the dialogue history and evolving text plan are rigid. However, we are exploring the idea of treating these as preconditions, since, if they are not true in a given situation, they could be made true by generating some additional text. For example, if a constraint such as (IN-POTENTIAL-FOCUS-LIST ?act) is not true, it can be made true by using rhetorical techniques to introduce ?act, if it is a new topic, or to return to ?act, if it has been mentioned before. User model constraints can also be treated as preconditions in cases where the system has the underlying knowledge to support explanations that could make them true.</Paragraph>
      <Paragraph position="6"> We would like to provide our text planner with the ability to choose between making an assumption or planning text to satisfy a user model constraint. Modifying the architecture to support this type of reasoning is relatively straightforward. The more difficult problem is to identify heuristics that guide the text planner in making the most effective choices. When we incorporate the notion of preconditions into our plan operators, we will make a distinction between constraints that the system can try to satisfy and those that must already be satisfied before the operator can be applied.</Paragraph>
      <Paragraph position="7"> This distinction was first made by Litman and Allen (1987) and later by Maybury (1992).</Paragraph>
    </Section>
    <Section position="2" start_page="677" end_page="682" type="sub_section">
      <SectionTitle>
5.4 The Planning Mechanism
</SectionTitle>
      <Paragraph position="0"> An overview of the explanation generation facility and its relation to other components in the system is shown in Figure 17. The text planner produces a plan for an explanation using the operators in its plan library. The planning process begins when a communicative goal is posted. This may come about in one of two ways. First, in the process of performing a domain task, the expert system may need to communicate with the user, e.g., to ask a question or to recommend that the user perform an action.</Paragraph>
      <Paragraph position="1"> To do so, it posts a communicative goal to the text planner. Alternatively, the user may request information from the system. In this case, the query analyzer interprets the user's question and formulates a communicative goal. Note that a communicative goal such as Achieve the state where hearer knows about concept c is really an abstract specification of the response to be produced.</Paragraph>
      <Paragraph position="2"> When a goal is posted, the planner identifies all of the potentially applicable operators by searching its library for all operators whose effect field matches the goal. To make this search more efficient, plan operators are stored in a discrimination network based on their effect field. For each operator found, the planner then checks to see if all of its constraints are satisfied. Those operators whose constraints are satisfied (or, in the case of user model constraints, can be assumed to be satisfied) become candidates for achieving the goal.</Paragraph>
      <Paragraph position="3"> From the candidates, the planner selects an operator based on several factors, including what the user knows (as indicated in the user model), the conversation that has occurred so far (as indicated in the dialogue history), the relative specificity of the candidate operators, and whether or not each operator requires assumptions to be made. The knowledge of preferences is encoded into a set of selection heuristics.</Paragraph>
      <Paragraph position="4"> A discussion of the selection process is beyond the scope of this paper; see Moore (in press) for details. Once a plan operator has been selected, it is recorded in the current plan node as the selected operator, and all other candidate plan operators are recorded in the plan node as untried alternatives. If the operator chosen requires any assumptions to be made, they are also recorded in the plan node. The planner then  expands the selected plan operator by posting its nucleus and required satellites as subgoals to be refined.</Paragraph>
      <Paragraph position="5"> Recall that in Section 4.2 we noted that a text generator must decide on the ordering of the text spans it produces. When instantiating an operator, the planner must decide on the ordering of the subgoals in the nucleus and satellite. There are really two ordering issues: (1) for each satellite, the planner must determine whether that satellite should appear before or after the nucleus, and (2) whenever there are multiple subgoals in a set of satellites, the planner must determine the order of these subgoals.</Paragraph>
      <Paragraph position="6"> In our current framework, the second type of ordering knowledge is compiled into our text-planning strategies. That is, subgoals should be expanded in the order in which they appear in the plan operator. This is the approach taken by most systems employing schemata, e.g., McKeown (1985); Paris (1991b), and is also common in systems that make use of linear planners, e.g., Cawsey (1993); Hovy (1991); Maybury (1992). Again, this leads to computational efficiency, since the planner does not have to reason about ordering among sibling subgoals. However, some flexibility is lost. We plan to investigate this tradeoff in our future work.</Paragraph>
      <Paragraph position="7"> To solve the first ordering problem, we appeal to ordering information provided by RST. Although RST does not strictly constrain ordering, Mann and Thompson (1988) have observed that, for some relations, one ordering is significantly more frequent than  Computational Linguistics Volume 19, Number 4 the other. In RST, these ordering observations are treated as &amp;quot;strong tendencies&amp;quot; rather than constraints. For example, in an EVIDENCE relation, the nuclear claim usually precedes the satellite that provides the evidence for the claim; in contrast, in a BACKGROUND relation, the satellite providing the background information necessary to understand the nuclear proposition usually precedes the nucleus. Our planner expands the nucleus before the satellite except for those relations that have been identified as having a typical order of satellite before nucleus.</Paragraph>
      <Paragraph position="8"> Finally, the planner must decide when to expand optional satellites. In the current implementation, the planner has two modes, terse and verbose. In terse mode, no optional satellites are expanded. In verbose mode, each satellite is checked against the user model, and those that do not duplicate the user's existing knowledge are expanded. For example, the plan operator in Figure 12 has two satellites. The first satellite calls for persuading the hearer to perform the act. The second satellite calls for making the hearer competent to perform the act and would, if expanded, provide any information the hearer needs to know in order to be capable of performing the act. These satellites are both marked &amp;quot;optional,&amp;quot; indicating that it would be sufficient to simply state the recommendation. In verbose mode, the planner will check each of these satellites against the user model. To avoid expanding the first satellite, the user model would have to contain information indicating that the user already has the goal of performing the recommended act. However, note that this would never be the case, since if it were, the system would not be planning text to make the hearer adopt the goal of doing the act. That is, it would not be expanding this operator in the first place. The second satellite provides a more interesting example. In verbose mode, if the user model indicates that the user knows how to perform replacement actions, the second satellite will not be expanded, since the system believes that the user has the knowledge necessary to perform the recommended act.</Paragraph>
      <Paragraph position="9"> The planner maintains an agenda of pending goals to be satisfied. When instantiating a plan operator, it creates a new node for the nucleus subgoal and puts it into a list. The planner then expands each of the required satellites and adds them to this list. If the satellite is one that should precede the nucleus, the new satellite node is appended to the front of the list. Otherwise, it is appended to the back of this list. The planner then considers each of the optional satellites. For each of the ones it decides to expand, the planner creates a new node and adds it to the appropriate end of the list. The list is then appended to the front of the agenda of goals to be expanded. In this way a text plan is built in depth-first order.</Paragraph>
      <Paragraph position="10"> When a speech act is reached, the system constructs a specification that directs the realization component, Penman (Mann and Matthiessen 1985; Penman Natural Language Generation Group 1989), to produce the corresponding English utterance.</Paragraph>
      <Paragraph position="11"> The system builds these specifications based on the type of speech act, its arguments, and the context in which it occurs. 12 As the planner examines each of the arguments of the speech act, new goals may be posted as a side effect. If one of the arguments is a concept that the user does not know (as indicated in the user model), a satellite subgoal to define this new concept is posted. In addition, if the argument is a complex data structure that cannot be realized directly by English lexical items, the planner will have to &amp;quot;unpack&amp;quot; this complex structure, and, in so doing, will discover additional concepts that must be mentioned. Again, if any of these concepts is not known to the user, subgoals to explain them are posted. An example of this phenomenon is given 12 Bateman and Paris (1989, 1991) are investigating the problem of phrasing utterances for different types of users and situations.</Paragraph>
      <Paragraph position="12">  SYSTEM You should replace (SETQ X 1) with (SETF X 1). SETQ can only be used to assign a value to a simple-variable. In contrast, SETF can be used to assign a value to any generalized-variable. A generalized-variable is a storage location that can be named by any accessor function.</Paragraph>
      <Paragraph position="13">  in the next section. In this way, our system can opportunistically define a new term when the need arises. Contrast this approach to the schema-based approach described earlier. Recall that handling this type of phenomenon with schemata would require that the definition of each schema explicitly include an optional Identification predicate after every entry in every schema.</Paragraph>
      <Paragraph position="14"> Planning is complete when all subgoals in the text plan have been refined to speech acts. It is important to note that text planning proceeds in such a way that speech acts are planned in the order in which they will appear in the final text. This provides two advantages. First, the text plan is a record of the system's utterances in the order in which they are generated. Because our text plans also include the intentional structure of the final text, focus information can be derived from a completed text plan, and there is no need to maintain a separate data structure for managing focus information. Second, by doing the planning in this manner, the planner can easily be extended for incremental generation in which planning and realization are interleaved.</Paragraph>
      <Paragraph position="15"> 6. Participating in Explanatory Dialogues: An Example In this section, we provide an example illustrating how our system constructs a text plan for recommending an action. We contrast the text plan produced by our system with the schema representation we showed in Figure 4, and show how the text plan may then be used to handle two follow-up questions. See Moore (in press) for a more detailed discussion of the planning process and additional examples.</Paragraph>
      <Paragraph position="16"> Let us return to the sample dialogue with the PEA system that was shown in Figure 3 and that we include again in Figure 18 for the reader's convenience. In this dialogue the user indicates a desire to enhance the readability and maintainability of his or her program. To enhance maintainability, the expert system determines that the user should replace SETQ with SETF. To recommend this transformation, the expert system posts the communicative goal (GOAL USER (DO USER REPLACE-I)) to the text planner. This goal says that the speaker would like to achieve the state where the hearer has adopted the goal of performing the act REPLACE-1.</Paragraph>
      <Paragraph position="17"> A plan operator capable of satisfying this goal was shown in Figure 12. The nucleus is expanded first, causing (RECOMMEND SYSTEM USER (DO USER REPLACE-I)) to be posted as a subgoal. RECOMMEND is a speech act goal that can be achieved directly, and thus expansion of this branch of the plan is complete. Focus information--the current focus and the potential focus list--is also updated at this point. In this case, the current focus is the act REPLACE-1 and the potential focus list includes the participants in this act, i.e., USER, SETQ, and SETF.</Paragraph>
      <Paragraph position="18"> Next, the planner must expand the satellites. Since both satellites are optional in this case, the planner must decide which, if any, are to be posted as subgoals. For the purposes of this example, assume that the planner is in verbose mode and that the user model indicates that the user has the knowledge necessary to perform replacement  Computational Linguistics Volume 19, Number 4 acts (i.e., he or she knows how to use the text editor). Thus, only the first satellite will be expanded, posting the communicative subgoal to achieve the state where the user is persuaded to perform the replacement, i.e., (PERSUADED USER (DO USER REPLACE-I)).</Paragraph>
      <Paragraph position="19"> A plan operator for achieving this goal using the rhetorical relation MOTIVATION was shown in Figure 14.</Paragraph>
      <Paragraph position="20"> When attempting to satisfy the constraints of the operator in Figure 14, the system first checks the constraint (STEP REPLACE-I ?goal). This constraint states that, in order to use this operator, the system must find a domain goal, ?goal, which REPLACE-I is a step in achieving. To find such goals, the planner searches the expert system's problem-solving knowledge. A detailed discussion of the expert system framework we use is beyond the scope of this paper. However, it is important to note that the type of explanation capability we are describing in this paper places stringent requirements on the way domain knowledge is represented and used in reasoning. Interested readers may find a thorough treatment of this topic in Swartout (1983), Clancey (1983), Neches, Swartout, and Moore (1985), Swartout, Paris, and Moore (1991), and Moore and Paris (1991).</Paragraph>
      <Paragraph position="21"> In this example, the applicable expert system goals, listed in order from most to least specific, are: apply-SETQ-t o-SETF-transf ormation apply-local-transformat ions-whose-rhs-use-is-more-general-than-lhs-use apply-local-t ransI ormat ions-that-enhance-maint ainabilit y apply-transformations-that-enhance-maintainability enhance-maintainability enhance-program Thus, six possible bindings for the variable ?goal result from the search for domain goals that REPLACE-I is a step in achieving.</Paragraph>
      <Paragraph position="22"> The second constraint of the current plan operator, (GOAL ?hearer ?goal), is a constraint on the user model stating that ?goal must be a goal of the hearer. Not all of the bindings found so far will satisfy this constraint. Those that do not will not be rejected immediately, however, as we do not assume that the user model is complete.</Paragraph>
      <Paragraph position="23"> Instead, they will be noted as possible bindings, and each will be marked to indicate that, if this binding is used, an assumption is being made, namely that the binding of ?goal is assumed to be a goal of the user. The selection heuristics can be set to tell the planner to prefer choosing bindings that require no assumptions to be made.</Paragraph>
      <Paragraph position="24"> In this example, since the user is employing the system to enhance a program and has indicated a desire to enhance the readability and maintainability of the program, the system infers the user shares the top-level goal of the system (ENHANCE-PROGRAM), as well as the two more specific goals ENHANCE-READABILITY and ENHANCE-MAINTAIN-ABILITY. Of these two more specific goals, only ENHANCE-MAINTAINABILITY is on the refinement path leading to the act REPLACE-I. Therefore, the two goals that completely satisfy the first two constraints of the operator shown in Figure 14 are ENHANCE-PROGRAM and ENHANCE-MAINTAINABILITY. Finally, the third constraint indicates that only the most specific goal along any refinement path to the act should be chosen. This constraint encodes the explanation principle that, in order to avoid explaining parts of the reasoning chain that the user is familiar with, when one goal is a subgoal of another, the goal that is lowest in the expert system's refinement structure, i.e., most specific, should be chosen. Note that ENHANCE-MAINTAINABILITY is a refinement of ENHANCE-PROGRAM. Therefore, ENHANCE-MAINTAINABILITY is now the preferred candidate binding for the variable ?goal.</Paragraph>
      <Paragraph position="25"> The last two constraints of the operator are also satisfied: REPLACE-1 is the current focus, and the operator is being used to expand a satellite branch of the text plan.</Paragraph>
      <Paragraph position="26">  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs The plan operator is thus instantiated with ENHANCE-MAINTAINABILITY as the binding for the variable ?goal. The selected plan operator is recorded as such, and all other candidate operators are recorded as untried alternatives.</Paragraph>
      <Paragraph position="27"> The nucleus of the chosen plan operator is now posted, resulting in the subgoal (MOTIVATION REPLACE-I ENHANCE-MAINTAINABILITY). The plan operator chosen for achieving this goal is the one shown in Figure 16. This operator motivates the replacement by describing differences between the object being replaced and the object replacing it with respect to the user's goal, i.e., by posting the subgoal (BEL USER (SOME-</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="682" end_page="682" type="metho">
    <SectionTitle>
REF (DIFFERENCES-WRT-GOAL SETF-FUNCTION SETQ-FUNCTION ENHANCE-MAINTAINAB-
</SectionTitle>
    <Paragraph position="0"> ILITY) )). Although there are many differences between SETQ and SETF, only the differences relevant to the domain goal at hand (ENHANCE-MAINTAINABILITY) should be expressed.</Paragraph>
    <Paragraph position="1"> The relevant differences are determined in the following way. From the expert system's problem-solving knowledge, the planner determines what roles SETQ and SETF play in achieving the goal ENHANCE-MAINTAINABILITY. In this case, the system is enhancing maintainability by applying transformations that replace a specific construct with one that has a more general usage. SETQ has a more specific usage than SETF, and therefore the comparison between SETQ and SETF should be based on the generality of their usage. Thus, the goal:</Paragraph>
  </Section>
  <Section position="10" start_page="682" end_page="682" type="metho">
    <SectionTitle>
(BEL USER (REF (DIFFERENCE SETF-FUNCTION SETQ-FUNCTION) USE)).
</SectionTitle>
    <Paragraph position="0"> To satisfy this goal, the system uses an operator that informs the user that SETQ can be used to assign a value to a simple variable, and contrasts this with the use of SETF. Focus information is again updated at this point. SETF becomes the current focus, and USE, ASSIGN-T0-GV and its arguments, VALUE and GENERALIZED-VARIABLE, become potential foci.</Paragraph>
    <Paragraph position="1"> Finally, the text planner expands the speech act</Paragraph>
  </Section>
  <Section position="11" start_page="682" end_page="682" type="metho">
    <SectionTitle>
(INFORM SYSTEM USER (USE SETF-FUNCTION ASSIGN-T0-GV))
</SectionTitle>
    <Paragraph position="0"> in order to form a specification for the surface generator. In doing so, the system must express the complex concept ASSIGN-T0-GV where</Paragraph>
    <Paragraph position="2"/>
  </Section>
  <Section position="12" start_page="682" end_page="684" type="metho">
    <SectionTitle>
(DESTINATION GENERALIZED-VARIABLE)).
</SectionTitle>
    <Paragraph position="0"> When expressing processes such as ASSIGN, the system expresses the process itself, as well as the participants (e.g., OBJECT) involved in and circumstances (e.g., DESTINATION) surrounding the process. In this case, to express the concept ASSIGN-T0-GV, the system will express the assignment action, the object being assigned (i.e., VALUE), and the destination of the assignment (i.e., GENERALIZED-VARIABLE). To understand the final utterance that will be generated, the listener must know the concepts ASSIGN, VALUE, and GENERALIZED-VARIABLE.</Paragraph>
    <Paragraph position="1"> When building specifications for complex processes such as ASSIGN-T0-GV, the planner checks each of the fillers (e.g., VALUE, GENERALIZED-VARIABLE) of the roles (e.g., OBJECT, DESTINATION) of the concept (e.g., ASSIGN) to determine if the user knows that  Computational Linguistics Volume 19, Number 4 filler. If so, the planner can simply mention that filler concept by name in the generated text. If, on the other hand, the user model does not indicate that the user is familiar with the concept to be mentioned, the planner must either make an assumption that the user knows the concept (if the planner is in terse mode) or post a subgoal to make the hearer know this concept to the front of its current agenda (if the planner is in verbose mode).</Paragraph>
    <Paragraph position="2"> In the current example, recall that the planner is in verbose mode and further suppose that the user model indicates that the user knows the following concepts:  Thus, the user model indicates that the user knows the concepts ASSIGN and VALUE but has no indication that the user knows the concept GENERALIZED-VARIABLE. As a result, the system posts a subgoal to make the user know this concept, i.e., (KNOW-ABOUT USER (CONCEPT GENERALIZED-VARIABLE) ). Since GENERALIZED-VARIABLE is a member of the potential focus list, no special care need be taken to introduce it. This goal can thus be achieved by elaborating on the previous text to define this new term. This is done with a plan operator that describes concepts by stating their class membership and describing their attributes.</Paragraph>
    <Paragraph position="3"> The text plan for response 3 of the sample dialogue is now completed, and it is shown in its entirety in Figure 19. Contrast this text plan with the instantiated schema representation for the same utterance shown in Figure 4. Note that in addition to representing rhetorical relations between portions of the text (e.g., MOTIVATION, CONTRAST, and ELABORATION) that are analogous to the rhetorical predicates contained in the schema, the text plan includes the intentional structure of the text (as shown in Figure 5). Although in this case the text span boundaries coincide with the text segment (as defined by Grosz and Sidner) boundaries, this need not always be the case. In RST, a text span is either a minimal unit or a schema application made up of one or more relations between minimal units. The minimal unit is &amp;quot;essentially a clause, except that clausal subjects and complements and restrictive relative clauses are considered parts of their host clause rather than as separate units&amp;quot; (Mann and Thompson 1988, p. 248). Thus, at least at the lowest level of analysis, text spans can be determined on the basis of syntactic structure alone. For Grosz and Sidner, intentions are the basic determiner of discourse segmentation. A segment must have an identifiable discourse segment purpose (DSP), and embedding relationships between segments are a surface reflection of relationships among their associated DSPs (Grosz and Sidner 1986, pp. 177-178). Therefore, text spans and text segments are based on quite different criteria.</Paragraph>
    <Paragraph position="4"> In a text plan produced by our system, any subtree headed by a communicative goal (i.e., an intention) corresponds to a discourse segment in Grosz and Sidner's theory. A segment may consist of more than one span. In such cases, the spans will be connected by subject matter RST relations. Such a case would arise, for example, if the system needed to inform the hearer of a procedure involving several steps. The subtree for the entire procedure would be associated with an intention, and the spans stating the steps would be related to one another by SEQUENCE relations. To construct</Paragraph>
  </Section>
  <Section position="13" start_page="684" end_page="686" type="metho">
    <SectionTitle>
(INFORM SYSTEM USER N
(CLASS-ASCRIPTION GENERALIZED-VARIABLE
STORAGE-LOCATION
(RESTRICT NAMED-BY ACCESS-FUNC)))
</SectionTitle>
    <Paragraph position="0"> &amp;quot;&amp;quot; A generalized variable is a storage location that can be named by any access func~on.&amp;quot; Figure 19 Completed text plan for recommending replace SETQ with SETE plan operators for our system based on sample texts, we first segment the texts based on intentional structure. We then identify the rhetorical structure and relate it to the intentional structure. In our formalism, rhetorical techniques are viewed as linguistic strategies for achieving communicative intentions.</Paragraph>
    <Paragraph position="1"> In our system, after a text plan like the one shown in Figure 19 is constructed, it is recorded in the system's dialogue history and passed to the grammar interface, which translates the hierarchical text plan into a sequence of sentence specifications suitable for the sentence generator. In the process of this translation, the system decides where to place sentence boundaries and which, if any, connective markers to include. We currently use a simple set of heuristics that make these decisions based on the rhetorical relation between two text spans. To determine sentence boundaries, we have divided the space of RST relations into those that may appear within a clause complex and those that must start a new sentence. To choose connectives, we have associated a small set of possible connectives with each rhetorical relation. Some connectives are suitable for starting new sentences, while others are suitable for constructing complex sentences. If the system finds more than one suitable connective for expressing an RST relation, the first one is chosen.</Paragraph>
    <Paragraph position="2"> Thus, in translating the text plan shown in Figure 19 into response 3, we see that MOTIVATION, CONTRAST, and ELABORATION cause a sentence break. Although not shown  Computational Linguistics Volume 19, Number 4 in this example, other relations, e.g., MEANS, cause a complex sentence to be formed. In addition, the CONTRAST relation is expressed explicitly via the connective &amp;quot;In contrast,&amp;quot; whereas the MOTIVATION and ELABORATION relations do not cause any connectives to be added. The heuristics we have described here are clearly too simplistic, and we are currently working on more sophisticated techniques for linearizing our text plans. However, note that the information recorded in our text plan is already useful in making these decisions, and we believe this information will also facilitate more complex strategies.</Paragraph>
    <Section position="1" start_page="685" end_page="686" type="sub_section">
      <SectionTitle>
6.1 Recovering From Failure: Avoiding Repetition
</SectionTitle>
      <Paragraph position="0"> After the system has produced its recommendation, suppose that the user asks \[USER\] What is a generalized variable? \[4\] Recall that, from our analysis of naturally occurring dialogues such as the one shown in Figure 1, we observed that advice-seekers frequently ask such questions.</Paragraph>
      <Paragraph position="1"> The query analyzer interprets this question and formulates the communicative goal: (KNOW-ABOUT USER (CONCEPT GENERALIZED-VARIABLE)). At this point, the explainer must recognize that this goal was attempted by the last sentence of the previous explanation and was not fully achieved. Failure to do so might lead to simply repeating the description of a generalized variable that the user did not understand.</Paragraph>
      <Paragraph position="2"> Note that this is precisely what would occur if we had generated the previous explanation using the schema shown in Figure 4. From the schema, the system would not be able to recognize that part of the text previously generated was intended to make the user know about generalized variables. A schema to define a term would thus be triggered and it would give the same answer as was previously generated. Even if a schema-based system were to keep track of the information it already mentioned so that it could avoid literally repeating the same content, it would still not be capable of generating a new text, taking into consideration the fact that a goal was previously attempted but failed. The same argument can be made about a system that generates explanations based solely on &amp;quot;RST plans&amp;quot; of the type used in Hovy's (1991) structurer.</Paragraph>
      <Paragraph position="3"> By examining the text plan of the previous explanation recorded in the dialogue history, our system is able to determine whether the current goal (resulting from the follow-up question) is a goal that was previously attempted, as it is in this case.</Paragraph>
      <Paragraph position="4"> This time, when attempting to achieve the goal, the planner must select an alternative strategy. Recovery heuristics are responsible for selecting an alternative strategy when responding to such follow-up questions (Moore and Swartout 1989; Moore in press).</Paragraph>
      <Paragraph position="5"> One of these indicates that when the goal is to make the user know a concept, a good recovery strategy is to give examples.</Paragraph>
      <Paragraph position="6"> In the current case, the user model indicates that there are examples of the concept GENERALIZED-VARIABLE that the user is familiar with, namely CAR-0F-CONS and CDR-OF-CONS. Thus, the strategy of giving examples can be applied to yield the following system response: \[SYSTEM\] For example, the car of a cons is a generalized variable \[5\] named by the access function CAR, and the cdr of a cons is a generalized variable named by the access function CDR.</Paragraph>
      <Paragraph position="7"> Providing an alternative explanation would not be possible without the explicit representation of the intentional structure that underlies the generated text recorded in our text plans. To avoid repetitions, the system must realize what goals it has tried to achieve in the previous discourse and what strategies it used to achieve them. Then, if  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs the same goal is posted later in the discourse, the system can realize that its previous strategy was not successful and can employ an alternative strategy.</Paragraph>
      <Paragraph position="8"> A similar situation would arise if the user were to indicate that he or she has not been persuaded to replace SETQ with SETF, e.g., with an utterance such as: 13 \[USER\] I don't see why I should replace SETQ with SETF. \[4'\] This would cause the query analyzer to post a goal to persuade the user to perform this act. Once again, the system must recognize that it has already attempted to persuade the user to perform the act by contrasting the usage of SETQ with the usage of SETF.</Paragraph>
      <Paragraph position="9"> Since this strategy did not succeed, it must be able to persuade the user in a different way. From the information recorded in the text plan in Figure 19, our system can determine that it has already tried to persuade the user to do the replacement. Since MOTIVATION is a presentational RST relation, the system has no other strategies for achieving the persuade goal. However, it does have several other rhetorical strategies for MOTIVATION. One of these is to provide a trace of the expert system's reasoning and could be used to generate the following explanation: \[SYSTEM\] I'm trying to enhance the maintainability of the program \[5 ~\] by applying transformations that enhance maintainability.</Paragraph>
      <Paragraph position="10"> A transformation enhances maintainability if the usage of the construct on its right hand side is more general than usage of the construct on its left hand side. The usage of SETF is more general than the usage of SETQ.</Paragraph>
      <Paragraph position="11"> Again, a schema-based system would not be able to recover correctly from this failure. As we argued earlier, because the schemata do not record the intentions of the schema components, the system cannot determine that it contrasted SETQ with SETF in order to persuade the user to perform the replacement and thus would not know what part of the schema to replan nor what other strategies to try.</Paragraph>
      <Paragraph position="12"> 7. Comparison to Related Work Building on our work, Maybury (1992) devised a system to plan &amp;quot;communicative acts.&amp;quot; Like Appelt (1985, p. 9), Maybury's system makes use of a hierarchy of linguistic actions. At the highest level, Maybury has added rhetorical acts (e.g., describe, explain, convince). The next two levels correspond to the top two levels of Appelt's hierarchy: illocutionary acts (e.g., inform, request), and locutionary acts (assert, ask, command). TM The actions at each level in the hierarchy have been encoded into plan operators that are used in a process of hierarchical decomposition (Sacerdoti 1977) to refine rhetorical acts through illocutionary acts into locutionary acts.</Paragraph>
      <Paragraph position="13"> An example operator from Maybury's system is shown in Figure 20. This is a rhetorical operator that can be used to define an entity by giving its &amp;quot;logical definition&amp;quot; (the entity's genus and differentia.) Note that operators in Maybury's language 13 We do not currently allow users to ask questions phrased in such a manner because we do not have a sophisticated natural language understanding component. Instead, we have implemented a direct manipulation interface that allows users to use the mouse to point at the noun phrases or clauses in the text that were not understood or accepted. To approximate the query above, the user could highlight the system's original recommendation to replace SETQ with SETF, and select &amp;quot;Why?&amp;quot; from the menu that appears. As a result of interpreting this &amp;quot;Why?&amp;quot;, the system posts the communicative goal</Paragraph>
    </Section>
  </Section>
  <Section position="14" start_page="686" end_page="690" type="metho">
    <SectionTitle>
(PERSUADED USER (DO USER REPLACE-I)); see Moore and Swartout (1990) for more details. 14 Appelt has two layers below locutionary acts that are not included in Maybury's system: concept
</SectionTitle>
    <Paragraph position="0"> activation and utterance acts.</Paragraph>
    <Paragraph position="1">  Computational Linguistics Volume 19, Number 4 contain both a &amp;quot;header&amp;quot; and an &amp;quot;effects&amp;quot; field. The header field designates the type of act (e.g., define, inform) associated with the operator, whereas the effects field specifies the effect(s) that this act is expected to have on the hearer's mental state. To cause the planner to generate a text, a &amp;quot;discourse controller&amp;quot; posts a goal to cause an effect in the hearer's mental state, such as KNOW-ABOUT(USER, KC-135). This goal is matched against the effects field of operators in the plan library, and one is chosen. Therefore, at the top level, Maybury's system has a record of the intention causing the text to be produced. But now consider what happens during goal refinement. When an operator is instantiated, the clauses in its decomposition field are posted as subgoals. Note from Figure 20 that in Maybury's operator language expressions in the decomposition field are not subgoals to affect the hearer's mental state. Rather, they are linguistic actions such as define, inform, and assert. Unless these actions are locutionary, they become subgoals. To achieve an action subgoal, the planner matches the subgoal against the header field of operators in the plan library. When an operator is chosen, the propositions in the effects field are recorded, and the actions in the decomposition become further subgoals.</Paragraph>
    <Paragraph position="2"> So, in fact Maybury's system does not have a record of the intentional structure behind the text it is producing. There are two problems. First, except at the top level, planning is done by matching against the header field of operators, not against the effects field. Because there are multiple effects listed for most of the operators, the system cannot know which one is the intended effect! Maybury's system thus cannot distinguish between intended effects and side effects. It is crucial that agents be able to distinguish their intentions from the side effects of their actions in order to recover from plan failures (Davis 1979; Bratman 1987). Therefore, Maybury's system does not have the knowledge necessary for recovering from failures, as our system does.</Paragraph>
    <Paragraph position="3"> A second problem with Maybury's text plans is that they do not capture the relationship between intentions, i.e., that some intentions are in the plan because they in turn serve other intentions that appear higher in the plan tree. Once again, this is because Maybury's system simply records all the effects of each action. It is impossible to tell from the sets of effects at each level of the decomposition how effects are related to one another. Grosz and Sidner (1986) argue that such relations between intentions are a crucial part of intentional structure. Contrast Maybury's plans with those produced by our system. Our text plans explicitly represent the intended effects of actions and the relationships between these intentions. While we believe that it is useful to represent additional effects of operators, it is crucial to distinguish intended effects from side effects. Therefore, we argue that while Maybury's approach does indeed represent the effects of all of the &amp;quot;communicative acts&amp;quot; in his plans, it does not capture intentional structure and therefore cannot be used to recover from communication failures.</Paragraph>
    <Paragraph position="4"> In related work, Cawsey (1993) built a system called EDGE that allows the user to interrupt with clarification questions while a text is being generated. EDGE plans extended tutorial explanations about the structure and input/output behavior of simple electrical circuits. This system is novel because it addresses issues of conversation management, such as turn-taking and topic control. The system separates discourse planning rules from content planning rules for this purpose. EDGE plans an explanation at a high level, following a specified curriculum. This plan is fleshed out as the dialogue progresses, causing sentences to be generated. After each sentence is generated, the system pauses to allow the user to supply feedback by choosing an item from a menu.</Paragraph>
    <Paragraph position="5"> The EDGE discourse planner maintains an agenda indicating the topics that will be covered later in the explanation. Based on this agenda, the system can recognize when the user is asking about a topic that will be covered eventually, and can make  A plan operator for defining (from Maybury \[1992\]).</Paragraph>
    <Paragraph position="6"> comments such as We'll be getting to that in a moment. If this is not the case, or if the user insists, EDGE answers the user's question immediately. Once the interruption has been addressed, EDGE alters its subsequent explanation plan based on what took place during the interruption, and proceeds with its explanation as specified in the overall explanation plan. To resume from interruptions coherently, the discourse planning rules that manage the conversation include markers and meta-comments (e.g., Anyway, I was talking about... ).</Paragraph>
    <Paragraph position="7"> Because of its extended explanation plan and its discourse management rules, EDGE can handle interruptions in ways that are beyond the current capability of our system. However, as Cawsey points out, EDGE is based on a largely syntactic model of dialogue structure, and the system does not explicitly represent why different dialogue actions are selected. The effects that dialogue operators are intended to have on the user's knowledge and goals are not represented. EDGE content plans are much like schemas and therefore suffer from the limitations we discussed in Section 3.2.1. If a user fails to understand a text produced by a content plan, the system's only recourse is to try another strategy to achieve the top-level content goal, e.g., &amp;quot;describe how an entity works.&amp;quot; Moreover, the content plans of EDGE are domain-specific, and are not based on general rhetorical techniques. Because rhetorical structure is not represented, the system cannot choose connective markers or use other rhetorical devices that make text easier to comprehend. For these reasons, EDGE cannot handle the types of phenomenon our system handles.</Paragraph>
    <Paragraph position="8"> It is also important to note that, because we are dealing with expert and advisory applications, our system must be able to manage a dialogue whose structure emerges dynamically as the user asks questions. In advisory interactions, the system presents the user with a recommendation or result and only provides explanations when the user requests them. It is not appropriate for the system to plan extended explanations, testing the user's understanding and elaborating without provocation.</Paragraph>
    <Paragraph position="9"> Therefore, Cawsey's approach, which relies on the fact that the system has an extended explanation plan to follow, cannot be used directly.</Paragraph>
    <Paragraph position="10"> It is clear, however, that our approach and Cawsey's are complementary, and that a complete system would need to incorporate aspects of both. In particular, our system should be augmented to include conversation management operators in order to manage topic shifts, to handle interruptions, and to generate meta-comments about the discourse itself.</Paragraph>
    <Paragraph position="11">  Computational Linguistics Volume 19, Number 4 8. Status and Future Directions The text planner presented in this paper is implemented in Common Lisp and can produce the text plans necessary to participate in the sample dialogue described in this paper and several others; see Moore (in press) and Paris (1991a). We currently have over 150 plan operators that can answer the following types of questions:  -- Why? -- Why conclusion? -- Why are you trying to achieve goal? -- Why are you using method to achieve goal? -- Why are you doing act? -- How do you achieve goal (in the general case)? -- How did you achieve goal (in this case)? -- What is a concept? -- What is the difference between concept1 and concept2? -- Huh?  The text planner is being incorporated into several knowledge-based systems and two intelligent tutoring systems currently under development. Two of these systems are intended to be installed and used in the field. This will give us an opportunity to evaluate the techniques proposed here and extend the system as appropriate. It has also been employed in Reithinger's (1991) system for incremental language generation, and serves as the basis of the presentational planner for WIP, a multimedia system that plans text and graphics to achieve communicative goals (Wahlster et al. 1991). Finally, it is the basis for a text planner capable of generating explanatory texts that integrate examples with their surrounding context (Mittal and Paris 1993). This integration would not be possible without our system's explicit representation of the intentions for generating portions of the text and the rhetorical strategies used to achieve them.</Paragraph>
    <Paragraph position="12"> We have begun to investigate how the discourse history should be indexed and exploited to control the dialogue and affect subsequent responses in more general ways. As reported here, the dialogue history is used primarily to determine how to interpret and answer follow-up questions (e.g., Why?, How come?), and to determine how to respond when the user asks a question that has already been answered or indicates that an explanation was not understood (Huh?). In Carenini and Moore (1993) and Rosenblum and Moore (1993) we discuss additional ways in which prior explanations can affect the generation of the current utterance.</Paragraph>
    <Paragraph position="13"> We currently do not allow the user to return to a previous topic (e.g., once the system has moved on to a new topic, Let's go back to replacing SETQ with SETF... ), or to introduce new goals into the dialogue (e.g., Well, now suppose I wanted to enhance efficiency ... ). In order to allow the user to change topics and introduce new goals at will, the system will need to be able to track the user's shifting goals and attention. Sidner (1985) and Carberry (1987) have proposed approaches for tracking the topic of conversation in task-oriented dialogues. However, their approaches rely on the assumption that the topic of conversation closely follows the structure of the domain task. Litman and  Johanna D. Moore and C6cile L. Paris Planning Text for Advisory Dialogs Allen (1987) identified types of subdialogues in task-oriented interactions, including clarifications and corrections, in which topic shift deviates from task structure, and they devised a plan recognition model for handling such subdialogues. Our system currently handles what Litman and Allen call clarification subdialogues. We believe that our model could be extended to handle other types of subdialogues, and that the text plans recorded in our dialogue history will aid in more general discourse management tasks than the ones we currently address. To perform these tasks, our system must understand how the previous responses stored in its discourse history relate to one another. That is, we must address issues of how to build a representation of the intentional structure of the dialogue that is emerging across conversational turns (Grosz and Sidner 1986) and to track global focus (Grosz 1977). In addition, we will need communicative strategies for managing the dialogue, e.g., strategies for introducing a topic, strategies for returning to a topic, etc.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML