XML Viewer - h91-1064

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1064_metho.xml
Size: 22,969 bytes
Last Modified: 2025-10-06 14:12:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1064">
  <Title>DISCOURSE STRUCTURE IN THE TRAINS PROJECT</Title>
  <Section position="4" start_page="0" end_page="326" type="metho">
    <SectionTitle>
THE DATA
</SectionTitle>
    <Paragraph position="0"> We decided to collect our own data rather than using an existing source, such as the ATIS corpus, for several reasons. First, the dialogs in ATIS are structured to emphasize question-answering rather than interactive problem solving. More importantly, the mixed modality interaction of the ATIS scenario inhibits most natural spoken dialog phenomena. In particular, the long pauses before responses and the table-based system output prevent natural follow-up, such as&amp;quot; acknowledgements, clarifications and confirmations that are common in spoken dioalog. Almost 50% of the soeech collected in our more natural setting was of these types.</Paragraph>
    <Paragraph position="1"> The TRAINS domain was carefully designed so that a significant part of it is within reach of current (or near future) capabilities of plan reasoning systems. Because of this, we should be able to fully specify and implement the reasoning underlying the &amp;quot;system&amp;quot; in the dialogs. If ATIS were extended to be a travel-planner rather than a database, the domains would be comparable.</Paragraph>
    <Paragraph position="2"> We have collected an initial corpus of natural spoken conversations between two people engaged in complex problem solving in the TRAINS world. One person (the &amp;quot;system&amp;quot;) has most of the information and detail about the domain, but the other has the problem to solve. The two are in different rooms and so have no visual contact, but they both have the same map from which to work. A fragment from one of the dialogs shown in Figure 1. Each utterance is roughly classified as to its function: whether it is primarily concerns with making progress on solving the problem (plain text), or whether it is primarily concerned with maintaining the conversation itself (in bold). The agents are labelled &lt;H&gt; (for human) and &lt;S&gt; (for system), even though the system here was simulated by a person. Comments on the possible discourse function of the utterances concerned with maintaining the conversation are presented in italics.</Paragraph>
    <Paragraph position="3"> As can be seen, approximately half of the utterances are concerned with maintaining the communication process.</Paragraph>
    <Paragraph position="4"> There are utterances that identify the goals of the next stretch of discourse, and a large number of utterances that pertain to acknowledging the other agents utterances and in maintaining a smooth flow of control (i.e. identifying whose turn it is to speak). It has been our claim for some time that this level of discourse interaction must be explicitly modelled if we are to build systems that can converse in natural language, and in previous papers we have described a plan-based model that accounted for clarification subdialogs among other things (Litman &amp; Allen, 1990, Litman &amp; Allen, 1987). We are now attempting to develop an extended model that can account for all the discourse-level interactions found in the corpus. The project is pursuing two main thrusts. First, we are developing a database for studying discourse phenomena.</Paragraph>
    <Paragraph position="5"> To do this, we are developing a taxonomy of discourse-level acts with which different people can independently classify each utterance reliably. Using this classification,  &lt;H&gt; ok, now uhh, let me, let me check on the uhh &lt;H&gt; where the.. where the engines are and the.. the boxcars are uhh setting the immediate conversation goals for the following dialog fragment &lt;H&gt; I'm assuming, indicating that &lt;H&gt; is asking for confirmation &lt;H&gt; let's see, that uhh &lt;H&gt; is holding the turn while he examines the map &lt;H&gt; I have two engines to work with, engine E2 which is at city D &lt;H&gt; and engine E3 which is at city A &lt;S&gt; aah, yes.</Paragraph>
    <Paragraph position="6"> the &amp;quot;aah&amp;quot; probably indicates that &lt;S&gt; is thinking about the answer (and acknowledging that the question was understood) &lt;H&gt; and uh, I've got two tankers, tanker tl is at city A, &lt;H&gt; and tanker t2 is at city B &lt;S&gt; that's right, hnn, hnn.</Paragraph>
    <Paragraph position="7"> &lt;H&gt; ok &lt;H&gt; indicates that he has accepted &lt;S&gt;'s reply &lt;S&gt; there're., there're other tankers as well.</Paragraph>
    <Paragraph position="8"> &lt;H&gt; ok &lt;H&gt; acknowledges &lt;S&gt;~ introduction of new information &lt;S&gt; there're actually four tankers at city E &amp;quot;actually&amp;quot; indicates that &lt;S&gt; believes &lt;H&gt; doesn't know about these tankers &lt;H&gt; four tankers at city E, ok &lt;H&gt; acknowledges hearing the new information, and then accepts it &lt;H&gt; uhh so, tankers t3, t4, t$, and t6 are all at city E. &lt;H&gt; confirms his understanding of &lt;S&gt; ~ assertions &lt;S&gt; that's right &lt;S&gt; confirms &lt;H&gt;~ confirmation &lt;H&gt; ok. and just uh &lt;H&gt; acknowledges the previous exchange and signals a move to a new topic &lt;H&gt; I have four boxcars, b6 at city H, b5 at city F, &lt;H&gt; b7 at city B, and b8 at city I.</Paragraph>
    <Paragraph position="9"> we are building a database of dialogs with each utterance annotated by its discourse function. In addition, we are analyzing the tapes and extracting prosodic information (primarily pitch contours, speech rate) and adding this information to the database as well. We have started some preliminary studies on prosodic cues to the discourse acts in our taxonomy, but need to analyze additional data before we have significant results. Second, we are developing a system that implements the discourse model together with full natural language processing and plan reasoning in the domain. In this paper, I will mainly describe the problems we are facing and the initial taxonomy developed so far. At the end, I will briefly describe the discourse model in the current implementation.</Paragraph>
    <Paragraph position="10"> THE TAXONOMY Rather than analyze the dialogs in terms of abstract discourse relations, our taxonomy is based entirely on the intentions of the speaker. This allows us to integrate well with previously developed computational speech act models, and provides a slightly different view from the other approaches. It is important to remember that just because a speaker intended an utterance is a certain way, it doesn't mean that the hearer understands it that way.</Paragraph>
    <Paragraph position="11"> Establishing agreement between the speaker and hearer as to what was intended is the primary reason for acknowledgements, clarifications and corrections. In addition, even if an utterance is understood correctly, this doesn't commit the hearer to accepting the intended consequences of the act (e.g. believing the speaker's assertion, or performing the requested act). Acceptance involves yet additional mechanisms to acknowledgment.</Paragraph>
    <Paragraph position="12"> As we define the set of speech act types, It is important to realize that nearly every speech act can be used at different levels of the conversation: they can involve the plan in the TRAINS world (the domain level), or the problem solving process that the two agents are engaged in (the problem solving level), or the understanding and managing of the conversation itself (the discourse level). We will try to give examples of the acts at each level as they are defined. Because of the focus on the discourse-level acts in this paper, we will often distinguish these as separately named acts.</Paragraph>
    <Paragraph position="13"> The speech acts themselves break into three major classes: the understanding acts, which include acknowledgements and confirmations, the information acts, which involve imparting information and include informs, elaborations, clarifications, corrections and summarizations, and the co-ordination acts, which involve co-ordinating the activities of the two agents and include requests, suggestions, acceptances and so on.</Paragraph>
    <Paragraph position="14"> Throughout we will refer to the agent performing the speech act as the speaker and the other agent as the other agent.</Paragraph>
    <Paragraph position="15">  There is not the space to precisely define each act, but I would like to present the entire taxonomy. To do this, some of the acts will simply be presented by an example.</Paragraph>
  </Section>
  <Section position="5" start_page="326" end_page="326" type="metho">
    <SectionTitle>
THE UNDERSTANDING ACTS
</SectionTitle>
    <Paragraph position="0"> The understanding acts specifically relate to indicating the successful hearing of the other agent's utterances.</Paragraph>
    <Section position="1" start_page="326" end_page="326" type="sub_section">
      <SectionTitle>
Acknowledgment (Ack)
</SectionTitle>
      <Paragraph position="0"> An acknowledgment indicates that the speaker has understood the other agent's previous utterance, but does not necessarily commit the speaker to agreeing with the other agent. An acknowledgement that is not an acceptance of the other agent's request is shown in italics:  A confirmation act is a special form of acknowledgment that involves restating or paraphrasing information established previously in the conversation. If there is any doubt implied in the utterance, say by using a question intonation, then the utterance is a clarification request rather than a confirmation.</Paragraph>
      <Paragraph position="1">  This is a wide-ranging class and includes any utterances whose main purpose is to maintain the speakers turn, although they may also serve as an acknowledgement. &lt;S&gt; where it will then pick up the orange juice &lt;S&gt; and uhhh ...</Paragraph>
      <Paragraph position="2"> &lt;S&gt; and then take that to city G</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="326" end_page="327" type="metho">
    <SectionTitle>
THE INFORMATION ACTS
</SectionTitle>
    <Paragraph position="0"> Information acts involve making claims about the state of the world. The prototypical speech act in this class in the speech act literature in the inform act.We will break down informs at the discourse level into clarifications, corrections, elaborations and summarizations.</Paragraph>
    <Paragraph position="1"> Inform (Inf) An inform act in the TRAINS domain is generally either in response to a question, or is a situation setting action that describes background information necessary to understand the problem. Inform is the default assignment for acts in this class if none of the following acts seem appropriate. &lt;H&gt; Where's e3? &lt;S&gt; e3 is just coming in to city A.</Paragraph>
    <Section position="1" start_page="326" end_page="327" type="sub_section">
      <SectionTitle>
Clarifications (Clr)
</SectionTitle>
      <Paragraph position="0"> A clarification is an utterance that provides additional information to help the interpretation of the previous utterance. Utterances that provide information which is not necessary to understand the previous utterance are not clarifications, but rather elaborations. Examples are  previous utterance. The name Tag is assigned as this is the role that is played by tags in sentences such as John is coming to the party, isn't he?. The tag indicates that the utterance is a question rather than an assertion. A tag can be deleted without affecting the dialog (if the previous utterance is treated appropriately as indicated by the tag)  A correction is a special form of clarification that replaces some earlier information with the new information.</Paragraph>
      <Paragraph position="1"> Corrections often follow utterances that signal some problem, such as No, or opps, and so on. Corrections also can appear mid-way through an utterance when the speaker needs to make a correction of something uttered earlier in the sentence.</Paragraph>
      <Paragraph position="2"> &lt;S&gt; e3 is on its way to tl &lt;S&gt; oops with tanker tl &lt;S&gt; full of orange juice Elaboration (Elab) An elaboration is an inform that further develops a previous topic. The information is not needed in order to understand the previous sentence (in which case it would be a clarification).</Paragraph>
      <Paragraph position="3">  Summary (Sum) A summary act is an inform that restates what has been asserted or decided upon in the previous utterances, or draws conclusions from what was previously asserted. &lt;S&gt; uh we can actually have the orange juice made by uhh twelve pm tonight &lt;S&gt; so there should be plenty., plenty of time</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="327" end_page="328" type="metho">
    <SectionTitle>
THE CO-ORDINATION ACTS
</SectionTitle>
    <Paragraph position="0"> These acts involve the two agents co-ordinating their activities by making requests and suggestions and reaching agreement after negotiation. As mentioned above, this co-ordination can occur at the three different levels of conversation. As before, the acts at the discourse level will be given special treatment as subclasses of the general cases.</Paragraph>
    <Section position="1" start_page="327" end_page="328" type="sub_section">
      <SectionTitle>
Request (Req)
</SectionTitle>
      <Paragraph position="0"> A request involves one agent attempting to get the other agent to do something by direct means. If a request is not taken up, it must be explicitly denied by the hearer either by stating that he won't comply or by suggesting a modification to the requested action. The requested action may be either a domain act, as in: &lt;H&gt; Can you have city I fill B6 with oranges, please or a problem solving act as in &lt;H&gt; Let me know when E3 has B6 loaded.</Paragraph>
      <Paragraph position="1"> Particular subclasses of requests involving questions are treated individually as they have their own specific syntactic markers in language.</Paragraph>
      <Paragraph position="2"> Wh Question (WHQ) Wh-questions are true question where the speaker is actually asking for information about a specific entity from the hearer. An example at the domain level is: &lt;H&gt; How much does it cost to dunk it on the ground? and at the problem solving level is &lt;H&gt; What should we do? Yes-No Question (YNQ) These are true yes/no questions, where the speaker would be content with a simple yes or no answer. If additional information does seem to be required, then the original question was probably an indirect request or WHQ. An example at the domain level is: &lt;H&gt; Is e3 at city I? and at the problem solving level is &lt;H&gt; are you uhh trying to compute the time to take E2 with T3 and T4 Requests and questions at the discourse level are typically clarification requests, which are marked in their own category below.</Paragraph>
      <Paragraph position="3"> Clarification Request (reqClr) A clarification request is a request for information to help interpret some previous utterance(s), i.e. a request for a clarification. In the following example, the clarification request is in bold italics, and the ensuring clarification in italics:  other agent to do something, but is weaker than a request. Suggestions explicitly leave open an option of negotiation between the agents, often by using the first person plural to include both agents in the suggested action. An example at the domain level is: &lt;&lt;S&gt; Why don't we begin loading oranges in boxcar B6 and at the problem solving level: &lt;S&gt; Shall we look at the other engine? and at the discourse level: &lt;H&gt; Well, lets talk about orange juice Correction Suggestion (sugCor) Other suggestions at the discourse level may be correction suggestions. In the example, the correction suggestion is in bold italics, and the acceptance (i.e. a correction) is in italics.</Paragraph>
      <Paragraph position="4"> &lt;S&gt; second engine E3 is going to uhh city H to pick up the bananas &lt;S&gt; back to A, dro ..................</Paragraph>
      <Paragraph position="5"> &lt;H&gt; ....... H to pick up the oranges &lt;S&gt; sorry, pick up the oranges &lt;S&gt; back to A to drop the oranges off Accept (Ace) Art accept indicates that the hearer has accepted the act in the previous utteranee, be it a request, suggest, inform of whatever. After an agent has done an accept, they are committed to whatever the speech act that was accepted requires. Accepts can also be implicit if the agent  continues on without explicit denial. Examples often overlap with acknowledgments. Here is a suggestion at the domain level that is accepted: &lt;H&gt; and in the mean time, it would be nice if city H could be filling B6 with oranges &lt;S&gt; OK, it looks like we can do that Denial or Rejeetance (Den) A Denial is the opposite of an acceptance. As with accept acts, one can deny requests, suggestions or many other acts. There are not many denials in the current dialogs as the conversants are quite co-operative!. But they do occur occasionally. Here's an acknowledgement of a request followed by a denial.</Paragraph>
      <Paragraph position="6"> &lt;H&gt; have city B prepare for its arrival, it should unload b8 processing orange juice and load t2.</Paragraph>
      <Paragraph position="7"> &lt;S&gt; OK, ah but tanker t2 is currently full of beer.</Paragraph>
    </Section>
    <Section position="2" start_page="328" end_page="328" type="sub_section">
      <SectionTitle>
Evaluative Statement (Eval)
</SectionTitle>
      <Paragraph position="0"> An evaluative statement describes the reaction of the speaker to the current situation. Such statements serve a confirmation or denial role, or express more subtle shades in between.</Paragraph>
      <Paragraph position="1"> Typical Phrases: great!, terrific!, yuk!, how nice! &lt;S&gt; looks like we can do that</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="328" end_page="329" type="metho">
    <SectionTitle>
&lt;H&gt; Terrific
THE CURRENT SYSTEM
</SectionTitle>
    <Paragraph position="0"> Eventually, we intend to develop a model that defines each of the above discourse acts in terms of the changes that the act makes to the shared and individual beliefs and goals of the two participants in the dialog. The current system, however, is quite simple and was constructed mainly to define the overall architecture of the system. The current discourse model has the following basic capabilities * maintaining knowledge of the turn taking (i.e. whose responsibility is it to speak next); * tracking the status of each fragment of the plan as it is suggested and discussed; * tracking and responding to simple discourse obligations (e.g. answering questions)..</Paragraph>
    <Paragraph position="1"> The discourse module uses the domain plan reasoner, which uses planning and plan recognition techniques, to maintain the domain plans. It calls the domain reasoner to verify hypotheses about the discourse function of the utterances, and to update the state of the plan as needed. Plan fragments in the knowledge base are characterized by six modalities that are used to indicate the status of parts of the plans being discussed. These are organized hierarchically with inheritance so that we can examine the full plan from either human's of the system's perspective as shown in Figure 2.</Paragraph>
    <Paragraph position="2"> The modalities include: * the plan fragment suggested by the human but not yet acknowledged by the system (Human-Proposed-PIanPrivate); null * the plan fragment suggested by the system and not yet acknowledged by the human (System-Proposed-PlanPrivate); null * the plan fragment suggested by the human and acknowledged but not yet accepted by the system (HumanProposed-Plan); null * the plan fragment suggested by the system and acknowledged but not yet accepted by the human (SystemProposed-Plan); null * the plan fragment that is shared between the two (i.e. accepted by both) (Shared-Plan); and * the plan fragment constructed by the system but not yet suggested (System-Private-Plan).</Paragraph>
    <Paragraph position="3"> Each context is associated with a particular form of plan reasoning as indicated in the figure. In particular, the plan in the System-Private-Plan context is extended by plan construction (essentially classical planning), where the plans in all the other contexts are extended by plan recognition relative to the appropriate set of beliefs. Figure 2 also shows how plan fragments may move between the various contexts. A suggestion from the human enters a new plan fragment into the Human-Proposed-Plan-Private context and initiates plan recognition with respect to what the system believes about the human's private beliefs. Once acknowledged, this suggestion becomes &amp;quot;public&amp;quot; (i.e. it is in Human-Proposed-Plan). An acceptance from the system would then move that plan fragment into the Shared-Plan context, again invoking plan recognition.</Paragraph>
    <Paragraph position="4"> Planning by the system results in new actions in the System-Private-Plan context. To make these actions part of the Shared-Plan context, the system must suggest the actions and then depend on the human to acknowledged and accept them. This model, while still crude by philosophical standards, is rich enough to model a wide range of the discourse acts involving clarification, acknowledgment and the suggest/accept speech act cycle ever-present in dialogs in this setting.</Paragraph>
    <Paragraph position="5"> Because of the inheritance through the spaces, when the system is planning in the System-Private-Plan context, it sees a plan consisting of all the shared goals and actions, what it has already suggested, and all the new actions it has introduced into the plan privately but not yet suggested.  Consider an example. Assume that the Shared-Plan context Acknowledgements contains a plan to move some oranges to a factory at B, but there is no specification of the engine to be used. The system might plan to use engine E3. At this stage, the plan from the System-Private-Plan context involves E3.</Paragraph>
    <Paragraph position="6"> The plan in the System-Proposed context, however, is still the same as the plan in the Shared-Plan context, which still does not identify which engine to use. When the system makes the suggestion, the plan fragment involving E3 is added to the system proposed plan (private). An acknowledgment from the human results in this plan fragment being added to the system-proposed plan known to both agents. If the human then accepts this, it then becomes part of the shared plan. If the human rejects the suggestion, then E3 does not become part of the shared plan (at least, not without further discussion).</Paragraph>
    <Paragraph position="7"> This work has been done in conjunction with Shin'ya Nakajima and David Traum. It was supported in part by ONRIDARPA conbact number N00014-82-K-0193.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML