XML Viewer - w00-0311

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0311_metho.xml
Size: 26,969 bytes
Last Modified: 2025-10-06 14:07:19
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0311">
  <Title>A Compact Architecture for Dialogue Management Based on Scripts and Meta-Outputs</Title>
  <Section position="3" start_page="0" end_page="55" type="metho">
    <SectionTitle>
2 Theoretical Ideas,
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="54" type="sub_section">
      <SectionTitle>
2.1 Scripts vs Logical Forms
</SectionTitle>
      <Paragraph position="0"> Let's first look in a little more detail at the question of what the output representation of a spoken language interface to a semi-autonomous robot/agent should be. In practice, there seem to be two main choices: atheoretical representations, or some kind of logic.</Paragraph>
      <Paragraph position="1"> Logic is indeed an excellent way to think about representing static relationships like database queries, but it is much less clear that it is a good way to represent commands. In real life, when people wish to give a command to a computer, they usually do so via its operating system; a complex com- null mand is an expression in a scripting language like CSHELL, Perl, or VBScript. These languages are related to logical formalisms, but cannot be mapped onto them in a simple way. Here are some of the obvious differences: * A scripting language is essentially imperative, rather than relational.</Paragraph>
      <Paragraph position="2"> * The notion of temporal sequence is fundamental to the language. &amp;quot;Do P and then Q&amp;quot; is not the same as &amp;quot;Make the goals P and Q true&amp;quot;; it is explicitly stated that P is to be done first. Simi~ larly, &amp;quot;For each X in the list (A B C), do P(X)&amp;quot; is not the same as &amp;quot;For all X, make P(X) true&amp;quot;; once again, the scripting language defines an order, but no~ the logical language 2.</Paragraph>
      <Paragraph position="3"> * Scripting languages assume that commands do not always succeed. For example, UNIX-based scripting languages like CSHELL provide each script with the three predefined streams stdin, stdout and stderr. Input is read from stdin and written to stdout; error messages, warnings and other comments are sent to stderr.</Paragraph>
      <Paragraph position="4"> Wedo not think that these properties of scripting language are accidental. They have evolved as the result of strong selectional pressure from real users with real-world tasks that need to be carried out, and represent a competitive way to meet said users' needs. We consequently think it is worth taking seriously the idea that a target representation produced by a spoken language interface should share many of these properties.</Paragraph>
    </Section>
    <Section position="2" start_page="54" end_page="55" type="sub_section">
      <SectionTitle>
2.2 Fallible Interpretation: Outputs and
Meta-outputs
</SectionTitle>
      <Paragraph position="0"> We now move on to the question of modelling the interpretation process, that is to say the process that converts the input (speech) signal to the output (executable) representation. As already indicated, we think it is important to realize that interpretation is a process which, like any other process, may succeed more or less well in achieving its intended goals.</Paragraph>
      <Paragraph position="1"> Users may express themselves unclearly or incompletely, or the system may more or less seriously fail to understand exactly what they mean. A good interpretation architecture will keep these considerations in mind.</Paragraph>
      <Paragraph position="2"> Taking our lead from the description of scripting languages sketched above, we adapt the notion of the &amp;quot;error stream&amp;quot; to the interpretation process. In the course of interpreting an utterance, the system 2In cases like these, the theorem prover or logic programming interpreter used to evaluate the logical formula typically assigns a conventional order to the conjuncts; note however that this is part of the procedural semantics of the theorem prover/interpreter, and does not follow from the declarative semantics of the logical formalism.</Paragraph>
      <Paragraph position="3">  translates it into successively &amp;quot;deeper&amp;quot; levels of representation. Each translation step has not only an input (the representation consumed) and an output (the representation produced), but also something we will refer to as a &amp;quot;meta-output': this provides information about how the translation was performed.</Paragraph>
      <Paragraph position="4"> At a high level of abstraction, our architecture will be as follows. Interpretation proceeds as a series of non-deterministic translation steps, each producing a set of possible outputs and associated metaoutputs. The final translation step produces an executable script. The interface attempts to simulate execution of each possible script produced, in order to determine what would happen if that script were selected; simulated execution can itself produce further meta-outputs. Finally, the system uses the meta-output information to decide what to do with the various possible interpretations it has produced.</Paragraph>
      <Paragraph position="5"> Possible actions incl~tde selection and execution of an output script, paraphrasing meta-output information back to the user, or some combination of the two.</Paragraph>
      <Paragraph position="6"> In the following section, we present a more detailed description showing how the output/metaoutput'distinction works in a practical system.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="55" end_page="56" type="metho">
    <SectionTitle>
3 A Prototype Implementation
</SectionTitle>
    <Paragraph position="0"> The ideas sketched out above have been realized as a prototype spoken language dialogue interface to a simulated version of the Personal Satellite Assistant (PSi; (PSA, 2000)). This section gives an overview of the implementation; in the following section, we focus on the specific aspects of dialogue management which are facilitated by the output/meta-output architecture. null</Paragraph>
    <Section position="1" start_page="55" end_page="55" type="sub_section">
      <SectionTitle>
3.1 Levels of Representation
</SectionTitle>
      <Paragraph position="0"> The real PSA is a miniature robot currently being developed at NASA Ames Research Center, which is intended for deployment on the Space Shuttle and/or International Space Station. It will be capable of free navigation in an indoor micro-gravity environment, and will provide mobile sensory capacity as a backup to a network of fixed sensors. The PSA will primarily be controlled by voice commands through a hand-held or head-mounted microphone, with speech and language processing being handled by an offboard processor. Since the speech processing units are not in fact physically connected to the PSA we envisage that they could also be used to control or monitor other environmental functions. In particular, our simulation allows voice access to the current and past values of the fixed sensor readings.</Paragraph>
      <Paragraph position="1"> The initial PSA speech interface demo consists of a simple simulation of the Shuttle. State parameters include the PSA's current position, some environmental variables such as local temperature, pressure and carbon dioxide levels, and the status of the Shuttle's doors (open/closed). A visual display gives direct feedback on some of these parameters.</Paragraph>
      <Paragraph position="2"> The speech and language processing architecture is based on that of the SRI CommandTalk system (Moore et al., 1997; Stent et al., 1999). The system comprises a suite of about 20 agents, connected together using the SRI Open Agent Architecture (OAA; (Martin et al., 1998)). Speech recognition is performed using a version of the Nuance recognizer (Nuance, 2000). Initial language processing is carried out Using the SRI Gemini system (Dowding et al., 1993), using a domain-independent unification.</Paragraph>
      <Paragraph position="3"> grammar and a domain-specific lexicon. The language processing grammar is compiled into a recognition grammar using the methods of (Moore et al., 1997); the n~ resnlt is that only grammatically well-formed utterances Gan be recognized. Output from the initial language-processing step is represented in a version of Quasi Logical Form (van Eijck and Moore, 1992), and passed in that form to the dialogue manager. We refer to these as linguistic level representations.</Paragraph>
      <Paragraph position="4"> The aspects of the system which are of primary interest here concern the dialogue manager (DM) and related modules. Once a linguistic level representation has been produced, the following processing steps occur: * The linguistic level representation is converted into a discourse level representation. This primarily involves regularizing differences in surface form: so, for example, &amp;quot;measure the pressure&amp;quot; and &amp;quot;what is the pressure?&amp;quot; have different representations at the linguistic level, but the same representation at the discourse level.</Paragraph>
      <Paragraph position="5"> * If necessary, the system attempts to resolve instances of ellipsis and anaphoric reference. For example, if the previous command was &amp;quot;measure temperature at flight deck&amp;quot;, then the new command &amp;quot;lower deck&amp;quot; will be resolved to an expression meaning &amp;quot;measure temperature at lower deck&amp;quot;. Similarly, if the previous command was &amp;quot;move to the crew hatch&amp;quot;, then the command &amp;quot;open it&amp;quot; will be resolved to &amp;quot;open the crew hatch&amp;quot;. We call the output of this step a resolved discourse level representation.</Paragraph>
      <Paragraph position="6"> * The resolved discourse level representation is converted into an executable script in a language essentially equivalent to a subset of CSHELL. This involves two sub-steps. First, quantified variables are given scope: for example, &amp;quot;go to the flight deck and lower deck and measure pressure&amp;quot; becomes something approximately equivalent to the script foreach x (flight_deck lower_deck) go_to Sx measure pressure end The point to note here is that :the foreach has scope over both the go_to and the measure actions; an alternate (incorrect) sCoping would be foreach x (flight_deck lower_deck)</Paragraph>
      <Paragraph position="8"> measure pressure The second sub-step is to attempt to optimize the plan. In the current example, this can be done by reordering the list (flight_deck lowerAeck). For instance, if the PSA is already at the lower deck, reversing the list will mean that the robot only makes one trip, instead of two. ,, The final step in the 'interlJretation process is plan evaluation: the syStem tries to work out what will happen if it actually executes the plan. (The relationship between plan evaluation and plan execution is described in more detail in Section 4.1). Among other things, this gives the dialogue manager the possibility of comparing different interpretations of the original command, and picking the one which is most efficient. null</Paragraph>
    </Section>
    <Section position="2" start_page="55" end_page="56" type="sub_section">
      <SectionTitle>
3.2 How Meta-outputs Participate in the
Translation
</SectionTitle>
      <Paragraph position="0"> The above sketch shows how context-dependent interpretation is arranged as a series of non-deterministic translation stepS; in each case, we have described the input and the output for the step in question. We now go back to the concerns of Section 2. First, note that each translation step is in general fallible. We give severalexamples: * One of the most obvious cases arises when the user simply issues an invalid command, such as requesting the PSA to open a door D which is already open. Here, one of the meta~outputs issued by the plan evaluation step will be the term presupposition_failure(already_open(D)); the DM can decide to paraphrase this back to the user as a surface string of the form &amp;quot;D is already open&amp;quot;. Note that plan evaluation does not involve actually executing the final script, which can be important. For instance, if the command is &amp;quot;go to the crew hatch and open it&amp;quot; and the crew hatch is already open, the interface has the option of informing the user that there is a problem without-first carrying out the &amp;quot;go to&amp;quot; action.</Paragraph>
      <Paragraph position="1">  * The resolution step can give rise to similar kinds of meta-output. For example, a command may include a referring expression that has no denotation, or an ambiguous denotation; for example, the user might say &amp;quot;both decks&amp;quot;, presumably being unaware that there are in fact three of them. This time, the meta-output produced</Paragraph>
      <Paragraph position="3"> representing the user'sincorrect belief abou\[ the number of decks. The DM then has the possibility of informing the user of this misconception by realizipg the meta-output term as the surface stung &amp;quot;in. fact there are three of them&amp;quot;. Ambiguous denotation occurs when a description is under-specified. For instance, the user might say &amp;quot;the deck&amp;quot; in a situation where there is no clearly salient deck, either in the discourse situation :or in the simulated world: here, the meta-output will be presupposition_failure ( under specif ied_def inite (deck)) which can be realized as the clarification question &amp;quot;which deck do you mean?&amp;quot; * A slightly more complex case involves plan costs. During plan evaluation, the system simulates execution of the output script while keeping track of execution cost. (Currently, the cost is just an estimate of the time required to execute the script). Execution costs are treated as meta-outputs of the form cost(C) and passed back through the interpreter so that the plan optimization step can make use of them.</Paragraph>
      <Paragraph position="4"> Finally,. we consider what happens when the system receives incorrect input from the speech recognizer. Although the recognizer's language model is constrained so that it can only produce grammatical utterances, it can still misrecognize one grammatical string as another one. Many of these cases fall into one of a small number of syntactic patterns, which function as fairly reliable indicators of bad recognition. A typical example is conjunction involving a pronoun: if the system hears &amp;quot;it and flight deck&amp;quot;, this is most likely a misrecognition of something like &amp;quot;go to flight deck&amp;quot;.</Paragraph>
      <Paragraph position="5"> During the processing phase which translates linguistic level representations into discourse level representations, the system attempts to match each misrecognition pattern against the input linguistic form, and if successful produces a meta-output of the form presupposition_failure ( dubious_if (&lt;Type&gt;)) These mete-outputs are passed down to the DM, which in the absence of sufficiently compelling contrary evidence will normally issue a response of the form &amp;quot;I'm sorry, I think I misheard you&amp;quot;.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="56" end_page="58" type="metho">
    <SectionTitle>
4 A Compact Architecture for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="56" end_page="56" type="sub_section">
      <SectionTitle>
Dialogue Management Based on
Scripts and Meta-Outputs
</SectionTitle>
      <Paragraph position="0"> None of the individual functionalities outlined above are particularly novel in themselves. What we find new and interesting-is the fact that they can all be expressed in a uniform way in terms of the script output/meta-output architecture. This section presents three examples illustrating how the architecture can be used to simplify the overall organization of the system.</Paragraph>
    </Section>
    <Section position="2" start_page="56" end_page="57" type="sub_section">
      <SectionTitle>
4.1 Integration of plan evaluation, plan
</SectionTitle>
      <Paragraph position="0"> execution and dialogue management.</Paragraph>
      <Paragraph position="1"> Recall that the DM simulates evaluation of the plan before running it, in order to obtain relevant metainformation. At plan execution time, plan actions result in changes to the world; at plan evaluation time, they result in simulated changes to the world and/or produce meta-outputs.</Paragraph>
      <Paragraph position="2"> Conceptualizing plans as scripts rather than logical formulas permits an elegant treatment of the execution/evaluation dichotomy. There is one script interpreter, which functions both as a script executive and a script evaluator, and one set of rules which defines the procedural semantics of script actions. Rules are parameterized by execution type which is either &amp;quot;execute&amp;quot; or &amp;quot;evaluate&amp;quot;. In &amp;quot;evaluate&amp;quot; mode, primitive actions modify a state vector which is threaded through the interpreter; in &amp;quot;execute&amp;quot; mode, they result in commands being sent to (real or simulated) effector agents. Conversely, &amp;quot;meta-information&amp;quot; actions, such as presupposition failures, result in output being sent to the meta-output stream in &amp;quot;evaluate&amp;quot; mode, and in a null action in &amp;quot;execute&amp;quot; mode. The upshot is that a simple semantics can be assigned to rules like the following one, which defines the action of attempting to open a door which may already be open:</Paragraph>
      <Paragraph position="4"/>
    </Section>
    <Section position="3" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
4.2 Using meta-outputs to choose between
</SectionTitle>
      <Paragraph position="0"> interpretations As described in the preceding section, the resolution step is in general non-deterministic and gives rise to meta-outputs which describe the type of resolution carried out. For example, consider a command involving a definite description, like &amp;quot;open the door&amp;quot;. Depending on the preceding context, resolution will produce a number of possible interpretations; &amp;quot;the door&amp;quot; may be resolved to one or more contextually available doors, or the expression may be left unresolved. In each case, the type of resolution used appears as a meta-output, and is available to the dialogue manager when it decides which interpretation is most felicitous. By default, the DM's strategy is to attempt to~pply antecedents for referring expressions, preferring gh~ most recently occurring sortally appropriate candidate. In some cases, however, it is desirable to allow the default strategy to be overridden: for instance, it may result in a script which produces a presupposition failure during plan evaluation. Treating resolution choices and plan evaluation problems as similar types of objects makes it easy to implement this kind of idea.</Paragraph>
    </Section>
    <Section position="4" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
4.3 Using meta-outputs to choose between
</SectionTitle>
      <Paragraph position="0"> dialogue management moves Perhaps the key advantage of our architecture is that collecting together several types of information as a bag of meta-outputs simplifies the top-level structure of the dialogue manager. In our application, the critical choice of dialogue move comes after the dialogue manager has selected the most plausible interpretation. It now has to make two choices. First, it must decide whether or not to paraphrase any of the meta-outputs back to the user; for example, if resolution was unable to fill some argument position or find an antecedent for a pronoun, it may be appropriate to paraphrase the corresponding meta-output as a question, e.g. &amp;quot;where do you mean?&amp;quot;, or &amp;quot;what do you mean by 'it' ?&amp;quot;. Having all the meta-outputs available together means that the DM is able to plan a coherent response: so if there are several meta-outputs which could potentially be worth paraphrasing, it typically realizes only the most important one.</Paragraph>
      <Paragraph position="1"> Second, if interpretation was able to produce a well-formed plan, the DM currently has the three options of executing it, paraphrasing it back to the user as a confirmation question, or doing nothing.</Paragraph>
      <Paragraph position="2"> Once again, this decision often requires global information about what has happened during the interpretation process. For example, knowing that plan evaluation showed that the plan would take significant time to execute makes it more plausible that the user would prefer to receive a confirmation.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="58" end_page="58" type="metho">
    <SectionTitle>
5 Examples
</SectionTitle>
    <Paragraph position="0"> This section presents a Commented sample dialogue with the implemented PSA interface demo chosen to illustrate the points discussed inthe previous sections. We focus in particular On ,the :role that meta-outputs play in interpretation. Text preceded by USER represents spoken utterances from the user.</Paragraph>
    <Paragraph position="1"> Text preceded by PSA represents synthesized spoken responses from the PSA. Text inside square brackets describes the actions carried out bythe PSA.</Paragraph>
    <Paragraph position="2"> 1. The PSA can move to various locations and measure environmental factors. In the first exzhange, reference resolution is used to interpret the phrase &amp;quot;all three decks&amp;quot; as referring to the lower deck, mid deck and flight deck. Plan optimization finds the most efficient way to order the locations, an d paraphrases the resulting plan back to the user.</Paragraph>
    <Paragraph position="3"> USER: Go to all three decks and measure carbon dioxide. ~ ' PSA: I will move to flight,deck, mid deck and then lower deck and I will measure carbon ' dioxide level, okay? USER: Okay. \[PSA moves to flight deck\] PSA: The carbon dioxide level at the flight deck is one percent. \[PSA: moves to mid deck\] PSA.: The carbon dioxide level at the mid deck is one percent. \[PSA moves to lower deck\] PSA: The carbon dioxide level at the lower deck is one percent. ' 2. The next example involves resolution of ellipsis. The relevant context is recovered from the preceding dialogue.</Paragraph>
    <Paragraph position="4"> USER: Do the same for the pilot's seat.</Paragraph>
    <Paragraph position="5"> PSA: I will move to the pilot's seat and I will measure carbon dioxide level, okay? USER: Right. \[PSA moves to pilot's seat\] PSA: The carbon dioxide level at the pilot's seat is one percent.</Paragraph>
    <Paragraph position="6"> 3. Resolution of ellipsis can involve filling in unspecified semantic parameters to a syntactically complete command. It is not meaningful to talk about a sensor reading without reference to the place at which that reading was carried out; in the following example, resolution assumes that the location in question is the one most recently mentioned in the dialogue.</Paragraph>
    <Paragraph position="7"> Note also that since the PSA does not have to move in order to carry out the command, and</Paragraph>
    <Paragraph position="9"> hence can execute it immediately, the DM reasonably decides to skip the confirmation question. This decision is made based on the meta-Output encoding the estimated cost of executing the :script.</Paragraph>
    <Paragraph position="10"> 'i USER: What was the carbon dioxide level at fifteen oh five according to the fixed sensors? null PSA: According to the fixed sensors, at fifteen oh five the carbon dioxide level at the pilot's seat was one percent. -.</Paragraph>
    <Paragraph position="11"> The system encounters a user misconception, which results in a &amp;quot;presupposition failure&amp;quot; meta-output being produced during reference resolutiom It PSonsequently decides to alert the user to this, rather than attempt to carry out the command.</Paragraph>
    <Paragraph position="12"> USER: Close both doors.</Paragraph>
    <Paragraph position="13"> PSA: There are in fact three of them.</Paragraph>
    <Paragraph position="14"> The user utterance includes a pronoun, &amp;quot;it&amp;quot;. The system resolves &amp;quot;it&amp;quot; to. &amp;quot;the crew hatch&amp;quot;, and issues an appropriate paraphrase in order to confirm that this is what the user meant.</Paragraph>
    <Paragraph position="15"> USER: Go to crew hatch and close it.</Paragraph>
    <Paragraph position="16"> PSA: I will go to crew hatch and I will close crew hatch, okay? USER: Yeah. \[PSA moves to crew hatch and closes it\] The user asks the system to close the door, but it is unclear which door they are referring to. An &amp;quot;ambiguous reference&amp;quot; meta-output is produced by reference resolution, resulting in the DM issuing a clarification question.</Paragraph>
    <Paragraph position="17"> USER: Close the door.</Paragraph>
    <Paragraph position="18"> PSA: Which door do you mean? The user response is successfully interpreted by reference resolution as an instance of ellipsis. Trying to close the crew hatch a second time however causes a presupposition failure meta-output to be generated during plan evaluation. The system informs the user of this and takes no action.</Paragraph>
    <Paragraph position="19"> USER: The crew hatch.</Paragraph>
    <Paragraph position="20"> PSA: The crew hatch is already closed.</Paragraph>
    <Paragraph position="21"> The user presents three locations in a suboptimal order. Plan evaluation finds a more efficient ordering.</Paragraph>
    <Paragraph position="22"> USER: Move to storage lockers, commander's seat and flight deck and measure temperature. null PSA: I will move to flight deck, commander's seat and then storage lockers and I will measure temperature, okay? USER: Sure.</Paragraph>
    <Paragraph position="23"> \[PSA moves to flight deck, commander's seat and storage lockers in that order and reports temperature at each location\] 9. The user asks the system to repeat the previous command. Since the PSA is now starting from a different position, its most efficient plan is no longer the same, and it informs the user of this. USER: Do that again.</Paragraph>
    <Paragraph position="24"> 'PSA: I will move to storage lockers, commander's seat and then flight deck and I will measure temperature, okay? USER: Yes. \[PSA moves to storage lockers\] PSA: The temperature at the storage lockers is 19.9 degrees Celsius. \[PSA starts moving to commander's seat\] 10. The user interrupts the PSA in the middle of the plan, and asks it to go back to its last stopping place. Since &amp;quot;stop&amp;quot; may involve a potentially hazardous condition, the interface obeys immediately, without attempting to confirm first. USER: Stop. \[PSA stops\] USER: Go back. \[PSA returns to storage lockers\] null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML