File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2131_metho.xml
Size: 14,708 bytes
Last Modified: 2025-10-06 14:15:01
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2131"> <Title>An Architecture for Dialogue Management, Context Tracking, and Pragmatic Adaptation in Spoken Dialogue Systems</Title> <Section position="2" start_page="0" end_page="794" type="metho"> <SectionTitle> 1 Component Tasks of Discourse </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Processing </SectionTitle> <Paragraph position="0"> We divide discourse-level processing into three component tasks: Dialogue Management, Context Tracking, and Pragmatic Adaptation.</Paragraph> </Section> <Section position="2" start_page="0" end_page="794" type="sub_section"> <SectionTitle> 1.1 Dialogue Management </SectionTitle> <Paragraph position="0"> The Dialogue Manager is an oversight module whose purpose is to facilitate the interaction between dialogue participants. In a user-initiated system, the dialogue manager directs the processing of an input utterance from one component to another through interpretation and back-end system response. In the process, it detects and handles dialogue trouble, invokes the context tracker when updates are necessary, generates system output, and so on.</Paragraph> <Paragraph position="1"> Our conception of Dialogue Manager as controller becomes increasingly relevant as the software system moves away from the standard &quot;NL pipeline&quot; in order to deal with dialogue disfluencies. Its oversight perspective affords it (and the architecture) certain capabilities, which are listed in Table 1.</Paragraph> <Paragraph position="2"> 1 Supports mixed-initiative system by fielding spontaneous input from either participant and routing it to the appropriate components.</Paragraph> <Paragraph position="3"> 2 Supports non-linguistic dialogue &quot;events&quot; by accepting them and routing them to the Context Tracker (below).</Paragraph> </Section> </Section> <Section position="3" start_page="794" end_page="794" type="metho"> <SectionTitle> 3 Increases overall system performance. For example, </SectionTitle> <Paragraph position="0"> awareness of system output allows the Dialogue Manager to predict user input, boosting speech recognition accuracy. Similarly, if the back-end introduces a new word into the discourse, the Dialogue Manager can request the speech recognizer to add it to its vocabulary for later reco\[nition.</Paragraph> </Section> <Section position="4" start_page="794" end_page="794" type="metho"> <SectionTitle> 4 Supports meta-dialogues between the dialogue sys- </SectionTitle> <Paragraph position="0"> tem itself and either participant. An example might be a participant's questions about the status of the dialo\[ue s2/stem.</Paragraph> <Paragraph position="1"> Acts as a central point for dialogue troubleshooting, after (Duff et al. 1996). If any component has insufficient input to perform its task, it can alert the Dialogue Manager, which can then reconsult a previously invoked component for different output.</Paragraph> <Paragraph position="2"> The Dialogue Manager is the primary locus of the dialogue agent's outward personality as a function of interaction style; its simple protocol specifies conditions for interrupting user speech for permitting interruption by the user, when to initiate repair dialogues, and how often to backchannel. null</Paragraph> <Section position="1" start_page="794" end_page="794" type="sub_section"> <SectionTitle> 1.2 Context Tracking </SectionTitle> <Paragraph position="0"> The Context Tracker maintains a record of the discourse context which it and other components can consult in order to (a) resolve dependent forms that occur in input utterances and (b) generate appropriate context-dependent forms for achieving natural output. Interpretation of definite pronouns, demonstratives (this, those), indexicals (you, now, here, tomorrow), definite NPs (a car...the car), one-anaphora (the earlier one) and ellipsis (how about Seattle) all rely on stored context.</Paragraph> <Paragraph position="1"> The Context Tracker strives to record only those entities and events that could become eligible for reference. Context thus includes linguistic communicative acts (verbalizations), non-linguistic communicative acts (gesture), and non-communicative events that are deemed salient.</Paragraph> <Paragraph position="2"> Since determining salience requires a judgement, our implementations rely on heuristic rules to decide which events and objects get entered into the context representation. For example, the disappearance of a simulated vehicle off the edge of a map display might be deemed salient relative to a particular user model, the discourse history, or the task structure.</Paragraph> </Section> <Section position="2" start_page="794" end_page="794" type="sub_section"> <SectionTitle> 1.3 Pragmatic Adaptation </SectionTitle> <Paragraph position="0"> The Pragmatic Adaptation module serves as the boundary between language and action by determining what action to take given an interpreted input utterance or a back-end response.</Paragraph> <Paragraph position="1"> This module's role is to &quot;make sense&quot; of a communicative act in the current linguistic and non-linguistic context.</Paragraph> <Paragraph position="2"> The Pragmatic Adapter receives an interpretation of an input utterance with context-dependent forms resolved. It then proceeds to translate that utterance into a valid back-end command. It checks for violations of the Domain Model, which contains information about the back-end system such as allowable parameter values for command arguments. It also checks for commands that are infelicitous given the current Back-end State (e.g., the referenced vehicle does not exist at the moment). The Pragmatic Adapter combines the result of these simple tests and a set of if-then heuristics to determine whether to send through the command or to intercept the utterance and notify the Dialogue Manager to initiate a repair dialogue with the user.</Paragraph> <Paragraph position="3"> The Pragmatic Adapter receives output responses from the back-end and adapts or &quot;translates&quot; them into natural language communications which get incorporated by the Context Tracker into the dialogue history.</Paragraph> </Section> </Section> <Section position="5" start_page="794" end_page="796" type="metho"> <SectionTitle> 2 An Architecture for Spoken </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="794" end_page="796" type="sub_section"> <SectionTitle> Dialogue Systems </SectionTitle> <Paragraph position="0"> Having introduced our three discourse components, we now present our overall architecture.</Paragraph> <Paragraph position="1"> It is laid out in Figure 1, and its components are described in Table 2, starting from the user and going clockwise. The discourse components are left in white, while non-discourse components have been shaded gray.</Paragraph> <Paragraph position="2"> Convert back-end response to logical form representation of communicative act Track discourse entities of output utterance, insert dependent references (if desired) High-level control, intelligently route information between all agents and participants (see section 1.1) based on its own protocol for interaction.</Paragraph> <Paragraph position="3"> Several items are of note in Figure 1 and Table 2. First, although a default firing order is shown, this order is perturbed any time dialogue trouble arises. For example, a Speech Recognition (SR) error, may be detected after Natural Language Interpretation fails to parse the output of SR. Rather than continuing the flow on towards the back-end, the Dialogue Manager can re-consult SR for other hypotheses. Alternatively, the Dialogue Manager can fire Natural Language Generation with an output request for clarification. That request gets incorporated into the context representation by Context Tracking, the dialogue state is &quot;pushed&quot; in a repair dialogue, and a string is ultimately sent to Speech Synthesis for delivery to the user's ear. The next utterance is then interpreted in the context of the repair dialogue.</Paragraph> <Paragraph position="4"> Note also that Context Tracking and Pragmatics Adaptation are called twice each: on &quot;input&quot; (from the user), and on &quot;output&quot; (from the backend). The logical Context Tracker may be implemented as one or as two related modules, together tracking both sides of that dialogue so that either user or system can make anaphoric mention of entities introduced earlier.</Paragraph> </Section> </Section> <Section position="6" start_page="796" end_page="797" type="metho"> <SectionTitle> 3 A Near-Future Scenario of Spoken </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="796" end_page="797" type="sub_section"> <SectionTitle> Dialogue Systems 3.1 The Scenario </SectionTitle> <Paragraph position="0"> We build on images from the popular science fiction series Star Trek as a rich source of dialogue types in complex interrelations. These example dialogues have more primitive cousins under development today.</Paragraph> <Paragraph position="1"> Briefly, our example dialogue types are listed in The &quot;Food Replicator&quot; on Star Trek accepts structured English command language such as &quot;Tea. Earl Grey. Hot&quot; and produces results in the physical world.</Paragraph> <Paragraph position="2"> The ship's computer on Star Trek is an advanced application which can understand natural language queries, and replies either via actions or via a multimodal interface.</Paragraph> <Paragraph position="3"> &quot;Data&quot; on Star Trek converses as a human while providing information processing of a computer and is capable of action in the physical world. Star Trek's &quot;Universal Translator&quot; is capable of automatically interpreting between any two humans The ship's computer has the ability to retrieve, play back, and analyze previously-recorded conversations. In this sense, the dialogue becomes empirical data to be analyzed.</Paragraph> <Paragraph position="4"> Star Trek's &quot;Holodeck&quot; creates simulated humans (or characters) as actors, for the entertainment or training of human viewers.</Paragraph> </Section> <Section position="2" start_page="797" end_page="797" type="sub_section"> <SectionTitle> 3.2 Application of the Architecture to the Scenario </SectionTitle> <Paragraph position="0"> We now describe the role our architecture, and specifically our discourse components, play in these near-future examples.</Paragraph> <Paragraph position="1"> 3.2.1 Dialogue with a Back-End Computer The first three examples illustrate dialogues in which a human is talking to a computer. One dimension distinguishing the three examples is the agent's intelligent use of context. In a dialogue with an &quot;appliance&quot;, simple, structured, unambiguous command language utterances are interpreted one at a time in isolation from prior dialogue history. The Pragmatic Adaptation facility can follow a simple scheme for mapping each utterance to one of a very few back-end commands. The Context Tracker has no cross-sentence dependent references to contend with, and finally, since the appliance provides no linguistic feedback, the Dialogue Manager fires none of the &quot;output&quot; components (from back-end to human). In a dialogue with more sophisticated application or with a robot, the Dialogue Manager, Context Tracker, and Pragmatic Adapter need greater functionality, to handle both linguistic and non-linguistic events in both directions.</Paragraph> <Paragraph position="2"> The fourth example, that of the Universal Translator, is representative of a general dialogue type we label Mediator, in which an agent plays a mediation role between humans. In addition to interpretation, other roles of the mediator might be (Table 4): lediatorRol~ A Genie, which is available for meta-dialogues with the system itself, instead of with the dialogue partner (much as a human might ask an interpreter to repeat the partner's last utterance).</Paragraph> <Paragraph position="3"> A Moderator, which, in multi-party dialogues, enforces an agreed-upon interaction protocol, such as Robert's Rules of Order or a talk-show format (under control of the host).</Paragraph> </Section> </Section> <Section position="7" start_page="797" end_page="797" type="metho"> <SectionTitle> 3 A Bouncer, which decides who may join the dialogue </SectionTitle> <Paragraph position="0"> based on current enrollment (first-come-first-served), clearance level, invitation list, etc., as well as permitting different types of participation, so that some may only listen while others may fully participate.</Paragraph> </Section> <Section position="8" start_page="797" end_page="797" type="metho"> <SectionTitle> 4 A Stenographer, which records the dialogue, and </SectionTitle> <Paragraph position="0"> prepares a &quot;visualization&quot; of the dialogue structure.</Paragraph> <Paragraph position="1"> Table 4. Roles of a Mediator Agent Our architecture is applicable to mediated dialogues as well. In fact, it was first developed for bilingual dialogue in a voice-to-voice machine translation application. In this application, the Dialogue Manager is available for meta-dialogues with either user (as in Could you repeat her last utterance?), and the Context Tracker can use a single discourse representation structure to track the unfolding context in both languages.</Paragraph> <Paragraph position="2"> Our fifth example, a post-hoc analysis of a dialogue, does not require real-time processing. It is, nonetheless, a dialogue which can be analyzed using the components of our architecture, exactly as if it were real-time. The only difference is that no generation will be required, only analysis; thus, the Dialogue Manager need only fire the &quot;input&quot; components on each utterance. Our last example concerns a simulated human dialogue between two computer characters, for the benefit of human viewers. Such charactercharacter dialogues have been produced by several researchers, including (Kalra et al. 1998). Here, the architecture applies at two levels.</Paragraph> <Paragraph position="3"> First, the architecture can be internal to each agent, to implement that agent's conversational ability. Second, the architecture can be used externally to analyze the agents' dialogue, as discussed in the previous section.</Paragraph> </Section> class="xml-element"></Paper>