File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-2027_metho.xml
Size: 14,077 bytes
Last Modified: 2025-10-06 14:07:03
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-2027"> <Title>Evaluating Automatic Dialogue Strategy Adaptation for a Spoken Dialogue System</Title> <Section position="3" start_page="0" end_page="203" type="metho"> <SectionTitle> 2 MIMIC: An Adaptive Mixed Initiative </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Dialogue System </SectionTitle> <Paragraph position="0"> MIMIC is a voice-enabled, telephone-based dialogue system that provides movie showtime information. Two main features distinguish MIMIC from other existing spoken dialogue systems. First, MIMIC automatically adapts dialogue strategies based on dialogue characteristics and thus tailors its behavior to the dialogue context. Second, MIMIC's initiative module is decoupled from the dialogue manager's goal and strategy selection processes, thereby enabling the system's adaptation behavior to be easily modified. We outline characteristics of MIMIC relevant to our experiments and analyses. For further details about the system, see (Chu-Carroll, 2000).</Paragraph> </Section> <Section position="2" start_page="0" end_page="202" type="sub_section"> <SectionTitle> 2.1 Adaptation of Dialogue Strategies </SectionTitle> <Paragraph position="0"> Figure 1 showed an actual interaction between MIMIC and a user. The task and dialogue initiative holders annotated for each system turn are the factors that affect MIMIC's dialogue strategy adaptation. Following (Chu-Carroll and Brown, 1998), task initiative tracks the Utterance (1) S: Hello, this is MIMIC, the movie information system. (2) How can I help you? (3) U: What time is Analyze This playing in Montclair? \[what time is analyze this playing in montclair\] (4) S: What theater would you like? User (5) U: Montclair.</Paragraph> <Paragraph position="1"> \[montclair\] (6) S: Choices for theater are Wellmont Theater and Clearviews Screening Zone. System (7) What theater would you like? (8) U: Clearviews Screening Zone.</Paragraph> <Paragraph position="2"> lure movie screening zone\] (9) S: Analyze This is playing at Clearviews Screening Zone in Montclair at 1:15pm, 4:10pro, System and 6:30pm.</Paragraph> <Paragraph position="3"> (10) Please say the name of the movie or theater or town you would like information about. (11) U: What time is Analyze This playing at the Wellmont Theater? \[what time is analyze this playing at the wellmont theater\] (12) S: Analyze this is playing at Wellmont Theater in Montclair at 3:15pm, 5:45pm, 8:20pm, User and 10:10pm.</Paragraph> <Paragraph position="4"> (13) Can I help you with anything else? lead in the process toward achieving the dialogue participants' domain goal, while dialogue initiative models the lead in determining the current discourse focus. In our information query application domain, MIMIC has task (and thus dialogue) initiative when its utterances are intended to provide helpful guidance toward achieving the user's domain goal, while it has dialogue but not task initiative if its utterances only specify the current discourse goal. 2 For example, as a result of MIMIC taking over task initiative in (6), helpful guidance, in the form of valid response choices, was provided in its attempt to obtain a theater name after the user failed to answer an earlier question intended to solicit this information. In (4), MIMIC specified the current discourse goal (requesting information about a missing theater) but did not suggest valid response choices since it only had dialogue initiative.</Paragraph> <Paragraph position="5"> MIMIC's ability to automatically adapt dialogue strategies is achieved by employing an initiative module that determines initiative distribution based on participant roles, cues detected during the current user utterance, and dialogue history (Chu-Carroll and Brown, 1998). This initiative framework utilizes the Dempster-Shafer theory (Shafer, 1976; Gordon and Shortliffe, 1984), and represents the current initiative distribution as two basic probability assignments (bpas) that signify the overall amount of evidence supporting each agent having task and dialogue initiatives. The effects that a cue has on changing the current task and dialogue initiative distribution are also represented as bpas, obtained using an iterative training procedure on a corpus of transcribed 21n the dialogues collected in our experiments, which are described in Section 3, there are system turns in which MIMIC had neither task nor dialogue initiative. However, such cases are rare in this domain and will not be discussed in this paper.</Paragraph> <Paragraph position="6"> and annotated human-human dialogues. At the end of each user turn, the bpas representing the effects of cues detected during that turn are combined with the bpas representing the current initiative distribution to obtain the initiative distribution for the system's next turn.</Paragraph> <Paragraph position="7"> In Figure 1, utterance (3) triggered the cue AmbiguousAction since the town of Montclair has multiple theaters. Although AmbiguousAction contributed to the system having both initiatives in the next turn, the effect of this cue, when combined with the initiative distribution of (3), resulted in MIMIC having dialogue but not task initiative in (4). However, in addition to triggering AmbiguousAction, utterance (5) also triggered NoNewlnfo, since it did not contribute any new information. The effect of these two cues, combined with the initiative distribution in (5), was enough to lead MIMIC to take over task initiative in (6).</Paragraph> </Section> <Section position="3" start_page="202" end_page="203" type="sub_section"> <SectionTitle> 2.2 Modification of Adaptation Behavior </SectionTitle> <Paragraph position="0"> MIMIC employs an initiative module which is decoupied from its goal selection process, while allowing the outcome of both processes to guide its response strategy selection. A major advantage of this design is that changes in the system's strategy selection behavior, and thus its adaptation behavior, can be achieved by modifying parameters in the initiative module, while leaving the goal and strategy selection processes intact. The initiative module determines initiative distribution based on participant roles, cues observed in the current user utterance, and dialogue history. Although all three factors may play a role in modifying MIMIC's adaptation behavior, we discuss how the effects of participant roles and dialogue history are modified to create systems used as bases of comparison in our experiments.</Paragraph> <Paragraph position="1"> Participant roles affect the default initiative distribu- null Utterance (14) S: Hello, this is MIMIC, the movie information system. (15) Please say the name of the movie or theater or town you would like information about. (16) U: Analyze This.</Paragraph> <Paragraph position="2"> \[analyze this1 (17) S: Choices for task are when and where. System (18) What would you like to know about Analyze This? (19) U: When.</Paragraph> <Paragraph position="3"> \[when is\] (20) S: Choices for theater are Middlebrook Galeria, Loews East Hanover, Montgomery Cinemas, System Atlantic Triplex Cinema, and others.</Paragraph> <Paragraph position="4"> (21) What theater would you like? (22) U: Montclair.</Paragraph> <Paragraph position="5"> \[montclair\] (23) S: Choices for theater are Wellmont Theater and Clearviews Screening Zone. System (24) What theater would you like? (25) U: Wellmont.</Paragraph> <Paragraph position="6"> \[wellmontl (26) S: Analyze This is playing at Wellmont Theater in Montclair at 3:15pm, 5:45pm, 8:20pm, System and 10:10pm.</Paragraph> <Paragraph position="7"> (27) Please say the name of the movie or theater or town you would like information about. (28) U: Analyze This in Montclair.</Paragraph> <Paragraph position="8"> \[analyze this in montclair\] (29) S: Choices for task are location, what, where, and when. System (30) What would you like to know about Montclair? (31) U: When.</Paragraph> <Paragraph position="9"> (32) \[when\] (33) S: Choices for theater are Wellmont Theater and Clearviews Screening Zone System (34) What theater would you like? (35) U: Clearviews Screening Zone.</Paragraph> <Paragraph position="10"> \[screening zone\] (36) S: Analyze This is playing at Clearviews Screening Zone in Montclair at 1:15pm, 4:10pm, System and 6:30pm.</Paragraph> <Paragraph position="11"> (37) Please say the name of the movie or theater or town you would like information about. tion for MIMIC's initial utterance. For naive users, it may be more appropriate for MIMIC to employ system-initiative strategies. This system, MIMIC-SI, can be achieved by setting the default initiative distribution to provide definite evidence for the system having both task and dialogue initiatives) Figure 2 shows a dialogue in which a user interacts with MIMIC-SI to obtain the same information as in Figure 1. MIMIC-SI prompts the user for one piece of information at a time and provides (sometimes unnecessary) assistance, in the form of valid dialogue moves, during each turn.</Paragraph> <Paragraph position="12"> Taking into account dialogue history allows MIMIC to determine the initiative distribution based on the cumulative effect of previously observed cues. This effect can be disabled so that only local dialogue context affects the initiative distribution for the next turn. Based on MIMIC's parameter settings, this adjustment results in MIMIC-MI, a system which always has dia3Based on the Dempster-Shafer theory, if the bpas of the default initiative distribution or those of a detected cue provide definite evidence for drawing a certain conclusion, then no subsequent cues have any effect on changing that conclusion.</Paragraph> <Paragraph position="13"> logue initiative but never has task initiative. MIMIC-MI behaves similarly to many existing non-adaptive mixed initiative dialogue systems (e.g., (Bennacef et al., 1996; Papineni et' al., 1999)). Figure 3 shows a dialogue involving MIMIC-MI in which the user again attempts to acquire the same information as in the previous two dialogues. Notice that (41)-(49) constitute a failed attempt to solicit a theater name from the user. Instead of providing helpful information as in (6) in Figure 1, MIMIC-MI relied on the user to change her problem-solving strategy in (50) to obtain the information needed to answer the system's question.</Paragraph> </Section> </Section> <Section position="4" start_page="203" end_page="205" type="metho"> <SectionTitle> 3 Experimental Design </SectionTitle> <Paragraph position="0"> Our main goal in evaluating MIMIC is to determine whether users find the mixed initiative and automatic adaptation aspects of its dialogue strategies useful. We compared MIMIC to two control systems: MIMIC-SI and MIMIC-MI, since they employ dialogue management strategies similar to those in many existing systems. The comparison between MIMIC and MIMIC-SI Utterance (38) S: (39) (40) U: (41) S: (42) U: (43) S: (44) U: (45) S: (46) U: (47) S: (48) U: (49) S: (50) U: (51) S: (52) U: (53) S: Where in Montclair is Analyze This playing? \[where in montclair is analyze this playing\] focused on the contribution of mixed-initiative dialogue management, while the comparison between MIMIC and MIMIC-MI emphasized the contribution of automatic strategy adaptation. The following three factors were controlled in our experiments:</Paragraph> <Section position="1" start_page="204" end_page="205" type="sub_section"> <SectionTitle> User System User System User System User System User System User System User System User System User System </SectionTitle> <Paragraph position="0"> 1. System version: For each experiment, two systems were used: MIMIC and a control system. In the first experiment MIMIC was compared with MIMIC-SI, and in the second experiment, with MIMIC-MI.</Paragraph> <Paragraph position="1"> 2. Order: For each experiment, all subjects were ran null domly divided into two groups. One group performed tasks using MIMIC first, and the other group used the control system first.</Paragraph> <Paragraph position="2"> (b) Difficult Task 3. Task difficulty: 3-4 tasks which highlighted differ null Eight subjects 4 participated in each experiment. Each of the subjects interacted with both systems to perform all tasks. The subjects completed one task per call so that the dialogue history for one task did not affect the next task. Once they had completed all tasks in sequence using one system, they filled out a questionnaire to assess user satisfaction by rating 8-9 statements, similar to those in (Walker et al., 1997), on a scale of 1-5, where 5 indicated highest satisfaction. Approximately two days later, they attempted the same tasks using the other system. 5 These experiments resulted in 112 dialogues with approximately 2,800 dialogue turns.</Paragraph> <Paragraph position="3"> In addition to user satisfaction ratings, we automatically logged, derived, and manually annotated a number of features (shown in boldface below). For each task/subject/system triplet, we computed the task success rate based on the percentage of slots correctly filled in on the task worksheet, and counted the # of calls needed to complete each task. 6 For each call, the userside of the dialogue was recorded, and the elapsed time of the call was automatically computed. All user utterances were logged as recognized by our automatic speech recognizer (ASR) and manually transcribed from the recordings. We computed the ASR word error rate, ASR rejection rate, and ASR timeout rate, as well as # of user turns and average sentence length for each task/subject/system triplet. Additionally, we recorded the cues that the system automatically detected from each user utterance. All system utterances were also logged, along with the initiative distribution for each system turn and the dialogue acts selected to generate each system response.</Paragraph> </Section> </Section> class="xml-element"></Paper>