File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/j97-1006_metho.xml

Size: 34,134 bytes

Last Modified: 2025-10-06 14:14:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-1006">
  <Title>Smith and Gordon Human-Computer Dialogue INPUTS OUTP\[.q ~ Current Computer Goal Current User Focus Dialog Mode Computer Response Selection Algorithm Selected Task Goal</Title>
  <Section position="5" start_page="144" end_page="149" type="metho">
    <SectionTitle>
relationship
ELSE
</SectionTitle>
    <Paragraph position="0"> select as the next goal an uncommunicated fact relevant to the user focus ELSE IF Mode = passive THEN select as a goal that the user learn the computer has processed the user's last utterance Figure 2 Computerresponse selection algorithm. .</Paragraph>
    <Paragraph position="1"> .</Paragraph>
    <Paragraph position="2"> Declarative: User now has control.' Consequently, the selected goal must be a relevant fact. The previous goal is converted to &amp;quot;user learn that the light is on when the switch is up.&amp;quot; Passive: User has complete control. Computer simply acknowledges processing the last user utterance.</Paragraph>
    <Paragraph position="3"> This response selection process has been implemented as part of the previously mentioned Circuit Fix-It Shop. The two dialogues of Figure 3, obtained from actual usage of the implemented system, illustrate differences between the two modes in  Computational Linguistics Volume 23, Number 1 which the system was experimentally evaluated: directive and declarative. Note the following phenomena in these dialogues.</Paragraph>
    <Paragraph position="4"> .</Paragraph>
    <Paragraph position="5"> .</Paragraph>
    <Paragraph position="6"> In the directive mode dialogue, the subject is performing task goals under the close guidance of the computer. There is language interaction about each task goal. 6 In the declarative mode dialogue, the subject independently carries out several task goals, known to be necessary, without any interaction. By allowing the user to arbitrarily change subdialogues, the computer is able to provide relevant assistance when a potential problem is reported without requiring language interaction for the task goals already completed.</Paragraph>
    <Paragraph position="7"> A variable initiative dialogue system is just the first step toward the more important objective of a mixed-initiative dialogue system. In a mixed-initiative interaction, initiative can vary between the participants throughout the dialogue. Given that our first priority in experimentally evaluating the system was to demonstrate that behavior varied as a function of initiative, it was necessary to fix the level of initiative for the duration of a session. We next review the work of others who have examined issues in mixed-initiative interaction.</Paragraph>
    <Section position="1" start_page="145" end_page="147" type="sub_section">
      <SectionTitle>
2.3 Prior Theoretical Work on Mixed-Initiative
</SectionTitle>
      <Paragraph position="0"> Whittaker and Stenton (1988) propose a definition for dialogue control based on the utterance type of the speaker (question, assertion, command, or prompt) as follows:  * Question: The speaker has control unless the question directly followed a question or command by the other conversant.</Paragraph>
      <Paragraph position="1"> * Assertion: The speaker has control unless the assertion was a response to a question.</Paragraph>
      <Paragraph position="2"> * Command: The speaker has control.</Paragraph>
      <Paragraph position="3"> * Prompt: The listener has control because the speaker is abdicating  control.</Paragraph>
      <Paragraph position="4"> Walker and Whittaker (1990) apply these dialogue control rules to analyze how initiative affects discourse structure. Their analysis distinguishes between advisory dialogues and task-oriented dialogues, but they do not allow for the possibility that the novice in a task-oriented dialogue can gain knowledge over time and want more control of the dialogue. Their key observations include the following: Transfer of control is often a collaborative phenomenon. Since a noncontrolling participant has the option of seizing control at any moment, then the controlling participant must have control because the noncontroller allows it.</Paragraph>
      <Paragraph position="5"> 6 For brevity, dialogue 1 represents one of the simplest directive mode interactions that could occur. For more complex examples, the transcripts of the dialogues collected during the experiment are available by anonymous FTP. Contact the first author for details.</Paragraph>
      <Paragraph position="6">  i. C: This is the Circuit Fix-lt Shop. How may I help you? 2. U: I want to fix a circuit.</Paragraph>
      <Paragraph position="7"> 3. C: What is the ID of the circuit? 4. U: RSl11.</Paragraph>
      <Paragraph position="8"> 5. C: I am familiar with that circuit. The LED is supposed to be displaying alternately flashing one and seven.</Paragraph>
      <Paragraph position="9"> 6. U: The LED is off.</Paragraph>
      <Paragraph position="10"> 7. C: What is the switch at when the LED is off? 8. U: The switch is down.</Paragraph>
      <Paragraph position="11"> 9. C: Put the switch up.</Paragraph>
      <Paragraph position="12"> 10. U: Okay.</Paragraph>
      <Paragraph position="13"> II. C: Put the knob to one zero.</Paragraph>
      <Paragraph position="14"> 12. U: Okay.</Paragraph>
      <Paragraph position="15"> 13. C: What is the LED displaying? 14. U: LED is displaying only a flashing seven.</Paragraph>
      <Paragraph position="16"> 15. C: Is there a wire between connector eight four and connector nine nine? 16. U: No.</Paragraph>
      <Paragraph position="17"> 17. C: Add a wire between connector eight four and connector nine nine. 18. U: Done.</Paragraph>
      <Paragraph position="18"> 19. C: What is the LED displaying? 20. U: Alternately flashing one and seven.</Paragraph>
      <Paragraph position="19"> 21. C: Is the one on the LED displaying for a longer period of time? 22. U: No.</Paragraph>
      <Paragraph position="20"> 23. C: Is the seven on the LED displaying for a longer period of time? 24. U: No.</Paragraph>
      <Paragraph position="21"> 25. C: Put the knob to zero.</Paragraph>
      <Paragraph position="22"> 26. U: Okay.</Paragraph>
      <Paragraph position="23"> 27. C: What is the LED displaying? 28. U: Alternately displaying one and seven.</Paragraph>
      <Paragraph position="24"> 29. C: Is the one on the LED displaying for a longer period of time? 30. U: Yes.</Paragraph>
      <Paragraph position="25"> 31. C: Put the switch down.</Paragraph>
      <Paragraph position="26"> 32. U: The switch is down.</Paragraph>
      <Paragraph position="27"> 38. C: What is the LED displaying? 34. U; Nothing.</Paragraph>
      <Paragraph position="28"> 35. C: The circuit is working correctly. Good-bye. Dialogue 2 (Declarative Mode) 1. C: This is the Circuit Fix-It shop. How may I help you? 2. U: RSlll is not working.</Paragraph>
      <Paragraph position="29"> 3. C: I am familiar with that circuit. The LED is supposed to he displaying alternately flashing one and seven.</Paragraph>
      <Paragraph position="30"> NOTE: Subject put the switch up and set the knob to one zero, observed the LED display, and noted the potential problem without requiring any assistance from the computer.</Paragraph>
      <Paragraph position="31"> 4. U: There is no wire from connector nine nine to connector eight four. 5. C: There is supposed to be a wire between connector nine nine and connector eight four.</Paragraph>
      <Paragraph position="32"> NOTE: Subject then added the missing wire, and manually performed all necessary checks to verify the circuit was functioning properly. 6. U: The circuit is working.</Paragraph>
      <Paragraph position="33"> 7. C: Glad to have been of assistance. Good-bye.</Paragraph>
      <Paragraph position="34">  Figure 3 Sample dialogues: directive and declarative.</Paragraph>
      <Paragraph position="35">  Computational Linguistics Volume 23, Number 1 * The expert retains more control in the task-oriented dialogues, but there are still occasional control changes when the novice has to describe problems that are occurring while completing the task. t Summaries are more frequent in advisory dialogues due to the need for both participants to verify that they do share the mutual beliefs needed to develop the necessary plan.</Paragraph>
      <Paragraph position="36"> In Section 6.5 we investigate the relationship of this notion of dialogue control based on linguistic goals to our task goal notion of control. Kitano and Van Ess-Dykema (1991) extend the plan recognition model of Litman and Allen (1987) to consider mixed-initiative dialogue. Their key insight is the observation that the two participants may have different domain plans that can be activated at any point in the dialogue. Thus, there are speaker-specific plans instead of simply joint plans as in the Litman and Allen model. This separation of plans permits greater flexibility in the plan recognition process. Furthermore, they extend the initiative control rules proposed by Whittaker and Stenton to consider the utterance content by observing that a speaker has control when the speaker makes an utterance relevant to his or her speaker-specific domain plan. Although they do not consider a computational model for participating in mixed-initiative dialogues, their observation that there are speaker-specific plans or goals underlies the model that we propose.</Paragraph>
    </Section>
    <Section position="2" start_page="147" end_page="148" type="sub_section">
      <SectionTitle>
2.4 Theory Evaluation
</SectionTitle>
      <Paragraph position="0"> While WOZ simulation of directive and passive modes is feasible, the requirements for algorithmically determining the relationship between user focus and the computer goal make WOZ simulations of suggestive and declarative modes very difficult, especially given the fast response time necessary for spoken interaction. Before the construction of the Circuit Fix-It Shop, Moody (1988) conducted a Wizard-of-Oz study on the effects of restricted vocabulary on interactive spoken dialogue. Her data were the basis for the formulation of the experimental Circuit Fix-It Shop system. Although she attempted to acquire information concerning user behavior when users were given the initiative, she was unable to provide much information because her subjects did not interact with the system enough to evolve from novices to experts. Her attempts to yield the initiative to users still led to statements that guided users step-by-step through the task. By direct testing of a computer system that implements our proposed model of variable initiative dialogue, we could more rigorously control the system performance and more easily run repeated tests with subjects and allow them to gain task expertise.</Paragraph>
      <Paragraph position="1"> Simultaneously, we could more readily monitor the effects of the change in initiative setting while holding other system features constant.</Paragraph>
      <Paragraph position="2"> In testing our theory of variable initiative dialogue, there were two main types of phenomena we wished to examine: (1) general aspects of task efficiency, such as time to completion and number of utterances spoken; and (2) the nature of the dialogue structure. Results on task efficiency are reported in detail in Smith and Hipp (1994) and are briefly reviewed in Section 6.1. The primary contribution of this paper is to present an analysis of how the dialogue structure varies according to the computer's level of initiative. After reviewing some details about the overall dialogue-processing model and its implementation, in Section 3, and a review of the experimental environment, in Section 4, the remainder of the paper focuses on the results of this analysis, a review of some related analyses, and some concluding remarks about the usefulness of the analysis and the role of experimental natural language dialogue systems in modeling 'human-computer dialogue.</Paragraph>
      <Paragraph position="3">  Smith and Gordon Human-Computer Dialogue 3. Dialogue-Processing Model: An Integrated Approach</Paragraph>
    </Section>
    <Section position="3" start_page="148" end_page="148" type="sub_section">
      <SectionTitle>
3.1 Motivation and Overview
</SectionTitle>
      <Paragraph position="0"> Most prior work on natural language dialogue has either focused on individual sub-problems such as quantification, presuppositions, ellipsis, anaphoric reference, and user modeling, or else focused on dialogue-processing issues in database query applications. Examples of such dialogue systems are described in Allen, Frisch, and Litman (1982), Bobrow et al. (1977), Carberry (1988), Frederking (1988), Hafner and Godden (1985), Hendrix et al. (1978), Hoeppner et al. (1983), Jullien and Marty (1989), Kaplan (1982), Levine (1990), Peckham (1991), Seneff (1992), Waltz (1978), Wilensky et al. (1988), Young et al. (1989), and Young and Proctor (1989). However, there has been little work on integrating the various aspects of dialogue processing into a unified whole (exceptions are Allen et al. \[1995\] and Young et al. \[1989\]). Consequently, we developed a dialogue-processing model for task-oriented dialogues that when implemented in an electronic repair domain exhibits a number of important behaviors including: (1) problem-solving; (2) coherent subdialogue movement; (3) user model usage; (4) expectation usage; and (5) variable initiative behavior. We summarize the key features of the model below.</Paragraph>
      <Paragraph position="1"> * Theorem proving is used as the reasoning mechanism for determining when task goals are completed.</Paragraph>
      <Paragraph position="2"> * Consequently, the purpose for language during the dialogue is to acquire the missing axioms needed for proving task goal completion (i.e., The</Paragraph>
    </Section>
    <Section position="4" start_page="148" end_page="148" type="sub_section">
      <SectionTitle>
Missing Axiom Theory \[Smith 1992\]).
</SectionTitle>
      <Paragraph position="0"> * User model information is maintained as a set of axioms acquired from inferences based on user input. The axioms may then be used by the theorem prover.</Paragraph>
      <Paragraph position="1"> * Finally, integration of theorems, the utterances relevant to these theorems, and the expectations for responses that supply missing axioms yields a constructive method for creating and using a discourse model first proposed by Grosz and Sidner (1986), but for which they did not offer a method of dynamic construction during the course of a dialogue. Furthermore, the model enables the system to engage in variable initiative dialogue as outlined in Section 2. The interested reader is referred to Smith, Hipp, and Biermann (1995) for further details about the overall model.</Paragraph>
    </Section>
    <Section position="5" start_page="148" end_page="149" type="sub_section">
      <SectionTitle>
3.2 System Implementation
</SectionTitle>
      <Paragraph position="0"> We constructed the Circuit Fix-It Shop based on the details of our dialogue-processing model. The system was originally implemented on a Sun 4 workstation with the majority of the code written in Quintus Prolog and the parser in C. The system assists users in the repair of a Radio Shack 160 in One Electronic Proiect Kit. The system can detect errors caused by missing wires as well as a dead battery.</Paragraph>
      <Paragraph position="1"> Speech recognition is performed by a Verbex 6000 running on an IBM PC. To improve speech recognition performance, we restrict the vocabulary to 125 words. A DECtalk DTCO1 text-to-speech converter is used to provide spoken output by the computer.</Paragraph>
      <Paragraph position="2"> An important feature of any spoken natural language dialogue system is the ability to perform robust parsing. Spoken inputs are frequently ungrammatical but must still  Computational Linguistics Volume 23, Number 1 be interpreted correctly. The main source of ungrammatical inputs in our experiments was the misrecognition of the user's input. An error-correcting parser was developed that finds the minimal cost set of insertions, deletions, and substitutions to transform the input into grammatical input (Smith and Hipp 1994). During our formal experiment, the system was able to find the correct meaning for 81.5% of the more than 2,800 input utterances even though only 50% of these inputs were correctly recognized word for word. An overview of the experimental design is presented next.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="149" end_page="149" type="metho">
    <SectionTitle>
4. Experimental Design
</SectionTitle>
    <Paragraph position="0"> The experimental design is discussed in great detail in Smith and Hipp (1994) and Smith (1991). Here we present an overview of the experiment sufficient for understanding the environment in which the data were collected.</Paragraph>
    <Section position="1" start_page="149" end_page="149" type="sub_section">
      <SectionTitle>
4.1 Subject Pool
</SectionTitle>
      <Paragraph position="0"> The eight subjects were Duke University undergraduates who met the following criteria. null * They had demonstrated problem-solving skills by having successfully completed one computer science course and had taken or were taking another.</Paragraph>
      <Paragraph position="1"> * They did not have excessive familiarity with AI and natural language processing. In particular, they had not taken a class in AI and they had not interacted with a natural language system.</Paragraph>
      <Paragraph position="2"> * None were majoring in electrical engineering. Such individuals could probably fix the circuit without any assistance.</Paragraph>
      <Paragraph position="3"> The subject pool consisted of six male and two female subjects. In addition, two pilot subjects, one female and one male, were run using the proposed experimental design before the formal experiment began.</Paragraph>
    </Section>
    <Section position="2" start_page="149" end_page="149" type="sub_section">
      <SectionTitle>
4.2 Session Overview and Problem Selection
</SectionTitle>
      <Paragraph position="0"> Subjects participated in the experiment in three sessions. The first and third sessions occurred a week apart, and the second session normally occurred three or four days after the first session. 7 The first session consisted of: (1) the primary speech training, lasting approximately 60 to 75 minutes; (2) approximately 20 minutes of instruction on using the system; and (3) practice using the system by attempting to solve four &amp;quot;warmup&amp;quot; problems with the system operating in directive mode, the mode where the computer has maximal control. A maximum of two and one-half hours was spent on the first session. The second and third sessions each consisted of: (1) review work with the speech recognizer; (2) a review of the instructions; and (3) usage of the system on up to 10 problems depending on how rapidly the problems were solved. One group of subjects worked with the system in directive mode during the second session and in declarative mode during the third session while the other group worked with the same modes, but in opposite sessions. The time allowed for the second and third sessions was two hours each.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="149" end_page="152" type="metho">
    <SectionTitle>
7 The only exception was the last subject, where the second session occurred two days after the first session, and the third session occurred one week after the second session.
</SectionTitle>
    <Paragraph position="0"> Smith and Gordon Human-Computer Dialogue The particular circuit being repaired is supposed to cause the LED to alternately display a 1 and a 7, and the implemented domain problem-solving component could detect errors caused by missing wires as well as a dead battery. The basic debugging process consists of the following steps:</Paragraph>
    <Paragraph position="2"> Determine if the LED display is correct.</Paragraph>
    <Paragraph position="3"> If it is not correct, perform zero or more diagnostic steps to further isolate the problem. Possible diagnostic steps are voltage measurements or an LED observation under a different physical configuration of the circuit. Check for the absence of one or more wires until a missing wire is identified.</Paragraph>
    <Paragraph position="4"> The wires are attached to metal spring-like connectors, which are identified by numbers on the circuit board. Thus, a wire is identified by the numbers of the two connectors to which it is connected. In order to balance the difficulty of the problems between the second and third sessions, the wires were classified according to the number and type of diagnostic steps required to detect the error. Based on this classification, the assignment of missing wires to problems in each session was made as follows: * Four wires were used in the four warmup problems of the first session.</Paragraph>
    <Paragraph position="5"> * From a set of 10 other wires, 5 were used for the first five problems of session 2 and the other 5 were used for the first five problems of session 3. Each of these problems was balanced for difficulty. For example, problem 1 of both sessions was a power subcircuit problem, while problem 5 of both sessions was an LED subcircuit problem.</Paragraph>
    <Paragraph position="6"> Problems 2 through 4 were similarly balanced.</Paragraph>
    <Paragraph position="7"> * Problems 6 through 8 of sessions 2 and 3 consisted of 2 missing wires for each problem. The 2 missing wires were selected from the 5 missing wires used during the first five problems of the session. Each of problems 6 through 8 differed by one missing wire. These problems were also balanced for difficulty.</Paragraph>
    <Paragraph position="8"> * Problems 9 and 10 of each session consisted of a missing wire that was also used during the warmup problems of session 1. Each of these 4 wires was assigned to a different problem. Consequently, sessions 2 and 3 are balanced for difficulty only through the first eight problems.</Paragraph>
    <Section position="1" start_page="150" end_page="151" type="sub_section">
      <SectionTitle>
4.3 Experimental Setup
</SectionTitle>
      <Paragraph position="0"> Figure 4 provides a rough sketch of the room layout. The subject was seated facing the desk containing the circuit board. Communication with the speech recognizer was performed through a telephone handset. The experimenter was seated in front of the computer console. Thus, the subject's back was to the experimenter. The experimenter had a copy of the raw data form for the session, a copy of the word list, and a guide describing the allowed experimenter interaction with the subject. Data collection mechanisms consisted of the following: . Automatic logging of the words received from the speech recognizer (subject input) and the words sent to the DECtalk (computer output).</Paragraph>
      <Paragraph position="1">  This logging information included the time the words were received or sent. In addition, time information was recorded for when the parser finished its processing of the input and when the computation of the input interpretation was complete.</Paragraph>
      <Paragraph position="2"> The interaction was tape recorded in order to make a transcript that included the actual words used by the subject and the interactions that occurred between the subject and the experimenter.</Paragraph>
      <Paragraph position="3"> The experimenter made notes about the interaction on the raw data form as well as marked occurrences of subject-experimenter interaction according to the category into which the interaction could be classified. In order to assist the experimenter in determining when a misrecognition occurred, the experimenter monitored the file where automatic logging occurred.</Paragraph>
    </Section>
    <Section position="2" start_page="151" end_page="152" type="sub_section">
      <SectionTitle>
4.4 Experimenter Interaction
</SectionTitle>
      <Paragraph position="0"> An important issue in experiments such as this, as has been observed elsewhere (Biermann, Fineman, and Heidlage 1992), is the problem of giving the subject sufficient error messages to enable satisfactory progress. One major source of difficulty in this experiment were misrecognitions by the Verbex speech recognizer. These miscommunications created various problems for the dialogue interaction, ranging from repetitive dialogue to experimenter intervention to occasional failure of the dialogue. Whenever a serious misrecognition caused the computer to interpret the utterance in a way that contradicted what was meant, the experimenter was allowed to (1) tell the subject that a misrecognition had occurred, and (2) tell the subject the interpretation made by the computer, but could say nothing else. For example, when one subject said, &amp;quot;the circuit is  Smith and Gordon Human-Computer Dialogue working,&amp;quot; the speech recognizer returned the words &amp;quot;faster it is working.&amp;quot; This was interpreted as the phrase faster. Consequently, the experimenter told the subject, &amp;quot;Due to misrecognition, your words came out as faster.&amp;quot; It is important to note that when an utterance was misunderstood, the experimenter did not tell the subject what to do, but merely described what happened. In this way, the interaction was restricted to being between the computer and the subject as much as possible, given the quality of commercial, real-time, continuous speech recognition devices at the time of the experiment. Such error messages from the experimenter occurred, on average, once every 15 user-utterances throughout the experiment.</Paragraph>
      <Paragraph position="1"> The other main source of difficulty in using the system was the enforcement of the single utterance, turn-taking protocol of the interaction. This required the user to signal the beginning of an utterance by speaking the sentinel word verbie and end the utterance with the word over. Users would sometimes forget to use the sentinel words or else would not wait for the system's response that would occasionally be delayed up to 30 seconds (normal response time was 5 to 10 seconds). In cases where the interaction protocol was violated, the experimenter would issue a warning statement such as, &amp;quot;Please be patient. The system is taking a long time to respond,&amp;quot; or &amp;quot;Please remember to start utterances with verbie.&amp;quot; These types of experimenter interactions occurred, on average, once every 33 user-utterances.</Paragraph>
    </Section>
    <Section position="3" start_page="152" end_page="152" type="sub_section">
      <SectionTitle>
4.5 The Nature of the Spoken Dialogue
</SectionTitle>
      <Paragraph position="0"> The limitations of real-time continuous speech recognition at the time of the experiments had an impact on the nature of the spoken human-computer interaction that was observed in comparison to what might be expected in a spoken human-human interaction. In particular, the restrictive 125-word vocabulary meant that speech repairs and disfluencies that are prevalent in human-human spoken interaction and an important area of study (Oviatt 1995; Heeman and Allen 1994) could not be processed by the system. Whenever a person misspoke, they could start over by issuing the sentinel word cancel, rather than over at the end of their utterance. To prevent this from happening often, subjects were instructed at the start of their participation to plan their utterance completely before speaking. Consequently, there were only 11 cancels issued in the production of the 2,840 user-utterances. Furthermore, in exit interviews conducted after they had completed participation, none of the subjects indicated any difficulty with, or dislike of, planning utterances in advance.</Paragraph>
      <Paragraph position="1"> To summarize, the results in Section 6 on the structure of spoken natural language dialogue are based for the most part on planned speech, a consequence of the technological limitations of speech recognizers at the time. Nevertheless, we believe it represents the first widely reported and analyzed spoken human-computer co-operative problem-solving dialogue, and that it is representative of such dialogue for the forseeable future.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="152" end_page="155" type="metho">
    <SectionTitle>
5. Classifying Dialogue Utterances
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="152" end_page="153" type="sub_section">
      <SectionTitle>
5.1 Major Subdialogues in Repair Assistance
</SectionTitle>
      <Paragraph position="0"> For task-oriented dialogues Grosz (1978) has noted that the structure of a dialogue mirrors the structure of the underlying task. Moody (1988) conducted a Wizard-of-Oz study on the effects of restricted vocabulary on interactive spoken dialogue. Her data were the basis for the formulation of the experimental Circuit Fix-It Shop system. For repair  Table 1 shows the classification into the various subdialogues of the utterances from the sample dialogues of Figure 3.</Paragraph>
    </Section>
    <Section position="2" start_page="153" end_page="154" type="sub_section">
      <SectionTitle>
5.2 Subdialogue Transition
</SectionTitle>
      <Paragraph position="0"> Another important aspect of the dialogue structure is the nature of the transitions between subdialogues. The model we present is derived from Moody's (1988) study, mentioned above. In the absence of errors in completing task actions, the natural transition from subdialogue to subdialogue is described by the following regular expression: I+A+(D+R*T+)nF where &amp;quot;+&amp;quot; denotes that one or more utterances will be spoken in the given subdialogue, &amp;quot;,&amp;quot; denotes that zero or more utterances will be spoken in the given subdialogue, and n represents the number of individual repairs in the problem. 8 The letters correspond to the abbreviations given in Section 5.1, and F represents the finished state (i.e., completion of the dialogue). This transition model is also depicted in the finite-state network of Figure 5. For clarity, loop arcs (i.e., transitions from a subdialogue back into itself) are omitted. We see from this model that dialogues normally begin with the Introduction and Assessment phases. Once the errant system behavior is described, the dialogue goes through one or more cycles of Diagnosis, Repair, and Test, until the system behavior is correct.</Paragraph>
      <Paragraph position="1"> 8 In our domain, n represents the number of missing wires in the problem. For example, when there are two missing wires, the first DRT iteration will cause one missing wire to be added, but the Test phase will show that the circuit is still not working. A second DRT iteration is required to detect and add the missing wire that completes the repair.</Paragraph>
      <Paragraph position="2">  Smith and Gordon Human-Computer Dialogue Figure 5 Subdialogue transition as a finite-state network.</Paragraph>
      <Paragraph position="3"> This model was helpful in classifying each utterance into the appropriate subdialogue. As discussed in Section 6.4, however, not all dialogues followed this model, due to user initiative and dialogue miscommunication. Nevertheless, it provides a good first approximation of the nature of subdialogue movement.</Paragraph>
    </Section>
    <Section position="3" start_page="154" end_page="154" type="sub_section">
      <SectionTitle>
5.3 Transcript Coding
</SectionTitle>
      <Paragraph position="0"> The two authors each coded the transcripts independently. Every utterance (those spoken by the computer as well as those spoken by the human subject) was classified into one of these five subdialogue categories, according to two perspectives: the speaker's perspective (i.e., the task subdialogue that the speaker of the utterance believed was relevant to the statement) and the global perspective (i.e., the task subdialogue that is relevant to the utterance, based on omniscient knowledge of the task status). Normally these were the same, but not always. In situations where the user carried out a repair without explicitly notifying the computer, the computer might think the task was still in one phase, when the user had actually moved the task into another phase.</Paragraph>
      <Paragraph position="1"> In the results to be presented, the current subdialogue is based on global, rather than speaker, perspective. Overall, there was a difference between speaker and global perspective in 6.7% of the declarative mode utterances and in 1.7% of the directive mode utterances.</Paragraph>
    </Section>
    <Section position="4" start_page="154" end_page="155" type="sub_section">
      <SectionTitle>
5.4 Coding Reliability
</SectionTitle>
      <Paragraph position="0"> The two authors compared their coding results as the transcripts for each one of the eight subjects were completed, in order to resolve differences and, hopefully, improve agreement as more transcripts were coded. The first author was a principal designer of the system, while the second author had only watched a videotape of the system in operation and read some of the previous papers about the project. Consequently, many of the initial disagreements in coding were due to a lack of familiarity with what transpired during the experiment. For example, in situations where the Repair subdialogue was not explicitly verbalized, it was not clear whether subsequent descriptions of the circuit behavior indicated that the current subdialogue was Test or Assessment.</Paragraph>
      <Paragraph position="1"> Proper coding in these situations required familiarity with what had actually occurred during the experiment, familiarity that only the first author had. For all dialogues, initial interrater agreement on both speaker and global perspective of the current sub-dialogue was 87.2%. That is, for 12.8% of the utterances, there was a disagreement between the coders over either speaker perspective of the current subdialogue, global perspective, or both. The kappa coefficient (Isard and Carletta 1995) for the level of agreement is 0.82. When the coding process was completed, all discrepancies were resolved to the satisfaction of both authors.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML