File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1009_metho.xml

Size: 17,154 bytes

Last Modified: 2025-10-06 14:09:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1009">
  <Title>Developing A Flexible Spoken Dialog System Using Simulation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Cooperative Response Strategies
</SectionTitle>
    <Paragraph position="0"> We have aimed to design a more cooperative spoken dialog system in two respects. First, the information is delivered so that at each turn a dynamic summary of the database items in focus is presented. Secondly, the dialog manager is augmented with a domain-independent algorithm to handle over-constrained queries. The system gives alternative suggestions that are integrated with the dynamic summaries.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Flexible System Responses
</SectionTitle>
      <Paragraph position="0"> Response planning is performed both in the dialog management and the language generator, Genesis.</Paragraph>
      <Paragraph position="1"> To enable flexible responses, and avoid rigid system prompts, the dialog manager accesses the database at every turn with the current set of user-specified constraints in focus. With this data subset returned, a data refinement server (Polifroni et al., 2003) then computes frequency characteristics of relevant keys for the subset. This is incorporated into the system reply frame as shown in Table 2.</Paragraph>
      <Paragraph position="2"> Following this, Genesis provides a summary of the characteristics of the data set, utilizing context information provided by the dialog manager and the frequency statistics. Genesis provides control on how to summarize the data linguistically via explicit rules files. The developer can specify variables a0 , a1 , and a2a4a3a6a5a8a7 which control how lists of items are summarized, separately for different classes of data. If the number of items is under a1 , all options are enumerated. If the top a0 frequency counts cover more than a2a4a3a6a5a8a7 of the data, then these categories will be suggested, (e.g. &amp;quot;Some choices are Italian  For each example frame above, hundreds of simulated variant sentences can be obtained. and Chinese.&amp;quot;). Alternatively, summaries can indicate values that are missing or common across the set, (e.g. &amp;quot;All of them are cheap.&amp;quot;).</Paragraph>
      <Paragraph position="3"> By accessing the database and then examining the data subset at each turn, the system informs the user with a concise description of the choices available at that point in the dialog. This is a more flexible alternative than following a script of prompts where in the end the user may arrive at an empty set. Moreover, we argue that performing the summary in real time yields greater robustness against changes in the database contents.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Dialog Management
</SectionTitle>
      <Paragraph position="0"> The domain-independent dialog manager is configurable via an external dialog control table. A set of generic functions are triggered by logical conditions specified in formal rules, where typically several rules fire in each turn. The dialog manager has been extended to handle scenarios in which the user constraints yield an empty set. The aim is to avoid simply stating that no data items were found, without providing some guidance on how the user could re-formulate his query. Domain-independent routines relax the constraints using a set of pre-defined and configurable criteria. Alternate methods for relaxing constraints are: a1 If a geographical key has been specified, relax the value according to a geography ontology. For instance, if a particular street name has been specified, the relaxation generates a subsuming neighborhood constraint in place of the street name.</Paragraph>
      <Paragraph position="1"> a1 If a geographical key has been specified, remove the geographical constraint and search for the nearest item that satisfies the remaining constraints. The algorithm computes the nearest item according to the central latitude/longitude coordinates of the neighborhood or city.</Paragraph>
      <Paragraph position="2"> a1 Relax the key-value with alternative values that have been set to defaults in an external file.</Paragraph>
      <Paragraph position="3"> For instance, if a Vietnamese restaurant is not available at all, the system relaxes the query to alternative Asian cuisines.</Paragraph>
      <Paragraph position="4"> a1 Choose the one constraint to remove that produces the smallest data subset to speak about.</Paragraph>
      <Paragraph position="5"> If no one constraint is able to produce a non-empty set, successively remove more constraints. The rationale for finding a constraint combination that produces a small data set, is to avoid suggesting very general alternatives: for instance, suggesting and summarizing the &amp;quot;337 cheap restaurants&amp;quot; when &amp;quot;cheap fondue restaurants&amp;quot; were requested.</Paragraph>
      <Paragraph position="6"> The routine will attempt to apply each of these relaxation techniques in turn until a non-zero data set can be attained.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Simulations in Text Mode
</SectionTitle>
      <Paragraph position="0"> The first stage of development involved iteratively running the system in text mode and inspecting log files of the generated interactions for problems. This development cycle was particularly useful for extending the coverage of the NL parser and ensuring the proper operation of the end-to-end system.</Paragraph>
      <Paragraph position="1"> Simulations have helped diagnose initial problems overlooked in the rule-based mechanisms for context tracking; this has served to ensure correct inheritance of attributes given the many permutations of sequences of input sentences that are possible within a single conversation. This is valuable because in such a mixed-initiative system, the user is free to change topics and specify new parameters at any time. For instance, a user may or may not follow up with suggestions for restaurants offered by the system. In fact, the user could continue to modify any of the constraints previously specified in the conversation or query any attributes for an alternate newly spoken restaurant. There are vast numbers of dialog contexts that can result, and simulations have assisted greatly in detecting problems.</Paragraph>
      <Paragraph position="2"> Furthermore, by generating many variations of possible user constraints, simulations have also helped identify initial problems in the summarization rules for system response generation. The text generation component is handcrafted and benefits largely from examples of real queries to ensure their proper operation. These kinds of problems would otherwise normally be encountered only after many user interactions have occurred.</Paragraph>
      <Paragraph position="3"> Table 4 shows a typical simulated dialog. In the interaction shown, the simulator provides one or more constraints at each turn. It also selects alternative values according to the previous chosen key. After the dialog has arrived at a small data set, the simulator randomly asks questions about individual items.</Paragraph>
      <Paragraph position="4"> During one simulation run, we completed 2000 dialogs in text mode. There were a total of 8147 input utterances, resulting in an average of 4.07 input utterances per dialog. Of the input utterances, 5446 were unique. These were generated from 3349 unique semantic frames. There were 4320 unique system replies.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Simulating Over-Constrained Queries
</SectionTitle>
      <Paragraph position="0"> By configuring the simulator to query with multiple constraints in single turns, we were able to synthesize over-constrained queries, in which, initially, empty data sets were retrieved. In earlier development cycles, simulated dialogs were crucial in finding combinations that yielded problematic responses due to over-constrained queries. In one run after refinements, we tested 300 dialogs whereby two or more constraints were combined in each query by the simulator. As a result, 113 queries necessitated the system to invoke the relaxation algorithm in order to present a suggested alternative data set. Under all these cases, the system gave suggestions without fail. Some examples of the replies are shown in Table 5.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Simulation with Speech Recognition
</SectionTitle>
      <Paragraph position="0"> The next step is to implement the synthesize-andrecognize cycle whereby the text-to-speech output is directly piped to the recognition engine. A speech recognizer (Glass, 2003) is built using the utterances produced by the text mode as training data for the language model. This recognizer uses class a1 -gram models. With the training set of 8147 utterances, the perplexity of the bigram and trigram models are respectively 6.6 and 5.1. The speech recognizer, including the restaurant names contains 2325 vocab-SIM: What restaurants do you know? SYS: There are 863 restaurants. Some of the options are American, Pizza, and Italian.</Paragraph>
      <Paragraph position="1"> Most of them are located in Boston and Cambridge.</Paragraph>
      <Paragraph position="2"> SIM: Any restaurants in Back Bay? SYS: There are 57 restaurants in Back Bay.</Paragraph>
      <Paragraph position="3"> Many of them are American, and Nouvelle Cuisine.</Paragraph>
      <Paragraph position="4"> SIM: So how about food in The South End? SYS: There are 42 restaurants in The South End.</Paragraph>
      <Paragraph position="5"> They are mostly American, and Nouvelle Cuisine. Many of them are located on Columbus Avenue, and Tremont Street.</Paragraph>
      <Paragraph position="6"> SIM: I would like a moderately priced restaurant. null SYS: There are 9 medium priced restaurants in The South End. Some of the options are Nouvelle Cuisine, and American. Many of them are on Tremont Street and Appleton Street.</Paragraph>
      <Paragraph position="7"> SIM: a1 oha2 What about any cheap American restaurants? SYS: There are 7 cheap American restaurants in The South End. They are located on Columbus Avenue, Tremont Street, Dartmouth Street, and Washington Street.</Paragraph>
      <Paragraph position="8"> SIM: Columbus Avenue.</Paragraph>
      <Paragraph position="9"> SYS: I have found 3 cheap American restaurants in The South End on Columbus Avenue. Their names are Charlie's Sandwich Shoppe, Bob The Chefs, and Grill Fish.</Paragraph>
      <Paragraph position="10"> SIM: What is the phone number of first one? SYS: The phone number of Charlie's Sandwich Shoppe is 617 536 7669.</Paragraph>
      <Paragraph position="11"> SIM: What is the price range for Grill Fish? SYS: The price range of Grill Fish is cheap. Table 4: Typical example of a simulated dialog. SIM are the simulated inputs and SYS are the system responses. null ulary items. The understanding component would select one sentence of an a0 -best list output from the speech recognizer.</Paragraph>
      <Paragraph position="12"> Presently, when a recognition error causes a hard parse failure in the understanding, the simulator is configured to output the same semantic frame as contained in the previous utterance. This will cause the text generator to output a different variant of the same query. If the parse failures occur multiple times in sequence, the simulated user aborts and terminates the dialog.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4.4 Results from Spoken Dialog System
</SectionTitle>
    <Paragraph position="0"> Our initial run of 36 dialogs yielded 213 sentences.</Paragraph>
    <Paragraph position="1"> To simplify dialogs, the simulator is configured to specify just one constraint at each turn. After the 1. Cheap Restaurants on Rowes Wharf: There are no cheap restaurants on Rowes Wharf. However, there are in total 5 cheap restaurants in the Financial District. They are on Broad Street, Post Office Square, Federal Street, and Bromfield Street.</Paragraph>
    <Paragraph position="2">  of user constraints. Various schemes for relaxation are shown. (1) relaxes on the geographical location, (2) offers a nearest alternative, and (3) removes the cuisine constraint, outputting a single alternate selection. data subset has been narrowed down to six items or less, the simulator queries focus on one of the six items. For the 213 utterances, the recognition word error rate is 11.2%, and the sentence error rate is 32.4%. Because the synthesizer is highly domain specific and was originally trained on another domain, the synthetic waveforms were in fact highly unnatural. However, the relatively good recognition performance can be attributed to segmental units being well matched to the segment-based recognizer, an exact match to the trained a1 -gram model and the lack of spontaneous speech phenomena such as disfluencies. These 36 dialogs were analysed by hand.</Paragraph>
    <Paragraph position="3"> All dialogs successfully arrived at some small data subset at termination, without aborting due to errors. 29 (80.1%) of the dialogs completed without errors, with the correct desired data set achieved.</Paragraph>
    <Paragraph position="4"> Of the errorful dialogs, 3 exhibited problems due to recognition errors and 4 dialogs exhibited errors in the parse and context tracking mechanisms. All the questions regarding querying of individual restaurants were answered correctly.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> The above evaluations have been conducted on highly restricted scenarios in order to focus development on any fundamental problems that may exist in the system. In all, large numbers of synthetic dialogs have helped us identify problems that in the past would have been discovered only after data collections, and possibly after many failed dialogs with frustrated real users. The hope is that using simulation runs will improve system performance to a level such that the first collection of real user data will contain a reasonable rate of task success, ultimately providing a more useful training corpus.</Paragraph>
    <Paragraph position="1"> Having eliminated many software problems, a final real user evaluation will be more meaningful.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> Recently, researchers have begun to address the rapid prototyping of spoken dialog applications.</Paragraph>
    <Paragraph position="1"> While some are concerned with the generation of systems from on-line content (Feng et al., 2003), others have addressed portability issues within the dialog manager (Denecke et al., 2002) and the understanding components (Dzikovska et al., 2003).</Paragraph>
    <Paragraph position="2"> Real user simulations have been employed in other areas of software engineering. Various kinds of human-computer user interfaces can be evaluated for usability, via employing simulated human users (Riedl and St. Amant, 2002; Ritter and Young, 2001). These can range from web pages to cockpits and air traffic control systems. Simulated users have also accounted for perceptual and cognitive models. Previous work in dialog systems has addressed simulation techniques towards the goal of training and evaluation. In (Scheffler and Young, 2000), extensive simulations incorporating user modeling were used to train a system to select dialog strategies in clarification sub-dialogs. These simulations required collecting real-user data to build the user model. Other researchers have used simulations for the evaluation of dialog systems (Hone and Baber, 1995; Araki and Doshita, 1997; Lin and Lee, 2001).</Paragraph>
    <Paragraph position="3"> In (Lopez et al., 2003), recorded utterances with additive noise were used to run a dialog system in simulation-mode. This was used to test alternate confirmation strategies under various recognition accuracies. Their methods did require the recording of scripted user utterances, and hence were limited in the variations of user input.</Paragraph>
    <Paragraph position="4"> Our specific goals have dealt with creating more cooperative and flexible responses in spoken dialog.</Paragraph>
    <Paragraph position="5"> The issues of mismatch between user queries and database contents have been addressed by others in database systems (Gaasterland et al., 1992), while the potential for problems with dead-end dialogs caused by over-constrained queries have also been recognized and tackled in (Qu and Green, 2002).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML