XML Viewer - p04-1009

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1009_intro.xml
Size: 10,119 bytes
Last Modified: 2025-10-06 14:02:21
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1009">
  <Title>Developing A Flexible Spoken Dialog System Using Simulation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 System Architecture with Simulator
</SectionTitle>
    <Paragraph position="0"> Figure 1 depicts a spoken dialog system architecture functioning with simulator components, which create synthetic user inputs. Simulations can be customized to generate in text or speech mode. In text mode, text utterances are treated as user inputs to the understanding components. The dialog manager creates reply frames that encode information for generating the system reply string. These are also used by the simulator for selecting a random user response in the next turn. In speech mode, synthetic waveforms are created and recognized by the speech recognizer, yielding an a0 -best list for the understanding components.</Paragraph>
    <Paragraph position="1">  grated with user simulation components.</Paragraph>
    <Paragraph position="2"> Examples and experiments in this paper are drawn from a Boston restaurant information system. Obtained from an on-line source, the content offers information for 863 restaurants, located in 106 cities in the Boston metropolitan area (e.g., Newton, Cambridge) and 45 neighborhoods (e.g., Back Bay, South End). Individual restaurant entries are associated with detailed information such as cuisines, phone numbers, opening hours, credit-card acceptance, price range, handicap accessibility, and menu offerings. Additionally, latitude and longitude information for each restaurant location have been obtained. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Instantiation of a System
</SectionTitle>
      <Paragraph position="0"> The concept of driving the instantiation of a dialog system from the data source was described in (Polifroni et al., 2003). In the following, the steps envisioned for creating an initial prototype starting with on-line content are summarized below:  1. Combing the web for database content 2. Identifying the relevant set of keys associated with the domain, and mapping to the information parsed from the content originator 3. Creating an NL grammar covering possible domain queries 4. Configuring the discourse and dialog components for an initial set of interactions 5. Defining templates for system responses  The above steps are sufficient for enabling a working prototype to communicate with the proposed simulator in text mode. The next phase will involve iteratively running simulated dialogs and refinements on the spoken dialog system, followed by  examination of successive corpora of simulated dialogs. Later phases will then incorporate the speech recognition and text-to-speech components.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Simulation with User Modeling
</SectionTitle>
      <Paragraph position="0"> The simulator, Figure 1, is composed of several modular components. The core simulator accepts reply frames from the dialog system, and produces a meaning representation of the next synthetic user response. A text generation component paraphrases the meaning representation into a text string. In text mode, this poses as a typed user input, whereas in speech mode, the text is passed to a synthesizer as part of a synthesize/recognize cycle. Configuring a simulation for any domain involves customizing a simple external text file to control the behavior of the domain-independent simulator module, and tailoring text generation rules to output a variety of example user input sentences from the meaning representation. null One simulated dialog would commence with an initial query such as &amp;quot;what restaurants do you provide?&amp;quot;. The synthetic user makes successive queries that constrain the search to data subsets. It may (1) continue to browse more data subsets, or (2) when a small list of data entries is in focus, choose to query attributes pertaining to one or more individual items, or (3) terminate the conversation. The entire system is run continuously through hundreds of dialogs to produce log files of user and system sentences, and dialog information for subsequent analyses. The simulator also generates generic kinds of statements such as asking for help, repeat and clearing the dialog history.</Paragraph>
      <Paragraph position="1">  The simulator takes input from the system-generated reply frame, and outputs a flat semantic frame, encapsulating the meaning representation of the next intended user query. The system reply frame contains the essential entities, used in the paraphrase for creating the system prompt. But also, a sub-frame, shown in Figure 2, retains pre- null procedure for the simulator.</Paragraph>
      <Paragraph position="2"> computed counts associated with the frequency of occurrence of values for every key pertaining to the data subset within the discourse focus. During the browsing stage, the simulator randomly selects a key (e.g, a cuisine) from the given frame, and then makes a random selection on the value, (e.g., &amp;quot;Chinese.&amp;quot;). The simulator may choose one or more of these key-value pairs as constraints to narrow the search. For each key, more than one value from the list of possible values may be specified, (e.g., querying for &amp;quot;Chinese or Japanese restaurants.&amp;quot;). When querying about individual restaurants, the simulator randomly selects one restaurant entry from a small list, and then seeks to obtain the value for one key characteristic for a restaurant entry. For example, this could be a phone number or an address.</Paragraph>
      <Paragraph position="3"> Figure 2 illustrates the decision making performed by the simulator at each turn. At each decision point, the system &amp;quot;throws the dice&amp;quot; to determine how to proceed, for example, whether to select an additional key for constraint within the same turn, and whether to persist in querying about the available attributes of the small list of restaurants or to start over.</Paragraph>
      <Paragraph position="4"> The behavior of the simulator at each decision point can be tuned from an external text file, which allows the following to be specified:  a1 Probability of combining several constraints into a single query a1 Probability of querying a different value for a previous key versus selecting from among other keys presented by the reply frame a1 Probability of continued querying of the attributes of restaurants from a list of one or more restaurants a1 Probability of the user changing his goals,  hence querying with alternative constraints A simple user model is maintained by the simulator to track the key-value pairs that have already been queried in the current dialog. This tracks the dialog history so as to enable the synthetic user to further query about a previously mentioned item.</Paragraph>
      <Paragraph position="5"> It also prevents the dialog from cycling indefinitely through the same combinations of constraints, helping to make the dialog more coherent.</Paragraph>
      <Paragraph position="6"> The external configuration file can effectively tune the level of cooperative behavior for the synthetic user. If the synthetic user selects a single key-value pair from the reply frame at each turn, a non-empty and successively smaller data subset is guaranteed to result at each turn. Moreover, selections can be configured to bias towards frequencies of instance values. The basis for this stems from the hypothesis that locations populated with more restaurants are likely to be queried. That is, the statistics of the database instances can directly reflect on the distribution of user queries. For instance, users are more likely to query about, &amp;quot;Chinese restaurants in Chinatown.&amp;quot; Hence, the output dialogs may be more suitable for training language models. Alternatively, the synthetic user may be configured to select random combinations of various keys and values from the current or stored summary frame at a turn. Under these circumstances, the subsequent database retrieval may yield no data for those particular combinations of constraints.</Paragraph>
      <Paragraph position="7">  Each semantic frame is input to Genesis, a text generation module (Seneff, 2002), to output a synthetic user utterance. Genesis executes surface-form generation via recursive generation rules and an associated lexicon. A recent addition to Genesis is the ability to randomly generate one of several variant sentences for the same semantic frame. A developer can specify several rules for each linguistic entity allowing the generator to randomly select one. Due to the hierarchical nature of these templates, numerous output sentences can be produced from a single semantic frame, with only a few variants specified for each rule. Table 3 depicts example semantic frames and corresponding sample sentences from the simulator. null In total, the full corpus of simulated sentences are generated from approximately 55 hand-written rules in the restaurants domain. These rules distinguish themselves from previous text generation tasks by the incorporation of spontaneous speech phenomena such as filled pauses and fragments. In the initial phase, this small rules set is not systematically mined from any existing corpora, but is handcrafted by the developer. However, it may be possible in future to incorporate both statistics and observations learned from real data to augment the generation rules.</Paragraph>
      <Paragraph position="8">  A concatenative speech synthesizer (Yi et al., 2000) is used to synthesize the simulated user utterances for this domain. The parameters and concatenative units employed in this synthesizer were tailored for a previous domain, and therefore, the naturalness and intelligibility of the output waveforms are expected to be poor. However, the occurrence of some recognition errors may help in assessing their impact on the system.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML