XML Viewer - p03-1033

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1033_intro.xml
Size: 5,822 bytes
Last Modified: 2025-10-06 14:01:47
<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1033">
  <Title>Flexible Guidance Generation using User Model in Spoken Dialogue Systems</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A spoken dialogue system is one of the promising applications of the speech recognition and natural language understanding technologies. A typical task of spoken dialogue systems is database retrieval.</Paragraph>
    <Paragraph position="1"> Some IVR (interactive voice response) systems using the speech recognition technology are being put into practical use as its simplest form. According to the spread of cellular phones, spoken dialogue systems via telephone enable us to obtain information from various places without any other special apparatuses. null However, the speech interface involves two inevitable problems: one is speech recognition errors, and the other is that much information cannot be conveyed at once in speech communications.</Paragraph>
    <Paragraph position="2"> Therefore, the dialogue strategies, which determine when to make guidance and what the system should tell to the user, are the essential factors. To cope with speech recognition errors, several confirmation strategies have been proposed: confirmation management methods based on confidence measures of speech recognition results (Komatani and Kawahara, 2000; Hazen et al., 2000) and implicit confirmation that includes previous recognition results into system's prompts (Sturm et al., 1999). In terms of determining what to say to the user, several studies have been done not only to output answers corresponding to user's questions but also to generate cooperative responses (Sadek, 1999). Furthermore, methods have also been proposed to change the dialogue initiative based on various cues (Litman and Pan, 2000; Chu-Carroll, 2000; Lamel et al., 1999).</Paragraph>
    <Paragraph position="3"> Nevertheless, whether a particular response is co-operative or not depends on individual user's characteristics. For example, when a user says nothing, the appropriate response should be different whether he/she is not accustomed to using the spoken dialogue systems or he/she does not know much about the target domain. Unless we detect the cause of the silence, the system may fall into the same situation repeatedly.</Paragraph>
    <Paragraph position="4"> In order to adapt the system's behavior to individual users, it is necessary to model the user's patterns (Kass and Finin, 1988). Most of conventional studies on user models have focused on the knowledge of users. Others tried to infer and utilize user's goals to generate responses adapted to the user (van Beek, 1987; Paris, 1988). Elzer et al. (2000) proposed a method to generate adaptive suggestions according to users' preferences.</Paragraph>
    <Paragraph position="5"> However, these studies depend on knowledge of the target domain greatly, and therefore the user models need to be deliberated manually to be applied to new domains. Moreover, they assumed that the input is text only, which does not contain errors.</Paragraph>
    <Paragraph position="6"> On the other hand, spoken utterances include various information such as the interval between utterances, the presence of barge-in and so on, which can be utilized to judge the user's character. These features also possess generality in spoken dialogue systems because they are not dependent on domain-specific knowledge.</Paragraph>
    <Paragraph position="7"> We propose more comprehensive user models to generate user-adapted responses in spoken dialogue systems taking account of all available information specific to spoken dialogue. The models change both the dialogue initiative and the generated response. In (Eckert et al., 1997), typical users' behaviors are defined to evaluate spoken dialogue systems by simulation, and stereotypes of users are assumed such as patient, submissive and experienced.</Paragraph>
    <Paragraph position="8"> We introduce user models not for defining users' behaviors beforehand, but for detecting users' patterns in real-time interaction.</Paragraph>
    <Paragraph position="9"> We define three dimensions in the user models: 'skill level to the system', 'knowledge level on the target domain' and 'degree of hastiness'. The former two are related to the strategies in management of the initiative and the response generation. These two enable the system to adaptively generate dialogue management information and domain-specific information, respectively. The last one is used to manage the situation when users are in hurry.</Paragraph>
    <Paragraph position="10"> Namely, it controls generation of the additive contents based on the former two user models. Handling such a situation becomes more crucial in speech communications using cellular phones.</Paragraph>
    <Paragraph position="11"> The user models are trained by decision tree Sys: Please tell me your current bus stop, your destination or the specific bus route.</Paragraph>
    <Paragraph position="12"> User: Shijo-Kawaramachi.</Paragraph>
    <Paragraph position="13"> Sys: Do you take a bus from Shijo-Kawaramachi? User: Yes.</Paragraph>
    <Paragraph position="14"> Sys: Where will you get off the bus? User: Arashiyama.</Paragraph>
    <Paragraph position="15"> Sys: Do you go from Shijo-Kawaramachi to Arashiyama? User: Yes.</Paragraph>
    <Paragraph position="16"> Sys: Bus number 11 bound for Arashiyama has departed Sanjo-Keihanmae, two bus stops away.</Paragraph>
    <Paragraph position="17">  learning algorithm using real data collected from the Kyoto city bus information system. Then, we implement the user models and adaptive dialogue strategies on the system and evaluate them using data collected with 20 novice users.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML