File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1073_intro.xml

Size: 4,934 bytes

Last Modified: 2025-10-06 14:00:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1073">
  <Title>Automatic Optimization of Dialogue Management</Title>
  <Section position="2" start_page="0" end_page="502" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent advances in spoken language understanding have made it 1)ossible to develop dialogue systems tbr many applications. The role of the dialogue manager in such systems is to interact in a naturM w~y to hel 1 ) the user complete the tasks that the system is designed to support. Tyl)ically, an expert designs a dialogue manager by hand, and has to make m~ny nontrivial design choices that can seriously impact system performance. This paper applies reintbrcement learning (RL) to automatically learn design choices that optimize system pertbrnmnee for a choseLL pertbrmance measure (Levin et al., 2000; Walker et al., 1998).</Paragraph>
    <Paragraph position="1"> Consider the spoken dialogue system named N J-Fun, wlfich we built to help users find fun places to go in New Jersey. A sample dialogue with N J-Fun is shown in Figure 1, with system utterances labeled Si and user utterances labeled Ui. When NJFun greets the user, it can provide an open greeting such as utterance S1, expecting that the user will take the initiative. Alternatively, it can provide a directive greeting in which the system takes the initiative, as in &amp;quot;Welcome to NJFun. Please say an activity name or say 'list activities' for a list of activities I know about.&amp;quot; After tile user's response U1, NJFun must decide whether it should explicitly confirm its understanding, as in utterances $2 and $3. NJFun can also simply continue on with the dialogue, as when it does not explicitly confirm that the user wants to find out M)out wineries. In NJFun, as shown in more detail below, decisions about initiative and confirmation strategies alone result in a SI: Welcome to N.llSm. How may I help you? UI: I'd like to find mn winetasting in Lambertville in the morning. (ASR output: I'd like to find out wirterics the in the Lambertvillc in the mornin.q. )  $2: Did you say you are interested in Lambertville? U2: Yes.</Paragraph>
    <Paragraph position="2"> $3: Did you say you want to go in the morning? U3: Yes.</Paragraph>
    <Paragraph position="3"> $4: I found a winery near l~ambertville that is open in the morning, it is \[...\] Please give lne feedback by saying 'good', ~so-so', or 'bad'.</Paragraph>
    <Paragraph position="4"> U4: Good.</Paragraph>
    <Paragraph position="5">  search space of 242 potential globnl dialogue strategies. Furthermore, the performance of a dialogue strategy depends on many other factors, such as the user population, the robustness of the automatic speech recognizer (ASR), and task difficulty (Kamm et al., 1998; DanMi and Gerbino, 1995).</Paragraph>
    <Paragraph position="6"> In the main, previous research has treated the specification of the dialogue management strategy as an iterative design problem: several versions of a system are created, dialogue corpora are collected with human users interacting with different versions of tile system, a number of evaluation metrics are collected ibr each dialogue, and the different versions are statistically compared (Danieli and Gerbino, 1995; Sanderman et al., 1998). Due to the costs of experimentation, only a few global strategies are typically explored in any one experiment.</Paragraph>
    <Paragraph position="7"> However, recent work has suggested that dialogue strategy can be designed using tile formalism of Markov decision processes (MDPs) and the algorithms of RL (Biermann and Long, 1996; Levin et al., 2000; Walker et nl., 1998; Singh et al., 1999). More specifically, the MDP formalism suggests a method for optimizing dialogue strategies from sample dialogue data. The main advantage of this approach is the 1)otential tbr computing an optilnal dialogue strategy within a much larger search space, using a relatively small nmnber of training dialogues. This paper presents an application of RL to the  problem of oi)timizing dialogue strategy selection in the NJFnn system, and exl)erimentally demonstrates the utility of the ~l)proach. Section 2 exl)lahls how we apply RL to dialogue systems, then Se('tion 3 describes t.he NJFun system in detail. Section 4 dee scribes how NJFun optimizes its dialogue strategy from experimentally obtained dialogue data. Section 5 reports results from testing the learned strategy demonstrating that our al)l)roach improves task coml)letion rates (our chosen measure for 1)erformance optimization). A conll)alliOll paper provides only an al)brevi~tted system and dialogue manager description, but includes additional results not presented here (Singh et al., 2000), such as analysis establishing the veracity of the MDP we learn, and comparisons of our learned strategy to strategies hand-picked by dialogue experts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML