File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3006_intro.xml

Size: 9,178 bytes

Last Modified: 2025-10-06 14:02:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3006">
  <Title>Error Detection and Recovery in Spoken Dialogue Systems</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background and Motivation
</SectionTitle>
    <Paragraph position="0"> Spoken conversational systems have great potential for providing spoken language access to information sources.</Paragraph>
    <Paragraph position="1"> However, such systems are only useful if they can understand most of the content words in each utterance. Many a user has been aggravated by conversational systems that hypothesize the same incorrect words over and over, often due to ignorance of critical words the user is speaking. The dialogue manager component is often of little or no help, suggesting, &amp;quot;Please try using shorter sentences,&amp;quot; or &amp;quot;I did not understand. Please rephrase your query.&amp;quot; The system must be able to recognize complications such as misrecognitions, repetitive mistakes, and out-of-vocabulary (OOV) words, and to react appropri- null ately. A more successful interaction is achievable if the dialogue manager is able to work with the user to resolve an error in the course of the dialogue.</Paragraph>
    <Paragraph position="2"> A common strategy used in human-human dialogue for handling an OOV word is for one participant to ask the other to repeat or spell the unknown word. This provides the inquisitor with more detailed information about the word, which can facilitate his understanding of the word.</Paragraph>
    <Paragraph position="3"> Several dialogue systems have employed such a strategy in dealing with a confusing or unknown word (Bauer and Junkawitsch, 1999; Schramm et al., 2000). We aim to employ such a speak-and-spell strategy in our system.</Paragraph>
    <Paragraph position="4"> In this work, we focus on the class of place names, specifically cities, states, and airports. Such names are prevalent and problematic in any domain where geography plays a dominant role. For example, a weather information system or a flight reservation system must have the city, state, or country names exactly correct in order to be useful. In real user interactions, it is inevitable that some city will be mentioned that is unknown to the system. Such a name will often be misrecognized as a known city. The dialogue manager must determine whether a poorly scoring hypothesized city name is, in fact, a mis-recognized but known city, or an entirely new word. Such uncertainty must be resolved in a manner that is timeefficient and does not overburden the user.</Paragraph>
    <Paragraph position="5"> For example, in the case of a weather information system, very common geographic names would most likely be known to the recognizer (e.g., New York, Moscow).</Paragraph>
    <Paragraph position="6"> If a user wants to know the weather in his hometown of Menominee, for example, which is unknown to the recognizer, the system is faced with a problem. If the recognizer is not equipped with an unknown word detector, the closest word in the recognizer's vocabulary will be chosen as the best hypothesis. The user will then have to proceed through potentially many clarification turns in which the system repeatedly hypothesizes incorrect city names.</Paragraph>
    <Paragraph position="7"> Since &amp;quot;Menominee&amp;quot; is OOV, the system will never find the correct city name.</Paragraph>
    <Paragraph position="8"> If, however, a large external database is available (e.g., US Census data), that the system could consult given a hypothesized spelling of the unknown city, there is hope that the system will find the user's intended city name.</Paragraph>
    <Paragraph position="9"> Such an approach of access to a much larger vocabulary in spell mode has been successfully applied, for example, in (Schramm et al., 2000). In the most extreme case, the large external database would be the World Wide Web.</Paragraph>
    <Paragraph position="10"> One could imagine the system posing a search query on the Web to determine that the closest major city is Green Bay, Wisconsin, located 60 miles south of Menominee, so it may suffice to provide the weather for Green Bay.</Paragraph>
    <Paragraph position="11"> It should be mentioned that the flexibility of allowing the user to spell a word creates a new task for the system.</Paragraph>
    <Paragraph position="12"> Instead of simply looking up a hypothesized word in a User: I would like to fly to austin tomorrow morning [mis-recognized as &amp;quot;Boston&amp;quot;] System: Sorry. I'm not sure what you said. There are no flights from Boston to Boston. User: austin texas System: Okay, flights from Austin to Boston departing between 5:00 am and noon on Monday March tenth. One moment please. If I have misunderstood you, you should say, Scratch that. There are no nonstop flights. I have flights on American, Continental, and Delta. There are at least 6 flights departing between 8:00 a.m. and 9:00 a.m. Can you be more specific? User: I would like to fly from boston massachusetts to austin texas System: Please enter the departure city name using the telephone keypad ending with the pound key. User: 267866# [user enters &amp;quot;Boston&amp;quot; via keypad] System: Okay from Boston to Austin. What date will you be traveling? USER: I would like to leave tomorrow morning System: Okay, flights from Boston to Austin departing between 5:00 am and noon on Monday March tenth. One moment please. . . .</Paragraph>
    <Paragraph position="13">  based on perceived confusion.</Paragraph>
    <Paragraph position="14"> SYSTEM: Welcome Angie. How may I help you? USER: I would like to &lt;uh&gt; [mis-recognized as: I would like to Atlanta] SYSTEM: Okay from Hartford to Atlanta. What date will you be traveling? USER: October thirty first.</Paragraph>
    <Paragraph position="15"> SYSTEM: Okay, from Hartford to Atlanta on Wednesday October 31. Can you provide the approximate departure time or airline? USER: Late morning.</Paragraph>
    <Paragraph position="16"> SYSTEM: Okay, flights from Hartford to Atlanta departing between 10:00 a.m. and noon on Wednesday October 31. One moment please... I have a nonstop Delta flight leaving at 11:20 a.m. and arriving at 1:45 pm. Would that work? USER: No.</Paragraph>
    <Paragraph position="17"> SYSTEM: There are no nonstop flights. I have a U S Airways flight connecting in Charlotte leaving at 10:45 a.m., a United flight connecting in Washington Dulles leaving at 11:20 a.m., and a U S Airways flight leaving at noon. Please select one of these flights or change any constraint you have already specified. USER: I would like to change the arrival city.</Paragraph>
    <Paragraph position="18"> SYSTEM: Please enter the destination city using the telephone keypad ending with the pound key. USER: 6752636# [user enters &amp;quot;Orlando&amp;quot; via keypad]  SYSTEM: I heard you say you want to arrive in Orlando. Is that correct? If you need help, you can say, for example, what cities do you know in Oregon.</Paragraph>
    <Paragraph position="19"> USER: That is correct.</Paragraph>
    <Paragraph position="20">  request by the user to change the arrival city.</Paragraph>
    <Paragraph position="21"> lexicon, the system must now remain open to the possibility that the hypothesized word (i.e., sequence of letters) is misspelled. In other words, a spelling checker must be utilized. Depending on the robustness of the spelling checker, the system may or may not be able to identify the intended word from a misspelling. With a large database, a potential misspelling could correspond to multiple data items. In such a case, the system would need to initiate a disambiguation subdialogue to resolve the ambiguity.</Paragraph>
    <Paragraph position="22"> There are several pragmatic issues to consider in obtaining spelled data from a user whether via keypad or speech. The problem of disambiguating keypad sequences has been addressed using both dictionary-based (Davis, 1991) as well as probabilistic (MacKenzie et al., 2001) approaches. In both input modes, the user may use abbreviations such as &amp;quot;S T P E T E R S B U R G&amp;quot; for &amp;quot;Saint Petersburg&amp;quot;. Spoken spelling is especially difficult, because the recognition accuracy for spoken letters can be quite low. For instance, the members of the &amp;quot;E-set&amp;quot; (B, C, D, E, G, P, T, V, Z) are well-known for being confusable to a recognizer, as discussed in previous studies (Marx and Schmandt, 1994). This problem is compounded by the fact that humans spell words in creative ways. Some may spell in military style (e.g., &amp;quot;Alpha Bravo Charlie&amp;quot; for &amp;quot;A B C&amp;quot;) or in simile (e.g., &amp;quot;B as in 'Boy&amp;quot;'). Some users may include the word &amp;quot;space&amp;quot; to mark the word boundaries of a multi-word sequence, such as &amp;quot;N E W space Y O R K&amp;quot;. Some may simply enter a letter sequence containing several meaningful chunks, as in &amp;quot;N E W Y O R K J F K N Y C&amp;quot; for Kennedy Airport in New York City. Many of these issues have been addressed in (Schramm et al., 2000).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML