File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1158_metho.xml

Size: 11,394 bytes

Last Modified: 2025-10-06 14:08:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1158">
  <Title>Efficient Confirmation Strategy for Large-scale Text Retrieval Systems with Spoken Dialogue Interface</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Text Retrieval System for
Large-scale Knowledge Base
</SectionTitle>
    <Paragraph position="0"> Our task involves text retrieval for a large-scale knowledge base. As the target domain, we adopted a software support knowledge base provided by the Microsoft Corporation. The knowledge base consists of the following three components: glossary, frequently asked questions (FAQ), and a database of support articles. Figure 1 is an example of the database. The knowledge base is very large-scale, as shown in  text retrieval system for this knowledge base. The system accepts a typed-text input as questions from users and outputs a result of the retrieval. The system interprets input sentences taking a syntactic dependency and synonymous expression into consideration for matching it with the knowledge base. The system can also navigate for the user when he/she makes vague questions based on scenarios (dialog card) that were described manually beforehand. Hundreds of the dialog cards have been prepared to ask questions back to the users. If a user question matches its input part, the system generates a question based on its description.</Paragraph>
    <Paragraph position="1"> We adopted the Dialog Navigator as a back-end system and constructed a text retrieval system with a spoken dialogue interface. We then investigated a confirmation strategy to interpret the user's utterances robustly by taking into account the problems that are characteristic of spoken language, as previously described.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="3" type="metho">
    <SectionTitle>
3 Confirmation Strategy using
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Relevance Score and Significance
Score
</SectionTitle>
      <Paragraph position="0"> Making confirmations for every portion that has the possibility to be an ASR error is tedious.</Paragraph>
      <Paragraph position="1"> This is because every erroneous portion does not necessarily affect the retrieval results. We therefore take the influence of recognition errors for retrieval into consideration, and control generation of confirmation.</Paragraph>
      <Paragraph position="2"> We make use of N-best results of the ASR for the query and test if a significant difference is caused among N-best sets of retrieved candidates. If there actually is, we then make a confirmation on the portion that makes the difference. This is regarded as a posterior confirmation. On the other hand, if a critical error occurs in the ASR result, such as those in the product name in software support, the following retrieval would make no sense. Therefore, we also introduce a confirmation prior to the retrieval for critical words.</Paragraph>
      <Paragraph position="3"> The system flow including the confirmation is  summarized below.</Paragraph>
      <Paragraph position="4"> 1. Recognize a user's utterance.</Paragraph>
      <Paragraph position="5">  2. Calculate a relevance score for each phrase of ASR results.</Paragraph>
      <Paragraph position="6"> 3. Make a confirmation for critical words with a low relevance score.</Paragraph>
      <Paragraph position="7"> 4. Retrieve the knowledge base using the Dialog Navigator for N-best candidates of the ASR.</Paragraph>
      <Paragraph position="8"> 5. Calculate significance scores and generate a confirmation based on them.</Paragraph>
      <Paragraph position="9"> 6. Show the retrieval results to the user.</Paragraph>
      <Paragraph position="10">  This flow is also shown in Figure 2 and explained in the following subsections in detail.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Definition of Relevance Score
</SectionTitle>
      <Paragraph position="0"> We use test-set perplexity for each portion of the ASR results as one of the criteria in determining whether the portion is influential or not for the retrieval. The language model to calculate the perplexity was trained only with the target knowledge base. It is different from that used in the ASR.</Paragraph>
      <Paragraph position="1"> The perplexity is defined as an exponential of entropy per word, and it represents the average number of the next words when we observe a word sequence. The perplexity can be denoted as the following equation because we assume an ergodicity on language and use a trigram as a language model.</Paragraph>
      <Paragraph position="2">  ) was trained. If the  perplexity is small, it indicates the sequence appears frequently in the knowledge base. On the contrary, the perplexity for a portion including the ASR errors increases because it is contextually less frequent. The perplexity for out-of-domain phrases similarly increases because they scarcely appear in the knowledge base. It enables us to detect a portion that is not influential for retrieval or those portions that include ASR errors. Here, a phrase, called bunsetsu  in Japanese, is adopted as a portion for which the perplexity is calculated. We use a syntactic parser KNP (Kurohashi and Nagao, 1994) to divide the ASR results into the phrases</Paragraph>
      <Paragraph position="4"> Bunsetsu is a commonly used linguistic unit in Japanese, which consists of one or more content words and zero or more functional words that follow.</Paragraph>
      <Paragraph position="5">  As the parser was designed for written language, the division often fails for portions including ASR errors. The division error, however, does not affect the whole system's performance because the perplexity for the erroneous portions increases, indicating they are irrelevant.  (PP) and relevance score (RS) We then calculate the perplexity for the phrases (bunsetsu) to which the preceding and following words are attached. We then define the relevance score by applying a sigmoid-like transform to the perplexity using the following equation. Thus, the score ranges between 0 and</Paragraph>
      <Paragraph position="7"> Here, a and b are constants and are empirically set to 2.0 and 11.0. An example of calculating the relevance score is shown in Figure 3.</Paragraph>
      <Paragraph position="8"> In this sample, a portion, &amp;quot;Atarashiku katta (= that I bought recently)&amp;quot;, that appeared in the beginning of the utterance does not contribute to any retrieval. A portion at the end of the sentence was incorrectly recognized because it may have been pronounced weakly. The perplexity for these portions increases as a result, and the relevance score correspondingly decreases.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.2 Confirmation for Critical Words
using Relevance Score
</SectionTitle>
      <Paragraph position="0"> Critical words should be confirmed before the retrieval. This is because a retrieval result will be severely damaged if the portions are not correctly recognized. We define a set of words that are potentially critical using tf*idf values, which are often used in information retrieval. They can be derived from the target knowledge base automatically. We regard a word with the maximum tf*idf values in each document as being its representative, and the words that are representative in more documents are regarded as being more important. When the amount of documents represented by the more important words exceeds 10% out of the whole number of documents, we define a set of the words as being critical. As a result, 35 words were selected as potentially critical ones in the knowledge base, such as 'set up', 'printer', and '(Microsoft) Office'. null We use the relevance score to determine whether we should make a confirmation for the critical words. If a critical word is contained in a phrase whose relevance score is lower than threshold th, the system makes a confirmation.</Paragraph>
      <Paragraph position="1"> We set threshold th through the preliminary experiment. The confirmation is done by presenting the recognition results to the user. Users can either confirm or discard or correct the phrase before passing it to the following matching module. null</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.3 Weighted Matching using
Relevance Score
</SectionTitle>
      <Paragraph position="0"> A phrase that has a low relevance score is likely to be an ASR error or a portion that does not contribute to retrieval. We therefore use the relevance score RS as a weight for phrases during the matching with the knowledge base. This relieves damage to the retrieval by ASR errors or redundant expressions.</Paragraph>
    </Section>
    <Section position="5" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.4 Significance Score using Retrieval
Results
</SectionTitle>
      <Paragraph position="0"> The significance score is defined by using plural retrieval results corresponding to N-best candidates of the ASR. Ambiguous portions during the ASR appear as the differences between the N-best candidates. The score represents the degree to which the portions are influential.</Paragraph>
      <Paragraph position="1"> The significance score is calculated for portions that are different among N-best candidates. We define the significance score SS(n,m) as the difference between the retrieval results of n-th and m-th candidates. The value is defined by the equation,</Paragraph>
      <Paragraph position="3"> Here, res(n) denotes a set of retrieval results for the n-th candidate, and |res(n) |denotes the number of elements in the set. That is, the significance score decreases if the retrieval results have a large common part.</Paragraph>
      <Paragraph position="4"> Figure 4 has an example of calculating the significance score. In this sample, the portions of &amp;quot;suuzi (numerals)&amp;quot; and &amp;quot;suushiki (numeral expressions)&amp;quot; differ between the first and second candidates of the ASR. As the retrieval results for each candidate, 14 and 15 items are obtained, respectively. The number of common items between the two retrieval results is 8. Then, the significance score for the portion is 0.70 by the above equation.</Paragraph>
    </Section>
    <Section position="6" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
3.5 Confirmation using Significance
Score
</SectionTitle>
      <Paragraph position="0"> The confirmation is also made for the portions detected by the significance score. If the score is higher than a threshold, the system makes the confirmation by presenting the difference to users  . Here, we set the number of N-best candidates of the ASR to 3, and the threshold for the score is set to 0.5.</Paragraph>
      <Paragraph position="1"> In the confirmation phrase, if a user selects from the list, the system displays the corresponding retrieval results. If the score is lower than the threshold, the system does not make the confirmation and presents retrieval results of the first candidate of the ASR. If a user judges all candidates as inappropriate, the system rejects the current candidates and prompts him/her to utter the query again.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML