File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1107_metho.xml

Size: 14,414 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1107">
  <Title>Example-based Speech Intention Understanding and Its Application to In-Car Spoken Dialogue System</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Similarity and its Calculation
</SectionTitle>
    <Paragraph position="0"> This section describes a technique for calculating the degree of similarity between sentences using the information on both dependency and morphology.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Degree of Similarity between
Sentences
</SectionTitle>
      <Paragraph position="0"> In order to calculate the degree of similarity between two sentences, it can be considered to make use of morphology and dependency information. The calculation based on only morphemes means that the similarity of only surface words is taken into consideration, and thus the result of similarity calculation may become large even if they are not so similar from a structural point of view. On the other hand, the calculation based on only dependency relations has the problem that it is difficult to express the lexical meanings for the whole sentence, in particular, in the case of spoken language. By using both the information on morphology and dependency, it can be expected to carry out more reliable calculation.</Paragraph>
      <Paragraph position="1"> Formula (1) defines the degree of similarity between utterances as the convex combination b of the degree of similarity on dependency, a d , and that on morpheme, a</Paragraph>
      <Paragraph position="3"> : the degree of similarity in dependency a m : the degree of similarity in morphology l : the weight coefficient (0 [?] l [?] 1) Section 3.2 and 3.3 explain a d and a</Paragraph>
      <Paragraph position="5"> spectively.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Dependency Similarity
</SectionTitle>
      <Paragraph position="0"> Generally speaking, a Japanese dependency relation means the modification relation between a bunsetsu and a bunsetsu. For example, a spoken sentence &amp;quot;kono chikaku-ni washoku-no mise aru? (Is there a Japanese restaurant near here?)&amp;quot; consists of five bunsetsusof &amp;quot;kono (here)&amp;quot;, &amp;quot;chikaku-ni (near)&amp;quot;, &amp;quot;washokuno (Japanese-style food)&amp;quot;, &amp;quot;mise (a restaurant)&amp;quot;, &amp;quot;aru (being)&amp;quot;, and there exist some dependencies such that &amp;quot;mise&amp;quot; modifies &amp;quot;aru&amp;quot;. In the case of this instance, the modifying bunsetsu &amp;quot;mise&amp;quot; and the modified bunsetsu &amp;quot;aru&amp;quot; are called dependent and head, respectively. It is said that these two bunsetsus are in a dependency relation. Likewise, &amp;quot;kono&amp;quot;, &amp;quot;chikaku-ni&amp;quot; and &amp;quot;washoku-no&amp;quot; modify &amp;quot;chikaku-ni&amp;quot;, &amp;quot;aru&amp;quot; and &amp;quot;mise&amp;quot;, respectively. In the following of this paper, a dependency relation is expressed as the order pair of bunsetsus like (mise, aru), (kono, chikaku-ni).</Paragraph>
      <Paragraph position="1"> A dependency relation expresses a part of syntactic and semantic characteristics of the sentence, and can be strongly in relation to the intentional content. That is, it can be expected that two utterances whose dependency relations are similar each other have a high possibility that the intentions are also so.</Paragraph>
      <Paragraph position="2"> A formula (2) defines the degree of similarity in Japanese dependency, a  as the degree of correspondence between them.</Paragraph>
      <Paragraph position="3">  : the number of dependencies in correspondence null Here, when the basic forms of independent words in a head bunsetsu and in a dependent bunsetsu correspond with each other, these dependency relations are considered to be in correspondence. For example, two dependencies (chikaku-ni, aru) and (chikaku-ni ari-masu-ka) correspond with each other because the independent words of the head bunsetsu and the dependent bunsetsu are &amp;quot;chikaku&amp;quot; and &amp;quot;aru&amp;quot;, respectively. Moreover, each word class is given to nouns and proper nouns characteristic of a dialogue task. If a word which constitutes each dependency belongs to the same class, these dependencies are also considered to be in correspondence. null</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Morpheme Similarity
</SectionTitle>
      <Paragraph position="0"> : the number of morphemes in correspondence null In our research, if a word class is given to nouns and proper nouns characteristic of a dialogue task and two morphemes belong to the same class, these morphemes are also considered to be in correspondence. In order to extract the intention of the sentence more similar as the whole sentence, not only independent words and keywords but also all the morphemes such as noun and particle are used for the calculation on correspondence.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Calculation Example
</SectionTitle>
      <Paragraph position="0"> Figure 2 shows an example of the calculation of the degree of similarity between an input ut-</Paragraph>
      <Paragraph position="2"> &amp;quot;kono chikaku-ni washoku-no mise aru? (Is there a Japanese restaurant near here?)&amp;quot; and an example sentence in a corpus,</Paragraph>
      <Paragraph position="4"> there a European restaurant located nearby?)&amp;quot;, when a weight coefficient l =0.4. The number of the dependencies of S</Paragraph>
      <Paragraph position="6"> is 4 and 3, respectively, and that of dependencies in correspondence is 2, i.e., (chikaku, aru) and (mise, aru). Moreover, since &amp;quot;washoku (Japanese-style food)&amp;quot; and &amp;quot;yoshoku&amp;quot; (European-style food) belong to the same word class, the dependencies (washoku, aru) and (yoshoku, aru) also correspond with each other. Therefore, the degree of similarity in dependency a d comes to 0.86 by the formula (2). Since the number of mor-</Paragraph>
      <Paragraph position="8"> are 7 and 8, respectively, and that of morphemes in correspondence is 6, i.e., &amp;quot;chikaku&amp;quot;, &amp;quot;ni&amp;quot;, &amp;quot;no&amp;quot;, &amp;quot;mise&amp;quot;, &amp;quot;aru(i)&amp;quot; and &amp;quot;wa(yo)shoku&amp;quot;. Therefore, a m comes to 0.80 by a formula (3). As mentioned above, b using both morphemes and dependencies comes to 0.82 by a formula (1).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Utilizing Context Information
</SectionTitle>
    <Paragraph position="0"> In many cases, the intention of a user's utterance occurs in dependence on the intentions of the previous utterances of the user or those of the person to which the user is speaking. Therefore, an input utterance might also receive the influence in the contents of the speeches before it. For example, the user usually returns the answer to it after the system makes a question, and furthermore, may ask the system a question after its response. Then, in our technique, the degree of similarity b, which has been explained in Section 3, is weighted based on the intentions of the utterances until it results in a user's utterance. That is, we consider the occurrence of a utterance intention I n at a certain time n to be dependent on the intentions of the last N [?] 1 utterances. Then, the conditional occurrence probability P(I</Paragraph>
    <Paragraph position="2"> Here, we write a sequence of utterance in-</Paragraph>
    <Paragraph position="4"> ). Moreover, we call the conditional occurrence probability of the formula (4), intentions N-gram probability.</Paragraph>
    <Paragraph position="5"> The weight assignment based on the intentions sequences is accomplished by reducing the value of the degree of similarity when the intentions N-gram probability is smaller than a threshold. That is, a formula (5) defines the degree of similarity g using the weight assignment by intentions N-gram probability.</Paragraph>
    <Paragraph position="7"> o: weight coefficient (0 [?] o [?] 1) b: the degree of similarity th: threshold A typical example of the effect of using intentions N-gram is shown below. For an input utterance &amp;quot;chikaku-ni chushajo-wa ari-masu-ka? (Is there a parking lot located nearby?)&amp;quot;, the degree of similarity with a utterance with a tag &amp;quot;parking lot question&amp;quot; which intends to ask whether a parking lot is located around the searched store, and a utterance with a tag &amp;quot;parking lot search&amp;quot; which intends to search a parking lot located nearby, becomes the maximum. However, if the input utterance has occurred after the response intending that there is no parking lot around the store, the system can recognize its intention not to be &amp;quot;parking lot question&amp;quot; from the intentions N-gram probabilities learned from the corpus, As a result, the system can arrive at a correct utterance intention &amp;quot;parking lot search&amp;quot;.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="1" type="metho">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> In order to evaluate the effectiveness of our method, we have made an experiment on utterance intention inference.</Paragraph>
    <Section position="1" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
5.1 Experimental Data
</SectionTitle>
      <Paragraph position="0"> An in-car speech dialogue corpus which has been constructed at CIAIR (Kawaguchi et al.,  and analyzed based on Japanese dependency grammar (Matsubara et al., 2002). That is, the intention tags were assigned manually into all sentences in 412 dialogues about restaurant search recorded on the corpus. The intentions 2-gram probability was learned from the sentences of 174 dialogues in them. The standard for assigning the intention tags was established by extending the decision tree proposed as a dialogue tag scheme (JDRI, 2000). Consequently, 78 kinds of intention tags were prepared in all (38 kinds are for driver utterances). The intention tag which should be given to each utterance can be defined by following the extended decision tree. A part of intention tags and the sentence examples is shown in Table 1, and a part of the decision tree for driver's utterances is done in Figure 3  .</Paragraph>
      <Paragraph position="1"> A word class database (Murao et al., 2001), which has been constructed based on the corpus, was used for calculating the rates of correspondence in morphemes and dependencies. Moreover, Chasen (Matsumoto et al., 99) was used for the morphological analysis.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
5.2 Experiment
5.2.1 Outline of Experiment
</SectionTitle>
      <Paragraph position="0"> We have divided 1,609 driver's utterances of 238 dialogues, which is not used for learning the intentions 2-gram probability, into 10 pieces equally, and evaluated by cross validation. That is, the inference of the intentions of all 1,609 sen- null In Figure 3, the description in condition branches is omitted.</Paragraph>
      <Paragraph position="1">  tences was performed, and the recall and precision were calculated. The experiments based on the following four methods of calculating the degree of similarity were made, and their results were compared.</Paragraph>
      <Paragraph position="2">  1. Calculation using only morphemes 2. Calculation using only denpendencies 3. Calculation using both morphemes and denpendencies (With changing the value of the weight coefficient l) 4. Calculation using intentions 2-gram probabilities in addition to the condition of 3.  (With changing the value of the weight co-efficient o and as th =0)  The experimental result is shown in Figure 4. 63.7% as the recall and 48.2% as the precision were obtained by the inference based on the above method 1 (i.e. l = 0), and 62.6% and 58.6% were done in the method 2 (i.e. l =1.0). On the other hand , in the experiment on the method 3, the precision became the maximum by l =0.2, providing 61.0%, and the recall by l =0.3 was 67.2%. The result shows our technique of using both information on morphology and dependency to be effective.</Paragraph>
      <Paragraph position="3"> When l [?] 0.3, the precision of the method 3 became lower than that of 1. This is because the user speaks with driving a car (Kawaguchi et al., 2000) and therefore there are much comparatively short utterances in the in-car speech corpus. Since there is a few dependencies per  intention tag utterance example search Is there a Japanese restaurant near here? request Guide me to McDonald's.</Paragraph>
      <Paragraph position="4"> parking lot question Is there a parking lot? distance question How far is it from here? nearness question Which is near here? restaurant menu question Are Chinese noodles on the menu?  one utterance, a lot of sentences in the corpus tend to have the maximum value in inference using dependency information.</Paragraph>
      <Paragraph position="5"> Next, the experimental result of the inference using weight assignment by intentions 2-gram probabilities, when considering as l =0.3, is shown in Figure 5. At o =0.8, the maximum values in both precision and recall were provided (i.e., the precision is 68.9%). This shows our technique of learning the context information from the spoken dialogue corpus to be effective.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="1" end_page="1" type="metho">
    <SectionTitle>
6 In-car Spoken Dialogue System
</SectionTitle>
    <Paragraph position="0"> In order to confirm our technique for automatically inferring the intentions of the user's utterances to be feasible and effective for task-oriented spoken dialogue processing, a prototype system for restaurant retrieval has been developed. This section describes the outline of the system and its evaluation.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
6.1 Implementation of the System
</SectionTitle>
      <Paragraph position="0"> The configuration of the system is shown in Fig-</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML