File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/a97-1008_concl.xml

Size: 7,266 bytes

Last Modified: 2025-10-06 13:57:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1008">
  <Title>An Evaluation of Strategies for Selective Utterance Verification for Spoken Natural Language Dialog</Title>
  <Section position="8" start_page="46" end_page="47" type="concl">
    <SectionTitle>
ELSE
</SectionTitle>
    <Paragraph position="0"> engage in a verification subdialog Using this decision rule and comparing it to Strategy 1, the over-verification rate drops from 19.2% to 7.6% while the under-verification rate rises from 2.6% to 4.7% (i.e., the percentage of utterances correctly understood falls from 97.4% to 95.3%). This corresponds to a reduction in over-verifications from once every 5.2 user utterances to once every 13.2 user utterances while under-verifications (i.e., undetected misunderstandings) rises from once every 38.5 user utterances to once every 21.3 user utterances.</Paragraph>
    <Paragraph position="1"> It should be noted that on average, users spoke 20 utterances per dialog. We now examine a context-dependent strategy that takes into account specific domain information.</Paragraph>
    <Section position="1" start_page="46" end_page="47" type="sub_section">
      <SectionTitle>
4.4 Strategy 4: Domaln-Dependent
Exceptions
</SectionTitle>
      <Paragraph position="0"> As previously noted, correctly interpreting certain utterances is crucial for efficient continuation of the dialog. In the Circuit Fix-It Shop, the crucial condition was correct determination of the LED display.</Paragraph>
      <Paragraph position="1"> Several utterances in each dialog concerned a discussion of the LED display. Consequently, assertions about the LED display were often part of the main expectation.</Paragraph>
      <Paragraph position="2"> However, due to the myriad of possible LED displays and the frequent misrecognition of key words and phrases in these descriptions, an effective dialog system would want to be careful to ascertain correctness in interpreting these descriptions. Consequently, we modify the verification decision rule as follows: IF the Parser Confidence Score &gt; the Verification Threshold THEN DO N0T engage in verification subdialog ELSE IF the utterance meaning is an assertion about the LED display THEN engage in a verification subdialog  engage in a verification subdialog As a result, the decision rule for verifying utterances concerning the LED focuses solely on the local information about parsing cost and does not consider dialog context information about expectation. 5 Such a modification might also be appropriate in 5 In actuality, a small component of the total parsing cost is the expectation cost based on dialog context, but that weighting is negligible compared to the weighting of the parse cost, the predominant factor in computing total cost.</Paragraph>
      <Paragraph position="3">  other domains for information deemed essential to continuing progress.</Paragraph>
      <Paragraph position="4"> For this final decision rule the the over-verification rate is 9.8% while the under-verification rate is 3.7%.</Paragraph>
    </Section>
    <Section position="2" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
4.5 Strategy Comparsion
</SectionTitle>
      <Paragraph position="0"> Table 1 summarizes the results of the four strategies for a fixed Verification Threshold. We conclude that the combination of considering both the local information of the parsing cost and the dialog context information about expectation provides the best strategy. We also note that inclusion of domain-dependent information does not show any notable improvement in the over-verification/underverification tradeoff as compared with the context-dependent but domain-independent Strategy 3. 6 We believe the results show that for task-oriented domains where there are fairly strong expectations for utterances that relate directly to task goals such as those described in figure 4, a context-dependent verification strategy is effective at reducing the over-verification rate to a reasonable amount while keeping the number of under-verifications to a near minimum. Further study is needed to determine the practical usefulness of this strategy in an actual experimental situation and it is an open question as to whether or not such strategies are feasible for less task-specific domains such as advisory dialogs and database query environments.</Paragraph>
    </Section>
    <Section position="3" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
4.6 Improving Accuracy
</SectionTitle>
      <Paragraph position="0"> Obtaining a higher accuracy requires reducing the under-verification rate. For Strategy 1 we explored the impact of raising and lowering the threshold on the over- and under-verification rates. Not surprisingly, there was a tradeoff. As the threshold was raised, more utterances are verified, resulting in fewer under-verifications but more over-verifications.</Paragraph>
      <Paragraph position="1"> Lowering the threshold had the opposite impact. In fact, using just the strategy of lowering the threshold to reduce the over-verification rate to 9.3% causes the under-verification rate to rise to 8.0%. In contrast, the new context-dependent strategy, Strategy 3, achieves an over-verification rate of 7.6%, but the under-verification rate is only 4.7%. Clearly, the use of dialog context in the verification subdialog decision rule improves system performance. Nevertheless, a small set of under-verifications remains. Are there any possibilities for further reductions in the under-verifications without a substantial increase in the over-verification rate? ~This of course, does not preclude the possibility that domain-dependent interaction may be more useful in other domains.</Paragraph>
      <Paragraph position="2">  An analysis of the 133 under-verifications that occur with the new strategy indicates that while some of the under-verifications are due to deficiencies in the grammar, there is a a core group of under-verifications where misrecognition of the speaker's words is impossible to overcome. Incorrect recognition of digits, lost content words, and misrecognized content words can cause the system to have high confidence in an incorrect interpretation. One approach that may prove helpful with this problem is the use of speech recognition systems that provide alternate hypotheses for the speech signal along with scoring information. Another possibility is word by word verification of the speaker input (see (Baber and Hone, 1993)), but such a strategy is too time-consuming and tedious for general spoken natural language dialog, especially when the user does not have access to a visual display of what the system hypothesizes was spoken. In general, experimental trials to observe subject reaction to verification sub-dialogs are needed.</Paragraph>
      <Paragraph position="3"> In conclusion, while useful, there appear to be limits to the effectiveness of verification subdialogs. Consequently, strategies for delayed detection and resolution of miscommunication (e.g. (McRoy and Hirst, 1995), (Brennan and Hulteen, 1995), and (Lambert and Carberry, 1992)) become necessary and remain an area of continued investigation. These include both computer-initiated as well as user-initiated strategies.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML