File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2051_evalu.xml

Size: 6,062 bytes

Last Modified: 2025-10-06 13:59:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2051">
  <Title>Spontaneous Speech Understanding for Robust Multi-Modal Human-Robot Communication</Title>
  <Section position="9" start_page="395" end_page="396" type="evalu">
    <SectionTitle>
7 Evaluation
</SectionTitle>
    <Paragraph position="0"> For the evaluation of the entire robot system BIRON we recruited 14 naive user between 12 and 37 years with the goal to test the intuitiveness and the robustness of all system modules as well as its performance. Therefore, in the rst of two runs the users were asked to familiarize themselves with the robot without any further information of the system. In the second run the users were given more information about technical details of BIRON (such as its limited vocabulary).</Paragraph>
    <Paragraph position="1"> We observed similar effects as described in section 2. In average, one utterance contained 3.23 words indicating that the users are more likely to utter short phrases. They also tend to pause in the middle of an utterance and they often uttered so called meta-comments such as that s ne . In gure 5 some excerptions of the dialogs during the experiment settings are presented.</Paragraph>
    <Paragraph position="2"> Thus, not surprisingly the speech recognition error rate in the rst run was 60% which decreased in the second run to 42%, with an average of 52%.</Paragraph>
    <Paragraph position="3"> High error rate seems to be a general problem in settings with spontaneous speech as other systems also observed this problem (see also (Gorniak and Roy, 2005)). But even in such a restricted experiment setting, speech understanding will have to deal with speech recognition error which can never be avoided.</Paragraph>
    <Paragraph position="4"> In order to address the two questions of (1) how well our approach of automatic speech understanding (ASU) can deal with automatic speech recognition (ASR) errors and (2) how its performance compares to syntactic analysis, we performed two analyses. In order to answer question (1) we compared the results from the semantic analysis based on the real speech recognition re- null sults with an accuracy of 52% with those based on the really uttered words as transcribed manually, thus simulating a recognition rate of 100%. In total, the semantic speech processing received 1642 utterances from the speech recognition system.</Paragraph>
    <Paragraph position="5"> From these utterances 418 utterances were randomly chosen for manual transcription and syntactic analysis. All 1642 utterances were processed and performed on a standard PC with an average processing time of 20ms, which fully ful lls the requirements of real-time applications. As shown in Table 1 39% of the results were rated as complete or partial misunderstandings and 61% as correct utterances with full semantic meaning. Only 4% of the utterances which were correctly recognized were misinterpreted or refused by the speech understanding system. Most errors occurred due to missing words in the lexicon.</Paragraph>
    <Paragraph position="6"> Thus, the performance of the speech understanding system (ASU) decreases to the same degree as that of the speech recognition system (ASR): with a 50% ASR recognition rate the number of non-interpretable utterances is doubled indicating a linear relationship between ASR and ASU.</Paragraph>
    <Paragraph position="7"> For the second question we performed a manual classi cation of the utterances into syntactically correct (and thus parseable by a standard parsing algorithm) and not-correct. Utterances following the English standard grammar (e.g. imperative, descriptive, interrogative) or containing a single word or an NP, as to be expected in answers, were classi ed as correct. Incomplete utterances or utterances with a non-standard structure (as occurred often in the baby-talk style utterances) were rated as not-correct. In detail, 58 utterances were either truncated at the end or beginning due to errors of the attention system, resulting in utterances such as where is , can you nd , or is a cube . These utterances also include instances where users interrupted themselves. In 51 utterances we found words missing in our lexicon database. 314 utterances where syntactically correct, whereas in 28 of these utterances a lexicon entry is missing in the system and therefore would ASR=100% ASR=52% ASU not or part. interpret. 15% 39% ASU fully interpretable 84% 61%  lead to a failure of the parsing mechanism. 104 utterances have been classi ed as syntactically notcorrect. null In contrast, the result from our mechanism performed signi cantly better. Our system was able to interprete 352 utterances and generate a full semantic interpretation, whereas 66 utterances could only be partially interpreted or were marked as not interpretable. 21 interpretations of the utterances were semantically incorrect (labeled from the system wrongly as correct) or were not assigned to the correct speech act, e.g., okay was assigned to no speech act (fragment) instead to con rmation. Missing lexicon entries often lead to partial interpretations (20 times) or sometimes to complete misinterpretations (8 times). But still in many cases the system was able to interprete the utterance correctly (23 times). For example can you go for a walk with me was interpreted as can you go with me only ignoring the unknown for a walk .The utterance can you come closer was interpreted as a partial understanding can you come (ignoring the unknown word closer ). The results are summarized in Table 2.</Paragraph>
    <Paragraph position="8"> As can be seen the semantic error rate with 15% non-interpretable utterances is just half of the syntactic correctness with 31%. This indicates that the semantic analysis can recover about half of the information that would not be recoverable from syntactic analysis.</Paragraph>
    <Paragraph position="9"> ASU Synt. cor.</Paragraph>
    <Paragraph position="10"> not or part. interpret. 15% not-correct 31% fully interpret. 84% correct 68%</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML