File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/c96-1075_evalu.xml

Size: 4,809 bytes

Last Modified: 2025-10-06 14:00:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1075">
  <Title>Multi-lingual Translation of Spontaneously Spoken Language in a Limited Domain</Title>
  <Section position="7" start_page="444" end_page="445" type="evalu">
    <SectionTitle>
6 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="444" end_page="445" type="sub_section">
      <SectionTitle>
6.1 The Ewduation Procedure
</SectionTitle>
      <Paragraph position="0"> In order to assess the overall eflhctiveness of the two translation contponents, we developed a detailed end-to-end evaluation procedure (Gates el; hi. 1996). We evaluate the translation modules on both transcribed and spee.ch recognized input.</Paragraph>
      <Paragraph position="1"> The evMuation of transcribed inl)ut allows us to assess how well our translation modnles wouhl \[unction with &amp;quot;perfect&amp;quot; speech recognition. 'lhsting is performed on a set; of &amp;quot;unseen&amp;quot; dialogues, that were not used for developing the translation modules or training the speech recognizer.</Paragraph>
      <Paragraph position="2"> '\['he translation of an utterance is manually evaluated by assigning it a grade or a set of grades based on the number of sentences in the utteralice. 'file utterances are broken clown into sentences for evaluation in order to give more weight to longer utterances, and so that utterances containing both in and out-of-domain sentences can be .iudged more accurately.</Paragraph>
      <Paragraph position="3"> Each sentence is cla,ssified first as either relevant to the scheduling domain (in-domain) or not rel- null evant to the scheduling domain (out-of-domain).</Paragraph>
      <Paragraph position="4"> Each sentence is then assigned one of four grades for translation quality: (1) Perfect - a fluent translation with all information conveyed; (2) OK all important information translated correctly but some unimportant details missing, or the translation is awkward; (3) Bad - unacceptable translation; (4) Recognition Error - unacceptable translation due to a speech recognition error. These grades are used for both in-domain and out-of-domain sentences. However, if an out-of-domain sentence is automatically detected as such by the parser and is not translated at all, it is given an &amp;quot;OK&amp;quot; grade. The evaluations are performed by one or more independent graders. When more than one grader is used, the results are averaged together.</Paragraph>
    </Section>
    <Section position="2" start_page="445" end_page="445" type="sub_section">
      <SectionTitle>
6.2 Results
</SectionTitle>
      <Paragraph position="0"> Figure 4 shows the evaluation results for 16 unseen Spanish dialogues containing 349 utterances translated into English. Acceptable is the sum of &amp;quot;Perfect&amp;quot; and &amp;quot;OK&amp;quot; sentences. For speech recognized input, we used the first-best hypotheses of the speech recognizer.</Paragraph>
      <Paragraph position="1"> Two trends have been observed from this evaluation as well as other evaluations that we have conducted. First, The GLR translation module performs better than the Phoenix module on transcribed input and produces a higher percentage of &amp;quot;Perfect&amp;quot; translations, thus confirming the GLR approach is more accurate. This also indicates that GLR performance should improve with better speech recognition and improved pre-parsing utterance segmentation. Second, the Phoenix module performs better than GLR on the first-best hypotheses from the speech recognizer, a result of the Phoenix approach being more robust.</Paragraph>
      <Paragraph position="2"> These results indicate that combining the two approaches has the potential to improve the translation performance. Figure 5 shows the results of combining the two translation methods using the simple method described in the previous section.</Paragraph>
      <Paragraph position="3"> The GLR* parse quality judgement is used to determine whether to output the GLR translation or the Phoenix translation. The results were evaluated only for in-domain sentences, since out-of-domain sentences are unlikely to benefit from this strategy. The combination of the two translation approaches resulted in a slight increase in the percentage of acceptable translations on transcribed input (compared to both approaches separately).</Paragraph>
      <Paragraph position="4"> On speech recognized input, although the over-all percentage of acceptable translations does not improve, the percentage of &amp;quot;Perfect&amp;quot; translations was higher. 2 2In a more recent evaluation, this combination method resulted in a 9.5% improvement in acceptable translations of speech recognized in-domain sentences. Although some variation between test sets is to be ex-</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML