File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-1411_evalu.xml

Size: 6,319 bytes

Last Modified: 2025-10-06 13:58:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1411">
  <Title>Towards a Simple and Accurate Statistical Approach to Learning Translation Relationships among Words</Title>
  <Section position="10" start_page="6" end_page="7" type="evalu">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> Our basic method for finding translation pairs was applied to a set of approximately 200,000 French and English aligned sentence pairs, derived mainly from Microsoft technical manuals, resulting in 46,599 potential translation pairs. The top 42,486 pairs were incorporated in the alignment lexicon of our end-to-end translation system. null  The procedure for finding translations of captoids was applied to a slight superset of the training data for the basic procedure, and yielded 2561 possible translation pairs. All of these were added to our end-to-end translation system, with the French multiwords being added to the lexicon of the French parser, and the translation pairs being added to the alignment lexicon.</Paragraph>
    <Paragraph position="1"> The improvements in end-to-end performance due to these additions in a French-to-English translation task are described elsewhere (Pinkham and Corston-Oliver, 2001). For this report, we have evaluated our techniques for finding trans- null As of this writing, however, the alignment procedure does not yet make use of the general translation pairs involving compounds, although it does make use of the captoid translation compounds.</Paragraph>
    <Paragraph position="2"> lation pairs by soliciting judgements of translation correctness from fluent French-English bilinguals. There were too many translation pairs to obtain judgements on each one, so we randomly selected about 10% of the 42,486 general translation pairs that were actually added to the system, and about 25% of the 2561 captoid pairs.</Paragraph>
    <Paragraph position="3"> The accuracy of the most strongly associated translation pairs produced by the basic method at various levels of coverage is displayed in Table 1. We use the terms &amp;quot;coverage&amp;quot; and &amp;quot;accuracy&amp;quot; in essentially the same way as Melamed (1996, 2000). &amp;quot;Type coverage&amp;quot; means the proportion of distinct lexical types in the entire training corpus, including both French and English, for which there is at least one translation given. As with the comparable results reported by Melamed, these are predominantly single lemmas for content words, but we also include occurrences multiwords as distinct types. &amp;quot;Mean count&amp;quot; is the average number of occurrences of each type at the given level of coverage. &amp;quot;Token coverage&amp;quot; is the proportion of the total number of occurrences of items in the text represented by the types included within the type coverage.</Paragraph>
    <Paragraph position="4"> Since the judges were asked to evaluate the proposed translations out of context, we allowed them to give an answer of &amp;quot;not sure&amp;quot;, as well as &amp;quot;correct&amp;quot; and &amp;quot;incorrect&amp;quot;. Our accuracy scores are therefore given as a range, where the low score combines answers of &amp;quot;not sure&amp;quot; and &amp;quot;incorrect&amp;quot;, and the high score combines answers of &amp;quot;not sure&amp;quot; and &amp;quot;correct&amp;quot;.</Paragraph>
    <Paragraph position="5">  The &amp;quot;total accuracy&amp;quot; column gives results at different levels of coverage over all the translation pairs generated by our basic method. For a more detailed analysis, the remaining columns provide a breakdown for single-word translations, translations involving multiwords given to us by the parser (&amp;quot;multiword accuracy&amp;quot;), and new multiwords hypothesized by our procedure (&amp;quot;compound accuracy&amp;quot;). As the table shows, our performance is quite good on single-word translations, with accuracy of around 80% even at our cut-off of 63% type coverage, which represents 99% of the tokens in the corpus.</Paragraph>
    <Paragraph position="6"> To compare our results more directly with Melamed's published results on single-word translation, we show Table 2, where both coverage and accuracy are given for single-word translations only. According to the standard of correctness Melamed uses that is closest to ours, he reports 92% accuracy at 36% type coverage, 89% accuracy at 46% type coverage, and 87% accuracy at 90% type coverage, on a set of 300,000 aligned sentence pairs from the French-English Hansard corpus of Candian Parliament proceedings. Our accuracies at the first two of these coverage points are 88-90% and 84-87%, which is slightly lower than Melamed, but given the different corpus, different judges, and different evaluation conditions, one cannot draw any definite conclusions about which method is more accurate at these coverage levels. Our method, however, does not produce any result approaching 90% type coverage, and accuracy appears to start dropping rapidly below 56% type coverage. Nevertheless, this still represents good accuracy up to 97% token coverage.</Paragraph>
    <Paragraph position="7"> Returning to Table 1, we see that our accuracy on multiwords is much lower than on single words, especially the multiwords hypothesized by our learning procedure. The results are much better, however, when we look at the results for our specialized method for finding translations of captoids, as shown in Table 3. Our accuracy at nearly 20% type coverage is around 84%, which is higher than our accuracy for general translation pairs (76-80%) at the same type coverage level. It is lower than our single-word translation accuracy (90-91%) at this coverage level, but it is striking how close it is, given far less data. At 20% type coverage of single words, there are 389 tokens per word type, while at 20% type coverage of captoids, there are fewer than 9 tokens per captoid type. In fact, further analysis shows that of the 2561 captoid translation pairs, 947 have only a single example of the English captoid in the training data, yet our accuracy on these is around 82%. We note, however, that our captoid learning procedure cuts off at around 20% type coverage, which is only 25% token coverage for these items.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML