File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1027_intro.xml

Size: 2,735 bytes

Last Modified: 2025-10-06 14:05:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1027">
  <Title>Translating Collocations for Use in Bilingual Lexicons</Title>
  <Section position="3" start_page="153" end_page="153" type="intro">
    <SectionTitle>
3. Evaluation
</SectionTitle>
    <Paragraph position="0"> We are carrying out three tests with Champollion with two data base corpora and three sets of source collocations. The first data base corpus (DB1) consist of 8 months of Hansards aligned data taken from 1986 and the second data base corpus consists of all of the 1986 and 1987 transcripts of the Canadian Parliament. The first set of source collocations (C1) are 300 collocations identified by Xtract on all data from 1986, the second set (C2) is a set of 300 collocations identified by Xtract on all data from 1987, and the third set of collocations (C3) consists of 300 collocations identified by Xtract on all data from 1988. We used DB1 with both C1 (experiment 1) and C2 (experiment 2) and are currently using DB2 on C3 (experiment 3).</Paragraph>
    <Paragraph position="1"> Results from the third experiment were not yet available at time of publication.</Paragraph>
    <Paragraph position="2"> We asked three bilingual speakers to evaluate the results for the different experiments and the results are shown in Table 2. The second column gives the percentage of correct translations, the third column gives the percentage of Xtract errors, the fourth column gives the percentage of Champollion's errors, and the last column gives the percentage of Champollion's correct translation if the input is filtered of errors introduced by Xtract. Averages of the three evaluators' scores are shown, but we noted that scores of individual evaluators were within 1-2% of each other; thus, there was high agreement between judges. The best results are obtained when the data base corpus is also used as a training corpus for Xtract; ignonng Xtract errors the evaluation is as high as 77%. The second experiment produces low results as many input collocations did not appear often enough in the database corpus. We hope to show that we can compensate for this by increasing the corpus size in the third experiment.</Paragraph>
    <Paragraph position="3"> One class of Champollion's errors arises because it does not.translate closed class words such as prepositions. Since the frequency of prepositions is so high in comparison to open class words, including them in the translations throws off the correlations measures. Translations that should have included prepositions were judged inaccurate by our evaluators and this accounted for approximately 5% of the errors. This is an obvious place to begin improving the accuracy of Champollion.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML