File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0301_evalu.xml

Size: 3,216 bytes

Last Modified: 2025-10-06 13:58:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0301">
  <Title>al: A word alignment system with limited language</Title>
  <Section position="5" start_page="3" end_page="3" type="evalu">
    <SectionTitle>
5 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> Tables 4 and 5 list the results obtained by participating systems in the Romanian-English task. Similarly, results obtained during the English-French task are listed in Tables 6 and 7.</Paragraph>
    <Paragraph position="1"> For Romanian-English, limited resources, XRCE systems (XRCE.Nolem-56k.RE.2 and XRCE.Trilex.RE.3) seem to lead to the best results. These are systems that are based on GIZA++, with or without additional resources (lemmatizers and lexicons). For unlimited resources, ProAlign.RE.1 has the best performance.</Paragraph>
    <Paragraph position="2"> For English-French, Ralign.EF.1 has the best performance for limited resources, while ProAlign.EF.1 has again the largest number of top ranked figures for unlimited resources.</Paragraph>
    <Paragraph position="3"> To make a cross-language comparison, we paid particular attention to the evaluation of the Sure alignments, since these were collected in a similar fashion (an agreement had to be achieved between two different annotators). The results obtained for the English-French Sure alignments are significantly higher (80.54% best Fmeasure) than those for Romanian-English Sure alignments (71.14% best F-measure). Similarly, AER for English-French (5.71% highest error reduction) is clearly better than the AER for Romanian-English (28.86% highest error reduction).</Paragraph>
    <Paragraph position="4"> This difference in performance between the two data sets is not a surprise. As expected, word alignment, like many other NLP tasks (Banko and Brill, 2001), highly benefits from large amounts of training data. Increased performance is therefore expected when larger training data sets are available.</Paragraph>
    <Paragraph position="5"> The only evaluation set where Romanian-English data leads to better performance is the Probable alignments set. We believe however that these figures are not directly comparable, since the English-French Probable alignments were obtained as a reunion of the alignments assigned by two different annotators, while for the Romanian-English Probable set two annotators had to reach an agreement (that is, an intersection of their individual alignment assignments).</Paragraph>
    <Paragraph position="6"> Interestingly, in an overall evaluation, the limited resources systems seem to lead to better results than those with unlimited resources. Out of 28 different evaluation figures, 20 top ranked figures are provided by systems with limited resources. This suggests that perhaps using a large number of additional resources does not seem to improve a lot over the case when only parallel texts are employed.</Paragraph>
    <Paragraph position="7"> Ranked results for all systems are plotted in Figures 2 and 3. In the graphs, systems are ordered based on their AER scores. System names are preceded by a marker to indicate the system type: L stands for Limited Resources, and U stands for Unlimited Resources.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML