File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-0304_evalu.xml
Size: 2,681 bytes
Last Modified: 2025-10-06 13:58:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0304"> <Title>Statistical Translation Alignment with Compositionality Constraints</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Experimental Results </SectionTitle> <Paragraph position="0"> The different word-alignment methods described in sections 2 and 3 were run on the test corpora of the WPT-03 shared task on alignment. Results were evaluated in terms of alignment precision (P), recall (R), F-measure and alignment error rate (AER) (Och and Ney, 2000). As specified in the shared task description, all of these metrics were computed taking null-alignments into account (i.e. tokens left unconnected in an alignment were actually counted as aligned to virtual word token &quot;0&quot;). The results of our experiments are reproduced in table 2.</Paragraph> <Paragraph position="1"> We observe that imposing a &quot;contiguous compositionality&quot; constraint (C and RC methods) allows for substantial gains with regard to plain Viterbi alignments (V and RV respectively), especially in terms of precision and AER (a slight decline in recall can be observed between the V and C methods on the ro-en corpus, but it is not clear whether this is significant). These gains are even more interesting when one considers that all pairs of alignments (V and C, RV and RC) are obtained using exactly the same data. This highlights both the deficiencies of IBM Model-2 and the importance of compositionality.</Paragraph> <Paragraph position="2"> Using both the forward and reverse models (CC) yields yet more gains with regard to all metrics. This result is interesting, because it shows the potential of the compositional alignment method for integrating various sources of information.</Paragraph> <Paragraph position="3"> With regard to language pairs, it is interesting to note that all alignment methods produce figures that are substantially better in recall and worse in precision on the ro-en data, compared to en-fr. Overall, ro-en alignments display significantly higher F-measures. This is surprising, considering that the provided en-fr corpus contained 20 times more training material. This phenomenon is likely due to the fact that the en-fr test reference contains much more alignments per word (1.98 per target word) than the ro-en (1.12). All alignment methods described here produce roughly between 1 and 1.25 alignments per target words. This fact affects recall and F-measure figures positively on the ro-en test, while precision and AER (which correlates strongly with precision in practice) are affected inversely.</Paragraph> </Section> class="xml-element"></Paper>