File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1123_intro.xml

Size: 4,162 bytes

Last Modified: 2025-10-06 14:03:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1123">
  <Title>Empirical Lower Bounds on the Complexity of Translational Equivalence [?]</Title>
  <Section position="3" start_page="0" end_page="977" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Translational equivalence is a mathematical relation that holds between linguistic expressions with the same meaning. The most common explicit representations of this relation are word alignments between sentences that are translations of each other. The complexity of a given word alignment can be measured by the difficulty of decomposing it into its atomic units under certain constraints detailed in Section 2. This paper describes a study of the distribution of alignment complexity in a variety of bitexts. The study considered word alignments both in isolation and in combination with independently generated parse trees for one or both sentences in each pair. Thus, the study [?] Thanks to David Chiang, Liang Huang, the anonymous reviewers, and members of the NYU Proteus Project for helpful feedback. This research was supported by NSF grant #'s 0238406 and 0415933.</Paragraph>
    <Paragraph position="1"> + SW made most of her contribution while at NYU.</Paragraph>
    <Paragraph position="2"> is relevant to finite-state phrase-based models that use no parse trees (Koehn et al., 2003), tree-to-string models that rely on one parse tree (Yamada and Knight, 2001), and tree-to-tree models that rely on two parse trees (Groves et al., 2004, e.g.).</Paragraph>
    <Paragraph position="3"> The word alignments that are the least complex on our measure coincide with those that can be generated by an inversion transduction grammar (ITG). Following Wu (1997), the prevailing opinion in the research community has been that more complex patterns of word alignment in real bitexts are mostly attributable to alignment errors. However, the experiments in Section 3 show that more complex patterns occur surprisingly often even in highly reliable alignments in relatively simple bitexts. As discussed in Section 4, these findings shed new light on why &amp;quot;syntactic&amp;quot; constraints have not yet helped to improve the accuracy of statistical machine translation.</Paragraph>
    <Paragraph position="4"> Our study used two kinds of data, each controlling a different confounding variable. First, we wanted to study alignments that contained as few errors as possible. So unlike some other studies (Zens and Ney, 2003; Zhang et al., 2006), we used manually annotated alignments instead of automatically generated ones. The results of our experiments on these data will remain relevant regardless of improvements in technology for automatic word alignment.</Paragraph>
    <Paragraph position="5"> Second, we wanted to measure how much of the complexity is not attributable to systematic translation divergences, both in the languages as a whole (SVO vs. SOV), and in specific constructions (English not vs. French ne. . . pas). To eliminate this source of complexity of translational equivalence, we used English/English bitexts. We are not aware of any previous studies of word alignments in monolingual bitexts.</Paragraph>
    <Paragraph position="6"> Even manually annotated word alignments vary in their reliability. For example, annotators sometimes link many words in one sentence to many  words in the other, instead of making the effort to tease apart more fine-grained distinctions. A study of such word alignments might say more about the annotation process than about the translational equivalence relation in the data. The inevitable noise in the data motivated us to focus on lower bounds, complementary to Fox (2002), who wrote that her results &amp;quot;should be looked on as more of an upper bound.&amp;quot; (p. 307) As explained in Section 3, we modified all unreliable alignments so that they cannot increase the complexity measure. Thus, we arrived at complexity measurements that were underestimates, but reliably so. It is almost certain that the true complexity of translational equivalence is higher than what we report.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML