File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1064_intro.xml

Size: 2,684 bytes

Last Modified: 2025-10-06 14:02:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1064">
  <Title>Aligning words using matrix factorisation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Word alignments
</SectionTitle>
    <Paragraph position="0"> We address the following problem: Given a source sentence f = f1 :::fi :::fI and a target sentence e = e1 :::ej :::eJ, we wish to find words fi and ej on either side which are aligned, ie in mutual correspondence. Note that words may be aligned without being directly &amp;quot;dictionary translations&amp;quot;. In order to have proper alignments, we want to enforce the following constraints: Coverage: Every word on either side must be aligned to at least one word on the other side (Possibly taking &amp;quot;null&amp;quot; words into account). Transitive closure: If fi is aligned to ej and e', any fk aligned to e' must also de aligned to ej.</Paragraph>
    <Paragraph position="1"> Under these constraints, there are only 4 types of alignments: 1-1, 1-N, M-1 and M-N (fig. 1).</Paragraph>
    <Paragraph position="2"> Although the first three are particular cases where N=1 and/or M=1, the distinction is relevant, because most word-based translation models (eg IBM models (Brown et al., 1993)) can typically not accommodate general M-N alignments.</Paragraph>
    <Paragraph position="3"> We formalise this using the notion of cepts: a cept is a central pivot through which a subset of e-words is aligned to a subset of f-words. General M-N alignments then correspond to M-1-N alignments from e-words, to a cept, to f-words (fig. 2). Cepts naturally guarantee transitive closure as long as each word is connected to a single cept. In addition, coverage is ensured by imposing that each le droit de permis ne augmente pas the licence fee does not increase</Paragraph>
    <Paragraph position="5"> fig. 1, 2. Black squares represent alignments.</Paragraph>
    <Paragraph position="6"> word is connected to a cept. A unique constraint therefore guarantees proper alignments: Propriety: Each word is associated to exactly one cept, and each cept is associated to at least one word on each side.</Paragraph>
    <Paragraph position="7"> Note that our use of cepts differs slightly from that of (Brown et al., 1993, sec.3), inasmuch cepts may not overlap, according to our definition.</Paragraph>
    <Paragraph position="8"> The motivation for our work is that better word alignments will lead to better translation models. For example, we may extract better chunks for phrase-based translation models. In addition, proper alignments ensure that cept-based phrases will cover the entire source and target sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML