File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1063_intro.xml

Size: 1,381 bytes

Last Modified: 2025-10-06 14:06:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1063">
  <Title>A Word-to-Word Model of Translational Equivalence</Title>
  <Section position="4" start_page="0" end_page="490" type="intro">
    <SectionTitle>
2 Co-occurrence
</SectionTitle>
    <Paragraph position="0"> With the exception of (Fung, 1998b), previous methods for automatically constructing statistical translation models begin by looking at word co-occurrence frequencies in bitexts (Gale &amp; Church, 1991; Kumano &amp; Hirakawa, 1994; Fung, 1998a; Melamed, 1995). A bitext comprises a pair of texts in two languages, where each text is a translation of the other. Word co-occurrence can be defined in various ways. The most common way is to divide each half of the bitext into an equal number of segments and to align the segments so that each pair of segments Si and Ti are translations of each other (Gale &amp; Church, 1991; Melamed, 1996a). Then, two word tokens (u, v) are said to co-occur in the  aligned segment pair i if u E Si and v E Ti. The co-occurrence relation can also be based on distance in a bitext space, which is a more general representations of bitext correspondence (Dagan et al., 1993; Resnik &amp; Melamed, 1997), or it can be restricted to words pairs that satisfy some matching predicate, which can be extrinsic to the model (Melamed, 1995; Melamed, 1997).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML