File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2129_intro.xml

Size: 2,453 bytes

Last Modified: 2025-10-06 14:06:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2129">
  <Title>Automatic Detection of Omissions in Translations</Title>
  <Section position="4" start_page="0" end_page="764" type="intro">
    <SectionTitle>
2 Bitext Maps
</SectionTitle>
    <Paragraph position="0"> Any algorithm for detecting omissions in a translation must use a process of eliminatiorl: It; must first decide which segments of the original text have corresponding segments in the translation.</Paragraph>
    <Paragraph position="1"> This decision requires a detailed description of the correspondence between units of the original text; and milts of the translation. To un(lerstand such correspondence, think of the original text and the translation as a single bitext (Hat ris, 1988). A description of the correspondence between the two halves of the bitext is called a bitext map. At least two methods for finding bitext maps have been described in tile literature (Church, 1993; Melamed, 1996). Both methods output a sequence of corresponding character positions in the two texts. The novelty of' the omission detection method presented in this paper Dies in analyzing these correspondence points geometrically. null A text and its translation can form the axes of a rectangular bitext space, as in Figure 1. The height and width of the rectangle correspond to the lengths of the two texts, in characters. The lower leg corner of ttle rectangle represents the texts' beginnings. The upper right corner represents the texts' ends. If we know other corresponding character positions between the two texts, we can plot them as points in the bitext space. The bitext map is the real-valued fnnclion obtained by interpolating successive points in the bitext space. The bitext map between two texts that are translations of each other (mutual translations) will be injective (one to one).</Paragraph>
    <Paragraph position="2"> Bitext maps have another property that is crucial lbr detecting omissions in translations.</Paragraph>
    <Paragraph position="3"> There is a very high correlation between the lengths of mutual translations ('p = .991) (Gale &amp; Church, 1991). This implies that the slope of segments of the bitext map flmction tlnetuates very little. The slope of any segment of the  mall will, in probal)ility, be very close tO the ratio of the lengths of l, lm two texts. \[n &lt;)ther words, the slop\[; of ma.p segments has vel'y low val'ia/lge.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML