File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-2178_evalu.xml

Size: 2,916 bytes

Last Modified: 2025-10-06 14:00:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2178">
  <Title>amp;quot;Using Cognates to Align Sentences in Bilingual Corpora,&amp;quot; Fourth International Conference on Theoretical and Methodological Issues in Machine</Title>
  <Section position="6" start_page="7098" end_page="7098" type="evalu">
    <SectionTitle>
6. Results
</SectionTitle>
    <Paragraph position="0"> This algorithm was applied to a fragment of the Canadian Hansards that has been used in a number of other studies: Church (1993) and Simard et al (1992). The 30 significant pairs with the largest mutual information values are shown in Table 9.</Paragraph>
    <Paragraph position="1"> As can be seen, the results provide a quick-and-dirty estimate of a bilingual lexicon. When the pair is not a direct translation, it is often the translation of a collocate, as illustrated by acheteur ~ Limited and Santd -~ Welfare. (Note that some words in Table 9 are spelled with same way in English and French; this information is not used by the K-vec algorithm).</Paragraph>
    <Paragraph position="2"> Using a scatter plot technique developed by Church and Helfman (1993) called dotplot, we can visulize the alignment, as illustrated in Figure 1. The source text (Nx bytes) is concatenated to the target text (Ny bytes) to form a single input sequence of Nx+Ny bytes. A dot is placed in position i,j whenever the input token at position i is the same as the input token at position j.</Paragraph>
    <Paragraph position="3"> The equality constraint is relaxed in Figure 2. A dot is placed in position i,j whenever the input token at position i is highly associated with the input token at position j as determined by the mutual information score of their respective Kvecs. In addition, it shows a detailed, magnified and rotated view of the diagonal line. The alignment program tracks this line with as much precision as possible.</Paragraph>
    <Paragraph position="4">  3. The low frequency words (frequency less then 3) would have been rejected anyways as insignificant.</Paragraph>
    <Section position="1" start_page="7098" end_page="7098" type="sub_section">
      <SectionTitle>
French English
3.2 Beauce Beauce
3.2 Comeau Comeau
3.2 1981 1981
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
    <Section position="2" start_page="7098" end_page="7098" type="sub_section">
      <SectionTitle>
2.8 Deans Deans
2.8 Prud Prud
2.8 Prud homme
2.7 acheteur Limited
2.7 Communications Communications
2.7 MacDonald MacDonald
2.6 Mazankowski Mazankowski
2.5 croisi~re nuclear
2.5 Sant6 Welfare
2.5 39 39
2.5 Johnston Johnston
2.5 essais nuclear
2.5 Universit6 University
2.5 bois lumber
2.5 Angus Angus
2.4 Angus VIA
2.4 Saskatoon University
2.4 agriculteurs farmers
2.4 inflation inflation
2.4 James James
2.4 Vanier Vanier
2.4 Sant6 Health
2.3 royale languages
2.3 grief grievance
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML