File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/p93-1003_intro.xml

Size: 3,125 bytes

Last Modified: 2025-10-06 14:05:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1003">
  <Title>AN ALGORITHM FOR FINDING NOUN PHRASE CORRESPONDENCES IN BILINGUAL CORPORA</Title>
  <Section position="3" start_page="94304" end_page="94304" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Areas of investigation using bilingual corpora have included the following: * Automatic sentence alignment \[Kay and RSscheisen, 1988, Brown eL al., 1991a, Gale and Church, 1991b\].</Paragraph>
    <Paragraph position="1">  machine translation \[Brown et al., 1992\]. The work described here makes use of the aligned Canadian Hansards \[Gale and Church, 1991b\] to obtain noun phrase correspondences between the English and French text.</Paragraph>
    <Paragraph position="2"> The term &amp;quot;correspondence&amp;quot; is used here to signify a mapping between words in two aligned sentences. Consider an English sentence Ei and a French sentence Fi which are assumed to be approximate translations of each other. The subscript i denotes the i'th alignment of sentences in both languages. A word sequence in E/is defined here as the correspondence of another sequence in Fi if the words of one sequence are considered to represent the words in the other.</Paragraph>
    <Paragraph position="3"> Single word correspondences have been investigated \[Gale and Church, 1991a\] using a statistic operating on contingency tables. An algorithm for producing collocational correspondences has also been described \[Smadja, 1992\]. The algorithm involves several steps. English collocations are first extracted from the English side of the corpus. Instances of the English collocation are found and the mutual information is calculated between the instances and various single word candidates in aligned French sentences. The highest ranking candidates are then extended by another word and the procedure is repeated until a corresponding French collocation having the highest mutual information is found.</Paragraph>
    <Paragraph position="4"> An alternative approach is described here, which employs simple iterative re-estimation. It is used to make correspondences between simple noun phrases that have been isolated in corresponding sentences of each language using finite-state recognizers. The algorithm is applicable for finding single or multiple word correspondences and can accommodate additional kinds of phrases.</Paragraph>
    <Paragraph position="5"> In contrast to the other methods that have been mentioned, the algorithm can be extended in a straightforward way to enable correct correspondences to be made in circumstances where numerous low frequency phrases are involved. This is important consideration because in large text corpora roughly a third of the word types only occur once.</Paragraph>
    <Paragraph position="6"> Several applications for bilingual correspondence information have been suggested. They can be used in bilingual concordances, for automatically constructing bilingual lexicons, and probabilistically quantified correspondences may be useful for statistical translation methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML