File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1050_intro.xml

Size: 1,774 bytes

Last Modified: 2025-10-06 14:05:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1050">
  <Title>Identifying Word Translations in Non-Parallel Texts</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Approach
</SectionTitle>
    <Paragraph position="0"> It is assumed that there is a correlation between the co-occurrences of words which are translations of each other. If - for example - in a text of one language two words A and B co-occur more often than expected from chance, then in a text of another language those words which axe translations of A and B should also co-occur more frequently than expected. This assumption is reasonable for parallel texts. However, in this paper it is further assumed that the co-occurrence patterns in original texts axe not fundamentally different from those in translated texts.</Paragraph>
    <Paragraph position="1"> Starting from an English vocabulary of six words and the corresponding German translations, table la and b show an English and a German co-occurrence mat~x. In these matrices the entries belonging to those pairs of words that in texts co-occur more frequently than expected have been marked with a dot. In general, word order in the lines and columns of a co-occurrence matrix is independent of each other, but for the purpose of this paper can always be assumed to be equal without loss of generality.</Paragraph>
    <Paragraph position="2"> If now the word order of the English matrix is permuted until the resulting pattern of dots is most similar to that of the German matrix (see table lc), then this increases the likelihood that the English and German words axe in corresponding order. Word n in the English matrix is then the translation of word n in the German matrix.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML