File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/j96-4002_evalu.xml

Size: 3,422 bytes

Last Modified: 2025-10-06 14:00:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="J96-4002">
  <Title>An Algorithm to Align Words for Historical Comparison</Title>
  <Section position="6" start_page="486" end_page="488" type="evalu">
    <SectionTitle>
5. Results on Actual Data
</SectionTitle>
    <Paragraph position="0"> Tables 3 to 10 show how the aligner performed on 82 cognate pairs in various languages. (Tables 5-8 are loosely based on the Swadesh word lists of Ringe 1992.) 3</Paragraph>
    <Paragraph position="2"> nosotros : nous 'you' n o s o t r o s nu ......</Paragraph>
    <Paragraph position="3"> quign : qui 'who?' k y e n ki-qug: quoi 'what?' k - e kwa todos : tous 'all' t o d o s  tu--una una : une 'one' (f.sg.) ti n dos : deux 'two' d o s d6tres: troix 'three' t r - e s t rwa hombre : homme 'man' omb r e oi-n o deg _ These are &amp;quot;difficult&amp;quot; language pairs. On closely similar languages, such as Spanish/Italian or German/Danish, the aligner would have performed much better. Even so, on Spanish and French---chosen because they are historically close but phonologically very different--the aligner performed almost flawlessly (Tables 3 and 4). Its only clear mistake is that it missed the hr correspondence in arbre : drbol, but so would the linguist without other data.</Paragraph>
    <Paragraph position="4"> With English and German it did almost as well (Tables 5 and 6). The s in this is aligned with the wrong s in dieses because that alignment gave greater phonetic similarity; taking off the inflectional ending would have prevented this mistake. The alignments of mouth with Mund and eye with Auge gave the aligner some trouble; in each case it produced two alternatives, each getting part of the alignment right. English and Latin (Tables 7 and 8) are much harder to pair up, since they are separated by millennia of phonological and morphological change, including Grimm's Law. Nonetheless, the aligner did reasonably well with them, correctly aligning, for example, star with stglla and round with rotundus. In some cases it was just plain wrong, e.g., aligning tooth with the -tis ending of dentis. In others it was indecisive; although it found the correct alignment of fish with piscis, it could not distinguish it from three alternatives. In all of these cases, eliminating the inflectional endings would have resulted in correct or nearly correct alignments.</Paragraph>
    <Paragraph position="5">  boca : bouche 'mouth' b o k a bu~ pie : pied 'foot' P y e pye corazdn : coeur 'heart' koraOon k6r ....</Paragraph>
    <Paragraph position="6"> ,~p,~, b - e r voir vel&amp;quot; vwa r venir : venir 'come' b e n i r voni r de0ir decir : dire 'say' d - - i r pobre : pauvre 'poor' p o b r e povro Table 9 shows that the algorithm works well with non-Indo-European languages, in this case Fox and Menomini cognates chosen more or less randomly from Bloomfield (1941). Apart from some minor trouble with the suffix of the first item, the aligner had smooth sailing.</Paragraph>
    <Paragraph position="7"> Finally, Table 10 shows how the aligner fared with some word pairs involving Latin, Greek, Sanskrit, and Avestan, again without knowledge of morphology. Because it knows nothing about place of articulation or Grimm's Law, it cannot tell whether the d in daughter corresponds with the th or the g in Greek thugat~r. But on centum : hekaton and centum : satom the aligner performed perfectly.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML