File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-3208_abstr.xml

Size: 1,285 bytes

Last Modified: 2025-10-06 13:45:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3208">
  <Title>Morphology Induction from Limited Noisy Data Using Approximate String Matching</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> For a language with limited resources, a dictionary may be one of the few available electronic resources. To make effective use of the dictionary for translation, however, users must be able to access it using the root form of morphologically deformed variant found in the text. Stemming and data driven methods, however, are not suitable when data is sparse. We present algorithms for discovering morphemes from limited, noisy data obtained by scanning a hard copy dictionary. Our approach is based on the novel application of the longest common substring and string edit distance metrics. Results show that these algorithms can in fact segment words into roots and affixes from the limited data contained in a dictionary, and extract affixes. This in turn allows non native speakers to perform multilingual tasks for applications where response must be rapid, and their knowledge is limited. In addition, this analysis can feed other NLP tools requiring lexicons.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML