File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1037_intro.xml

Size: 3,514 bytes

Last Modified: 2025-10-06 14:06:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1037">
  <Title>Memory-Based Morphological Analysis</Title>
  <Section position="2" start_page="0" end_page="285" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Morphological analysis is an essential component in language engineering applications ranging from spelling error correction to machine translation. Performing a full morphological analysis of a wordform is usually regarded as a segmentation of the word into morphemes, combined with an analysis of the interaction of these morphemes that determine the syntactic class of the wordform as a whole. The complexity of wordform morphology varies widely among the world's languages, but is regarded quite high even in the relatively simple cases, such as English. Many wordforms in English and other western languages contain ambiguities in their morphological composition that can be quite intricate. General classes of linguistic knowledge that are usually assumed to play a role in this disambiguation process are knowledge of (i) the morphemes of a language, (ii) the morphotactics, i.e., constraints on how morphemes are allowed to attach, and (iii) spelling changes that can occur due to morpheme attachment.</Paragraph>
    <Paragraph position="1"> State-of-the art systems for morphological analysis of wordforms are usually based on two-level finite-state transducers (FSTS, Koskenniemi (1983)). Even with the availability of sophisticated development tools, the cost and complexity of hand-crafting two-level rules is high, and the representation of concatenative compound morphology with continuation lexicons is difficult. As in parsing, there is a trade-off between coverage and spurious ambiguity in these systems: the more sophisticated the rules become, the more needless ambiguity they introduce. null In this paper we present a learning approach which models morphological analysis (including compounding) of complex wordforms as sequences of classification tasks. Our model, MBMA (Memory-Based Morphological Analysis), is a memory-based learning system (Stanfill and Waltz, 1986; Daelemans et al., 1997). Memory-based learning is a class of inductive, supervised machine learning algorithms that learn by storing examples of a task in memory. Computational effort is invested on a &amp;quot;call-by-need&amp;quot; basis for solving new examples (henceforth called instances) of the same task. When new instances are presented to a memory-based learner, it searches for the best-matching instances in memory, according to a task-dependent similarity metric. When it has found the best matches (the nearest neighbors), it transfers their solution (classification, label) to the new instance. Memory-based learning has been shown to be quite adequate for various natural-language processing tasks such as stress assignment (Daelemans et al., 1994), grapheme-phoneme conversion (Daelemans and Van den Bosch, 1996; Van den Bosch, 1997), and part-of-speech tagging (Daelemans et al., 1996b).</Paragraph>
    <Paragraph position="2"> The paper is structured as follows. First, we give a brief overview of Dutch morphology in Section 2. We then turn to a description of MBMA in Section 3. In Section 4 we present  the experimental outcomes of our study with MBMA. Section 5 summarizes our findings, reports briefly on a partial study of English showing that the approach is applicable to other languages, and lists our conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML