XML Viewer - e93-1035

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/e93-1035_intro.xml
Size: 5,303 bytes
Last Modified: 2025-10-06 14:05:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1035">
  <Title>On Abstract Finite-State Morphology</Title>
  <Section position="2" start_page="0" end_page="297" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Finite-state approaches to morphology provide ways of analyzing surface forms by appealing to the notion of a finite-state transducer which in turn mimics an ordered set of rewrite rules. Instead of intermediate forms being introduced (as would happen if rewrite rules are used (e.g. \[Narayanan and Mehdi, 1991\] for Arabic morphology)), the finite-state transducer works on two tapes (one representing lexical structure, the other the surface structure) and switches states if the symbols currently being scanned on the two tapes match the conditions of the state transition. Following the distinction expressed by Kay \[1987\], two-level morphology is a specialization of finite-state morphology in that intermediate forms are not required even in the grammatical formalism (e.g. \[Koskenniemi, 1983; Koskenniemi, 1984\]). The only representations required are those for the lexical and surface forms, together with ways of mapping between the one and the other directly. Surface forms express the result of any spelling-change interactions between dictionary/lexicon primitives. A typical architecture of a two-level morphological system \[Karttunen, 1983; Kataja and Koskenniemi, 1988\] consists of a dictionary/lexicon component containing roots, stems, affixes and their co-occurrence restrictions, and an automaton component which codes for the mappings between dictionary/lexicon forms and surface realizations. null One of the problems faced by two-level approaches was their handling of nonconcatenative morphology. The main difference between Semitic and non-Semitic languages is that inflectional patterns are not straightforwardly concatenative (where morphemes are simply concatenated with roots, stems and each other) but 'interdigitate' or 'intercalate', i.e. the alTLx pattern is distributed among the constituents of the root morpheme. For example, the Arabic root 'd_r_s' ('study') intercalates with the inflectional pattern '_u_i_' (perfect passive) to form the stem 'duris' ('was studied'), which in turn can be inflected to signify number and gender 1. This nonconcatenative aspect of Arabic can be problematic for a traditional two-level approach which bypasses intermediate forms.</Paragraph>
    <Paragraph position="1"> The problem concerns the way roots, stems (roots for Arabic verbs, stems for Arabic nouns) and inflection patterns are represented and stored. It is obviously not practical to store all the possible inflected forms 1Modern written Arabic rarely marks the vowels (short vowels are marked by diacritics), in this case the 'u' and 'i' in 'duris', except in beginners' books on Arabic. The (text) realization has the form Mrs'.</Paragraph>
    <Paragraph position="2">  of each root. Instead, roots are usually separated from inflections. Morphological analysis of a string then consists of identifying the root and following pointers to inflections which may themselves contain pointers to other inflections \[Karttunen, 1983\]. The nonconcatenative aspect of Arabic means that, when processing a 'word' from beginning to end, different constituents of different inflections are ertcounted during root and inflection identification. The traditional idea of identifying a root and then following a pointer to types of inflection depending on immediately contiguous constituents of the inflection cannot be adopted. This forced the ALPNET researchers, for example, to adopt a novel way of storing and identifying inflections \[Beesley el al., 1989; Beesley and Newton, 1989; Beesley, 1990\]. In their system there are two types of lexicon: the root lexicon, and the pattern lexicon. The root lexicon stores (three-consonant) roots in the form 'X_Y_Z', and the pattern lexicon stores inflectional patterns in the form '_A_B_', where the underscores '_' are called de.</Paragraph>
    <Paragraph position="3"> tours. Starting with the pattern lexicon, the analysis routines recursively switch between the two types of lexicon whenever a detour character is found.</Paragraph>
    <Paragraph position="4"> This interesting solution raises the question of what aspect of morphology detouring is meant to reflect or express. If detouring is based simply on implementation and efficiency criteria, it is open to the possible criticism that an alternative, efficient way of handling intercalation which expresses some linguistic generalities whilst being consistent with the two-level approach should be preferred. Also, it is not clear what the implications of detouring are for parallel evaluation. However, one possible advantage is that detouring forces inflectional patterns to be kept together in the dictionary, rather than splitting them up into even smaller fragments, as might be required by a simple two-level approach. For instance, without detouring, patterns of the form '_A_B_' may need to be split up into lexical entries first for the 'A' and then, at a different level, for 'B'. The fact that 'A' and 'B' together represent a certain class of morphological phenomena might be lost.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML