File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2211_metho.xml

Size: 9,842 bytes

Last Modified: 2025-10-06 14:13:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2211">
  <Title>A Knowledge Acquisition and Management System for Morphological Dictionaries</Title>
  <Section position="3" start_page="0" end_page="1284" type="metho">
    <SectionTitle>
2. Finite-State Morphology
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="1284" type="sub_section">
      <SectionTitle>
Systems
</SectionTitle>
      <Paragraph position="0"> One of the most widespread approaches to morphological dictionaries is finite-state morphology. Currently, the most attractive architecture is as in Figure 1 (Karttunen et al. (1992)):  Compared to the original systems of Koskenniemi (1983) and Karttunen (1983), the major improvements made during the past ten years were in the compilation of the finite-state transition tables, and the switching from analyzers to transducers. By pushing finite-state technology to its limits, the resulting finite-state machines are extremely t2qst. If we consider current finite-state morphology systems as a potential solution to the problem sketched in section 1, i.e. the acquisition and maintenance of morphological and lexical knowledge, they have a number of shortcomings. First, knowledge acquisition is not supported very well: the editing of the formative lexicon and the string alternation rules must be done with text editors. Hence, it is a major difficulty for the expert to conceive the lexicon's structure and the string alternation rules. In particular, the system provides little support for the visualization of the structuring and interaction between the entities specified. Secondly, the knowledge is represented specifically for the purpose of mapping from text words to their analysis and vice versa. Flexibility in this respect is lacking. Thirdly, the mapping problem as defined above is only partially covered: neither multi-word units, nor compounding and prefixation can be handled appropriately with the basic systems.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1284" end_page="1287" type="metho">
    <SectionTitle>
3. Word Manager
</SectionTitle>
    <Paragraph position="0"> The three drawbacks of finite-state morphology mentioned have served as a starting point in the development of Word Manager (WM). The resulting properties of WM are presented in the three following subsections.</Paragraph>
    <Section position="1" start_page="1284" end_page="1285" type="sub_section">
      <SectionTitle>
3.1. Coverage
</SectionTitle>
      <Paragraph position="0"> A strictly finite-state mechanism has a number of problems in covering natural language morphology, as has been recognized earlier (e.g. Kay (1987)).</Paragraph>
      <Paragraph position="1"> In order to treat prefixation and compounding in a way parallel to suffixa- null tion, the rules for combining formatives in WM are context-free. Another difference with two-level morphology is the basic distinction between inflection and wordformation in WM. Inflection is treated as the paradigmatic realization of certain features on a lexeme, whereas wordformation is the application of a rule to a lexeme, resulting in a new lexeme. This type of distinction is linguistically motivated and pragmatically elaborated by ten Hacken (1993).</Paragraph>
      <Paragraph position="2"> The rules for string alternation in WM are similar in function to two-level rules, but two additional ways are offered to restrict their domain of application. Whereas two-level rules can only see and change strings, WM string alternation rules can also see (but not change) features. Besides they can be defined for individual classes of lexemes or for individual entries, for the treatment of exceptions. The entire formalism of this part of WM is described in Domenig &amp; ten Hacken (1992).</Paragraph>
      <Paragraph position="3"> The subsystem Phrase Manager, described in Pedrazzini (1994), covers all cases where the mapping between text words and lexemes is not one-to-one.</Paragraph>
      <Paragraph position="4"> This includes (graphic) clitics and multi-word units. The clitics mechanism may split up text words and rearrange the parts before further analysis. Multi-word units are recognized and assigned a structure on the basis of the string, and treated as possible analyses alongside the literal ones.</Paragraph>
      <Paragraph position="5"> Together, the mechanisms included in WM cover the entire mapping problem.</Paragraph>
    </Section>
    <Section position="2" start_page="1285" end_page="1286" type="sub_section">
      <SectionTitle>
3.2. Compilation
</SectionTitle>
      <Paragraph position="0"> The WM formalism can be compiled in a variety of ways. As opposed to compilation of current finite-state systems, the output is not restricted to an analyzer or transducer.</Paragraph>
      <Paragraph position="1"> The basic compilation converts rules and entries into a network structure recording links between rules, formatives, and lexeme entries. On the basis of this structure, WM can offer dynamic knowledge aggregation options, to be used for browsing, debugging (rule application tracing) etc.. Another use of this structure is in the access to the data by client applications, where any type of view on the data is supported.</Paragraph>
      <Paragraph position="2"> By compiling the wordformation rules into a unification-based grammar, WM can analyze unknown words, and, drawing from the information gathered in the parse tree, generate their lexeme entries (including all inflectional forms of the new entry). Thus, the word-formation rules can be used for semi-automatic construction of the dictionary. In principle, there is no limit to the types of different formats and rules which can be generated out of the basic network structure. It is straighttbrward, tbr example, to generate the input for Koskenniemi's two-level system (Koskenniemi (1983)) or Karttunen's lexical transducer system (Karttunen et al. (1992)).</Paragraph>
    </Section>
    <Section position="3" start_page="1286" end_page="1287" type="sub_section">
      <SectionTitle>
3.3. Knowledge Acquisition
</SectionTitle>
      <Paragraph position="0"> In the construction of a WM dictionary, linguistic and lexicographic knowledge are separated. As opposed to coding in finite-state morphology, it is not a single person who has to keep track of all the rules, know all special symbols and write all the entries. Instead, tailor-made interfaces for linguistic and lexicographic experts have been developed.</Paragraph>
      <Paragraph position="1"> The linguist's interface supports the description of the lnorphological rule system ol' a language in the WM-formalism. Besides formulating the rules themselves, the linguist has to give at least one example per rule, and to describe all exceptions.</Paragraph>
      <Paragraph position="2"> The compilation step checks syntactic correctness and signals a number of semantic errors or probable errors. The compiled database can be inspected to see if the rules yield the desired results. The interface offers an incremental compilation facility, so that it is possible to get an inspectable version of the database without entirely recompiling it.</Paragraph>
      <Paragraph position="3"> This makes a gradual expansion of the rule system with frequent testing of intermediate results feasible. Thus, the way compilation has been implemented in WM contributes to a higher degree of consistency.</Paragraph>
      <Paragraph position="4"> In the lexicographer's interface, the rule-base coded by the linguist is presupposed and two tasks are supported: deg New formatives (stems) can be specified and linked to existing regular inflection rules.</Paragraph>
      <Paragraph position="5"> * Existing l'ormatives can be combined in novel ways, according to existing rules lot wordformation and multi-word units.</Paragraph>
      <Paragraph position="6"> Obviously, the interface Call show the consequences ot' any lexicographic decision for the l'orms generated by the rules concerned. More active support is also provided, so that the system proposes an analysis interactively, t;or the first task, an inflection rule is proposed on the basis of forms given by the lexicogra~ phcr. In the second task, the support consists of parsing the input on the basis of the rules in the database. The lexicographer can choose and inspect one of the proposals, confirm it as it is, or go back and choose a dill'trent proposal.</Paragraph>
      <Paragraph position="7"> The resulting entry is incrementally added to the lexical database.</Paragraph>
      <Paragraph position="8"> The lexicographer's interface helps the lexicographer concentrate oil lexicographic decisions, rather than problems of encoding them. Since wordformation accounts for a major portion of a language's vocabulary, and this part of a WM-database is constructed by rule application a high degree of consistency is guaranteed.</Paragraph>
      <Paragraph position="9"> 4. Present State of the Project WM has been fully implemented as a client/server system, where tile server runs on either Sun workstations or Macintosh computers; the client cur- null rently runs on Macintosh only. The implementation includes Phrase Manager, the compiler, and the linguist's and lexicographer's interfaces as described above. A comprehensively documented version has been published on an ftp server (ftp@ifi.unizh.ch).</Paragraph>
      <Paragraph position="10"> A full rule database, including morphological and phrasal rules, has been implemented for German, and complete morphological rule databases for Italian (described by Bopp (1993)) and English.</Paragraph>
      <Paragraph position="11"> The development of these databases has shown that the expressive power of the WM-formalism is sufficiently rich for at least some of the languages most frequently used in NLP-systems currently under development.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML