File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/p99-1034_abstr.xml

Size: 3,997 bytes

Last Modified: 2025-10-06 13:49:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1034">
  <Title>Bal~tzs Kis</Title>
  <Section position="1" start_page="0" end_page="261" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper introduces a new approach to morpho-syntactic analysis through Humor 99 (High-speed Unification Mo.rphology), a reversible and unification-based morphological analyzer which has already been integrated with a variety of industrial applications. Humor 99 successfully copes with problems of agglutinative (e.g. Hungarian, Turkish, Estonian) and other (highly) inflectional languages (e.g. Polish, Czech, German) very effectively. The authors conclude the paper by arguing that the approach used in Humor 99 is general enough to be well suitable for a wide range of languages, and can serve as basis for higher-level linguistic operations such as shallow parsing.</Paragraph>
    <Paragraph position="1"> Introduction There are several linguistic phenomena that are possible to process by means of morphological tools for agglutinative and other highly inflectional languages, while processing the same features requires syntactic parsers in case of other languages such as English. This paper provides a brief description of Humor 99 first presenting a general theoretical background of the system.</Paragraph>
    <Paragraph position="2"> This is followed by examples of the most recent applications (in addition to those listed earlier) where the authors argue that the approach used in Humor 99 is general enough to be well suitable for a wide range of languages, and can serve as basis for higher-level linguistic operations such as shallow or even full parsing.</Paragraph>
    <Paragraph position="3"> 1 Affix arrays rather than affixes Segmentation of a word-form in Humor 99 is based on surface patterns, that is, typical sequences of separate suffix morphemes are analyzed as a whole. For example, the English nominal ending string ers' (NtoV+PL+POSS) is a complex affix handled as an atomic string in Humor 991 .</Paragraph>
    <Paragraph position="4"> The string ers' is generated from er+s+ 's in an earlier development phase by a dedicated utility. The generator is able to make a finite set of affix sequences from an (even recursive) description 2. Running this utility can be considered the learning phase of the algorithm. The resulting suffix combinations are stored in a compressed internal lexicon structure that guarantees very fast searching) The entire algorithm shows features similar to the hypothesis according to which most segments of word-forms in agglutinative lan-We use mainly English examples in spite of the fact that English morphology is simpler than the morphologies of agglutinative and highly inflectional languages.</Paragraph>
    <Paragraph position="5"> 2 Depth of the recursive process can be given as a parameter. The method is similar to the one of Goldberg &amp; K=ilm=in (1992) used in the BUG system: the description is theoretically infinite, hut there is a finite performance limit when running.</Paragraph>
    <Paragraph position="6"> 3 The idea has something in common with the PC-Kimmo based analyzer of the University of Pennsylvania (Karp et al. 1992). Our compression ratio is around 20%.</Paragraph>
    <Paragraph position="7">  guages are handled as &amp;quot;Gestalts&amp;quot; by native speakers, instead of parsing them on-line. 4 This idea is not new in the literature: according to Bybee, &amp;quot;a psycholinguistic argument for treating (some) ending sequences as wholes comes from the observation that children acquiring inflectional languages seldom make errors involving the order of morphemes in a word.&amp;quot; (Bybee 1985) Another source is Karlsson: &amp;quot;The endings and entries are often listed as wholes, especially in close-knit combinations. 5 Such combinations are often subject to bi-directional dependencies that are hard to capture otherwise&amp;quot; (Karlsson 1986).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML