File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2206_intro.xml

Size: 3,411 bytes

Last Modified: 2025-10-06 14:05:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2206">
  <Title>Incremental Construction of a Lexical Transducer for Korean</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper presents an incremental construction method of a lexieal transducer (LT) for Korean.</Paragraph>
    <Paragraph position="1"> A lexical transducer, lirst described by Karttunen, Kaplan, and Zaencn (KKZ) (Karttunen, 1992a), is a speeialized tinite state transducer (FST) that maps canonical citation forms of words and morphological categories to infh.'cted surface forms, and vice versa.</Paragraph>
    <Paragraph position="2"> l;\]'s have many advantages for stemming, morphological analysis, and generation. They are (i) bldirectlonah the same structure (:an be used for stemming and generation.</Paragraph>
    <Paragraph position="3"> (ii) etticient: the recognition and generation of word tbrms does not require the application of any morphological rules at runtime.</Paragraph>
    <Paragraph position="4"> LTs for li',nglish and French have been built at Xerox PARC within a t}amework known as two-level Inorphology (Koskenniemi, 1983). As described by KKZ(Karttuneu, 1992a), this can be done in three steps: (i) we construct a simple flnite-state automa ton (LA) that defines all valid lexical tbrms (LFs) of the language.. A I,F is a eoncateuation of stems and morphemes in their canonical dictionary representation. (ii) We describe morphological alternations by means of two-lew,.1 rules(Koskeuniemi, 1983; Kin't1 '\['his p~per was partially supporl;ed by t(orcan Science and h\]nginecring l,'onndation.</Paragraph>
    <Paragraph position="5"> tunen, 1993), compile the rules to finite-state transducers, and intersect them to form a single rule transducer (RSF). (iii) We merge the LA with the ffF by composition producing the 1/\[' that has on its lexical side every valid lexical form of the language and on the surface side the corresponding realization as determined by the morphological alternations of the language.</Paragraph>
    <Paragraph position="6"> KKZ argued that for l!'rench, it, was best to divide step (ii) into two stages. A three-level description was required to give a linguistically satisfactory account of the plural lbrmation of eompomld nouns. KKZ opted for two cascading two-level rule systems thai; are cornpiled separately, then intersected laterally and finally composed to a single RT.</Paragraph>
    <Paragraph position="7"> The task of building a morphological analyzer for ~t language such as Korean or ,I apanese is a much higher challenge than it is for l';uglish and French. A Kore.an verb may have more than fifty thousand inflected forms. = The Korean writing system (l\]anaul) does not consistently distinguish be.tween single and eompound nouns. Because llangul uses syllabic characters, changes in syllable strucl, ure are directly reflected in the. orthography.</Paragraph>
    <Paragraph position="8"> Because o\[' the complexity of the morphological alternations in Korean, it; is very difficult, although not impossible in principle, to describe them in a single two-level rule system or in a system that is limited to just three levels like the KKZ system for French. The most natural description of the Korean alternation is a cascade of rules of greater depth.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML