File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2206_metho.xml

Size: 19,126 bytes

Last Modified: 2025-10-06 14:13:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2206">
  <Title>Incremental Construction of a Lexical Transducer for Korean</Title>
  <Section position="4" start_page="0" end_page="1264" type="metho">
    <SectionTitle>
2 Morphological Alternations
</SectionTitle>
    <Paragraph position="0"> in Korean 3'he \[langul is a phonemic syllabic-based script where morphologieal alternations that change the syllable strllctnre of the word are rellected in the orthography (Korean Ministry of Education, 1988; Kim, 1990). This paper uses the so-called Yale system for representing llaugul in a Romanized \[brm, except; that we 2A &amp;quot; e-jel(word)&amp;quot; which is a spacing unit of llangul can consist, of a verb stem, scwu'al endings and pnstpositions. The A,I Lab of Dept. of Computer Science, l'usan National Univ. has more than 50,000 &amp;quot; c jcl&amp;quot; generated front &amp;quot;mck-~a(eat)&amp;quot;  use wue and oa instead of we and wa of the Yale system because we art(\] wa do not show that they are diphthongs, composed of wu and e and of o and a respectively.</Paragraph>
    <Paragraph position="1"> ,,:xa,np|e.~ (l)i.~:ld (2) i.v(,lvc thro(,~ .~i~,~,l,...,o(,phological alternat, ions: (i) the realizatiou of a stem final p in irregular predicates as a vowel in front of vowel-initial suffixes; (ii) let |to-right voweJ harnlony \[)degsell on partitioning of vowels into 'lighl,' (\[+light\]:a, o, oa), 'dark' nnd 'neutral'|l-light;l); (iii) tile realizal;ion of i~ morpheme boundary as a syllable boun(lary or as nothing.</Paragraph>
    <Paragraph position="2"> A syllabi(; boundary is introduced tle\[br(' fill(' last; consonant of irregular-p verbs/adiectiv('s when a vowel-initial suffix tbllows and the -p itself is realized as o if the preceding vowel is \[ blight\], otherwise 'wu/)y vowel harmony. Only some o\[' l, he predicates ending ill -p are irregular. In verbs I;hat, end in a vowel such as cwu %o give', Lhe vowel may merge with a sulfix-initial vowel to form a diphthong or il, may retain its syllabic, stdegIlls ill a two-vowel seqllenee.</Paragraph>
    <Paragraph position="3"> Wc usdeg &amp;quot;+&amp;quot; in the lexieal representation to marl( morpheme boundaries, &amp;quot;-&amp;quot; to mark syllable boundaries, &amp;quot;0&amp;quot; I,o represent deletion (surfaee side) and cpenthesis (lexical side), an(I two diacritic markers  {pVerb} \[br an ir,x'.gular -p verb and {rVerb} \['or a regular verb to tel)resent classes o\[' verbal si;ems.</Paragraph>
    <Paragraph position="4"> (,) (a) (b)</Paragraph>
    <Paragraph position="6"> Be.cause cwup is ~t, irregular-p verb, tile following phoneme a/e is a vowel and the iireceding syllallle wu is \[light\], p in (I) (a)is realized as 'wu. The. a/c is realized as c because l, he pre(:cding surface vowel wu is \[-light\]. At the same time, w'u aim c are eontracl;ed into a (liphl;hong wue wflieh is (loser|bed as the deM, ion of '%&amp;quot; in (a)of (1). 'Fhese two cha,~ges are linked in that one must not be, allowc(t to happ(m without the, other. Otherwise cwu-wu-c-se and cwu-wue-se would lie general;(~d, but ()lily cw'~t-w'tte-se is graLrttnatic&amp;\]. On tile other hand, in tile case of the regular verl) cwu, both cwu-e-se and the contrac~,cd variant; cw'tte-se are aceeptabh',.</Paragraph>
    <Paragraph position="7"> These rules (:ira lie described easily I)y two-level  nlorl)hol(&gt;gy as |Clews.</Paragraph>
    <Paragraph position="8"> (s) (i) A syllable boundary ( .... ) is introduced before a st;(.'m-fiu;d p in irregular -p verbs/~(Iject;ivcs when a vowel-initial suffix follows.</Paragraph>
    <Paragraph position="9"> (ii) h st(:m final p in irregular -p verbs/~d.iec.lives is realized as o if' the l)rec('diI,g vowel is \[+light\], otherwise wu.</Paragraph>
    <Paragraph position="10"> (iii) ale is r('alizcd as a if the 1)r('ee(ling vowel is \[+light\], othe.rwise (',.</Paragraph>
    <Paragraph position="11"> (iv) (a)The nlorpheme boundary following ir regular-p vcrbs/adjcet;ives is deleted be.</Paragraph>
    <Paragraph position="12"> fore a wnvel-init;ial sultix and realized as syllable bound~ry elsewhere.</Paragraph>
    <Paragraph position="13"> (b) The morl)hcme boundary in regular  verbs/a(ljectives can lie deleted or realized as a syllable 1)oundary (le.pe.n&lt;ling ell (;olitex\[,.</Paragraph>
    <Paragraph position="14"> With the hell) C' the Xerox two-level rule eolnpiler (%wolc')(Karttunen, 1992b) the rules deg.an bc compiled to finil;e state transducers ~md int;erseeted to a single trans(lueer. I)escribillg reich phenomena as paral\[('.l rules may be eomplie~t, ed hedegdeguse eaeh rule may be a t'ormul~tion of effed;s caused by several t)honologieal rules. For example, i,I f'orlnalizing (ii) as a t;wo-h.'vcl rill(; we |nus|, take into aceoun\[, bol, h irregular eonjugw t, ion C'-p v('rbs/n(ljt'ci, ives and vowel harmony. This is a not a desirable state of ~tfl'airs. We will coln(~ back t,o this l)oiut later.</Paragraph>
    <Paragraph position="15">  The first, st(q) in the coustruet;ion of a lexieM transdueer is to create a simple linite--state automaton for all wdid k'.xical tbrms of Korean. The lexical aul, omaton (I,A) is eomllosed wit;h l,he first set of rule transducers (R;I'). The result;ing transducer has on its &amp;quot;Ul/ per&amp;quot; side, |,he valid lexical forms, and on the &amp;quot;lower&amp;quot; si(le, interm0.(tiate represenl;aJ, ions derived fly the lirst set C' rules. This inl;ermediate transducer is composed with |,he second set of rule trmlsducers and tim in'o tess is itera|,ed several l;imes. At each stage ill tit(! process, the lexicaI si(le remains unchanged and the iut, erme(liate re\[)resenl,atious are changed by the new set C' rules. The \[ilml result is a transducer tim |asso clare's the valid lexic~d forms with their proper surface realizations. Concel)tually this is similar to what hap. |)ells ill a traditional phonologic.~d deriw~tion. Ill)wever, note thai, rul('s a.pply to |,he lexicon as a whole r~ther than 1,o individual words an(I (;It(: result; of e~(:h application is ~L new transducer. /~ecaus(' th(&amp;quot; intermediate levels (lisa,deg)pear in the eomposition, the resulting l/l' is equaJly well suited for morphological aualysis as it is for general;lolL The compila|,ioll aml int;ei:seel,ion of rule d;lNtlls(ltleers was done with the I.wole eompihw, the cousl;ruetio,  of the LA and the compositions we carried out with the Xerox interactive finite-state calculus ('ifsm').</Paragraph>
    <Section position="1" start_page="1262" end_page="1264" type="sub_section">
      <SectionTitle>
3.1 Construction of Lexical Automa-
</SectionTitle>
      <Paragraph position="0"> ton(Lh) The ifsm-utility enabled us to assemble the LA incrementally. The first step was to divide the total list of morphemes into snblexicons on the basis of their morphological type and to make a text file for each sublexieon. We added diacritic markers to the edges of certain types of morphemes in order to be able to enforce morphotactie constraints on valid morpheme sequences.</Paragraph>
      <Paragraph position="1"> Each sublexicon was compiled separately to a finite-state automaton. The sublexicons were used to con struct the LA with the help of the regular expression facility in the ifsm-toolkit..For example, having compiled a simple automaton from the list ofsm@le nouns, we could expand it to an infinite lexicon of compound nouns with the regular expression &amp;quot;noun.auto&amp;quot; \[# &amp;quot;noun.auto&amp;quot;l* '\]'his regular expression reads the noun automaton from a file and concatenates it with itself any number of times and marks the internal word boundaries with #.</Paragraph>
      <Paragraph position="2"> The first version of the LA was made in this way by combining sublexicons with regular operations (concatenation, union, iteration).</Paragraph>
      <Paragraph position="3"> In order to enforce morphotactic constraints on the concatenation of some classes of snflixes, we wrote a set of two-level rules that require or prohibit the occurfence of particular diacritics at certain suffix boundaries. Lexieal forms that do not satisfy the morphotactic constraints get eliminated in the composition with the well-formedness rules. The diacritics themselves are realized as zero so that they are not present in the lower side of the resulting transducer. The final form of the lexical automaton is obtained by extracting the lower-side from that transducer as a simple automaton.</Paragraph>
      <Paragraph position="4"> We believe that this incremental method of lexicon construction is better suited to morphologically complex languages than the lexicon format commonly used in two-level morphology. In standard two-level lexicons, individual entries contain intbrmation about which sublexicon they may concatenate with. The entire lexical structure is compiled in one step to large letter tree (Karttunen, 1993; Antworth, 1990). Our method is more tractable in two ways. Firstly, the lexicon can be developed and refined stepwise. Secondly, the morphotactic rules of the language are described explicitly as the regular expressions that construct the LA in conjunction with the well-formedness constraints that eliminate certain types of concatenations. In two-level lexicons of the standard variety, the morphotactic structure of the language is not described explicitly at; all. l~,ather, it is expressed in a very opaque and indirect way, in the sequences of links between entries and snblexicons.</Paragraph>
      <Paragraph position="5"> Sproat argued thai; two-level morphology of morphotactics leads to a somewhat inelegant model of long-distance dependencies and suggested the unficalion scheme, due to Bear, as a solution (Sproat, 1992). But unification scheme introduces additional runtime overhead. The above approach can easily and explicitly describe the fact that &amp;quot;-able&amp;quot; attaches to verbs formed with the prefix &amp;quot;en-&amp;quot; and does not require additional runtime overhead.</Paragraph>
      <Paragraph position="6"> We give a few examples of the difficulties in the description of Korean morphol;artics. There are two different types of endings: (i) non-tinM (verbal) endings for tense, modality, subject honorific or aspect, and (it) final (verbal) endings as cornplementizer, nominalizer and adjectivizer. The non-tinal endings are placed in fl'ont of final endings and must be followed by a suflix of the second type.</Paragraph>
      <Paragraph position="7"> (4) shows the ordering restrictions of non-finM endings. The parentheses indicate optionality.</Paragraph>
      <Paragraph position="9"> (4) compiles to a lexicon covering 20 difi&gt;rent compound non-final ending sequenees including null. 'l'his representation is clearly more informative than a simple listing of the members of the class. The proMbition of &amp;quot;Past+Perf+Will+l{.etro&amp;quot; in (4) can not be described by an adjacency table.</Paragraph>
      <Paragraph position="10"> In (4) we do not need any morphotactic diacritics on the left, because all non-final endings can combine with any verb and adjective stems arm the combination of non-final and final endings is controlled by the diacritics of the latter group.</Paragraph>
      <Paragraph position="11"> (5) shows three entries in the suhlexiron of final endings. Tim elements in square brackets are morpho tactic diacritics. (Square brackets indicate grouping, the vertical bar marks a disjunction.) 'Phe diacritics are deleted by well-formedness rules when the final endings are combined with other morphemes. The diacritics on the left of nun and nuu-ka shows that they can not combine with adjectives.</Paragraph>
      <Paragraph position="12">  (5) \[Verb I Adj I I{on I P;~t I WlU I I'~.~'f\] ~ ~ {l)~) ;</Paragraph>
      <Paragraph position="14"> inarking; &amp;quot;;&amp;quot;: the end of declaration, t, he meaning is the same as &amp;quot;1&amp;quot;)  '\['h,, dh.:,.i,,i(: ,,..,k,;,.. {D~,4, {q,,,} ,~,,a {C,:,.} have two ro\]es as l, he \[C&amp;|,III'(~ 0\[' t, hc i')lor\[)h(!l\[l(~S ~q,\[l(\] as I.he righl;-h~md (:(ml;ext. They r(!nl~in ill liual I,A bcc~msc they ~u'c t;hc tL'~l;ure of c~(:h mO,'l)hemc.</Paragraph>
      <Paragraph position="15"> I'}y (:onc~t, cn~l;ing tJle sul)ncl;works of col|ll)Olllld non-fimd ('.l~(tings and finM emliugs, wc get ~t suht(!x icon of endint~ sc(lll(!uce a.s showJ~l ill ((\]). The \[Vcrl) I Adj\] di~(:ril, ics indical,e I, hal, nou final eudings (:~m combine with ~my vcrh stems aud ad.jcctive st, elll,q.</Paragraph>
      <Paragraph position="16">  (6) (\[VerhlAd.i \] &amp;quot;, ...... i ........ ! ........ ti,ml ..... \]i,,g.aut,,&amp;quot; +) &amp;quot;llna.l ending.a.uto&amp;quot;  This con(:a,l,elm.l;ion pi'oduc(~s a,n iuil, ia.I lcxicon of 974!)8 (2*20*2378 t 2378) diffl!reul, sequences where 20 is the number o\['compouud non fiual(mdulgs ~utd 2378 is t, he numl)er or sequences of' fiual (!udiugs with t,\]lcir ulorphot~clAc di~cril,ics. This sol, is rcdu(:cd 1:o 7888 by ~ s(;t, o1 well \['ormedness rule,'; that elimim~l,e unw~ml,ed scqucuces mM delel,c the morlAlOl~act;ic dia cril, ics. The cO\]Ul)ositi(m of the iuit, iM Icxicou wil, h I,\]lc well \]'ormcdness rli\]cs pro(h~(:es a I,ra.lls(hl(:(;r \['rom which lhc lower side is exl, r~cLed as a simple ~u,tom~&gt; torl a, Ild lls,:!(I ill the coustruct, ion of i, he linal 1,A. Allowing uouns Lo fl'c(;ly (:Olnl)oun({ wil, Ii l,ouus (:1'(2 M;es ~ t)rol)lem I)cc~mse il, gives rise to ma, ny umt(:c(;I)l; null ~dfl(' or unlil(c\]y cO\]nlmunds. For examl)}e , the E)rtn cw'ang-krt~-z ha, s \[iv(! ~dt,(.'rn~\[Lc mialy,ses: (7) c'~.'.~ql-k~'.-i G) ......... . v,.,4,.,i,~a~,,)t ~(.,,h.i,:,:~ ....... k,,,) *(h) ,: ......... , ,,,,,,.(,,i,l,n,,)//:C/(~ ....... ) *((') ........... :/( ....... k ) -//keu~ ( Ii v,w,s.'d t.i ...... \] k/(s,,bject m*u'ker ) *(d) ............. ,( ....... k) #,:,,,, (H,,,..,.,~.,i ...... \]://:,(,,,,,,m,I ....... ) :'~ ((}) ..... &amp;quot;L,,,,\[\]( ....... k) &amp;quot;//~ ~ H f' ~(\[ ....... \]i ...... )  Our solul, ion wa.s to constrain cO\]Ul)OUlMing wit;h a. wcll-f'ormeduess rule I,hM, excludes COml~OUnd,'; with monosyllabic nouns (l(wou, 1!)90). 'l'hc (:Oml)l(!xit, y of Lhc n~orpl~ological Mt, ernal, iollS in KOl/(NLII iN HO high 1;ha J; we need ~m easy way 1,o give coHsl;ra.inl,s hlcremen ta\]ly. Our al)l)ro~tch is a consistelH, mM explicil, w{w o1' describing morphol, a(:tic rules iiMuding Ioug-distaI~cc.</Paragraph>
      <Paragraph position="17"> dcpen(h'.n cic~';.</Paragraph>
    </Section>
    <Section position="2" start_page="1264" end_page="1264" type="sub_section">
      <SectionTitle>
3.2 Composil;ion of l.exical A utoma~
</SectionTitle>
      <Paragraph position="0"> ton wil;h Rule Transducers A\['tc.r constructing t, lm I(ore~m I,A, wc derive from it, a h:xic:~l I,ra.ns(\[ucer by (:&lt;)lUl)OSing; lhc I,A with ruh! lJ'ans(lucers (l{:t's) iu sevcrM sl~g(!s. AI, ea.ch st,age the previous resull; is composed wil, h an 1{71' derived hy iu(,(;rsecl, ion from sew~r;d I,wo level rules. The rule sets i,,clude (i) morpheme gcne,'a;ion ,'tiles, (ii),.,los for ir ,.,~g,,h.. v.,.t&gt;.~/~.U.~:tivo.~, (iii)d..L..~,io,, ,:.k..., (iv) w&gt;w,~l harmony rules ~md (v) coni, r~cgion rules. Morpheme geuer~tl;ion ruh',s give a, surl';tcc r,mlizat,ion to morl)ho.</Paragraph>
      <Paragraph position="1"> \]ogic,%l tags, such a,s P0.sl,, l\[on(ori\[ic)&gt; el;c,, t{,uIes \['o:r irregtlla, r vcrl)s (lea\[ with final c, onson:-mts ~m,:t sy\]iah ilicat,ion. Dc'lel, iou rule~'; climiml, I,c ouc of l,wo ,~Mja cent vowels on morpheme boundaries. Vowel }ltu'rnony rules rcMize t, he h;u'rnonizing ~zrchiphone.me,'~ WU as 0 () I 1 11\[) '1~ ~t i ~ (l /d ~S (t () ll ~ delmnding on the quMity of l,hc Im'ce&lt;ling vowel. (~onb'~wtion rifles involve tim merging of ~M,iaceut vowels t.o a. dipht, hollg or a single w)wel ;is a result o\[' the loss of the iu(;crw:ning sylhd)le holm(hu'y.</Paragraph>
      <Paragraph position="2"> All, hough it; is possible in principle to wi'it,e jus|: oue l,wo-levcl rule sysl:cm I;\[HtL describes all l;Ile alter md, io\]m in lm, rallel, it is very difficult in practice Lo creal;e a rule sy~;l;em with l, lu~t degree o17 colnplexil;y.</Paragraph>
      <Paragraph position="3"> The cOnll)h'Mi,y m'ises \['tom t, ll(! \['act, t,\[Hd, the \['ormu hd, iou of every rule iu a t,wo level system de.peuds ou every rule I, ha.t h~t,'; ,'K)me elleel, out, hc c(mtexl, of I;hc rill(: l;\[lal, We ~tre I,rying l,o express. For ex~mll)lc , ir l, here is a ruh! I,haL forces X 1,o be dclcl,cd in I'ronl, of ~ Y ~md ~mot, h&lt;,.r rule thai; introduces Z between X a.nd Y, gl'ett~ (:a.re lnusl, \])c exer(:isc(\] \])y I, he rule wril;er I,o InMce sure I, tHd, bot;h rules ~re specilicd in {~ w~w I, ha(, leaves room \['or the ol;her rule Lo }uwe il;s ef\['(:cl, but does not, (lepeud Oil Jl; il' the (lclel;ion of X ~md the inscrl,ion o\[' Z ~u'e two iudcl)eudenL altermd;ions.</Paragraph>
      <Paragraph position="4"> T\]Ic t)tu'l,iouing of rule'; inl;o scl,s and I;he inl, crletwin~,; o\[ i\[lt(~rsecLiou arm (:onll)osil,ion I,;r(;atly simplili(:s t,h(&amp;quot; task o\[' creal, ing and updaJ,iug the rule system, lC/.,ll(~s t, hal. ttl)l)ly iu dilFcr(mt envh'olmlenl;s ~md (\]o not, M&amp;quot; t'ecl, each otJmr can be COml&gt;ihxl and iul;ersccl, ed easily, whereas rules filM, involve MI;crm~tions in overhtl)ping colll;ext.s ~u'c n\]osl, ea.sily ha.ndled hy l~l~.(:ing them in dill&gt;rent, levels in law. cascade. \[u ell'eel, l,h(', l'tll(.'s arc. I)artially ordered. Sproal; also nol, iced l;tml, rub; inter ~,~ctions which \[mty bc ca.sy l,o sl;a.Le ill Lct'ms o\[ orderc.d I:ILICS, ~tn'e O\['I,C\[I much lllOt:e di\[licult to sl,M,e m one two h:wl rule sysl, eln (Sl)roai,, 1992).</Paragraph>
      <Paragraph position="5"> For Korean, l, hc partil;ioniltg o\[' t, hc rules fbr morphological alterna.l,ious iut.o t, he five s('.l;s described al)ovc, apl)C~u:s 1,o be tim Ol)fimM choice. I';ach of the rules iu the lla'm#ld sl, au(hu'd orl, hogr;q)hy Imhl\]shed iu M{u'ch of 1988(l(or&lt;m M inisl;ry o\[ Education, l!)88) is descril)cd in the corresl)oudiug l,wo level r~fle Sel)~t rarely in our inli)lemcut,~l, ion. The order of rules Lakes l;hc roh' oF rule iutcr~cl;ions. In this casct~de, qw Mteri,;~l, ious described in sccl,iou 2 ~s ex~mqqe (3) ~\],I:Q split, het,ween three levels: (s) (i) Rule,q E:.r irregular predicates: A s.ylia, bh~ boumhu:y is introduced be\[brc the.</Paragraph>
      <Paragraph position="6"> stem \[ina, I p in irreguhu&amp;quot;-p vcrbs/~djective,s when ;r vowcl.iuitJal suffix follows. The t'ol lowing morph( mc homuhlry is deleted ~md p is rcMized as the harmo\]li/,hlg arcldl)houeme  The morpheme boundary &amp;quot; ~&amp;quot; (:an optionally be deleted between wu and e.</Paragraph>
      <Paragraph position="7"> The etfect of these rules with respect to the irregular  -p verb cwup 'to pick up' is shown in (9).</Paragraph>
      <Paragraph position="8"> (!t) (a) ..... 0 p {vWr~,} + *':- .~,-~ (b) ..... WU 0 0 tO- .~,, (c) ~ ~ .... ~, 0 0 .....</Paragraph>
      <Paragraph position="9">  The intermediate level, (b), is eliminated in the cascade, thus the final lexical transducer maps (a) directly to (e).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML