File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1006_metho.xml
Size: 19,891 bytes
Last Modified: 2025-10-06 14:13:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1006"> <Title>Two Methods for Learning ALT-J/E \]Y=anslation Rules from Examples and a Semantic Hierarchy llussein Ahnuallim ln\[o. and Coml)uter Science Dept. King Fahd University of l)etroleum and Minerals l)hahran 312(;1, Sated( lXrahia</Title> <Section position="3" start_page="0" end_page="59" type="metho"> <SectionTitle> 2 ALT-J/E: A Brief Overview </SectionTitle> <Paragraph position="0"> ALT-.I/E, the Automatic Language Trlmslator: Japanese to English, is one of the most &dvitll(:(}d and well-recognized systems for translating ,htpanese to English. It is the largest such system in terms of the iunount of knowledge it compris(~s. In this work, we are concerned with the li)llowing components o\[' the ALT-J/E system: 1. The Semantic lliera.rchy, 2. The Semantic Dictionary, and 3. Tile Translation l{ules.</Paragraph> <Paragraph position="1"> We briefly describe each of these COmln)nents below. For more details al)out the AI,T-.I/E system, we refer the reader to \[lkehara et M. 1989, Ikehara et al. 1990, ikehara et al. 1991\]. As shown in l&quot;igam~ 1, the Semantic ltierarchy is it SOFt of colt(:el)t t}l(?SltllrtlS represented its it l;l'(?e structure in which each node is called a .SC'IIta'tttiC categolw, or a (:atego'l~9 R)r siml)licity. Edges in this structure represent &quot;is-a&quot; relations am(rag the categories. For example, &quot;Agents&quot; and &quot;P(!ople&quot; (see Figure 1) are both categories. Tile edge between these two (:ategories indicates that any instance of &quot;l)eoph~ '' is also an instance of &quot;Agents&quot;. The current version of ALT.l/E's Semlmtic llierarchy is :12 levels (let, I) and has about 3000 nodes. The Semantic Dictionary maps (~it(:h .\]~4pall('.sC IIOtlll to its aI)prol)riate SeItlalltic cRtcgories. For example, the Selilalltic D!ctionary states that the noun )~!:~ (niwatori), which meahs &quot;chicken&quot; OF &quot;h011&quot; ill English, is an instance of the categories &quot;Meat&quot; and &quot;Birds&quot;.</Paragraph> <Paragraph position="2"> The Translation Rules in AUI?-J/E associate Japanese patterns with English patterns. Currently, ALT-J/E uses roughly 10,000 of these rules.' As Figure 2 shows, each translation rule has a .\]apanese fret tern its its left-hand side and all English pattern as its right-hand side. For example, the first rule in this figure basically sltys that if the ,Japanese verb in a sentence is ~J'~ < (yaku), its subj('(:t is an instance of &quot;l)eople '', and its ol)ject is an instance of &quot;lh'ead&quot; or &quot;Cake&quot;, then the following English pattern is to be llS(?d: Sub.jeer &quot;l)ake&quot; Obj(!ct.</Paragraph> <Paragraph position="3"> Note that in this (:~e the Japan(!se verb ~y~ ((yaku) is transhtted into the English verb 'q)akc'&quot;. This slune .\]aI)anes(! yet'l) cait also be translated into the English verbs &quot;roast&quot;, &quot;broil&quot;, &quot;crenmte&quot; or &quot;burn&quot;, dependlug on the context. These (:~Lses axe }landled by the fore&quot; other rules given in Figure 2.</Paragraph> <Paragraph position="4"> Translation rules are meant only to handle basic sentences that contain just a single .\]itl)a.ltt.'se ver}). Such sentences are called &quot;simple selitellCeS. ''2 '\[l'o translate a comlllex sentence, M;\]'-,I/E does various ldnds of pre- and post-proc(~ssing, l/oughly speaking, the given complex sentence is first broken into a collection of simple sentences in the we-processing phase. Then, the English translations of these are combined together in the post-processing t)}u~se to give the final translation of the complex sentence.</Paragraph> <Paragraph position="5"> To translate a simple sentence, AI:I'-J/E looks for tile most ai)I)roi)ria.te translation rule to use. Based on the VOl'b of the sentence, the system considers ius candidates all those tra.nslation rules that have this verb on their left-hand side. 'l'he English pattern of the rule, whose JaI)imese pattern matches the s0Iitell(:(! })est is th(!ii osod to generate the desired English translation.</Paragraph> <Paragraph position="6"> As shown in Figm'e 2, the ,Ial)anese patterns are exln'essed using th(, wu'iM)les NI, N~,..., etc., which r(!\]\[)H}s(}llt variollS COIllp()lleIltS of it Ja, pallese S(~Ilt(!llCe~ such as the subject, the ob.iect , et(:. :l The &quot;degree O\[ llilttchillg ~ \])otw(R!II it ,\]ltl)alles(.' \[liltt(!l'lI itlld it Sl~llfence is based on how well the values o\[' these vltriables for the given sentence match those categories required by the Japanese pattern. 'Fhe Semantic Dictin fact. AUI-J/E has three dith'rel,t kinds of translation HI.s: (i) the senlauti(' pal teru transfer rules (ront~,hly 10,000 l'uh,s). (it) the idiomatic expression tl'itli~.fer l'lll(.s (/i\])Oltt 5.000 rules), and (iii) the p, en,.ral trallsfer rllh,s. We lINt ~ the lt'Hll &quot;'Tl'~illSliitii)ll l{llh.s&quot; 11t,1&quot;(, Io l'(,fel' to I\]le .Siqllilliti( l)itttUllt trailsti,r rules. These form the majority of the rulos, alld they are the most fl'equently used by .kUI'-J/E.</Paragraph> <Paragraph position="7"> ~'lhe I(,i'lli &quot;'siml)le S(~lllt,llC( ,'&quot; iS it (lilei't translalitm of IgS~ (taulmn) in .lal);UleSe.</Paragraph> <Paragraph position="8"> :l'\]o be precis...\]al)iil|~,s(, NI'llI,'II('t'N ill't* I|SllaIIv \])/tl'sed illIO a set ol (Olnlmn~mts (('ailed ~I - I{'}~ ~ - ~, E - t~, etc.) that iIl'e quite di|felt'll! froln those used in English. Using &quot;'sul)j(.cI&quot; and &quot;'ob.i('ct'&quot; \]1(~1( ' is ouly lilt'Hilt to Cits(' lhe discussion fin' English l'ell (I(TS.</Paragraph> <Paragraph position="9"> dopth 4 clopth .g (Ioplh 6 dopth Z ctopth 8 @ : People X~\[ HLIRlar/N ~ Old / YOLInO / ,, Male / fem~41(~ ~&quot;'-~Maldeg / Female~-~ ~ Male &quot;,,',--,,. ~ f:emalo l,'igur(~ 1: q'h(, upper h!v(!ls of th{! Semantic lli{war(:hy in AI:I'-,I/I'2.</Paragraph> <Paragraph position="11"> &quot;1)ranslation rules in the AI,T-,I/I~ system have so far been composed manually 1)y hunmn (!xl)erts. flowever, due to the high cost-1)er-ruh.' , and b(~(:aus(~ of the huge nmnlmr of translation rules needed fl)r AL'I'-,\]/I); to carry out ;t reas()nabl(.' transhttion job, the manual apI)roach hms been conchided by the d(~veloI)ers of AUI'-J/I'~ to be impracticld. In particular, the l'(,lh)wing l)roblems have been wported: * lhiilding and mmntaining the translation rules require *t greltt deal of expertise. &quot;1&quot;o qualify tin&quot; this task, skillflfl exI~erts are required not only to master both aal)anese and l!;liglish, Init also t() b('. flflly fiuniliar with Al;I'-J/l';'s large S(~lnanti(: llierarchy and to understand the overall l)l'()(:(.'ss of the system. Such qualifications are costly and involve extensive training.</Paragraph> <Paragraph position="12"> . In spite of the wmt am(rant of resourc(~s spent on tmilding the current ruh!s of AI2F-.III'; by human exports, faults are still detected from time to tinm, InalC/ing the malnt(!ilance of th(; system ~t ('oiltillllOliS r(~(|ll\]r(!Iil(}ilt.</Paragraph> <Paragraph position="13"> (r) The translaf.ion rules are not qnite coucrch: and vary dep(mding on the exI)ert. Rules (:onstructed by Oil('. oxpcl't ~-tl'(~ 11(){; (}asy for \[tiloth(H&quot; (}XpCl'\[, t() understand and modify. This makes the. maintcnine(! process ll)ore difficult and ii'lltkl~s it hard to substitute an expert by another, - An important o/)jective is to tmild sI)ecialized versions of ALq'- .} /I,; to be used in specitic al)pli(:ai;ion domnins. 'l?he Illttllllltl ai)proach is o/)viously unrealistic since it illvolveS Inor(! irainiug of the human experts with r('sp(!(:t I;() the l;arg(!f, application doina.in, alld I)(~(-itllS0 this l)rocess hm; to |)e repeated for (!v0ry new d()lHiliil.</Paragraph> <Paragraph position="14"> * One. of the problems fitting the design('rs of A1;I'-J/l~: is the refinement of the Smnantic lli(!rarchy. Whenever this structure is altered, the translation rules mnst also t)e revised to r(qh*(:t the change. Such revision is extr(~mely troubh~sonu., and error-prone if it is don(; mamlally.</Paragraph> </Section> <Section position="4" start_page="59" end_page="59" type="metho"> <SectionTitle> 4 A Machine Learning Ap- </SectionTitle> <Paragraph position="0"> proach &quot;\['lie problems we have just listed regarding the man-ual construction of A\[f.\['-,l/l'\]'s translation rules are largely solved if the process can be automated. An attractive approa(:h to this l)robhmi is lto resort to inductive machine learning techniques to extract the desired translation rules fl'om examples of .laI)anesc sent(m(:(~s and their English translations. At tit(.' on> rent stage, how(wet, learning translation rules fully automatically from eXaml)les alone seems to lm too chalhmging. A more realistic goal is to minimize rathc'r than to totMly eliHlinat(~ the intervention of human exp('rts in the rifle aquisiti~m process. Thus, OIll&quot; Cllrl'(?Ilt o1)jectiv(~ is to ('OllCOIltl';itt(~ 011 ~Ult.Olll~tting l;he niost ditlicult and tinl(>(:onsnlning parts of the niallllal procedure.</Paragraph> <Paragraph position="1"> The goal of the pr(!sent work is to learn what we call &quot;partial translation rules&quot;. A partial translation rule consisls ()l&quot; the left-hand side along with the English verb of the right-hand side of a translation rule. hi other words, the otlly diflin'en(:e between it transla.tion rul(.' and at partial translation ruh j is that the latter has only an I'\]nglish verl) rather than it full English patt0rn its its right-hand side.</Paragraph> <Paragraph position="2"> Constructing a partial translation rule is the most ditllcult part of constructing a. tl'anslati(m rule. lnd(~e(l, t;/ll'liillg it l)itrtial Fill{! into a comlil(!te one is a relatiw~ly easy t;ask that can Im done by a human operator with moderate knowh!dge of English and ,J al)~Ul(!Se.</Paragraph> </Section> <Section position="5" start_page="59" end_page="61" type="metho"> <SectionTitle> 5 Learning Task and Methods </SectionTitle> <Paragraph position="0"> In this work, we investigate two dift'erent inductiw, l('arning algorithms. Before talking about these algorithms, we will first IIiMc.e the learning task more precise, alid shed some light Oil the diftlculties that distinguish it from other previously studied learning tasks.</Paragraph> <Section position="1" start_page="59" end_page="60" type="sub_section"> <SectionTitle> 5.1 Tile Learning Task </SectionTitle> <Paragraph position="0"> The .iol) of a learning algorithm in our setting is to construct partial translation rules, l,'or a given ,lapan(~s(! verb ,l-vcr'b and a l)ossil)le English transhltion l,?-vcrbi of that verb, the MgorMlm has to llnd the npln'ol~riate condition(s) that should hoM in the ('i)litoxt ill Ol'dOr ti) Illlt 1) ,\]-'O,f~'l'l) to E-VC.'tq)i. As an exmnlfl(! , consider the ,lapanese verb /!E 5 (tsukau). This verb corresponds to the English verbs &quot;use&quot;, &quot;spend&quot; and %ml)loy&quot;. The c}loice aniong these IDn.t~lish verbs del)(mds mostly on tim o}@ct of the sentence, l,'or example, if the object is mi instance of &quot;Asset&quot; or &quot;Time&quot;, then &quot;spend&quot; is itpl)ropriate. Thus, it rough rule for mapping C/< 5 (tsukau) to &quot;Slmnd&quot; may look like</Paragraph> <Paragraph position="2"> \VO S(!('I'7. to \]Oitl'll this kind of l'lll(!s frolll exatl~ll)lt!s of ,hil)anese senti.mces and their I:;nglish translations, such as the following pair:</Paragraph> <Paragraph position="4"> After parsing (which is carrie{l trot by AI,T-J/Iq's parser), the. above exanq)le gives the ft}llowing l)ail':</Paragraph> <Paragraph position="6"> lly looking np the Semantic l)icti{)nary of AI/I'-.I/IQ the i}ossibh~ semanti{: catep;ories ft}r (mjyo are &quot;Noble Person&quot;, &quot;Daughter&quot; anti &quot;Female&quot;, antt thosP for kane are &quot;Asset&quot;, &quot;Metal&quot;, &quot;l)ay&quot; and &quot;M*'dal&quot;. Thus, this example is tiredly giwm to the learning alp;tn'ithm in the folh}wing fl)rm:</Paragraph> <Paragraph position="8"> where N :~ ,%&quot; indicates I;hat t}m senl:(m{'t' c()mI)(}n{mt N is an instant:e, of each category s (2 ,5'. '\['lw p;('n(wal fin'mat t)t&quot; the training examI)h's is as f{)ll{iw~<</Paragraph> <Paragraph position="10"> (sul}ject, ol)ject, etc.), mitt ea{:h ai,bi, and ci is a senlantic category.</Paragraph> <Paragraph position="11"> lY=om the viewpoint of machine learning r('s{!ar{:h, the al)t)vt~ h'.arning task is inter{~sting/(:hall('nl;in~: from two l}erspet:tives: ~, Iluge~ amount of backgrom,{l knowledge: 'lb I}e apl)roI}riate for our learning task, the learning algorithm must efl'{~ctively utiliz{~ AI,T-J/E's large Semantic lIierarchy. This requiremerit of being {'al}abk' t)f t~xl}l()iting such a hug{' amount of lm{:kgrt}und knt)wh~tlgt' (lisqualilics most of the known inductivt~ learning algorithms froln dirct:tly l)eing nsed ill our domain.</Paragraph> <Paragraph position="12"> (r) Ambiguity of the training examI)h~s: Unlike mr}st known learning doinains, tim trainint~ exa.mph,s in tmr setting (as givml in Et I. (l)) are ambiguous in the sense that cat:h (ll the varial)h's (SUII.IECT, OILIECT, etC,) iS assignt~tl multipl(' wdues rltther than a single value, l&quot;(){:usinl~ t}tl the rehwant wdu{!s (that is, the va\]ue~; tha.t contrilmted to the chtlice of the t,;nplish v(!rb) is an extlTit challenge to the l(!ill'Ii(!r ill ()Ill' (l{}IIlaill. To deal with th(' above learning l)l'{)bh!m, w{! investigate{l two al)I)roat:hes. One is based {m a tl~e()retical algorithm introdnc(,d by l lm~ssh,r fin&quot; learnint~ internal disjunctive conceI)ts, and the (,thor (m tht, wdl-known ll)3 alg(}rithm t)f QuiMan.</Paragraph> </Section> <Section position="2" start_page="60" end_page="60" type="sub_section"> <SectionTitle> 5.2 Haussler's algorithm for learning </SectionTitle> <Paragraph position="0"> internal disjmlctiw', exl)res:dons hi ()lit t\[l'S|, al}I)roach, we relwt'stml the c(m(lil.i(ms (}f the h~arned partial translati{m rules as i~h:rTml disj'uncli'vc c.:lPp't'cssio'tts, an{1 mnI}h)y an all;or(tirol given l)y llaussltw for learning {:oncel)ts exprbssed in this syntax, lhulssh!r's alg(}rithm enjt}ys many adwmtaD's. \]:irst, it has lwen analytically t}rt}vtm to l}e (luite tqficient both in terms of time and the munt)t'r (if ('Xaml)h's nt'(,detl f{), learninp;. S{!ct)ntl, tlw aIp;orithnl is Cal}al)le {)f exl}licitly utilizing the I)a(:kgrtmn{I kn(iwledgt~ rt'pr{'sentt~d \]}y tht~ Semantic llier;U't'lly. Mt)r('{tvt!r, l.\]le latlgPSttage used \]}y hlllrla.l |eXl){!rl.s It} t't)nslruct AI:I'-,I/E's rules is quite similar t,t} in((!rhal disjunctivt~ expr{~ssit)ns, suggesting the aI)prol}riateness ()f this alpiocithul's bias. 1 laussler's alporithm, on the other hand, suflbrs the iml)ortant sht)rtctmfing (within ()ur setting) that it is not Cal}abl{! t}f It,art> ing from ambiguous examl}h's. In orthq&quot; t,o I)e able t() use the algt}rit.hm for our tav~k, the atnl)ip;uity has It} be exl)licitly r('m(wt'(1 fr{}m all the training (~xanll)lt's.</Paragraph> <Paragraph position="1"> ()f c(,m'se, this al}i)rtmch is not desirable I)t'lraust~ it r{xltlil'{!5; s(}lllO ilti{,rvt'ltti{)ll t)y a, hllllliIll eXl)tWt im(l \])(,{'ause tht'rt~ are st) {,31aratd.t'('s that tlisam})ip~ual.itm iS doll(! ill \[I l)crfi~ct mamm\]'.</Paragraph> </Section> <Section position="3" start_page="60" end_page="61" type="sub_section"> <SectionTitle> 5.3 Quinlan's 11)3 </SectionTitle> <Paragraph position="0"> ()ur st!cond ai)l)roach is based on th(~ 11)3 algorithm intrtMuced I)y Quinlan in \[Quinlan 198(;\]. As il~ is, 11)3 is ilot al}lc ~ to utilize the 1)ackgrt)lmd knowledge of (mr domain, nor is it capable of dealing with ambiguous trahlhlg examplt!s of the form given by lCt I. (1). It. b; (:h!arly inal)l}rtq)riat:t! to {xt!al, NI, ~V2&quot;&quot; its multivahwd variabh's, which is the tilt)st, c()tlllll{)ll w}/y o\[ using I1)3. This is because of the hug(.' munbm&quot; of wdllt'S thest,, variables (:till Lake, ilIld IllS() I)(!CILIIS(~ V,'t! lit!(!({ to ext)loit the Ba{:kgromM knowh!dge represented by the Semantic 1 Ih!rarchy.</Paragraph> <Paragraph position="1"> To bt! ablt~ to use 11)3 ill {}llr d()lllllill~ We I.l'}tllSft}rm the training exanq)les into a new representatitm thai. can \])l! handled by 11)3. The tla.nsfornial.ion wt! ln'Ol)t)se is (lime in a way such that the \]'elevant inf(}r-III;l\[.i()II fr()Ill tll(~ t.ho StTIIla.llt.ic lli(!rar{:hy art! inchM{!d in the newly rel}rt~s('ntt'd eXaml}h~s, anti, id, tilt! HD.III(! (lille, these nt'wly rt'l}restmted eXaml}l('s still r{qlect the amBiguily l}rt's('nt iu tim t)rit~inal (!Xaml)l('s.</Paragraph> <Paragraph position="2"> ()Ill&quot; t.FilllSf()I'IIID.tit)ll lIl{q;hotl is d(~scril)ed as follt)ws: L('I. A I}{y tlw set ()f all the catetv)rit's (hilt alIl)tmrc(l in the (raisin(; exanll)h's , and t,heir ancesl.t}rs. I:or {wery c (! :1, w(! (It!lint! it bhml'y f(!atui'{~ a.s ;t tt!sI; t)\[ th{! t(ll'Ill Is Ni an instance of (2 For it training {!Xmnl)le (\[N, ~-.,fi,... Ni ~ Si,.-. N,, -S,,\], l'LVcr'b), we let the t)utctmie of the abt}ve test I}e t't'm', if and only if tiwrt! exists some s ~ Si such that s is ;Ill an{'e:~t{w of ,&quot; in the }~{'nlanlic I1itwar{:hv, ()r (: itself. Using {hi,s{, features, we c(mvtwt each t}f lhe {raininl,; oxami)h'~ imo a ut'w pair (V, I'J- Vcrl, ) wh('re 1' is a vt't:tt}r of bits ea(h I'{'I)I'{!S(!I{LiIII!~ the O/ll('t)lllt~ t)f t.h{! corrcsl)ouding t~at.m'(&quot; for t.he given training eXaml)le.</Paragraph> <Paragraph position="3"> Given the above definition of the binary features, the new pMrs (V, I '2- Verb ) in{:lude all the necessary l)aekgTom,d knowledge obtMn(xl form ttu., Semantic ltierarchy, and also reflect the ambigafity of the origimd trldning examt}les. In uther words, the above transformation can i}e seen as &quot;cOral}fling&quot; the information of the original ambignous training examph.'s along with the necessary parts of the Semantic llierarchy into a format that is ready to be proce~sscd 1}y ii)3 (or in fact, by many other feature-t}ased learning algorittmls).</Paragraph> <Paragraph position="4"> Note that if we create a featme fur every semantic category c and every sentence COmllonent Ni, then the total number of features will become inti.'asiblv large (Inany thousands), llowe.ver, what we need is only to consider those categories that apl}eared in the training data, and their ancestors (the set A above).</Paragraph> <Paragraph position="5"> In our experiments, this results in a reasomfl}le ram> ber of features (one to two hundred). This is 1}ecause the numl}er of examples is limited and also t)ecause of the rather &quot;tilted&quot; distribution of what categories can naturMly at}I}ear as a certain (:OlIll}Otlellt of it Selltenee for a given verb. (Eg. the object of the verl} f;2 ~3&quot; (nomu), which roughly means to &quot;drink&quot;, can not be just mlything!) The most important a(lvmltage of the al}ove approach is that it cmt be applied to alnbiguous training examl}les as they are, without the need to remove the mnbiguity explicitly as wc did with Ilaussler's algorithm. Another adwmtage of using ID3 is that we do not need to break our learning task into binary class learning problems since ID3 is caI}ablc of Mu'ning multi-class learning concepts.</Paragraph> </Section> </Section> class="xml-element"></Paper>