File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2134_metho.xml
Size: 18,035 bytes
Last Modified: 2025-10-06 14:13:42
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2134"> <Title>Hypothesis Selection in Grammar Acquisition</Title> <Section position="4" start_page="0" end_page="837" type="metho"> <SectionTitle> 2 Grammar Hypothesizing </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="837" type="sub_section"> <SectionTitle> 2.1 Grammar Formalism </SectionTitle> <Paragraph position="0"> The grammar lbrlnalisnl we use is a conventional uniIication-based grammar. F, ach gramLnar rule is written in the form of a combination of a conl, ext-free rule and feature unilication functions. This R)rmalism is not specitic to any linguistic ~;heory, but we inl.roduced a number of concepts widely accepted in lil,guisl, ic theories, such as grammatical flmctions, sub- categorization |'ralLies, aim X-bar theory.</Paragraph> <Paragraph position="1"> 'FILe parsing system we introdttced 1,11 allply our grammar lbrlnalism is a sysl,enl called SAX \[Matsmnoto, 1986\]. SAX uses the concepts of act, iw~' and inactive edges of Chart P~lrsing and analyses an input sentence with a bottom-up and parallel algoril;hm.</Paragraph> <Paragraph position="2"> As the grammar hyl)othesiziLLg algorithm is supposed l:o refer partial parsiug results of unsuccessflflly parsed sentences, we slightly modified SAX so that it ou/.lmts inactive edges as partial parsing results.</Paragraph> </Section> <Section position="2" start_page="837" end_page="837" type="sub_section"> <SectionTitle> 2.2 Basic Algorithm </SectionTitle> <Paragraph position="0"> When SAX fails to parse a sentence, no inactive edge of category 5' sl/anning the whole sentencc exists in the parsing result. Grammar tlypothesizing is a process to introduce this inactive edge by augnlentmg the current gramnmr. The basic part of the hypothesis generation algorithm is written as fbllows: \[Algorithm\] An inactive edge lie(A) : a:o, '%1 Call be introduced from x0 to x,~, with label A, by each of the \]lypotheses generated by the following two steps.</Paragraph> <Paragraph position="1"> \[Step 1\] For each sequence of inactive edges, lie(B1): :r0,Xl\],..., \[ie(Bn) : .Vn_l,a:n\], spanning from x0 to x~, generates a yew rule.</Paragraph> <Paragraph position="2"> A :=> /31,-..,B n \[Step 2\] For each existing rule A -~ AI,...,An, find an ineonlplete sequence of inactive edges,</Paragraph> <Paragraph position="4"> call this algorithm ibr \[ie(Ai) : xi-1, .vii.</Paragraph> <Paragraph position="5"> Feature Structures: A rule generated in \[Step 1\] could be a lexieal entry when this top-down algorithm reaches the bottom. As we adopted a unification-based grmnmar fbrmalism, we extended the algorithm so thai, it can hyt)othesize a feature structure of a lexical entry by observing surrounding successful categories. As the algorithm works even ibr a eOml)lex feature like a subcategorization frame, it can be used to acquire a subeategorization dictionary. While some previous works on subcategorization fi'alne acquisition assumed very little prior knowledge concerning the classification of subcategorization frames \[Brent, 1991; Manning, 1993\], our apllroach assumes the existence of grammar rules Sllecifying subcategorization fi'ame assignment, which enables more accurate learning of subcategorization frames.</Paragraph> <Paragraph position="6"> Multiple Defects: In \[Step 2\] of the algorithm, it is SUl)t)osed that each unsuccessfully parsed sentence has exactly one cause of failure but a sentmtee in actual texts often contains two or more causes of failure (for example, two unknown words). To solve this problem, we extended the algorithm so that it searches for a multiple hypothesis which is a set of rewriting rules and lexlcal entries.</Paragraph> </Section> </Section> <Section position="5" start_page="837" end_page="838" type="metho"> <SectionTitle> 3 Hypothesis Selection </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="837" end_page="837" type="sub_section"> <SectionTitle> 3.1 Basic Grammatical Constraints </SectionTitle> <Paragraph position="0"> From a linguistic point of view, hypotheses generated by the algorithnl given above might contain nlany unnatural hypotheses because the algorithm itself does not have any linguistic knowledge to judge the appropriateness of hypotheses. To remove unnatural hypotheses, we have introduced the following criteria \[Kiyono and Tsujii, 1993\].</Paragraph> <Paragraph position="1"> * The maximum number of adjacent unsuccessful categories is set to 2 in order not to decrease the efIiciency of the algorithm.</Paragraph> <Paragraph position="2"> * The lnaxilnuln nulnber of daughter nodes is set to 3. * Supl)osing that the existing grammar contains all the category conversion rules, a mmry rule which has only one daughter node is not generate.d.</Paragraph> <Paragraph position="3"> * Using generalizations embodied ill the existing grammar, a hypothesis contaiuing a sequence of subnodes which are collected into a larger category by existing grammar rules is not generated.</Paragraph> <Paragraph position="4"> * Distinguishing non-lexieal categories from lexica.l categories, a hypothesis whose mother category is a lexical category is not generated.</Paragraph> <Paragraph position="5"> * Assuming that tile existing gramznar has a complete set of fllnctional words, a lexical hypothe~sis is restricted to the open lexical categories, such as noun, verb, adjective, and adverb</Paragraph> </Section> <Section position="2" start_page="837" end_page="837" type="sub_section"> <SectionTitle> 3.2 Constraint based on /meal Boundaries </SectionTitle> <Paragraph position="0"> A new constraint on the violation of the boundary condition given to phrases was introduced to avoid any collection of adja/:ent successfifl categories in rule hypothesizing. The bomnlary condition is given by putting parentheses at both ends of a phrase, such as a noun phrase, a verb l)hrase, and a prelmsitional phrase. This constraint tilters out a hylmthesis which crosses either end, not both ends, of a phrase. For example, when parentheses are put like &quot;\[Tile default blocking factor\] is \[20 blocks\]&quot;, a hyl)othesis 'VP ='e VP, NP, VE'R.BBIs&quot;' covering &quot;blocking factor is&quot; is discarded because of the violation of the boumlary con dition of a noun phrase &quot;The defimlt blocking factor&quot; This constraint requires the hunlan task of putting parentheses before the hypothesis generator is invoked.</Paragraph> <Paragraph position="1"> hi comparison with writing a constituent structure of tile whole sentence, this work is much easier because we have only to give parentheses to delinite phrases.</Paragraph> <Paragraph position="2"> Moreover, instead of giving parentheses by hand, we can even obtain various tagged corpora.</Paragraph> <Paragraph position="3"> As this constraint is also atlplieable to other constituents of the input sentence, it might improve the etliciency of the top-down hypothesizing algorith m.</Paragraph> </Section> <Section position="3" start_page="837" end_page="838" type="sub_section"> <SectionTitle> 3.3 Constraint based on X-bar Theory </SectionTitle> <Paragraph position="0"> Most of the criteria in 3.1 are based on linguistic category classification but none of them commits itself to dealing with the rcla.tionship among the nlother node and the daughter nodes. For example, supposing the existing granmmr does not contain a rule for participial adjuncts ill t10UU phrases, the hypothesizing program generates a new rewriting rule 'NP ~. VP, NP' tY=om the phrase &quot;blocking tactor&quot; in the sentence &quot;The default blocking factor is 20 blocks&quot;. Ilowever, tile program also general,es other alternative hyt)otheses from the same phrase, such as 'PP ~ VP, NP', 'INFINIg'IVI'~ ~ VP, NP', and 'THAT_CLAUS'I,/~ VP, NP', each of which derives a. 1lost-positional adjunct for &quot;default&quot; by believing &quot;default&quot; is a head noun of the noun phrase. Liuguislically, such combinations of nlother nodes and daughter nodes are not allowed.</Paragraph> <Paragraph position="1"> As a general l)rinciple for explaining phrase structures, X-bar lheoryis widely accepted. According to X-bar theory, a grannnar rule is (or can be converted to) either of the following forlns, where each prime(') expresses the l)rojection level of' a head X. The l)rojec tion level increases as gramnlar rules are applied and X&quot; is called a maa:imal projection of that category. U and W are adjuncts of X' and should I)e maximal pro jeetions of some ca.tegories.</Paragraph> <Paragraph position="3"> ItLhe cxisi.ing grammar is wriLl,en in X-bar L\]mory, l,his const\]'aint is drastically etl'(,cliv(~ in reduciug Li.~ mmd)er of hyi)()Lhes(~s.</Paragraph> </Section> <Section position="4" start_page="838" end_page="838" type="sub_section"> <SectionTitle> 3.4 Plausibility of Hypotheses </SectionTitle> <Paragraph position="0"> Among the hypo|,h(~scs which passed throug, l~ all th(-C()llSi;lll,i\[\[l;S, ea, c}l oI\[c \[\[;bq ll~ (till'er(ml, plausibility as gramnl;~l;icM knowl(~dg('.. Assumin I, l;\[laI, the cxisl;iug grammm' is rtmsonahly COmln'(~hensive , lcxical or idiosyncratic km~wledg(~ should be lllOl:(! i)lausil)le than gen<:ral r(~wril,iug rules, lu oMcr 1;o eulphasizc this ten(hmcy, each hyl)othcsis is given the rolhm, ing i)lausihil.</Paragraph> <Paragraph position="1"> ity value.</Paragraph> <Paragraph position="3"> This wdue is relat(~d Io tshc I)rOl)orl, ion of tlw si/e, or tim 1)roducl, of the width and l,he hcighl,, of the m\[l)l,r(,c.</Paragraph> <Paragraph position="4"> (:omt)osed by l,he hylmthesis in the whole sLrll(:l;llrO, o\[' the senl;(mcc. The wduc ral~ges I'rom 0 l,o 1 and gel,s bigger i\[' the hypol, hesis cowers a smaller pro'l, o\[&quot; l,\[l(~ seHl,ew::e, The widLh or t,hC/~ hyl)ol, hcsis , 14:(llypoi), is delined as l;he woM cot\[l|t ,Jl&quot; t,hc sul)lrce aml lhe h(~ight II(llgpoi) is as t, he shorl,est, path I'rom h!xical uodes I,o the Lo 1) node ol: I, hc sHt)l, re(!.</Paragraph> </Section> </Section> <Section position="6" start_page="838" end_page="840" type="metho"> <SectionTitle> 4 Experiments 4.:1 Corlms </SectionTitle> <Paragraph position="0"> In order to check I, he eIl'ecl;s of the hyp(Mwsis sclecl,ion tcchuiqucs, wc carried out some eXlw.rim(ml;s wil,h I, hc contaius 118 r(~writing rules LhaL ('(~vcr basic exprcs sions o1' I,;I@ish. (fl'amma.r l~l is a subset of Grammm&quot; A and (:on(.ains only 25 rewriting rules. The conl,eul,s o1' (~ra\]\[llll,q.r A all(l (\]l';\[llllllar \[{ ;\[I'C ,~howll ill 'rablc 1. The dictionary we use is th(' I','\])1~ f';~\[glish l)iclioo nary containing 200,000 enl,rics, 'l'hc eifl, rics of Lhis dic-Lionary ar(, not wril.L(m in t.lm I;:)rm o1&quot; a feal.ur(~ sLruc.. 1,\[11'(: t)llt hav0 l,h(~ cllcodcd ild'ormation ()f Lhe SyllL~ctic pM;terns, which wc interpr(% its a f(~at, m'e sl, ru(:Lnre. As Lhe 1';1)1{ I)icl, ionary was d(;vclol)(>d as a masl,cr dicLionary for various al)plications , il; took ht 1;}1(2 il\[forilmtioH concerning all 1,he apl)(mran('es of ca(:h word wit\]> out scre(!niug 1)y h'(!quen(:ies. Th is characl;crisl,ic of t,lm EI)IL I)icl, ionary i\[lcreascs I;h(~ aml)iguity of parsing. \[u fact, each word wil, hin the Saml)h', s(mt('~uces I'r()m IJw I;NIX mamIM has 1.49 l)arts of speecl, in l;he I:,l)l/ I)icl, iona, ry while l;hc same vahlc is IAI according I,o the COL/)INS COIl UII, I) I)iclio~lar:ll.</Paragraph> <Section position="1" start_page="838" end_page="840" type="sub_section"> <SectionTitle> 4.3 Generated Ilypotheses </SectionTitle> <Paragraph position="0"> (4choral ()utcoll\[m: 'Fh<: CXl)r.rimcnts ()\[ gencraLiug hyl)ot, hen(~s were carried ()ttl, with (;rammar A umh~r thrt,.e (li\[li~rent ctmdMons, (a) using tim basic grammai.ical (:onsl, rMnLs only, (h) adding Lhe conslraiul with local I)hras;d boundaries giv(m as l)areld;h(:ses, and (c) adding l;tte consLraint wit,h X-I)ar theory, To carry out exp(,rimcnts (h) aud (c), within 1,he targt,t sentl!ll(t(!S, i)arellLh(!s(~s weft! given t() llOllll l)llr;is(~s, ill\[illi-. Live chmses, that-cla, uses, aml rod)ordinate clauses A lmrt of I.lw resltll, of (~Xl)erimcnt, (a) is shown in Tahle 2, e;.:h c()hmm of which disl)lays l, he numb('r of hyl)othcsos g(!nm'atcd '\['lm cohmms 'SiHgle' and 'MultilflC ~ ,~d.)w I,h(~ mmthcrs of single and multilflC hypol, h(:s('.s resl)eCl,ivcdy.</Paragraph> <Paragraph position="1"> The r(~sull, s of the t;hre(~ CXl)erimeul,s art. summarized in Table 3. The i)arser failed Lo amdyse 61 out of 100 senL(m(:es and I,he grammar hyl)olJ,esizing program was iuvoked lot 1,hose scnl,tm(;(:.~;. While no hypol, heses were g(mera, Lt~d \[rom 2(1 or 30% of unsu(:ct!ssfully parsed s(mt.ca~ces I)ccause tim current hypothesizing algonil, hm does m)t allow verth:M duplical, ion of incOnll)l(%cncss and also because I,he l)armH(d,crs of I, hc basic gra, mnmtical (:ons|,raints do 11ol, allow th('. (':xisl,Cll(:(: ()\['lllor(; l, ll;lll l.wo adjaceul; inc.omplet,~ nodes, Lh('. results on 1,he nttm.</Paragraph> <Paragraph position="2"> l)crs ot acLual hypotheses made show t\]ml, lhc ,q|,r()llger l;hc coHsl.raint, we pose, Lhc l}'~wer hYl)ol.heses are gell(W al,(!(I. The average hypol, lieses per S(;ll\[,(!ll(;(!, cah:ulatcd 1)y d ivi(ling l,\]le total hyl)ol,hc,~;is (:omfl; of 1,30l in (a), 708 in (b), and 23l ill (c), I)y the number or actual scntt'.nccs From which llyl)ol, hcses wer(~ g(m(:\]'al,cd, 50 in (a), .~4 in (I,), and 41 i,, ((:), ,w,~ ,.~,h,c(~d r,.(,,,, ~.0 t;() 5.6.</Paragraph> <Paragraph position="3"> Ill ,~ollie cas(:s, a\]\[ 1,11(~ hyl)oLh(,scs ;IFC l'(~lllOV(!d I)y ncwly introduced COllSl, r;tilli,s, 6 Sclll,(!lic(!s by Lhe local boundary consl,l:ailfl, and 3 more ,q(~llliellC(!S I)y I,he COilsl;raint o1&quot; X-I)ar I,hcory. hlvestigation of the iuiLial set. of hyl)oLh(~scs .g(:uer;ttct\] t'rOllI Sll(:ll SOH{,OIlC(~S r(wc;41(?d thaL no plausible hypot,hesis was included iu it. '\['h(~rc %re, Lhcse seni;ences are liOt; criLical to 1,he hyl)oLlmsis seh:cl;ion nmt, hod we inl,roduce(I.</Paragraph> <Paragraph position="4"> In tim linal sel; o/' hypol.heses, 3{) plausible hyl)(ll.h(! Sentence II Single ~_ Multiple II Tdegt'all Lex Rule Lex Mixed Rule Tie efault oc ing factor is 20 ~ &quot;~1800021 The output device in use is not capable of backspacing.</Paragraph> <Paragraph position="5"> l{emove initial definitions for all predefined symbols.</Paragraph> <Paragraph position="6"> The escaped NEWLINE is not included in the macro value.</Paragraph> <Paragraph position="7"> Components of an expression are separated by white space.</Paragraph> <Paragraph position="8"> The name of this directory is listed in tile fohler wu'iable.</Paragraph> <Paragraph position="9"> The name of the editor is listed in tile EDITOR variable.</Paragraph> <Paragraph position="10"> The weighting function explained ill 3.4 was not used for selecting hypotheses but tim validity of it was proved by counting the order of each plausible hypothesis in the set of generated hypotheses. The row of 'Rank of Plausible IIypotheses' in Table 3 indicates that plausible hypotheses stand much higher than the middle of the order.</Paragraph> <Paragraph position="11"> ExamI)les: IIereafter, in order to show how hypotheses were selected by each constraint, we explain tile results for some typical examples.</Paragraph> <Paragraph position="12"> Ex.1) &quot;The default blocking factor is 20 blocks.&quot; As Grammar A does not contain a rule for participial adjuncts, the parser fails to analyse the noun phrase &quot;tile det~ult blocking fact, of' and the grammar hypothesizing program is iuvoked. While this program generates 21 hypotlmses in experiment (a), it filters out the following 12 hypotheses in experiment (b).</Paragraph> <Paragraph position="13"> While checking local boundary violation, the program removes those grammatically unnatural combinations of categories, though it does not use any linguistic knowledge.</Paragraph> <Paragraph position="14"> backspacing.&quot; This sentence is also parsed unsuccessfully beeause the current version of the EI)R 1)ictionary does not have information that &quot;capable&quot; subcategorizes a prepositional phrase. Among the initial set of 30 hyl)otheses , the following 8 llypotheses pass through ory. The lirst hypothesis iu the list is the plausible hypothesis obtained in search of the real cause of the feature disagreement between &quot;capable&quot; and &quot;of backsl~aeiug&quot;. This lexical hypothesis for &quot;capa hie&quot; contains a modified version of its subcategorizalion frame so that it subcategorizes @prepositional phrase.</Paragraph> </Section> <Section position="2" start_page="840" end_page="840" type="sub_section"> <SectionTitle> 4.4 tIypotheses from Smaller Knowledge </SectionTitle> <Paragraph position="0"> Another experiment was pertbrmed with (;ramlmtr B under the basic grammatical eoustraints in order to compare 1,he effects of the maturity of existing gramma.tieal knowledge. The nmnl)ers el' hypotheses generated from two grammar sets are shown iu Tal)le 4.</Paragraph> <Paragraph position="1"> The eoverage of (lranunar II is so limited that 97 ()tit of 100 sentences were parsed unsuccessfully and passed to the Ilypol, hesis Generator \[lowew!r, as the immaturity of Grammar B also al\[ects the number of generated hypotheses, the mmlber of plausible hypotheses among the 550 hyl)otheses (10.6 hyl)otheses per sentence) generated l'rom 97 sentellces was only I(L This result claims that cyelie acquisition of grammatical knowledge is wdid. I&quot;,w'.n the sentences frolu which no hylmt,heses are gellerated with a small grammar would be taken into consideration ill a. later acquisition cycle with a larger grammar.</Paragraph> </Section> </Section> class="xml-element"></Paper>