File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2119_metho.xml
Size: 10,306 bytes
Last Modified: 2025-10-06 14:13:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2119"> <Title>Generalizing Automatically Generated Selectional Patterns</Title> <Section position="3" start_page="0" end_page="742" type="metho"> <SectionTitle> 2 Acquiring Semantic Patterns </SectionTitle> <Paragraph position="0"> Based on a series of experitnents over tile past two years \[5,6\] we have developed the following procedure for acquiring s(;m&lll;ic i)attern.q from a 1;ext (:()i'l)US: l, l)~rsc the trMning corpus using a hro~d-cover~g(~ ~r~i, iilll\]~tr) a\[ld i'(~ll\]~triT, e Lli(! |)arse,~ I,() I)roduce solllethil/g Mdil to ~ul l,F(l f-.sl, ructure, with explic il,ly lal)olod synl;icctic i'ela, tions su0h a,q SUII.1 li\]CT and O l:lJ F, CT. i 2. Extract froHi (,he regul~rizcd l);u'se ;~ series of ~riplcs of I;hc forni helm syntactic-reh~tion head-otLargumenl, /modi{icr Wc will use l, he iiotM, ion< 'lui~' 'll/j > for ~ilch ;t triple, ~li(I < r u:j > for a rt~laJ:ion--arguin<~nt pah'. 3. (',Olnptite tho h'eqliency P' of e~tc:h head and e~ch triple in the corpllS. If a, SelitCilce prodliCe8 N pm'ses, ~ l, rilAe gOlier~ti,ed froln a, singh~ pa, rse has weight 1IN in the t, of, M.</Paragraph> <Paragraph position="1"> Wc will IlS(~ th(~ liot,lttion 1'~(< 'll)irw j >) \['or tim fro qlion0y of a triple, ;Uld P'A(,,l('wi) t()r the frl!qilellCy wil, h w\[iich w i a4il)eai's as ~t hea,{t iii a plti'se i,I'C(L 7 For exaniple> the S(~IIIrOI/C.{2 Mary likes young lhiguists froni I,hn(n'ick.</Paragraph> <Paragraph position="2"> would produc(: the regul~trized synthetic structure (s like (subjcci. (up Mary))</Paragraph> <Paragraph position="4"> fl'Olli which i,hc following folir t,ripl0s arc gjelit~i';tl,e(\[: like subject Mary like ,lbj(~ot liugliist lhig;uist *>t)os yOIlll if, linguist froin l,iulerick (~iwm tJlc fre(lllCncy hiforiilM;ioll l+'~ we C~tll l, hen estiin~tte the prob;d)ilii,y i, ha, i, ~ particular head wi al)pem's with a parl, iculnr ;tl'~lilllOiil, or uiodilier</Paragraph> <Paragraph position="6"> Tliis probahility infornl~ctiou would l, hcn bc used in scoi'hlg altei'naA;ive p~trse (,rots, \[,'or (,lie eva, itla, tiOll l)0lOW: howcvcr, wc will I18(7 l,h(~ t'r(~tltl011cy tl&~a, l&quot; directly. Stall 3 (i, he I, riples exLr~-tcl,i{ni) inchl(tes ;t liilllil)(~r of spccial cases: l-hie wlt.h SOlllowh&l, ii1o1&quot;1: i'egli|aril,;tl01oil \[|l{tli iS I'l()litl ill I J&quot;( i; in pal'ticill&r~ l)a.ssivl: strll(:l.lircs &l'(l CilllVCl>t.ed tO clnTeSllOlidillg active f( )\['illS, ~N,)Ce that l'),<,<,~(wi) is different ~Y=oni /a(w i apl>ears as a head in a triple) sin(c a single hc&d in a l)&rse trec ln~ly l)rodu(:e sorer& |slich triples, one for eaCll &rgtllllClit OF nio(li|i(!r of tlutt head.</Paragraph> <Paragraph position="7"> (a) if ;t vm'b has a sep~l~i-M)le lmrticlc (e.g., &quot;ouL&quot; in &quot;c,~u-ry out&quot;), this is hi;filched t,o the head (to crc~,i;e the ticked carry-oul) ttll(I l\]()tu IH'o~Li, cd sis ;% Sel)~t rate rclatioii. I)ilfereilt p~vq.iclcs often corl'oSpolid to very dilfcrenl, senses of ~ w~rb, so this avoids coutlt~thig i, he subject ~md object distrilnitious of these different selises.</Paragraph> <Paragraph position="8"> (b) if the Vel'll is (<tie >>, We genera, l;(~ a i'eltttion bc-COml)lcmcnl bei,weeu the suliject and the pre(licat(~ COil liJiClll(!lli;.</Paragraph> <Paragraph position="9"> ((:) triples in which either I,he heard or l, he a, rg~ttlll(!ll{ is ;3. I)l'OliOtlll ;IA'(~ disc~crdcd (d) l,rilflCs in which the aa'<gliln(~nt i~ ;i, su\[ioi'din~tte (:l~uise ~cre disc*ci'dcd (lihis iuchides sut)or(Ihi;~l;e toll junctions ;uul wn't)s l,a, king cl~tils;_c\] ~_n'gunienl,s) (e) l.l'iples indic:cling negMiio/i (with &ll ;M'<ff.iil\[10ill, Of &quot;not&quot; or &quot;newer&quot;) are ignored</Paragraph> </Section> <Section position="4" start_page="742" end_page="743" type="metho"> <SectionTitle> 3 Generalizing Semantic Pat- </SectionTitle> <Paragraph position="0"> terns The proc.rdurc described M)ow ~, producers a. set, of I~'c (luencies 3,11d I)r(dmbilit.y esl.intatcs based on Sl)ccilic wordm The &quot;trnditi<mM&quot; ~tpproach to gcnerMizing tiff<; inl\:,rma, tion ha~ I:,ccu i;o assign the word,'-; t,() a set or ScIIl~tllti(' C\[asges, g%n(\[ thrill \[,0 collect the freqileiicy in\[br|m~tion on COlnbinations of sen,antic cla.sscs \[ 12, 1\]. Since ~t; least some of t, hese classes will be domain Sl)ecilic , there has \[)t!ell inl.erest in mltomating the acquisition of these classes ~ts well. This c~m be done by ch,st,(,ring l,ogetohcr words which appear in the s;m,c contexl.. Sl.arting from the lile of l;riplcs, thi:s involves: I. collecting for e;~l:h woM i;he \['re(lucilcy with which it occurs in each possilAc context; f.r cx~mplc, for n nou,l we wouhl collect the frequency with which it occurs as the slll)jeci; ;till\[ 1.he object ot'(~ach verb 2. delini,g a similarity lll{~tSill'(': between words, which rellccts thc tlllltll'Jer Of CO\]\[III\[IOll COIIt(!XL8 ill which l;ticy nppc~r 3. foruiing clusters Imsod on this similarity m{~asul'c Such a procedure was performed by Sekinc ct M. ~tt \[lMIS'l' \[ll\]; these chnsl.crs were then manually r(! viewed ~tnd the i'eStlltilt~ clusters wet'(! used 1() F, Clml'-Mizc SelC:Ctioll:rll pa, tllern,'s. A sitnila, r afq)roa,ch Lo word cluster rormatiou w~s dcscril}cd by llirschum.u (,t al.</Paragraph> <Paragraph position="1"> in 1975 V\]. M(iro ,'(...<y, rer,,ira, .t ;~1. \[(4 l,.v. (l,,sci'ibcd ;~ word chlstei'hlg nicthod ushig &quot;soft cinsl,ers': hi which a, word C;lll belong to several chlsl,er,q, with dilN~i'enl, chtsl,er menll)ership I,'obalfilith~s, (Jlusl, er creal;iou has (,he ;Ldwull, ago l,ha.I, the clusl, ers ;tr0 aAIlCll~tl)lc l;o ln~Ullt;cl review and correction, ()ll Lhe other haucl, olir experience illdicates 1,h;~t stlcccs.q rul chlster g~01if;ra, l, ioil depends Oil ral;her dclh:~i;c adjltsl;li~ienl, of the chlstcrillg criteria. We haw~ l, hcrcfore elected to try an approach which directly uses a form of similarity measnre to smooth (generalize) the prob-.</Paragraph> <Paragraph position="2"> abilities.</Paragraph> <Paragraph position="3"> Co-occurrence smoothing is a method which has been recently proposed for smoothing n-gram models \[3\].a The core of this method involves the computation of a co-occurrence matrix (a matrix of eonfl, sion probabilities) Pc:(wj Iw0, which indicates the prol)ability of word wj occurring in contexts in which word wi occurs, averaged over these contexts.</Paragraph> <Paragraph position="5"> where the sum is over the set of all possible contexts s.</Paragraph> <Paragraph position="6"> In applying this technique to the triples we have collected, we have initially chosen to generalize (smooth over) the first element of triple. Thus, in triples of the form wordl relation word2 we focus on wordl, treating relation attd word2 as the context:</Paragraph> <Paragraph position="8"> Informally, we ear, say that a large value of /)C'(,,il,)I) indicates that wi is selectionally (semantically) acceptable in the syntactic contexts where word w~ appears.</Paragraph> <Paragraph position="9"> For example, looking at the verb &quot;convict&quot;, we see that the largest values of P(:(eonvict, x) are for a: = &quot;acquit&quot; and x = &quot;indict&quot;, indicating that &quot;convict&quot; is selectionally acceptable in contexts where words &quot;acquit&quot; or &quot;indict&quot; appear (see Figure 4 for a larger example).</Paragraph> <Paragraph position="10"> How do we use this information to generalize the triples obtained from the corpus'? Suppose we are interested in determining (.he acceptability of the pattern convict-object-owner, even though this triple does not apl)ear in our training corpus. Since &quot;convict&quot; can appear in contexts in which &quot;acquit&quot; or &quot;indict&quot; ap pear, and the patterns acquit-object-owner and indicb o/)ject-owner appear in the corpus, we can conchlde thai, the pattern convict-object-owner is acceptable too. More formally, we compute a smoothed triples frequency lP.s' from the observed frequency /i' by averaging over all words w~, incorporating frequency information for w~ to the extent that its contexts are also suitable contexts for wi: :':~*(< *,:i ,. ,,,j >) -- ~ r&quot;(&quot;'il*&quot;;)&quot; ::(< ,,,~ ,, ,,:j >) ~tJ lit or(ler to avoid the generation of confltsion table entries from a single shared context (which quite often awe wish to thank Richard Schwartz of BBN for referring us to this method &lid article.</Paragraph> <Paragraph position="11"> is the result of an incorrect I)arse), we apply a filter in generating Pc: for i C/ j, we generate a non-zero Pc(wj Iwj only if the wi and wj appear it* at leant two eoitllnon contexts, and there is some eOlnnlon context in which both words occur at least twice, l,'urthermore, if the value computed by the formula for Pc' is less than some thresbold re:, the value is taken to be zero; we have used rc = 0.001 in the experiments reported below. (These tilters are not applied for the case i = j; the diagonal elements of the confusion matrix are always eomputed exactly.) Because these filters may yeild an an-normalized confltsion matrix (i.e., E~ t>(*vJlv'i) < l), we renorn, alize the n\]atrix so that }~,.j \[g,(wi\[wi ) = 1.</Paragraph> <Paragraph position="12"> A similar approach to pattern generalization, using a sirnilarity measnre derived fi'om co-occurrence data, has been recently described by l)agan et a\]. \[2\]. Theh' approach dill'ers from the one described here in two sign*titan* regards: their co-occurrence data is based on linear distance within the sentence, rather than on syntactic relations, and they use a different similarity measure, based on mutual information. The relative merits of the two similarity rneasures may need to be resolved empirically; however, we believe *bat, there is a virtue to our llOn-sylnlnetric lileaSlll'e~ becatlse 8tll)stitutibility in seleetional contexts is not a symmetric relation .4</Paragraph> </Section> class="xml-element"></Paper>