File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2141_metho.xml

Size: 16,296 bytes

Last Modified: 2025-10-06 14:13:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2141">
  <Title>COMPUTING FIRST AND FOLLOW FUNCTIONS FOR, FEA'rURE-THEoRE'rR? GRAMMARS</Title>
  <Section position="3" start_page="875" end_page="875" type="metho">
    <SectionTitle>
2 COMPUTING FIRST AND
FOLLOW
</SectionTitle>
    <Paragraph position="0"> We propose an algorithm :\[or the computation of FIRST values which handles feature-theoretic grammars without having to extract a CF backbone from theln; the approach is easily adapted to compute F()LLOW values too.</Paragraph>
    <Paragraph position="1"> An improvement to the algorithln in presented towards the end of' the pat)er. Betbre describing the algorithm, we give a well known procedure for coinputing FIRST for CF grammars (taken from (Aho et al., 1986):1189, where e is the empty string): &amp;quot;rio conlpute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or e can be added to ally FIRST set:.</Paragraph>
    <Paragraph position="2">  1. If X is terufinal, then FIRST(X) is X.</Paragraph>
    <Paragraph position="3"> 2. If X -+ e is a production, then add e to m~ST(X).</Paragraph>
    <Paragraph position="4"> 3. If X is nonterminal and X --~ Y1Y,2...Y~ is a. production, then place a in FI.I?,ST(X) if \['or some i, a is in FIR, ST(Yi), and e is in</Paragraph>
    <Paragraph position="6"> Now, we can compui;e FIRST t:br any string XI X.e...Xu as tbllows. Add to FIRST(XIX2...X~z) all of the non-e symbols of I,'II?.ST(X,). Also add the non-e symbols of 1,'I.BST'(X,2) ire is in .FI.RST(Xt), the non-e symbols of P'll{ST(Xa) if e is in both t,'IH.ST(X,) and F1RSfl'(X2), and so on. Finally, add ~ to FIH.ST(XIX.e...X,~) if, tbr all i, FIH, ST(Xi) contains e.&amp;quot; This algorithln will fbrm the basis of our proposal. null</Paragraph>
  </Section>
  <Section position="4" start_page="875" end_page="876" type="metho">
    <SectionTitle>
3 COMPILING FEATURE-
THEORETIC C~RAMMARS
3.1 EQUIVAI,ENCE CLASSES
</SectionTitle>
    <Paragraph position="0"> The inain reason why the al)ove algorithm canuol: be used with li~al, ure-theoi'etic grammars is that in general the number of possibh; nonterminals allowed by the gralnmar is intinit~e. One of the simplest ways of showing this is where a grammar accumulates the orthographic representation of its terminals as one of its feature values, it is not difficult to see how one can have an infinite mmlber of NPs in such a granlInar: NP\[orth: the (log 1 NP\[orth: the fat clog\] NP\[orth: the big Nt dog\], etc.</Paragraph>
    <Paragraph position="1"> This means that l~'Ii~ST(NP\[orth: the (tog\]) would have a different value to FllLgT(NP\[ orth: the fat dog\]) even though they share the same left;most terminal. That is, |:tie ilia ture structure for the substring &amp;quot;det adj noun&amp;quot; will be different to that for &amp;quot;det noun&amp;quot; ewm though they have tile same starting symbol.</Paragraph>
    <Paragraph position="2"> This point is important since similar situations arise with the subcategorization frame of verbs and the selnan(;ic value of categories in contemporary theories of grammar, (Pollard and Sag, 1992). Without modification, the algorithm above would not terminate.</Paragraph>
    <Paragraph position="3"> The sohltion to this problem is to define a finite number of equivalence classes into which the infinite uumber of nnnterminals inay be sorted. 'Fhese (',lasses may be established in a number of ways; the one we have adopted in that presented by (Harrison and Ellison, \] 992) which builds on l;he work of (Shieber, 1985): it introduces the nol;ion of a negative restrictor to define equivalence classes. In this solution a predefined portion of a category (a specific set of paths) is discarded when determining whether a category belongs to an equivalence (:lass or not. For instance, in the above example we could define the negative restrictor to be {orth}. Applying this negative restrietor to each of the three NPs abow~' would discard the infbrmation in the %rth' feature t,o give us three cquiwflenI; nonterminals. It, is clear that the restrictor must be such that it discards features which in one way or another give rise I;o an infinil;e munl)er of nOlfl;erminals. Unl'ortunately, terlnination in not guaranteed for all restrict;ors, and \['llrl;hermore, the, best restrictOl' CalUIOt; l)e chosen automatically since it depends on the amount of grammatical information I;hat is t;o be preserved. Thus, selection  o\[ :m ~t)t)roi)rial;e restrictor will det)(',IM on the parti(:ub~r grammar or system used.</Paragraph>
    <Section position="1" start_page="876" end_page="876" type="sub_section">
      <SectionTitle>
3.2 VA\] A)E SIIAR.ING
</SectionTitle>
      <Paragraph position="0"> Ano(;her prol)leln wil;h the Mgo,:il;hm Moove is l;ha.t, ree.ntranci(:s bel;w(:en a. category a)\[(t its Iqll.ST a.nd F()I,I,()W values are n()(. t)reserved in the sohition to (;hese t'unct;iollS; this is because (he algoril;hlu assumes al;omic syml)ols and /;hese ca,m~ot encode (~xI)licilJy ,~ha, red inf()rmation l)etwe(;l~ c~t(;t:gories, l'br example, cousid(:r the \[oIlowing ha,ire gra, mnm, r: S: :&gt; Ne\[a.gr: X\] VP{a.gr: X\] VP\[agr: X\] ~&gt; Vint\[a,gr: X\] NP\[~,gr: X\]-5 Det N\[a.gr: X\] We would like l,h(: solul;i(m of I,'OLLOW(N) t() in(:h\]de l;h(: l)in(ling o\[ the 'ag\]&amp;quot; f(:a,ture Sllch t;ha(; (;he va.hl(: of F()IA,()W ,'(~s(,ml)h:d: : x\]): : x\].</Paragraph>
      <Paragraph position="1"> (:he a.lgoril;hm above, even wi(;h a. r(:s(;ri(:t;or, would nol, prese)'ve such at l)indiug siuce the a,dditi(m of a new ca, t('~go)'y to I,'OLLOW(N) is don(', indel)e.nd(',utly of the bindings \[)(;l:w(',(',n (;he new (:a,i:egory ~tlut N.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="876" end_page="878" type="metho">
    <SectionTitle>
4 Tile BASIC AI,QOI{.ITHM
</SectionTitle>
    <Paragraph position="0"> We l)rOpose an algorithm which, rather than cousl;ru(:{; a set; of categories as (;\[t(~ vah\]e of l,'II1.S&amp;quot;l' a.nd F()M,()W, &lt;:onstru(:(;s a. set of pairs each of which represeuts a (:M;egory and its FIRST ov F()I,LOW category, with all the (:orrect biudings exp\]i(:it;ly encoded. For instant(:, for l;he a.hove (:xa.iill)l( L (,he pair (Vl&gt;\[agr: X\], Vint\[agr: X\]) would t)e in l;ho. set r(,pres(:nting the vMue ()f (;he fllll(:I;Joll FII{.ST. In th(~ uext section the a.lgorithm for (:OlUf)ul;ilu L FIll.ST is d(:s(:ril)(,.(l; (:ompul;a.l;io:t)oi' F()I~\],()W t&gt;ro(:(~e(ls in a similar l'ashion.</Paragraph>
    <Section position="1" start_page="876" end_page="877" type="sub_section">
      <SectionTitle>
4.1 SO\],VIN(; FI.IZSq?
</SectionTitle>
      <Paragraph position="0"> When modifying the a.lgorit;hm of Section 2 w(' note 1;ha.l; (:a.ch o(:(:mren(:(: o\[' a. (:al;eg()ry iu (;he grammar is pol;e.n(&gt;ia.lly &lt;list;in('.(; \['rom ev(:1' 3, o(;her (:a.Le.gory. \])1 addit;iou, l()r each cat,cgory we nee( |I;o r(:memb(u' a, ll the reentrmtcies between it aud the da,ughters wi(;hin the rule ill which i(; oc.(:ltrs. Finally, we assmne that any ca, tegory hi a, rule which c~m unify with a lexica.1 category is marked in some way, say by using the t'e~ture-wthle pair 'l;er: +', and I;ha.l. llOtl-(;(!rttlilla.l caJx;gori(,s IIIllS|; llni\[y with the tool;her o\[' ~ome rule in the grammar; the latter con(lit;ion is ne(:essaxy he(:ause the Mgo~ rithm only c(mllmLes the solutiou of FIll.ST \[or h:xi(:a,l (:a.lx:gories in' for (:aPSegories tJml; occur as mot, hers.</Paragraph>
      <Paragraph position="1"> \]n corn\]rot;in ~ Iql{.g'r w(' i(,era.l;e over ~1 |\[;he rules ill t;h(! gF;LIHHI&amp;I', (;re.al;ing t, he i\[loi;h('.l' O\[ each rule as the category fl)r which we m'e (;rying (,o lind a FIll.ST wdue. Throughout each i(x~ral:ion, unific~l;ion of a, (la.ugh(;er with tim lhs of an eh:Inent o\[ lql{ST resul(;s in a. modified rule and ~ modified pnir in which bindings be(;ween the mot;her category mM the )'hs o\[ the pair are (~si;a, lflislmd. The modi\[ied mot;her aim rhs are \[;h('.tl ll,q(:(l (,o (:o,lH(;rtl(:l; I,ho 1}air which is added to F\[\]{ST. l)'or iusta.nce, giwm rule X - &gt; ~&amp;quot; ~u,d pair (L, l~), w(! unify Y and L to t&lt;iw: X'- } }7, and (I7, 1{); DOln these the pair (X', l~ t) is COllSl, t'llc:l;cd ~tll( |added 1,o \] I ~,S \[. The algorith\]n a.ssumes an op('raPSion -I-~.</Paragraph>
      <Paragraph position="2"> which (:onsLrll(:l;.q a. sel; H' -- ,5' -} &lt;7 /) ill the lbllowing w~Lv: i\[ pair p sul)smues an element; a of 5 then S' = ,ff - o, fl- p; if p is subsulned I)y an (Qement of ,%~ (;hen 3;' ~= ,%'; else S' - ,S ) p. 1(; should b('. uol;ed (;trot the pairs col&gt; stitul;h~g the wflue of li'II{.ST can themselw~s l)(: comlm.red using the subsumption relation in whid~ reeIll;ran(; wdu(;s a.re su\[),'-;ulIcled by nonr(:(:lll;ra.ti1; oIlcs~ 3AI(\[ combined using the uni\[ication olmration. Thus in the pl'in(:ipal step of the a.l~;orithm, a. new \]mir is constructed ;is described above, ~ restrictov is applied to i(;, a.nd the resulting, resl;ricted pair is +&lt;-added to FIRST. 'Phe a.lgorithm is a.s follows:  \]. \[nitia, iise t&amp;quot;i'r,sl,. ~ {}.</Paragraph>
      <Paragraph position="3"> 2. l~,un through a J1 the da,ughgers in Lhe gramma, r. If X is pre-t;erlninal: then fci,~.,~t :- Fi,~.,~t I&lt; (X,X)N, (whore (X,X)!q&gt; meaus a.pply the nega.tiw: re.,%ri(:tor (P (x) l~he. ira, it (X, X)).</Paragraph>
      <Paragraph position="4"> 3. For each rule in the grammar with mother  X, apply steps 4 and 5 until no more changes are made to First.</Paragraph>
      <Paragraph position="5">  4. If the rule is X -+ e, then First = First +&lt; (X, e)!e~.</Paragraph>
      <Paragraph position="6"> 5. if the rule is X -+ V,..Y~..Yk, then First = First +~ (X', a)l(I ). if' ~(Y'i, a) has successflflly unified with an eleinent of First, and (~,, e, )... (~%, ei_~) have all successfully and simultaneously unified with members of First. Also, First = First+&lt; (X', e)\[(l~ if (Y(, e~)...(Y\[, e~) haw ~. all suc(:essfully and simultaneously unified with elements of lvir',vt.</Paragraph>
      <Paragraph position="7"> 6. Now, for any string of categories Xl ..X~..X,~, First = First +&lt; (X',...X\[,,, a)!(I) if (X~, a) has sueeessflflly unified with an element of First, and a f e. Also, for</Paragraph>
      <Paragraph position="9"> if (X',a) has suceessfiflly unified with an eMnent of First, a ~ e, and (X~, e, )... (X~_l, ci-1) have all sueeessfidly and simultaneously unified with members of First. Finally, First = First +&lt; (Xf...X,'~, C/)!(I' if (X',,e,)...(X~, %) have all suecessflflly and simultaneously unified with members of First. (This step may be eomtmted on demand).</Paragraph>
      <Paragraph position="10"> ()no observation on this algorithm is in order. Tim last; action of steps 5 and 6 adds e as a l)ossible wfiue of FII{ST for a mother category or a. string of categories; such a wflue results when all daughters or categories have e as their FII2.ST value. Since most grammatical descriptions assign a category to e (e.g. to bind onto it information necessary for correct gap threading), the. pairs (X',{-)or (X\[...X.\[~, C/) should have bindings between their two elements; this creates the problem of deciding which of the cs in the FIRST pairs to use, since it; is possit)le in principle that each of these will have a difl:erent value for (. In our irnplementation, the pair added to First in these situations consists of' the mother category or the string of categories and the most general category for e as defined by the grammar, thus etfectively ignoring any bindings that e may have within the constructed pair. A more accurate solution would have been to compute multiple pairs with c, construct their least upper bound, and then add this to First. However, in our implementation this solution has not t)roven necessary.</Paragraph>
    </Section>
    <Section position="2" start_page="877" end_page="878" type="sub_section">
      <SectionTitle>
4.2 EXAMPLE
</SectionTitle>
      <Paragraph position="0"> Assuming the grammm: in Fig. 1 and the negative restrietor (\]) = {slash}, the following is a simplified run through the algorithm:  grammar will unit) with the lhs of (NP, e) and hence S will have Vtra as part of its FIRST w~lue..rir,t = {..,(V l'\[o,,,' : X\], Wra\[a,,&amp;quot; : X\]), (NP, Det),(NP, e), (S, De, t), (S, Vtra)} * The next iteration adds nothing and the first stage of the algorithm terntinates.</Paragraph>
      <Paragraph position="1"> The second stage (step 6) is done on demand, for example to eomtmte state transitions for a parsing table, in order to avoid the expense of colntmting FIRST for all possible substrings of categories. For instance, to compute FIlq, ST for the string \[NP NP VP\] the algorithm wo,'ks as follows:</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="878" end_page="878" type="metho">
    <SectionTitle>
5 IMPROVING THE SEARCH
THI/,OUCdt FiTst
</SectionTitle>
    <Paragraph position="0"> 1\[ (;he a.lgoril;hm is r,m a,s t)r(;s(;nt(~(l, ea.(:h il:eration I;hrough l;ha gramunar rules lm(:()mes slow(;r a, nd sl(-)w(;I'. The r('.a,son is (;\]l~:tl;&gt; iH sl;e\[) 5, when st!a,r(:hing l&amp;quot;i'rst to cr(:at(, a new Imir (X', o,), every 1)a, ir in l,'i'rsl, ix cousi(h;red and unilical, ion of its lhs with the relevanL daughter of ,V ~(;teml)l,('xl. Sin(:(; en(:lL i(,(~raLion n()rmMly adds pah's to Fi','st ca.oh i(;(,r~t;ion involves a s(mr'(:h I;hrough a larger ~t\[l(l larger s('.(;; fm(;hertm)re, (;his search involves utfilic~rt;ion, a.nd in the case of a. su(:(:(;ssful match, tit(; subsequent; (:(instruction and a(l(tition to Fi'rst also r('quir(:s sul)sumption che(:ks. All ()f t;hese Ol)erations (:olnbine (;o make ea.t:h a(hlitit)nal elem(;nl; in 1/i'rsl, lu~ve ~v strong effect, on the per-\[brma,nce o\[ (;he Mgorithm. \Ve (;h(~rel'ore ne(,.d (;o mmilnize (;he number of pairs searched.</Paragraph>
    <Paragraph position="1"> C(msi(h,ring the d(:t)('nd(,nci(!s that exist t)(&gt; Lwee, u pairs in Fivsl, one nol;iccs (;lust; once a pair has been consi(ter(M in rela(:ion wit;h a, ll I;hc rlllcs in the gralnnlaa', I;he efl'(~cl; of thai, l);rir has |)eeu COml)h;l;(;ly dctermin('.(\[. Thai; is, a.ft(;r a, pair is added to Fi'r&gt;d, i( n(&gt;.d only I&gt;(, (:onsidcr(:d u I) (;o a.nd int;luding (he rule frOIll which it was d(wivo.d, aft;ev which time it; may lm excluded from fl~rtho.r se;trches. For exa.m~ t&gt;le, ta.ke th(: previous gra.IllIiii-lr, a.lld ilt pa.rticul;n' (:h('~ va.hw of l/'irsl, a\['l;o.r 1;\]l('~ first i l;ei'~t;ion through th(: algorithln. '\['he lmir (NI), l)c,t), a,dded Iwca, use of l;hc rifle NP\[~gr: X, slash:</Paragraph>
    <Paragraph position="3"> (:onsi(lered only once by every ruh', in the grammax; M't, er thai;, this I)a.ir cmmot hc involved in l;he ('.onsl,ru(:tion of new values.</Paragraph>
    <Paragraph position="4"> A siml)le data. stru('.ture which keel)s I;rack of thos(! pairs (;hat; n(;ed to be sear(:hcd a.(, any one tim(; was added 1;o the Mgoril,hm; the (la.ta S(;I'tlC(;III'(~ l,ook l;hc l'()l'l\[l o\[' ~ list of l)oin(;ers I;o a.cl:ive pa.irs it) l,&amp;quot;i'rst, whel:(~ m, a.('.l;ivc pair is one which has t~o(: linen (:()nsid(,red l)y the rule from which il; was c.(mst;ru(:t(M. For exa.ml&gt;l( h the pair (NP, I)(t) would 1)('. a.('.l;ive for a coml)le(x~ it(,ral;ion l'vom the moment tim(&gt; the cot-responding rule in(,roduc.ed iL until that rule is visited again (hiring (;he second it(u'~ti(m. The effe(:t of this policy is (;o allow eaclt pair in l;'i'rsl, to be (;este(l against each )'ul(~ exa(:l:ly OlI(:(: a, ll(\[ (;hell \])e ex(:lu(led \['rolil slll)st~(lllell(; s(:ar('h(:s; this g)'ea.(;ly r(~(lu(:(!s th(: mtml)er (&gt;I' pairs considered for ca,dr il;era(,ion.</Paragraph>
    <Paragraph position="5"> Usin/~; th(' 't'yt&gt;e&lt;l l&amp;quot;(,a.l;ure St;ru&lt;:l;m'e sysl:(!m (tit&lt;,. LKI\]) of (lh.is&lt;:&lt;m el, al., 1993), we wrole two gr~mmmrs and (;esl;(;d l;h(: algoril;hm on l;h(~ttl. 'l&amp;quot;a.ble 1 shows the average llllllIllCI' Of' pairs c(mside)'(~d for cad1 i(;(:rat;ion (-ompa.r(:(l  As we ca.n see., ah;er the first iteral;ion Lhe mmflw, r of lmirs I;h~rt needs to be considered is lnss (lnltch h.',ss t()i Lhc final iteration) thau t, he l;oLal mlml)er o\[ pfirs in I&lt;'i'rsl,. Similar im-I)rOV(mWnl;s in per\['ormance were obga.ined for the (:Oml)Ul;ation of F()I,IX)W.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML