File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1007_metho.xml
Size: 17,113 bytes
Last Modified: 2025-10-06 14:07:09
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1007"> <Title>Exploiting a Probabilistic Hierarchical Model for Generation</Title> <Section position="4" start_page="0" end_page="43" type="metho"> <SectionTitle> 2 Modeling Syntax </SectionTitle> <Paragraph position="0"> In order to model syntax, we use an existing wide-coverage grammar of English, the XTAG grammar developed at the University of Peru> sylvania (XTAG-Gronp, 1999). XTAG is a tree~ adjoining grammar (TAG) (Joshi, 1987a). In phase.; dotted lines show t)ossit}le a{ljun('ti{ms that were not made a TAG, the elementary structures are \])hrasestructure trees which are comt)osed using two ot}er~tions , sut}stitui,ion (w\]fich al}i}{;n{ts one tree ~1; the fl:ontier of another) mtd a(tjnlmtio\]t (which ins{;rts one tree into the mi{l{ll{', of imo|;her). In gral)hi{:al rei)reselltal;i{}ll , \]lo{tes &I; which substitul;ion can take 1)lac{'~ are \]uarked with dow\]>arrows. In linguisI;ic uses (}f TAG, we asso{'ial;e one lexical item (its anchor) with each tree, and {}he or (typically) more trees with each lexical ire\]n; its a result we obtain a lexicalized TAG or LTAG. Since ea{'h lexi{:al item is associated with a whole tree (rather than just a phrase-stru{'ture rule, tbr exa\]nl)le), we cm\] st)e(:i\[y t}oth the t)re{licate-argument structure of the lexeme (t}y includillg nodes at which its arguments must sut}stitute) and morl)h{}syntactic constraints such as sut}je('t-verb agreemen |within the sl;rucl;ure associated with the lexeme. This property is retbrred to as TAG's cztcndcd domain of locality. N{)l;e that in an LTAG, I;here is no distinction betw{:en lexicon -We depart fl:om XTAG in our treatment of |;rees tbr adjuncts (such as adverl}s), an{t instead tbllow McDonMd and Pusteiovsky (1985).</Paragraph> <Paragraph position="1"> While in XTAG the elementary tree for an ad.iuncl; conl;ains 1)hrase sl;ru{:i;ure |;hat atta{:hes l,he adjmmt to ll{}(tes in another tree with the stag anchored by sl)ecitie(1 label (say, VP) from the specified direction (say, fronl the left), in our systenl the trees for adjuncts simply express their active valency, trot 11o1\[; how they connect to the lexical item they modi\[y. This ilfl'ormal;ion is kept in the adjunct|on table which is associated with the. grammar; an excerpt is shown in Figure 2. Trees t;hat can adjoin to other trees (and have entries in the adjunct|on table) ;~re called gamma-trees, the other trees (which can only t)e substituted into other trees) are alpha-trees.</Paragraph> <Paragraph position="2"> Note that we can refer to a tree by a combination of its name, called its supertag, and its anchor. N)r example, (q is the supertag of an all)ha-tree anchored 1)y a noun that projects up to NP, wMle 72 is |;lie superi;ag of it gamma tree anchored by a noun that only t)rojects 1;{) N (we assume adjectives are adjoined at N), and, as the adjunction table shows, can right-adjoin to an N. So that estimate~ is a particular tree in our LTAG grammar. Another tree that a supertag can be associated with is ~t~, which represents the predicative use of a noun.1 Not all nouns are associated with all nominal supertags: the expletive there is only an cq.</Paragraph> <Paragraph position="3"> When we derive a sentence using an LTAG, we combine elementary trees fl'om the grmnmar using adjunction and substitution. For extortpie, to derive the sentence There was no cost estimate for the second phase from the grammar in Figure 1, we substitute the tree tbr there into the tree tbr estimate. We then adjoin in the trees tbr the auxiliary was, the determiner no, and the modit)ing noun cost. Note that these adjunctions occur at different nodes: at VP, NP~ and N, respectively. We then adjoin in the preposition, into which we substitute ph, ase, into which we adjoin the and second. Note that all adjunctions are by gamma trees, and all substitution by alpha trees.</Paragraph> <Paragraph position="4"> If we want to represent this derivation graphically, we can do so in a derivation tree, which we obtain as follows: whelmver we adjoin or substitute a tree t~ into a tree t2, we add a new daughter labeled t~ to the node labeled tg. As explained above, the name of each tree used is the lexeme along with the supertag. (We omit the address at which substitution or adjunction takes place.) The derivation tree t br our derivation is shown in Figure 3. As can be seen, this structure is a dependency tree and resembles a representation of lexical argument structure.</Paragraph> <Paragraph position="5"> aoshi (1987b) claims that TAG's properties make it particularly suited as a syntactic representation tbr generation. Specifically, its extended domain of locality is useflfl in generation tbr localizing syntactic properties (including word order as well as agreement and other morphological processes), and lexicalization is useful tbr providing an interfime from semantics (the deriw~tion tree represent the sentence's predicate-argument structure). Indeed, LTAG has been used extensively in generation, start- null with with be as the head, as is more usual, or with doctor as the head, as is done in XTAG 1)eeause the be really behaves like an auxiliary, not like a flfll verb.</Paragraph> <Paragraph position="6"> estimate there was no cost for 74 c~ 1 7 3 71 72</Paragraph> </Section> <Section position="5" start_page="43" end_page="44" type="metho"> <SectionTitle> 3 System Overview </SectionTitle> <Paragraph position="0"> FERGUS is composed of three modules: the 2Y=ee Chooser, the Unraveler, and the Linear Precedence (LP) Chooser. The input to the system is a dependency tree as shown in Figm'e 4. Note that the nodes are labeled only with lexemes, not with supertags. 2 The Tree Chooser then uses a stochastic tree model to choose TAG trees fbr the nodes in the input structure. This step can be seen as analogous to &quot;supertagging&quot; (Bangalore and Joshi, 1999), except that now supertags (i.e., names of trees) must be fbund tbr words in a tree rather than tbr words in a linear sequence. The Unraveler then uses the XTAG grammar to produce a lattice of all possible linearizations that arc compatible with the supertagged tree and the XTAG. The LP Chooser then chooses the most likely traversal of this lattice, given a language model. We discuss the three components in more detail.</Paragraph> <Paragraph position="1"> The Tree Chooser draws on a tree model, which is a representation of XTAG derivation tbr 1,000,000 words of the Wall Street Journal. a The ~IYee Chooser makes the simplifying as2In the system that we used in the experiments described in Section 4, all words (including flmction words) need to be present in tt, e inlmt representation, flflly inflected. This is of course unrealistic for applications. In this paper, we only aim to show that the use of a %'ee Model improves performance of a stochastic generator.</Paragraph> <Paragraph position="2"> See Section 6 for further discussion.</Paragraph> <Paragraph position="3"> 3This was constructed from the Penn ~lS&quot;ee Bank using some heuristics, since the Pemt ~IYee Bank does not contain hill head-dependent infornlation; as a result of the use of heuristics, the Tree Model is not flflly correct. Smnl)tions that the (:hoice of n tree. tbr ~t node dei)ends only on its daughter nodes, thus allowing \]'or a tot)-(lown dynamic l)rogrmnlning algoril;hln. St)ccifically, a node 'q in the intml; si;ru(:ture is assigned ~t sui)e, rt;~g s so th;tt the 1)rol):t |)ilil;y of fin(ling the treelet (;()m\])ose(t of ~1 with superta X ,~ ;rod :dl of its (l;mght(;rs (as foun(t in I;he ini)ut sl;rucl;ure) is m;rximiz(;d, and such l;ha, t .'~ is (:Oml)a, tit)le with 'q'~s mother ~tll(l her sut)e, rtag .sin. Here, &quot;('omt)atible&quot; l:nemis |;hat; the tree ret)resclfl;ed by .'~ can 1)e adjoined or substii;uted into the tree ret)resented by ,%~, :m('or(ling to the XTAG gra, nmmr. For our exmnt)le senl;en(:(;, the, ini)ui; 1;o the sysl,e,m is the t;ree shown in Figure d, and the oul;1)ul; fi'om l;he ~.l~ee (~hooser is the, tree. ;ts shown in \],'igure. 3. No(;c that while a (le, riw~tion tree in TAG fully Sl)(:(:iiies a derivation and thus :t smTth,(:e, s(mte.n(:e, the oul;lmt fl:om the ~l~-ee Chooser (loes not;.</Paragraph> <Paragraph position="4"> There are two reasons. \]C/irstly, as exi)laine.d at; the end of Section 2, fin: us trees (:orresponding to adjuncts are underspe.(-itied with rest)ect to the adjunction site aat(t/or I;h(; a(ljmwl;ion direction (from left; or fl'Oln right) in the tree of the mother node, or they nmy 1)e m~orde.re(l with respc(:t to other ad.iun('ts (tbr ex~nni)l(; , the fmnous adjective ordering t)roblem). Secondly, Sul)ert;ags nl~y h~ve been (:hose.n incorre(:l;ly or not at ;ill.</Paragraph> <Paragraph position="5"> The Unr;~veler takes ;~s input the sentispecitied derivation tree, (Figure 3) ml(l 1)roduces a word lattice. Each node, in the deriw> tion tree consisl;s of ~t lexi(:al item m~d a supertag. The linear order ()f the dmlghte.rs with rest)cot to l;he he;td 1)osil;ion of ;t sut)ertng is st)ecilied in the Xrl'AG grmnmar. This information is (:onsulted to order the (laughter nodes with rcsl)e(:t to the head at each le.vel of the (terival;ion tree. in cases where ~ daughter node C&ll \])(I ntta('hed at more thin1 ()lie t)lace in the head SUl)ertag (as is the (:;~se in our exmnt)le for &quot;was and for), n disjunction of M1 these, positions are. assigned to the dmlghter node. A botton> up algorithm the.n constructs ~ lattice that ell(;odes the strings rei)re.sented 1)y (;~(:1~ level of th(! derivation tr(x'.. The latti('e~ at the. root of the (teriwttion tr(w. is the result o171;\]m Um';tveler. 'Fhe resulting l~ttti(:(; for the ('.Xaml)h'. s(ml;e.nce is shown in Figure 6.</Paragraph> <Paragraph position="6"> The \]~t;ti('.e. OUtlmt from the. Unra.veh'a&quot; encodes all t)ossible word sequences l)erniitted 1)y the derivation strueialre. We rmlk these.</Paragraph> <Paragraph position="7"> word sequen(:es in the order of their likelihoo(l 1)y composing the lattice with a finite-state machine rel)rese.nting ~ trigrmn bmgu~Ge 1no(tel. This mo(M has 1)ee.n ('onstructed froln 1,000,0000 words of W~dl Stre, et Journal (:orpus.</Paragraph> <Paragraph position="8"> We 1)i(:k the 1)est path through the lattice, resulting from the comt)osition using the Viterl)i algorithm, ;m(t this to I) ranking word sequence is the outt)ut of the LP Chooser.</Paragraph> </Section> <Section position="6" start_page="44" end_page="46" type="metho"> <SectionTitle> 4 Experiments and Results </SectionTitle> <Paragraph position="0"> In order l:o show |;ll~tl; Lhe llSO, of ~t tl:ce lIlode\] trod a, grmmnar doe.s indeed hell) pe, rformmme, we pe.rforme, d three experiments: supertag-based model * For the baseline experiment, we impose a random tree structure ibr each sentence of the cortms and build a Tree Model whose parameters consist of whether a lexeme l~t precedes or tbllows her mother lexeme lm. We call this the Baseline Left-Right (LR) Model. This model generates There was estimate for phase the second no cost . for our example input.</Paragraph> <Paragraph position="1"> * In the second experiment, we derive the parmneters tbr the LR model fl'om an annotated corpus, in particular, the XTAG derivation tree cortms. This model generates Th, crc no estimate J'or the second phase was cost . tbr our example input.</Paragraph> <Paragraph position="2"> * In the third experiment, as described in Section 3, we employ the supertag-based tree model whose parameters consist of whether a lexeme l d with supertag Sd is zt dependent of Im with supertag sin. Fm'thermore we use the supertag in~brmation provided by the XTAG grammar to order the dependents. This model generates Thcrc was no cost estimate for the second phase . tbr our example input, which is indeed the sentence ibund in the WSJ.</Paragraph> <Paragraph position="3"> As in the case of machine translation, evaluation in generation is a complex issue. We use two metrics suggested in the MT literature (A1shawl et al., 1.998) based on string edit; distance t)etween the outtmt of the generation system and the reference corpus string front the WSJ. These metrics, simple accuracy and generation accuracy, allow us to evaluate without human intervention, automatically and objectively. 4 Simple accuracy is the mnnber of insertion (I), deletion (D) and substitutions (S) errors between the target language strings in the test corpus and the strings produced by the generation model. The metric is summarized in Equation (1). R is the number of tokens in the target string. This metric is similar to the string distance metric used for measuring speech recognition accuracy.</Paragraph> <Paragraph position="5"> Unlike sl)eech recognition, the task of generation involves reordering of tokens. The simple accuracy metric, however, penalizes a mist)lacc.d token twice, as a deletion from its c.xpo, ct('.d position and insertion at at different l)osition. Wc llSO ~ second metric, Generation A(:(:ura('y, shown in Eqm~tion (2), which treats (hilt|ion of ~ token ~tt OIIC location in 1;11(; string ~md th(; insertion of the same tok(m ~t anoth('a&quot; location in tim string as one single mov('an(mt (;trot (M). This is in addition to the rem~fining insertions (1 t) and deletions (Dl).</Paragraph> <Paragraph position="6"> Ge'n(~'rationAcc',,racy = (1 - 54 + I I + 1)' -t- ,q ) (2) The siml)lc, a(:cura('y, g(merntion a('(:ur;my a,n(l tim av(n:ag(~ time, ti)r goamration of (;a,(:h l;cst; s(~,u t(m('c for tim tin'o,(', (}Xl)crinmnts ;~r(~ tabul~m,xl in %d)le 1. The test set consist(xl of 1 O0 r~m(tonfly (:hoscn WS.I s(mt(m(:(; with ml ~w(n:age lengt;h of 16 words. As can be seen, tim sut)crtng-1)ased mo(M |rot)roves over the LR model derived from mmotated data ~md both models improv(; over the baseline LR mod(:l.</Paragraph> <Paragraph position="7"> Sul)ertngs incorl)or~te richer infbrmation st|oh as argunmnt mid a(tjunci: disl;in(:tion, and nmnbcr and types of argunmnts. YVe cxt)(;(:t to iml)rove the performance of the supcrtag-bas(;d model by taking these features into a(:(:ount.</Paragraph> <Paragraph position="8"> In ongoing work, we h~vc developed tree-based metrics in addition to the string-l)ased presented here, in order to ewfluate sto(:hastic gener~tion models. We h~vc also attempted to correlate these quantitative metrics with human (tualitativ(~ judgcnl(mts. Ado, tail(~d dis(:ussion of these experiments and results is t)r(',s(mto, d in (Bangalore (',|; al., 2000).</Paragraph> </Section> <Section position="7" start_page="46" end_page="47" type="metho"> <SectionTitle> 5 Comparison with Langkilde 8z Knight </SectionTitle> <Paragraph position="0"> Langkildc and Knight (1998a) use a hand(:rafted grmmmu: that maps semantic representations to sequences of words with lino, arization constraints. A COml)lex semantic st, ructur( ~, is trnnsl~ted to ~L lattice,, mid a bigrmn langunge mode,1 t;hell (:hoost~,s &lltOllg {;}lo, l)ossiblo, surface, strings (moo(led in the l~ttice.</Paragraph> <Paragraph position="1"> The system of Langkildc 8~ Knight, Nitrogen, is similar to FERGUS in that generation is divided into two phases, the first of which results in a lattice fl'om which a surNcc si;ring is chosen during the, s(;cond t)has(; using a language model (in our case a trigram model, in Nitrogen's case a. 1)igr~ml 1no(M). Ih)w(',ver, (;t1(; first t)hases nr(', quit(', ditf(;r(mt. In FEI(.GUS, we sI;m:i; with a lexi(:~d pr(',dit:at(;-argulnent st;ru(;l;ur(~ while in Nitrogen, a more s0,mantic intmt is used. FEII.GUS (:ould (',asily |)(; augm(;nt(;d with a t)r(;t)ro(:cssor l;h~d; maps a so, m;mti(: rc, t)ro, s(mtal;ion t;o ore: synta(:ti(: inl)ut; this is not the focus of our r(~sc~u'ch. \[Iowev(',r, ther(~ are two more imt)orl,mfl; differ(m('es. First, |;t1(; h~m(t-crafl;ed grmmnar in Nitrogen maps dir(;(:tly from semantics to a linear r(~l)r(;sentation , skipping tho, nr|)or(;s(:(mt rcI)rcsentation usually f~vore(t tbr the, rod)r(',s(mtn|;ion of syntax. There is no stochastic tree model, since, the, re, ~tr(', no trees. In FEI{GUS, in|tied ('hoices arc, ma(tc stochastically t)ascd on tim tree rcl)rcscntation in the &quot;IY=ce Chooser. This allows us to capture stochastically certain long(tisl;ance cfli',(:ts which n-grmns camlot, such as sct)~ration of p;n'ts of a collocations (such as peT:form an ope~ution) through interl)osing adjuncts (John peT:formed a long, .somewhat tedious, and quite frustrating opcration on hi,s border collie). Second, tim hand-('rafl;cd gramln;tr llSCd in FEll.(-IUS was crafted indel)endcntly fl'om the n(;('xl for gent, rat;ion and is a imrcly (l(;(:larative rcl)rcs(mtation of English syntax. As such, we can use it to handle morphological effects such as agreement, which cannot in general be clone by an n-gram model and which are, at; the same time, descriptively straightforward and which are handled by all non-stochastic generation modules.</Paragraph> </Section> class="xml-element"></Paper>