File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1035_metho.xml

Size: 18,453 bytes

Last Modified: 2025-10-06 14:07:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1035">
  <Title>Aspects of Pattern-matching in Data-Oriented Parsing</Title>
  <Section position="4" start_page="0" end_page="237" type="metho">
    <SectionTitle>
2 Data Oriented Parsing
</SectionTitle>
    <Paragraph position="0"> Data Oriented Parsing, originally conceived by Remko Scha (Scha, 1990), has been successfully applied to syntactic natural language parsing by ll,ens Bod (1995), (1999). The aim of Data Oriented Parsing (henceforth DOP) is to develop a per\[ormanee model of natural language, that models language use rather than some type of competence. It adapts the psycholinguistic insight that language users analyze sentences using previously registered constructions and that not only rewrite rules, but cornt)lete substructures of any given depth cast be linguistically relevant milts tbr parsing.</Paragraph>
    <Section position="1" start_page="0" end_page="236" type="sub_section">
      <SectionTitle>
2.1 Arehiteeture
</SectionTitle>
      <Paragraph position="0"> The core of a DOP-system is its TREEBANK: an annotated corlms is used to induce all substruct, ures of arbitrary depth, together with their respective probabilities, which is a expressed by  its fl:equency in the TREEBANK relative to l;he numl)er of substructures with the Sanle rootnode. null Figure 1 shows the coral)|nation ol)eral;ion that is needed to tbrm the correct l)arse tree for the sentence Peter&amp;quot; killed a raccoon. Given a treet)ank of substructures, the systcln tries to match the leftmost open nod(; of a substructure |;hat is consistent with the parse tree, with the top-node of another sul)structur(;, consistent with the parse tree.</Paragraph>
      <Paragraph position="1"> Usually, ditferent conlt)inations of sul)structllrO.s are possible, as is i~l(ti(:ated in Figure 1: in the examl)le at the left-hand side the tree-structure (:an t)e built l)y (:o11111ining all S-structure wil;h a st)coiffed NP a.lld a flllly spe(:ifled vp-structure. The right example shows another possible Colnl)ination, where a parse tree is 1)uilt t)y conll)ining the \]ninimal sut)s|;rltcl;ures. Nol;e that t\]\]cse are (:(msisl;(mt wit\]l ol'dinary rewrite-rules, such as s -+ NP VP.</Paragraph>
      <Paragraph position="2"> One t)artit:ul;~r 1)~trse tree may t;hus (:()\]lsist ()f several (lill.(u'(ml; deriva, tio'n.s..To lind l;hc 1)rot) al)ility (If ;I, (terivation, we lnultit)ly tim t)rot)a1)ilities of the substructures thai; were used to l.()rm the derivation. To lind the t)robal)ility of a parse, we must; in tlrilmit)le sum the t)rol)at)ilities of all its deriw~tions.</Paragraph>
      <Paragraph position="3"> It is COlnl/utationally hardly tra(:tat)h; to COilsider all deriw~tiolls t.()r each pars('. Since VITF, RBI ol)timization only su('(:ceds in finding the most 1)robal)h'~ (teriw~tion as opposed to the most 1)robal)le l)arse, the MONTE CARLO algorithm is introduced as a proper al)proximation I;hat randomly generates a large nlmfl)er of deriw~tions. The most prol)al/le l)arse is (:onsi(tered to be the parse that is most often observed in this derivation forest.</Paragraph>
    </Section>
    <Section position="2" start_page="236" end_page="236" type="sub_section">
      <SectionTitle>
2.2 Experimental Results of HOP
</SectionTitle>
      <Paragraph position="0"> The basic 1)op-model, POP1, was testc,(t (111 a manually edited version (if the ATIS-corlnlS (Marcus, Sant(lrini, and Marcinkiewicz, 199a).</Paragraph>
      <Paragraph position="1"> The syst;eln was trained on 603 Selltelmes (t)arl; ofstmech tag sequelmes) and (;wfluated on a test set (if 75 SCld;ences. Parse accuracy was used as an evahlation metric, expressing t;11(; percentage of sentences in the test set for which the tlarse l)rOl)osed by the system is COlnpletely identical to the one in l;lle original eort)us, l)ifl'ereat exl)erilnents were conducted in which max|11111111 sul)structure size was varied. With DoPllillfited to a sul)sl;ructure-size (If 1 (equiw~lenl;</Paragraph>
    </Section>
    <Section position="3" start_page="236" end_page="237" type="sub_section">
      <SectionTitle>
2.3 Short Assessment of DOP
</SectionTitle>
      <Paragraph position="0"> DOI'I in its ot)tinlal fornl achieves a very high parse accuarcy. The comt)utational costs of the syste111, however, are equally high. Bed (19951 reported an average t/arse tilne of 3.5 hours 11(;1 .</Paragraph>
      <Paragraph position="1"> Sellte.n(:e. Even though (:urrent 1)arse tilne is rcl)ortc.d to l)e 11,or(; reasollal)le, tile oi)timal D()P algoril:lml in whi(:h n(/('onstr;dlts are made on tll('~ size (1t' sut)structures, nlay not yet 1)e tract;able for life-siz( ~. COl'l)()ra.</Paragraph>
      <Paragraph position="2"> In a context-free grammar framework (consistent with \])()P limited to a sutlstru(:tm:e-size (If 1), there is only (me way a t/arse tree can t)e t'ornmd (t'(/1: exalnl/le, the right hand side of Figure \]), nleaning that there is Olfly one del:ivatioll for a given 1)arse tree. This allows efficient VITEll.BI style Ol)tillfization.</Paragraph>
      <Paragraph position="3"> To elmo(le (:ontext-sellsitivity in the systeln, DOP is tbr(:ed to introduce multiple deriw~tiolls, so that repeatedly the same l)arse tree needs to 1)e g(;lmrated, l)rillging at/(/ut a lot of COlll\])llta,tional overhead.</Paragraph>
      <Paragraph position="4"> Even though the use of larger syntactic coiltexts is highly relewmt fl'om a psycholinguisI,ic t)oint-ofview, there is 11o explicit l)reference l)eing lnade t'(/1' larger substructures in the DOP nlodel. While the MONTE CARLO optimizatiolx scheme nlaxinlizes the prot)ability of the (teriw&gt; tions and seelns to 1)refer derivations nlade up of larger substructures, it; may 1)e ild;eresting to</Paragraph>
      <Paragraph position="6"> see if we can make this assumption explicit.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="237" end_page="237" type="metho">
    <SectionTitle>
3 Pattern-matching
</SectionTitle>
    <Paragraph position="0"> When we look at natural language parsing fl:om a memory-based point of view, one might say that a sentence is analyzed by looking u t) the most similar structure for the different analyses of that sentence in meinory. The parsing system described in this paper tries to mimic this 1)ehavior by interpreting the pop-model as a memory-t)ased model, in which analyses are being matched with syntactic patterns recorded in memory. Similarity t)etween the proposed analysis and tile patterns in memory is com-Imted according to: * the number of patterns needed to construct a tree (to be minimized) * the size of the patterns that are used to construct a tree (to be maximized) Tile nearest neighbor tbr a given analysis can be defined as the derivation that shares the largest amount of common nodes.</Paragraph>
  </Section>
  <Section position="6" start_page="237" end_page="238" type="metho">
    <SectionTitle>
4 The experimental Setup
</SectionTitle>
    <Paragraph position="0"> 10-tbld cross-validation was used to appropriately evaluate the algorithms, as tile dataset (see Section 4.1) is rather small. Like DoPl the system is trained and tested on part-of-speech tag sequences. In a first phase, a simple bottom-up chart parser, trained on the training partitions, was used to generate parse forests tbr the 1)art-of speech tag sequences of the test partition. Next, the parse tbrests were sent to the 3 algorithms (hencetbrth the disambiguators) to order these parse forests, the first parse of the ordered parse forest being the one proposed by the disanfl)iguator.</Paragraph>
    <Paragraph position="1"> In this paper, 3 disambiguators are described:  The evaluation metric used is pars(; accuracy, but also tile typical parser evaluation metric F-measure (precision/recall) is given ms a means of reference to other systems.</Paragraph>
    <Section position="1" start_page="237" end_page="238" type="sub_section">
      <SectionTitle>
4.1 The Corpus
</SectionTitle>
      <Paragraph position="0"> The ext)eriments were conducted oil all edited version of tile ATIS-II-corpus (Marcus, Santorini, and Marcinkiewicz, 1993), which consists of 578 sentences. Quite a lot of errors and inconsistencies were found, but not corrected, since we want our (probabilistic) system to be  lille to deal with this kind of noise. Semanti(:ally oriented tlags like -TMP all(1 -Dill,, lllOSI; often used in conjmml;ion with l'p, have been renlove(t~ since l;here is no way of rel;rieving this kind of semanti(: intbrmation from t;11(; t)art;-o5 sl)ee(:h tags of the ATIS-(:ortms. Synta(:ti(: flags like -sILL on the other hand, \]lave 1)een maintaine(t. Internal relations (denoted by llllllleric tlags) were removed and tbr 1)ractical reasons, scntenee-lellgth was limited 1;o 15 words max.</Paragraph>
      <Paragraph position="1"> The edited (:orl)us retained 562 sentences.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="238" end_page="238" type="metho">
    <SectionTitle>
5 Parsing
</SectionTitle>
    <Paragraph position="0"> As a first phase, a 1)ottom-ut) (:hart parser i)al&amp;quot;sed t;he test sol;. This t)roved to t)e quite l)rol)lemati(:, since overall, 1()6 out of 562 senten(:es (190/(0) could not 1)e t)arsed, (111(', to the sl,arsencss of the gramnmr, meanil,g I;ha(; l;he at)l)ropriate rewrite rule needed to (:onstru('l; the (:orre(:t t)~lrse tree tbr a senten(:c, in the test set, wasn't featured in the, in(tu(:ed grammar. NPannol;at;ion seem(~(t 1;o 1)(; t;lle lml, in (:aus(~ \]'or 11nl)arsal)ility. An NP like restriction code AP/57 is repres(ml;ed 1)y the, rewrite rule: NP -~ NN NN sym sym sym C\]) CD Highly st)ccitt(: and tint stru(:tur(;s like these are s(:ar(:e an(t are usually ll()t induced from the training set whell nee(h;d to parse the test set.</Paragraph>
    <Paragraph position="1"> On-going re, sear(:h tries 1;o iml)h;ln(ml; gl&amp;quot;ammal;i(:a.1 SlnOothing ;ts :t soluti(m to |;his 1)rol)hml, but one might also (:onsid('a: genera.ling parse fol&amp;quot;eSi;S with an in(tep(mdent ~,;l&amp;quot;allllll;Ll', ilMu(:e(l fronl the entire (:orlms (training setq-t('~si;s(',l;) or a difl'erent corlms. 111 t)()th cases, however, we would need to apply 1)robal)ilisti(&amp;quot; smoothing to be al)le to assign t)rot)at)ilities to llllkllown s(;,l;llclures/rules. Neither grammatical, nor t)rot)abilistic smoothing was imt)lemented in the (;elltext of the exl)eriments, (les(:ril)ed in this 1)at)er. The sl/ars(mess of the grammar 1)roves t;o l)e a serious 1)otl;hme(:k fi)r pars(', a(:(:ura(:y, limiting our (lisamlliguators t;o a maximuln tlarsc act:uracy of 81%.</Paragraph>
  </Section>
  <Section position="8" start_page="238" end_page="239" type="metho">
    <SectionTitle>
6 PCFd-experiments
</SectionTitle>
    <Paragraph position="0"> a PCFG constru(:ts parse trees by using simple rewrite-rules. The prot)al)ility of ~ parse tree (;~7tll })e (:omlml;ed l)y mull;it)lying the t)robat)ilities (1t&amp;quot; the. rewrite-rules that w(~.re used to (:onst;fuel; the t)ars(:. Note that a l'CFd is i(h;nti(:al tO DOP\] whell we limit I;he maximum sul)Stl'UCtures size to \], only Mlowing deriwd;ions of the type found at the right-hand side of Figure 1.</Paragraph>
    <Section position="1" start_page="238" end_page="238" type="sub_section">
      <SectionTitle>
6.1 Experimental Results
</SectionTitle>
      <Paragraph position="0"> The first line of Tat)le I shows the, rc, sull;s for the l'CF(~-(',xl)eriments: 66.4% parse accuracy is an adequate result for this baseline model. We also look at l)arsc accuracy for parsable sentences (an estimal;e of the parse accuracy we 1night get if we had a more suited parse forest generator) and w(; notice that we are able to a(:hieve a 81.8% parse ae(:ur~my. This is already quite high, trot on exmnining the parsed data, serious and fluManmntal limitations to the POPO-mo(lcl can be el)served</Paragraph>
    </Section>
    <Section position="2" start_page="238" end_page="239" type="sub_section">
      <SectionTitle>
6.2 Error Analysis
</SectionTitle>
      <Paragraph position="0"> Figm'c 2, disl)lays the mosl; common tyl)c of mistake mad(; l)y 1)CFG~S. :\]'lit; (;orr0,cl; t)arse l;ree ('ouht r(;i)res(mt an mlalysis for 1;11(; senten(:e: I &amp;quot;.;ant o, fli.qht from \]h'us.scl.s to 2bronto.</Paragraph>
      <Paragraph position="1"> This examt)le shows thai; ~t PCFG h~ls a I;(~,ndency to prctbr tlatter strueture, s over emt)edde, d stru(:t;ures. This is a trivial effect of 1;11(; mathcmat;it'll tbrmula used to conqml;e the t)rol)at)il il;y of a I)arse-tr(;(;: emt/cdded structure require more r(;writ(' rules, adding more fat:tots to the multii)li(:ation , whi(:h will alm(/st ilw, vit~d)ly r(;suit in :t lower l)rol)al)ilit;y.</Paragraph>
      <Paragraph position="2"> 11; is all 1111J'()ri;llllal;e 1)r()I)(;rl;y of I'CFG~s t;hal; the mmfl)er of no(l(;s in the 1)atse tree is invers(~ly 1)rot)ortiomd;e to il;s t)rol)al)ility. ()n(; might t)e inclin(xl to n(n'malizc a parse tree's pr()bat)ility relative t(/the mnnt)er of nodes in the tree, but a more linguistically solmd alternative is at hand: the enhancenmnt of context sensii;ivity through the use of larger synl;tt(:ti(: (:ont(;xt; within t)arse tre(:s (:;/,11 make our disaml)iguat;or lnore rolmst.</Paragraph>
      <Paragraph position="3">  The 1)att(;rn-Matching Prol)al)ilistie Gramnmr is a memory-based interpretation of a \])OI'model, in which a s(mtence is analyzed t)y matching the largest, possible chunks of synt;acti(&amp;quot; strut:lure Oll the sentence. To COml)ile t/~rse trees into pat, terns, all substructm'es ill the l;raining set are eneo(ted 1)y assigning l;hem specific indexes, NP(o)345 e.g. denotil~g a fully specified NP-sl;ruel;urc. This apt)roa(:h was insl)ired 1)y Goodman (199(i), in which Goodman  unsuccessflflly uses a system of indexed parse trees to transform DOP into aSl equivalent PCFG.</Paragraph>
      <Paragraph position="4"> The system of indexing (which is detailed in De Pauw (2000)) used in tim experiments described in this paper, is however specifically geared towards encoding contextual intbnnation in parse trees.</Paragraph>
      <Paragraph position="5"> Gives, an indexed training set, indexes can then be matched on a test set parse tree in a bottom-up fashion. In the tbllowing example, boxed nodes indicate nodes that have been retrieved from memory.</Paragraph>
      <Paragraph position="6">  In this example we can see that an NP, consisting of a flflly specified embedded NP and l'P, has l)een completely retrieved from men&gt; ory, meaning that the NP in its entirety can be observed in the training set. However, no vp was tbund that consists of a VBP and that particular NP. Disambiguating with PMPG coilsequently involves pruning all nodes retrieved frolu illeillory:  Finally, the probability for this pruned parse tree is computed in a pCFO-type manner, not adding the retrieved nodes to the product:</Paragraph>
      <Paragraph position="8"/>
    </Section>
    <Section position="3" start_page="239" end_page="239" type="sub_section">
      <SectionTitle>
7.1 Experimental Results
</SectionTitle>
      <Paragraph position="0"> The results tbr the PMPG-exI)erinmnts can be ibund on the second line of Table 1. On some partitions, PMPG pcrtbrmed insignificantly better than PCFG, but Table 1 shows that tile results for the context sensitive scheme are much worse. 58.2% overall parse accuracy and 71.7% parse accuracy on parsable sentences indicates that PMPG is *sot a valid approximation of DOP'S context-sensitivity.</Paragraph>
    </Section>
    <Section position="4" start_page="239" end_page="239" type="sub_section">
      <SectionTitle>
7.2 Error Analysis
</SectionTitle>
      <Paragraph position="0"> The dramatic drop in parsing accuracy calls tbr an error analysis of the parsed data. Figure 3 is a prototypical mistake PMPG has made. The correct analysis could represent a parse tree for a sentence like: What flights can I get firm Brussels to 2brvnto. The PMPG analysis would never have been considered a likely candidate by a common PCFG. This particular sentence in fact was ef tbrtlessly disambignated by the PCFG . Yet the fact that large chunks of tree-structure are retrieved Dora memory, make it the preferred parse for the PMPG. We notice tbr instance that a large part of the sentence can be matched on an SBAR structure, which has no relevance whatsoever.</Paragraph>
      <Paragraph position="1"> Clearly, PMPG overestimates substructure size as a feature for disambiguation. It's interesting however to see that it is a working implementation of context sensitivity, eagerly matching patterns from memory. At the same time, it has lost track of common-sense PCFG tactics, it is in the combination of the two that one may find a decent disambiguator and accurate implementation of context-sensitivity.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="239" end_page="240" type="metho">
    <SectionTitle>
8 A Combined System (PMPG@PCFG)
</SectionTitle>
    <Paragraph position="0"> Table 1 showed that 81.8(/o of the time, a PCFG finds the correct parse (Ibr t)arsable sentences), meaning that the correct parse is at the first place in the ordered parse tbrest. 99% of the time, the correct parse can be tbund among the 10 most probable parses in the ordered pars(; forest. This opens up a myriad of possibilities tbr optin, ization. One might for instance use a best-first strategy to generate only the 10 best parses, significantly reducing parse and disambiguation time. An optimized disanNiguator might theretbre include a preparatory phase in wtfich a common-sense PCFG retains the most probable parses, so that a nlore sophisticated tbllow-up scheme ,teed not bother with senseless analyses.</Paragraph>
    <Paragraph position="1"> In our experiments, we combined the common-sense logic of a PCFG and used its output as the PMPG'8 input. This is a well-established technique usually refi~rred to as systent combination (see van Halteren, Zavrel, and Daelemans (1998) for an application of this  We art'. also presented with th(', possibility to assign a weight to each algorithm's decision.</Paragraph>
    <Paragraph position="2"> The probability of a parse can the })e described with the following formula:</Paragraph>
    <Paragraph position="4"> The weight of ea(:h algorithm's (lc(:ision, as well as the mnnt)er of 1HOSt t)robM)h; parses that m:e extrat)olated for the 1)attern-m~tt:hing algorithnq are parameters to 1)e optimized. Futm:e work will include evaluation on a validation set to retrieve the ol)timal va, hles for these 1)arame, tcrs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML