XML Viewer - c00-2175

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2175_metho.xml
Size: 15,203 bytes
Last Modified: 2025-10-06 14:07:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2175">
  <Title>Comparing two trainable grammatical relations finders</Title>
  <Section position="3" start_page="1146" end_page="1146" type="metho">
    <SectionTitle>
2 Differences Between the Two
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1146" end_page="1146" type="sub_section">
      <SectionTitle>
Systems
</SectionTitle>
      <Paragraph position="0"> Forro otal. (\].()00) alld Buchholz et al. (1999) both describe learning systems to find GRs.</Paragraph>
      <Paragraph position="1"> rl'he former (TI{) uses transformation-based error-driven learning (Brill and Resnik, 1994) aim the latter (MB) uses lnemory-bascd learning (l)aelemans et al., 1999).</Paragraph>
      <Paragraph position="2"> In addition, there are other difl'erences. The TR system includes several types of inibrmation not used in the MB system (some because memory-based systems have a harder time handling set-wdued attributes): possible syntactic (Comlex) and semantic (Wordnet) classes of a c\]11111k headword, 1;11(', stem(s) and named-entity category (e.g., person, h)cation), if any, of a c\]mnk h eadword, lcxemes in a clmnk besides the headword, pp-attachment estimate and cerl;ain verb chunk properties (e.g., passive, infinitive).</Paragraph>
      <Paragraph position="3"> Some lexemes (e.g., coordinating COlljllllCtions an(1 lmnctuation) are usually outside of any clmnk. The T12, system will store these in an attribute of the nearest chunk to the left; and to the right of such a \]eXellle. r.l'lle MB system represents such lexemes as if the, y arc Olle word chunks. Tim MB system cmmot use 1;11(; TI{ syste, m method of storage, l)ecaus('~ melnory-based systelns have difficulties with set-v~ducd al,l, ribtttes (value is 0 or \]now~ lexemes).</Paragraph>
      <Paragraph position="4"> '\['lie N,iB systellt (all(l llOt the '.\['1{ syStelll) a\]so exalnines the areal)or of commas an(t verb (:hunks crossed by a potential G12..</Paragraph>
      <Paragraph position="5"> The si)acc of l)ossible GlTls searched 1)y the two systems is slightly different. The TI{, system searches fbr Gl~s of length three clmnks or less.</Paragraph>
      <Paragraph position="6"> The MB system set~r(-hes for GRs which cross at lllOSt either zero (target to the source's left) or one (to the right) verb (:lnulks.</Paragraph>
      <Paragraph position="7"> Also, slightly different are the chunks exmnined relative to a potential GR. Both systems will examine the target and source chunks, plus the source's immediate neighboring chunks.</Paragraph>
      <Paragraph position="8"> The MB systeln also examines the source's se('~ end neighl)or to the left;. The Tll, system instead also exmnines the target's immediate height)ors and all the clmnks between the source and target. null The T12, system has more data partitioning than the MB system. With the TI{ syst:em, possible Gl{s that have a diit'erent source chunk tyl)e (e.g., noun versus verl)), o1&amp;quot; a different relationship type (c o. subiect versus ol)iect ) or \ &amp;quot;,% &amp;quot;, , direction or length (in chunks) are alwws considered separately and will be afl'ected by differeat rules. The MB system will note such differences, but lil W decide to ignore some or all of them.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1146" end_page="1149" type="metho">
    <SectionTitle>
3 Comparing the Two Systems
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1146" end_page="1147" type="sub_section">
      <SectionTitle>
3.1 Experiment Set-Up
</SectionTitle>
      <Paragraph position="0"> One cannot directly coral)are the two systems from the descriptions given in Ferro et al.</Paragraph>
      <Paragraph position="1"> (1999) and Buchholz et al. (1999), as the resuits in the descril)tions we, re based oll different (tatt~ sets and on different assumptions of what is known and what nee, ds to be fbund.</Paragraph>
      <Paragraph position="2"> Itere we test how well the systems 1)erform using the same snm\]l annotated training set, the a2.~).0 words of elementary school reading comprehension test bodies used in t, brro et al.</Paragraph>
      <Paragraph position="3"> (1999). 2 We are mainly interested in comparing the parts of the system that takes in syntax (noun, verb, etc.) chunks (also known as groups) and tinds the G12.s between those chunks. So for the exl)eriment , we used the general 'I'iMBL sysi;eln (l)aelemans et al., 1999) to just reconst;ruct the part of the MB systcan that takes in (:hmlks an(t finds G12s. Th(', input to 1)ot\]1 this reconstructed part and the T\]-{ systonl is data that has been manually alHlotate(t for syntax chunks and ORs, ahmg with automatic lexeme and sentence segmentation altd t)art-of sl)eech tagging. In addition, the '\].'12. system has nlmltlal nallie(t-e\]ltity allllOt3.tioll~ all(1 alltOmatic estimations for verb properties and inel)osition and sul)ordilmte conjmlction attachments (l~k',rro el; al., 1999). Because the MB system was originally desigued to handle Gll.s attached to verbs (and not noun to 1101111 O12S, etc.), We 17311 the reconstructed part to only find Glis to w;rbs, and ignored other types of GRs when eomt)aring the reconstructed part with the T12.</Paragraph>
      <Paragraph position="4"> system. The test set is the 1151 word test set used in Ferro et al. (t999). Only G12s to verbs were examined, so the elt'eetive training set GR count fell ti'om 1963 to 1298 and test set C12.</Paragraph>
      <Paragraph position="5"> ')Note that if wc had been trying to compare the two systems on a large mmotated training set, the, M\] ~ system would do better by default just lmcause the TR system wotlld take too long to l)roecss a large trailling set.  (:ount from 748 to 500.</Paragraph>
    </Section>
    <Section position="2" start_page="1147" end_page="1147" type="sub_section">
      <SectionTitle>
3.2 Initial Results
</SectionTitle>
      <Paragraph position="0"> In looking at the test set results, it is useful to divide up the Gils into the following sub-tyl)es:  1. Simple m'guments: subject, object, indirect object, copula subject and object, expletive subject (e.g., &amp;quot;It&amp;quot; in &amp;quot;It mined today. &amp;quot;). 2. Modifiers: time, location and other modifiers. null 3. Not so simple arguments: arguments that syntactically resemble modifiers. These are location objects, and also sut)jects, objects and indirect objects that are attached via  a preposition.</Paragraph>
      <Paragraph position="1"> Neither system produces a spurious response for tyl)e 3 Gils, but neither sysl;em recalls many of the test keys either. The reconstructed MB system recalls 6 of the 27 test key instances (22%), the TR system recalls 7 (26%). A possible ext)lanation tbr these low performances is the lack of training data. Only 58 (3%) of the  Recall is the number (and percentage) of the keys that m'e recalled. Precision is the number of correctly recalled keys divided by the munber of ORs the system claims to exist. F-score is the harmonic mean of recall (r) and precision (1)) percentages. It equals 2pr/(p+r). Here, the differences in r, p and F-score are all statistically significant, a The MB system performs better as measured by tile F-score. But a trade-off is involved. The MB system has both a higher recall and a lower precision.</Paragraph>
      <Paragraph position="2"> Tile t)ulk (370 or 74%) of tile 500 Gil, key instmmes in tile test set are of type 1 and most 3When comparing differences in this paper, the statistical signiticance of the higher score I)eing better than the lower score is tested with a one-sided test. Differcnccs deemed statistically significant m'e significant at the 5% level, l)ifferences deemed non-statistically signif leant are not significant at the 10% level.</Paragraph>
      <Paragraph position="3"> of these are either subjects or objects. Witll type J GRs, the results are  With these GRs, the TR system I)erforms considerably better both in terms of recall and precision. The ditferences in all three scores are statistically significant.</Paragraph>
      <Paragraph position="4"> Because 74% of the GI-/. test key instances are of tyt)e 1, where the TR system performs better, this system peribrlns better when looldng at the results for all the test Gl{s coml)ined. Again, all three score diffferenees are statistically significant: null  Later, we tried some extensions of the reconstructed MB systeln to try t;o lint)rove its overall result. We eould improve the overall result by a combination of using the I\]71 search algorithm (instead of IG27~EE) in TiMBL, restricting the t)otential Gils to those that crossed no verb chunks, adding estimates on prepo,sition and complement attachments (as was done in TR) and adding infbrnlat, ion on verb chunks about 1)eing passive., an infinitive or an unconjugated present 1)articit)le. The overall F-score rose to 65% (63% recall, 67% precision). This is an improvement, but the Til. system is still better. The differences between these scores and the other MB and Til, confl)ined scores are statistically significant.</Paragraph>
    </Section>
    <Section position="3" start_page="1147" end_page="1149" type="sub_section">
      <SectionTitle>
3.3 Exploring the Result Differences
</SectionTitle>
      <Paragraph position="0"> 3.3.1 Type 2 GRs: modifiers The reconstructed MB system performs better at type 2 Gil,s. How can we account tbr this result difl'erence? Letting the TR system find longer GRs (beyond 3 chunks in length) does not hell) nmch. It only finds one more type 2 Oil, in the test set (adds 1% to recall and 1% or less to precision). Rerumfing the TR system rule learning with an information organization closer to the MB system produces the stone 47% F-score as the  M\]} sysl;(;nl (rc.ca.ll is \]&lt;)w(~r, Iml; 1)rc(:isi(n, is high(;r). S1)c&lt;:ifi(:a.lly, we ~,;()I: 1;his rc.sull; when</Paragraph>
      <Paragraph position="2"> l)aS,~dv('., ilflinil;ivc), nam(;d-cm;il;y lal)cls &lt;)r hc.a.dr 1 &amp;quot;1 wor(t sLems. Also, l;\]m .I l~. ,%ml;cm How (;xmnin(;,q (;h&lt;; &lt;:hmtks cxamin(;(l 1)y l;h&lt;' &lt;)rJj,,ina\] M\]I s2/sl;('an: garg('.|;, '~ourcc and ,~our(:(;'.~ n('Jj,ihl)&lt;)rs.</Paragraph>
      <Paragraph position="3"> Ill a(l&lt;lil;ion, insl;c, ad of 6 al)s&lt;&gt;lul;e h;ngl,h ca.l;('gories (l;arg;cL i,~; 3 &lt;:\]ranks 1;() l;hc lcf|;, 2 (:\[mnks, \] clmnk, and similarly for I;11('. righl;), 1;he (\]l{.s c, m,~i&lt;l('xc, l now jusl; fall Jill;t);/.11&lt;t a.re \])m'i;it;i(m&lt;'xt inL&lt;) 3 re\]at;iv(; cat;(;gorics: i;argcl; is l;h(! tirsl; verb chunk 1;o Lhc lolL, ,~dmila.rly /;&lt;) Lh&lt;'. righl; and l;a.rgel; is i;\]m :-~c.(:(m&lt;\[ verb (:hun\]~ 1;&lt;) l:\]~c rig\]~I;. ~l'h('. MI\]{ SNS(i('.III (;;~11 (\[isgin?;ui~',h l)('kwe('~n (;\]w+;e sam(; r(',\]al;iv(; &lt;:a,1;('gori(',~.</Paragraph>
      <Paragraph position="4"> I h'.&lt;loing l;hi,s 'l.'l{. sy,%&lt;;~ n r&lt;;i'm~ 'wil./m'l+,l, &lt;:hi u ~l~ h('.a.(lwor&lt;l ~;ynLa(;l;i&lt;: (&gt;r ,~('.mmg;\](: (:las~;(',s \])r&lt;&gt;&lt;lu&lt;:cs z~ d(i(/&gt; l:-s(:or&lt;'~. If in a&lt;hlii;i(/n, (,h(,: l)l)al;La.(:linmnl;, v(,.rl) &lt;:\]ulnk 1)rol)('.rl;y, nam&lt;;&lt;l--cHl;il;y tal)el and \]madword ,ql;&lt;'.n~ intbrnml;i&lt;)n are ad&lt;l(:&lt;l I)a(;l( in, i;t~('~ i:-s(:or(', aclamlly (h'ol),~ 1;()/J3&lt;~). '.\[.'\]1(; &lt;lifl'(;r&lt;',nc&lt;'.,&amp;quot;, t)(;1;w&lt;:cn l;hc,~(; d'\[(/), d(i(/&gt; mM d3&lt;/) rc. null An cXaml)\](: is comt&gt;aring \] ./ly o'a ~l~ac.sday. mM l ./1!/k,o'm,c .fl'o'm, k,(&amp;quot;,'c o':~, \[l~u,('.,sd(~?/. in 1)oi;h s(',nl;(;ll(:(',s, on T.,&lt;'sday is a, l;imc m(&gt;(liticr &lt;)f Jly and on (:ro,~;s('~s no vca'l),~; 1:&lt;) reach .fly (on al;l;ach('~s l;&lt;) l;hc tirsl; verb 1;() ii;s left;). \]htl; in l;h(; fir,~;I; s(;nt(;ncc o'~, is ncxl; to Jly, while in l;h(; s(~(:&lt;)n&lt;l SCllt;(~llC(~ 1;lmr(~ arc l;\]ll&amp;quot;(x ~, chllil\]r~,q ,~c\]);l,r;i.{,ill~ o71~ aud .fly.</Paragraph>
      <Paragraph position="5">  are sLa.l;isl;i(:ally ,~ignilica.nl; and al\] I;\]m l --&gt;;(:orc.s in Lifts seA; arc. slal;isl;ica.lly signifi(:a.nt;ly diffcrcnl; J'l'Olll (;lit', 'i'll. SySi;Cll\[ l'llllS wil;h l;h&lt;&amp;quot; 78&lt;/) and 7!)&lt;/&gt; \])'-s(:orcs.</Paragraph>
      <Paragraph position="6"> From l;he sLal;ist;ica.lly ,~;ignitit:ant; scow. &lt;liif&lt;'a(;n(x's, iL s(;&lt;:ms l;\]lal; t)arl;il;ionin/ (\[al;a 1)y l)()l;ent;ia\] CI/. s(&gt;m'(:e &lt;:hunk l;yp&lt;', hell)s (in(:rc.a.q&lt;'~ from (id:(/&gt; 1;o (i9%), as does Lh&lt;'. resL &lt;&gt;f l;hc.</Paragraph>
      <Paragraph position="7"> imrl;ii;ioning l)(:rf&lt;&gt;rm(;d mM rim.king some slighl; clmngcs in what is (',xamincd (increase 1;o 75%), using |;rmlsformaLion-lmscd learning insl;ca(t of mcln&lt;)ry-bas(',t learning (increase to 78(/~) an(l using v&lt;',rl) chunk l)rot)(;rty informat, ion (in('r&lt;',ase |:o 80%).</Paragraph>
      <Paragraph position="8"> In the. original MB sysl,cm run, Lhc somce clmnk Lyl)('= mid l;h('= 1)oi;('ag;ial (:,11 \]CltgLh an(t (lir('=(:l;Jon wcxe a\]rca(ly deLcrminc(l l)y I;hc m(',m(/ry-bas('d \]carn(;r l;o l)e l;h(', mosL iml)orl;mfl;  attributes exanlined. So why would partitioning the data and runs by the values of these attributes be of extra help? A possible answer is that for different values, the relative order of importance of the other attributes (as deternfined by the menlory-based learner) changes.</Paragraph>
      <Paragraph position="9"> For example, when the som'ce chunk type is a noun, the second most inlportant attribute is the source dlunk's headword when the target is one to the right, but is the source chunk's right neighbor's headword when tile target is one to the left;. Partitioning the data and runs lets these different relative orders be used. Having one combined data set and rUlL inealLS that only one relative order is used. Note that while this partitioning may 11ot; be the standard way of using memory-based learning, it; is consistent with the central idea in memory-based learning of storing all the training instances and trying to find tile &amp;quot;lmarest&amp;quot; training instance to a test case.</Paragraph>
      <Paragraph position="10"> Another question is why using transtbrlnatiol&gt;based (rule) learning seems to be slightly better than nmmory-1)ased learning for these type 1 GFls. Memory-based learning keeps all of the training instances and does not try to find generalizations such as rules (Daelenlans el; al., 1999, Ch. 4). However, with type 1 Gl~s, a few simt)le generalizations can account for many of the, instances. In the nlanner of Stevenson (1998), we wrote a set of six simple rules that when run on the test set type 1 ORs produces an F-score of 77%.</Paragraph>
      <Paragraph position="11"> This is better than what our reconstructed MB system originally achieved and is (:lose to the TII. system's original results (close enough not to be statistically significantly different). An example of these six rules: IF (1) the center drank is a verb chunk and (2) is not considered as possibly passive and (3) its headword is not some fbrm of to be and (4) the right neighbor is a noun or ve, rb chunk, THEN consider that chunk to the right as 1)eing an object of the center ehuuk.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML