File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1018_metho.xml

Size: 25,892 bytes

Last Modified: 2025-10-06 14:07:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1018">
  <Title>The Use of Instrumentation in Grammar Engineering</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Software Instrumentation
</SectionTitle>
    <Paragraph position="0"> Systematic software testing requires a match between the test subject (module or comt)lete system) and a test suite (collection of test items, i.e., sample input). This match is usually computed as the percentage of code items exercised by the test suite.</Paragraph>
    <Paragraph position="1"> Depending oll the definition of a code item, various measures are employed, tbr example (cf. (Itetzel, 1988) and (EAGLES, 1996, Appendix B) ibr overviews): statement coverage percentage of single statements exercised branch coverage percentage of arcs exercised in control tlow graph; subsumes statement coverage null path coverage t)ercentage of 1)aths exercised from start to end in control flow graph; subsmues branch coverage; impractical due to large (often infinite) number of paths condition coverage percentage of (simple or aggregate) conditions evaluated to both true and false (on different test items) Testsuites are constructed to maximize the targeted measure. A test run yields information about the code items not exercised, allowing the improvement of the testsuite.</Paragraph>
    <Paragraph position="2"> The measures are autonmtically obtained by instrumentation: The test subject is extended by code which records the code items exercised during processing. Afl;er l)rocessing the testsuite, the records are used to comlmte the lneasures.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="118" type="metho">
    <SectionTitle>
3 Grammar Instrumentation
</SectionTitle>
    <Paragraph position="0"> Measures from SE cannot silnl)ly be transferred to unification grmmnars, because the structure of (imperative) programs is different fl:om (declarative) grmnmars. Nevertheless, the structure of a grmnmar (formalism) allows to define measures very similar to those employed in SE.</Paragraph>
    <Paragraph position="1"> constraint coverage is the quotient # constraints exercised Tco n --- # constraint in gralnlnar  where a (-onsi;raint; may \])e either a 1)hrascstructure or an equational COllSl.l'ailll;; del)('al(l-. ing o11 the formalisln.</Paragraph>
    <Paragraph position="2"> disjunction coverage is the quotient</Paragraph>
    <Paragraph position="4"> where a disjunction is (:onsidel:ed (:over(~(l when all its all:ernative (li@mcl;s have been set)aral;ely exercised. It en(;omlmSSes (:onstraint coverage.</Paragraph>
    <Paragraph position="5"> Optional eonstituenl, s an(l equal;ions have to be treated as a disjuncgion of the consgrain(, and an empty constraini; (cf. Fig.2 for an examl)le ).</Paragraph>
    <Paragraph position="6"> interaction coverage is the quotient -j/: disjuncl; (-omt)inai;ions exercise(1</Paragraph>
    <Paragraph position="8"> where a (lisjunct Colnbinal;ion is a ('omI)h'l;e sel.</Paragraph>
    <Paragraph position="9"> of choi('es in l;he (lisjun(:tfions which yiehls a wellforlned grmmnal;i('at sl;ru(;l;ure.</Paragraph>
    <Paragraph position="10"> As with path coverage, the set of legal (li@mct Colnbination typically is intinite (hie to rc('ursion. A solution from SE is to restri(:t 1;111! use of recursive rules to a fixed llUllll)er ()f (;;ts(',s, for exaint)le not using t.hc rule at all, and using il; ollly OllCe.</Paragraph>
    <Paragraph position="11"> The goal of insl;rllnlenl;al;ion is 1;o el)lain inf()rmali()li a})()lll, which test cases (~xer(:ise wlfi(:lt gl';'/llll11,:|l' (-onstraint.s. One way 1;o re(:or(1 lifts infornmlion is to exlend l,he parsing alg()rithm. Another way is ~o use 1:he gralmnar formalisln il.qelf Io i(lc,,l.ify lhe disjun(:l;s, l)el)elMing on the (!xl)ressivits- ()f l;he f()rrealism used, th(; following 1)ossil)ilil;ies exisl:: atomic features Assmning a uni(lue mmfl)ering of (tisjuncts, an annotal;ion ()f ghe form DISJUNCT-nn = + can be used for marking. To delx;rmine whether a (-ertain disjun(:l; was use(t in consl~ru(;til~g a sohttion, one only nee(Is to check whether the associate(l feal;m'e occurs (at some level of embedding) in the solut.i(m.</Paragraph>
    <Paragraph position="12"> set-valued features If set-valued f(~al;ures are availal)le, one can use a sel;-valued fl~alure DISJUNCTS to co\]le(;i; ai.onli(&amp;quot; sym1)ols tel)resenting one disjunct each: DISJUNCT-nn ~ DISJUNCTS, whi('h might, ease |;he collection of exereise(l (lisjuncl, s.</Paragraph>
    <Paragraph position="13"> multiset of symbols To recover the number of times a disjunct is used, one needs I;o leav(; the uniiication l)aradignl, l)ecause it is very difficult; 1;o counl; wiLh unitical;iol~ grammars. \Ve have use(l a special feal;ure of our gramntar (levelol)-</Paragraph>
    <Paragraph position="15"/>
    <Paragraph position="17"> synfl)olic marl:s, whit:h is fornmlly equivalenl. 1o a nmll.isel, of s3&amp;quot;ml)ols associate(t wit.h the con&gt; l)lel.e s()lut, ion (stru(:~,ural embedding I)\]ays no role; see (FranlC/ et al., \].998) f()r al)l)lical.ions).</Paragraph>
    <Paragraph position="18"> In I.his way, we can ('ollecl; fronl l.he reel; node of c'ach solution the set of all (tisjun(:ls exer(:ised, Ix)gel;her wil;h a usage eount..</Paragraph>
    <Paragraph position="19"> Consider the LFG granunar rule in Fig.1. ~ Consl.raint (:()verage would require tesl; items such tlmt every (:alegory in t.he VP is exer(:ised; a sequence of V NP PP would sutli(:e for this measure, l)isjun(:lion ('overage also requires 11) t.ake lh(! (unpty (lisjun('ls into a(:(tOUlll.: NP ;m(l PP are Ol)l.ional , s() i;hal, four ilems are neexh~(l 1.o achiewe full (lisjuncl.ion c()verage on 1.he phrase sl.ru(;lm'e imrl. of l he rule. 1)/1o \[.() l he (li@m(ti(m ill l.he PP retool alien, 1Ave more t.esl.</Paragraph>
    <Paragraph position="20"> items are requh'e(l (;o achieve full (lisjuncl;ion coverage on Lhe (-Oml)h!t(; rule. Fig.2 shows lhe rule from Fig.1 with insta'ument, ation.</Paragraph>
  </Section>
  <Section position="5" start_page="118" end_page="121" type="metho">
    <SectionTitle>
4 Grmnmar and Testsuite
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="118" end_page="119" type="sub_section">
      <SectionTitle>
Improvement
</SectionTitle>
      <Paragraph position="0"> l'laditionally, a tests,rite is used 1.o hill)rove (or mainl;ain) a gramnmr's quality (in terms of (:overage an(l overgenerali(m). Using insi;rumenl.al;ion, one may exten(1 this usage l)y looldng for sources of 1 Although the saml)h! rule m'e in the format of I,FG, nothing of the mc'th()d()logy relies (m the choice of linguistic (n&amp;quot; computal.ional 1)aradignL The notation: ?/*/+ represent ot)tionality/iteration including/exchtding zero occurrences on categories, e rel)resents the eml)ty string. Annotations to a cat(.'gory sl)ecify equality (=) o1&amp;quot; ,'~(!t membershi t) (C) of feature values, or non-existel,ce of i~aturcs (~); they are terlninat(!d l)y a s(,micolon (;). Disjunclions are given in I)ra(:(!s ({-&amp;quot; &amp;quot;1-'' })' I&amp;quot; (4-) &amp;quot;&amp;quot;' ,,.,t,~x',,,'i,,bldeg.~ ,','prc.~,.,ti,,g t.l,,, f,,~,lure st.ruclur(! corresponding t.o the mother ((laugh/.er) of th.e rule. o, (for optimalil.y) represents the sent.ence's multi-set valued .%,mbolic projc'ction. Com,nents are enclosed in quolati(m marks (&amp;quot;... &amp;quot;). Cf. (K:tplan and Bresnan, 1982) for an intro(lucti()n to 1,1,*14 notation.</Paragraph>
      <Paragraph position="1">  overgeneration (cf. Sec.4.3), and may also improve the quality of the testsuite, in terms of coverage (of.</Paragraph>
      <Paragraph position="2"> See.4.1) and economy (el. See.4.2).</Paragraph>
      <Paragraph position="3"> Complementing other work on testsuite construction (cf. Sec.4.4), I will assume that a. grammar is already available, and that a testsuite has to be constructed or extended. While one may argue that grmnmar and testsuite should be developed ill parallel, such that the coding of a new gralmnar disjunct is accompanied by the addition of suitable test cases, and vice versa, this is seldom the case. Apart from the existence of grmnmars which lack a testsuite, there is the more principled obstacle of the evolution of the grmnmar, leading to states where previously necessary rules silently loose their useflflness, because their flmction is taken over by some other rules, structured differently. This is detectable by instrumentation, as discussed in See.4.1.</Paragraph>
      <Paragraph position="4"> On the other hand, once there is a testsuite, it has to be used economically, avoiding redundant tests.</Paragraph>
      <Paragraph position="5"> Sec.4.2 shows that there are different levels of redundancy in a testsuite, dependent on tile specific grammar used. Reduction of this redundancy can speed Ul) the test; activity, and give a clearer picture of the grammar's pertbrmance.</Paragraph>
    </Section>
    <Section position="2" start_page="119" end_page="119" type="sub_section">
      <SectionTitle>
4.1 Testsulte Completeness
</SectionTitle>
      <Paragraph position="0"> If the disjunction coverage of a testsuite is 1 for some grammar, the testsuite is complete w.r.t, this grammar. Such a testsuite can l'eliably be used to monitor changes in the gramlnar: Any reduction ill the grammar's coverage will show Ul) ill the failure of some test case (for negative test cases, cf. Sec.4.3).</Paragraph>
      <Paragraph position="1"> If the testsuite is not complete, instrumentation can i(lentii\[y disjuncts which are not exercised. These might be either (i) approl)riate, but tmtested, disjuncts calling for the addition of a test case, or (it) illappropriate disjuncts, for which a grammatical test case exercising them cannot be constructed.</Paragraph>
      <Paragraph position="2"> Checking completeness of our local testsuite of 1.787 items, we found that only 1456 out of 3730 grammar disjuncts in our German grammar were tested, yielding T, tis = O.39 (the TSNLP testsuite containing 1093 items tests only 1081 disjuncts, yielding Tdi, = 0.28). 2 Fig.3 shows an example of a gap in our testsuite (there are no examples of circulnpositions), while Fig.4 shows an inal)l)roppriate disjunct thus discovered (the category ADVadj has been eliminated in the lexicon, but not in all rules). Another error class is illustrated by Fig.5, which shows a disiunct that can never be used due to an LFG coherence violation; tile grmnmar is illconsistent here. a</Paragraph>
      <Paragraph position="4"/>
    </Section>
    <Section position="3" start_page="119" end_page="119" type="sub_section">
      <SectionTitle>
4.2 Testsuite Economy
</SectionTitle>
      <Paragraph position="0"> Besides being coml)lete , a testsuite must be economical, i.e., contain as few items as 1)ossible. Instrunmntation can identify redundant test cases, where re(lundaney can be defined in three ways: similarity There is a set of other test cases which jointly exercise all disjunct which the test case under consideration exercises.</Paragraph>
      <Paragraph position="1"> equivalence There is a single test case which exercises exactly the same combination(s) of disjuncts. null strict equivalence There is a single test case which is equivalent to and, additionally, exercises the disjunets exactly as oft(m as, the test case under consideration.</Paragraph>
      <Paragraph position="2"> Fig.6 shows equivalent test cases found in our testsuite: Example 1 illustrates the distinction between equivalence and strict, equivalence; the test cases contain different numbers of attributive adjectives. Example 2 shows that our grammar does not make any distinction between adverbial usage and secondary (subject or object) predication.</Paragraph>
      <Paragraph position="3"> The reduction we achieved in size and processing time is shown in Table 1, which contains measurelnents for a test run containing only tile 1)arseable test cases, one without equivalent test cases (for every set of equivalent test cases, one was arbitrarily selected), and one without similar test cases. The last was constructed using a siml)le heuristic: Starting with the sentence exercising the most disjuncts, working towards sentences relying on fewer disjuncts, a sentence was selected only if it exercised a disjunct wtfich no previously selected sentence exercised. Assulning that a disjnnct working correctly suite, but receive no analysis since the grmmnatical fimction FREEDAT is not defined as such in the configuration section.  once will work corre('tly more than ()11(;(~: we di(1 nol; (:onsider st.rict, equivalence.</Paragraph>
      <Paragraph position="4"> We envisage the following use of this redundancy detec|:ion: '1'here ch~al'ly ar(; linguist;i(: r(~asolls l;o dist.inguish all 1;est cases ill (~xaml)le 2, s() l;hcy (:almol simply be delel;cd from the t(~'st;suit, e. Ilath(.'r, t.heir equivalence indicates that. the grammar is not 3,eli perfect. (or never will be, it' it, remains l/urely syntactic). Such equivalences couhl be int,erl)reted as a r(mfinder which linguistic distinclions need to lie incorl)orated into the grammar. Thus, 1;his lev(q ()f r('(hm(lancy may drive your grammar d(w(~h)l)ment agenda. The h~vcI of c(tuivalellc(~ C;/ll l)e t;/k(}li its a limited int(wacl;ion lesl: '\]'h('s(' lesl: ('as('s rcl)r(&gt; scnl; one~ (-()ml)h'.lx~' s(~lecl;ion of grammar disiml(:l,s , and (given l,hc grammar) lhere is nolhing we can gain 1)y checking a test case if an equivalenl; one was tested. Thus, this level of redundan('y may 1)e used for ensuring the quality of gramlnar changes prior t,o their incorporation into the t)roducl~ion version of t,he grammar. The level of similarit.y (:onl;ains much le.~ss l,est cases, and does not t,esl, any (systenml, ic) intera(:tion 1)et.ween disjuncts. Thus, it may 1)1; use(1 during (levch/1)ment. as a quick ru\]e-ofthunll) 1)rote dure detecting serious errors only.</Paragraph>
      <Paragraph position="5"> test relative runtime relative</Paragraph>
    </Section>
    <Section position="4" start_page="119" end_page="119" type="sub_section">
      <SectionTitle>
4.3 Sources of Overgeneration
</SectionTitle>
      <Paragraph position="0"> To cont,rol overgenel'a.tion, al)l)ropriately marked ungrammati('al sentences are iml)(n'tant in every testsuite, lnsl;rulnentation as 1)rol)osed here only looks at successful parses, but. can sgi\]l l)e aI)l)lied in this C()lll;(?xt: If ~/11 llllgfalllIll~/l;ieal t.est. (;ase recuives all analysis, insl;rumeld;at, ion informs us a})ouI, t,he disjulmtS used in the incorrect, analysis. One of these (lis.juncts must lie incorrect, or the sentence would not. have receiv(xt a solution. We exploit, this informati(m by aecumulat.ioll across the entire l;est suite~ looking tot (lisjuncts t,hat al)t)ear in mmsually high 1)report.ion in l)arseable mlgranmmtical test. cases.</Paragraph>
      <Paragraph position="1"> In t:his rammer, six grammar disjuncts are singled oul. \])y the l)arseal)h~ mlgramlnat,ical t.est cases in th(~ TSNLI ) t(,sIsuite. The rues1 l)rominen|; di@m(:t.</Paragraph>
      <Paragraph position="2"> al)l)ears in 26 senl(~n(;(&gt; (list.e(t in Fig.7), of which lhe top left group is in(l(&gt;d grmmnali('al and t h(~ rest fall int.() Fw(/ (:lasses: A partial V1 ) with object NP, inlert)reted as an imt/(n'at,iv(~ sentence (1)el;tom left), and a weird interaction with the tokenizcr incorrectly&amp;quot; handling cal)it.alization (right. groul)).</Paragraph>
      <Paragraph position="3"> 15tr fl'om being conclusive, t,hc similarity of these s(nlt.ences derived from a suspicious grammar disjunct, and the ('lear relation of the senten(-es to only tw(/exact.ly Sl)ceifial)le graminar errors make it 1)lausil)le that this approach is very i)rolnising ill detecting the sources of ovcrgener~tion.</Paragraph>
    </Section>
    <Section position="5" start_page="119" end_page="121" type="sub_section">
      <SectionTitle>
4.4 Other Al)l)roaches to Tcstsuite
Construction
</SectionTitle>
      <Paragraph position="0"> The delicacy of testsuite construction is acknowledged in (EAGLES, 1996, I).37). Although t.here are a mnnber of eflbrts to construct reusable testsuites, none has to my knowledge exl)lored how exist.ing grammars can l)e exl)loited.</Paragraph>
      <Paragraph position="1"> Starting wit.h (Flickinger el; al., 1987), |;(;si.suites have l)ecll (\[rawn 111) fix)In a linguistic viewpoint, infor'm, cd by \[lhc I study of linguistie,s and \[reflecting\] the 9'ram'm, atical issues that linguists h, avc concerned them,selves with, (Flickinger et al., 1987, p.4). A1- null though the question is not explicitly addressed in (Balkan, 1994), all the testsuites reviewed there also seem to follow the same methodology. The TSNLP project (Lehmann and Oepen, 1996) and its successor DiET (Netter et al., 1998), which built large nmltilingual testsuites, likewise fall into this category. The use of corpora (with various levels of mmotation) has been studied, but the reconmmndations are that much manual work is required to turn cori)us examples into test cases (e.g., (Balkan and Fouvry, 1.995)). The reason given is that corpus sentences neither contain linguistic 1)henomena in isolation, nor do they contain systematic variation. Corpora thus are used only as an inspiration.</Paragraph>
      <Paragraph position="2"> (Oepen and Flicldnger, 1998) stress the inter-dependence between application and testsuite, but don't comment on the relation between grammar and testsuite.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="121" end_page="123" type="metho">
    <SectionTitle>
5 Genre Adaptation
</SectionTitle>
    <Paragraph position="0"> A different al~t)lication of instrumentation is the tailoring of a general grammar to specific genres. All-purpose grammars are 1)lagued by lexical and structural aml)iguity that leads to overly long mmtimes.</Paragraph>
    <Paragraph position="1"> If this ambiguity could be limited, parsing efficiency would iml)rove. Instrunmnting a general grammar allows to automatically derive specialized subgrmnmars based on sample corpora. This setup has several advantages: The larger the overlap between gel&gt; res, the larger the portion of grammar development work that can be recycled. The all-lmrpose grammar is linguistically ltlore interesting, because it requires an integrated concept, as oI)posed to several separate genre-specific grammars.</Paragraph>
    <Paragraph position="2"> i will discuss two ways of improving the efficiency of parsing a sublanguage, given an all-purpose unification gramnmr. The first consists in deleting unused disjuncts, while the second uses a staged parsing process. The experiments are only sketched, to indicate the apl)licability of the instrumentation technique, and not to directly compete with other proposals on grmnnmr specialization. For example, the work reported in (Rwner and Smnuelsson, 1994; Samuelsson, 1994) diifers from the one presented below ill several aspects: They induce a grammar from a treebank, while I propose to mmotate the grammar based on all solutions it produces. No criteria for tree decomposition and category specialization are needed here, and the standard parsing algorithm can be used. On the other hand, the efficiency gains are not as big as those reported by (Rayner and Salnuelsson, 1994).</Paragraph>
    <Section position="1" start_page="121" end_page="122" type="sub_section">
      <SectionTitle>
5.1 Restricting the Grammar
</SectionTitle>
      <Paragraph position="0"> Given a large sample of a genre, instrunmntation allows you to determine the likely constructions of that genre. Elinfinating unused disjuncts allows faster  HC-DE, but no grammar development based on the other corpora. The NEWS-SC corpus is part the corl)uS of verb-final sentences used by (Beil et al., 1999).</Paragraph>
      <Paragraph position="1"> A training set of 1000 sentences froln each corpus was parsed with an instrumented base grammar. From the parsing results, the exercised grammar disjuncts were extracted and used to construct a corl)us-specific reduced grammar. The reduced grammars were then used to parse a test; set of another 1000 sentences Dora each corpus. TaMe 3 shows the lmrt'ornmnce ilnprovement on the corpora: It gives the size of the grammars in terms of the number of rules (with regular expression right-hand sides and feature annotation), the number of arcs (corresponding to unary or binary rules with disjunctive feature annotation), and the number of disjuncls (unary or binary rules with tmique feature annotation). The number of mismatches counts the sentences for which the solution(s) obtained differed fl'om those obtained with the base gramnmr, while the number of additi(ms counts the selltellces which {lid not receiw; a 1)arse with the base grannnar due to resource limitations (runtinle or memory), but received one with the reduced granmmr. The other cohnnns give timings to l~rocess the total corlms, and the longest and average processing time per senten(e; time is in seconds. The last cohmm gives the average nmnber of solutions per sentence.</Paragraph>
      <Paragraph position="2"> Due to the sampling of a genre, the grammars obtained can only be approximate. To deternfine the relation of the smnple size to the quality of the grmnmar obtained, the coverage of random fragment gram'mars was measured in the tbllowing way: Randomly select a nmnber of sentences fl'om the total corpus, construct (in the same way as described aloove for the reduced grammar) a fragment grmnmar, and deternfine its coverage on the test set fl'om the respective corpus. The graphs in Fig.8 show how the coverage and runtime relate to the number of sentences on which the fragnmnt granunars are based. The leftmost data point (x value 0) describes the performance of the reduced gramlnar on the training set, while the rightmost data point describes its perfbrmance on the test set. The data points in \])etween represent fl:agment grammars based on as  IlU/ll 3&amp;quot; ,%elll:ellces .:is giv(}n \])y (h(} x axis vahl(!. '.File result, s rel)orlx'(1 here r(~l)l'(LSOlll, (,he minimal l)el'l~)l:lllallc(; g;Iill duo Lo (;lit; t':t(;l; 1;tl;II, LII(, COllS(;l'llCl;ion of reduced ~/11(l t'lU/~lll(}ll{, ~I'}tllIIlI}U'S life lI()J, based on (.he corre('l, solul;i(ms for the (,raining ,qelll;ellce,q, })Ill; l';tgh(~d ' 011 all solulions l)rodu(:ed 1)y (;he base grammar. The (:OllSt, rucl;ion of a lart~e-s(:ale (;reel)ank with manually veriiie(l solutions is un(h!r way but has nol; )'el. 1)rogjresse(l far enough (;() serve as input for this ext)erimeld;. Even with this systenlatic, but (:urable error, (;lie reduction reduces overall processing by a factor of four. The mmd)er of solutions is constant becaus(~ only unused disjuncts are eliminated; this will change if the treebank solutions are used (;o construct l;he redu(:od gl'~lllllllat'.</Paragraph>
    </Section>
    <Section position="2" start_page="122" end_page="122" type="sub_section">
      <SectionTitle>
5.2 Staged Parsing
</SectionTitle>
      <Paragraph position="0"> Even eliminat, ing only unlil:ely disjunets necessarily redllces L\]Io coverage of the gramnmr. A sequence of l)arsing stages allows one to profit front a small and fast; granmmr as well as from a large and slow one.</Paragraph>
      <Paragraph position="1"> S~age(t l)arsing applies difl'erent grammars one after the other to the inlmt, m:(;il one yields a solution, which terminates the l)rocess. In our case, a grammar of sl;age 'l~, q- 1 in(:ludes the grammar of stag0 t~, 1)ttl; this nee(1 not be t:he case in gener;d.</Paragraph>
      <Paragraph position="2"> &amp;quot;1'() r(}(lu(x' the v;u'ial)iliLy for an (}Xl)(;rimenL: I as,SlllIIO (;}ll.'ee s{;,:/.~(}s: Tit('. :\[irst, ill('hld('s frequcnl;ly used di,~jun('I;s, Idle s{)COll(i illfFt)qll(}llt di@m(:ts, alt(l l:h0 thir(1 unu.~ed disjuncts. This ensur(?,~ (;he fllll (x)vt,rage of the base grammar, \]ml; allows lo focus on fre(lu(m(. con.sl:ru(q,ions in th(, first parsing stage. The t)rt)(:t;dure is similar as \])cfore: l&amp;quot;rom (.he solutiollS of a Lraining sol., ;t staged .qIYt?Itlltitl&amp;quot; iS construc.lx:d. ()urrel~tly, exl)erimenI;s are l)erforlned (;o dei;ermine a llseflll detini(;ion of 'frequellt, ly used'. Indel)endent from the ac(,ual performance gains finally obtained, the apl)lication of instrulnentation allows a systematie exploration of the possible configurations.</Paragraph>
    </Section>
    <Section position="3" start_page="122" end_page="123" type="sub_section">
      <SectionTitle>
5.3 Other approaches to grammar
</SectionTitle>
      <Paragraph position="0"> adal)tation (I{ayner and Samuelsson, 1994; Ilayner and Carter, 1996; Sanmelsson, 1994) present a grammar Sl)eeializal;ion (,eclmique for unification gran:Inars. Fronl a tl'eebanl: of the sublanguagc, they induce a specialized gramnlar using fewer 're, acre ~&amp;quot;ltlc,s&amp;quot; which col respond to the application of several original rules.</Paragraph>
      <Paragraph position="1"> They report an average speed-ul) of 55 for only the parsing phase (taking lexical lookup into accomlt, the sl)eed-up fael;or was only 6 10). I)ue to (:he (lerival.iOll ()f J;\]le ~rallllll~/r frOllt a corl)llS Sample,  they observed a decrease ill recall of 7.3% and an increase of precision of 1.6%. Tile differences to the approach described here are clear: Starting from the grammar, rather than from a treebank, we annotate tile rules, rather than inducing them from scratch.</Paragraph>
      <Paragraph position="2"> We do not need criteria for tree decomposition and category specialization, and we can use the standard parsing algorithm. On the other hand, the efficiency gains are not as big as those reported by (Rayner and Carter, 1996) (but note that we cannot measure ilarsing times alone, so we need to coral)are to their speed-up factor of 10). And we did not (yet) start from a treebank, but froln the raw set of solutions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML