File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1031_metho.xml
Size: 17,499 bytes
Last Modified: 2025-10-06 14:14:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1031"> <Title>GramCheck: A Grammar and Style Checker</Title> <Section position="4" start_page="175" end_page="178" type="metho"> <SectionTitle> 3 Error detection, diagnosis and </SectionTitle> <Paragraph position="0"> correction techniques The overall strate.gy for detection, diagnosis and (:orre(;tion of gramnmr a,n(1 style errors wil;hin GramCheck relies on three axes: * For detection, a combined fcal, ure rela:,:atiorl, and error anticipation apI)roach is adopt(xl.</Paragraph> <Paragraph position="1"> In order to iml)le.mcnt the former, extensiv(~ use of external CSs is performe(1 in the anal-ysis grammar, whereas for the latter, exl)licit rules, adequate.ly detined either in the core gralnlilar or in satellite, subgranlnlars~ are iml)lenmnte(l 2 .</Paragraph> <Paragraph position="2"> 2Gram(Hwck checks texts belonging I,o the s(;andard language and ix) the ~(hninisl;rativ(: subl~mguagc. The analysis moduh! has 1)een (:on(:eived C/~s COml)osed 1)y a (:()re grmmna.r lind (;we sate\]lit(: sut)grmnmars for overlappiltg (:as(:s tha.t are mutuMly exchlsive. Thus, (:he acl;ivaLiolt of one subgrammar implies i;he * Diagnosis is performed 1)y t;he CSs themselv('.s wit;ix the aid of a hcwristic I;cch, niqu(', for those errors wher(*' tests should 1/(*' l)erformed Oil so.veral olo, lll(~Ill;S mid a, lmttm'n-'rc.lal, cd t, cchniquc which 1)rovides a mr;ram to extx;nd feat;urc vahlcs wil;h a gra(lal;iotl of (',OII'(R:I, ;tilt| l)osit)h ', })tit iltt:orre(:l; inforltutl;ion. The tyl> ical case for l,ho, former is }/,gl'tR!IIl(*'llI;, l,hus for signs inw)lving Lhis Lyl)e o\[ information, both an initial h(;uristic vahm is assigned and aritlmle, tica\] ot)(;rati(ms re'c, perform(~d on (in)cqualiLy l;(;sl,s. As for the \]&l;l,cr, head&lgtllllt;tll; r(;lations wh(;r(; l)()llll(\[ prt*'l)OSitions &l:e involved are tre~tx*'d this way. li'or MI g;r~muna,r err()rs t}l(we is IlO notion o\[' weak vs. strong diagnosis, b('.ing all (:onsidere(1 ,%rong (;rrors ne(;(ling ;I, utonl~Ll;i(; (;orr(~(;l,iOl~.</Paragraph> <Paragraph position="3"> * (\]orre(:l;ion is p(!rfornw.d by m(m.ns of I,I'(R~ Lrans(hlc.tion of lJnguisti(: Sl,rlt('tltr(;s (L~s) (',ontaining errors l;() ;t qa.ngua.g(:' (ac, tually a 'langua.ge use') (letine(1 as corvecl, &m',,ish..</Paragraph> <Paragraph position="4"> These synthesized sl;rucl;urc.s are (tist)la.ye(l to the user. The ovcrM1 (It.sign is then simila,r to a transfer-based MT sysl;elll, where (;It(*. llSll&l cycle is a.nalysis-tra,nsfer-synthesis, being t,he ill&ill dilti~.rences the addition ()f the ab()velllOlltiOIl(~,(1 ~l':~/,llllll;l,l&quot; (:(/rr(~('ti(/n (levi(;e,~ a.n(t the fa(:t iJm.t n(/l; a.ll, but only in(:()rr(~(:t senteI)Ces, will l)e l)ush('xl through th(! c()ml)h%(! (:y(:h!.</Paragraph> <Paragraph position="5"> CSs alh)w the relaxation of certain f('~a.tm'es in the gramumr rules whos(~ unifi(:ati(m will 1)c (h> (:idea upon, in ;~ n(m-l;rivial way, within tlm,~(~ CSs. Thus, l'ltlcs (1() nol; \])(;rforln f(',;tLUl(; wduc (;bet:king, so CSs play a. (:ru(:ia\] roh~ 1)erfornling (;xttmtled wnia\])le uifiti(:~t,ion a.n(l t;aking approl)ria.Lt~ (teci.sions. I)(~l)(m(ting on l;tm err(n' Lyt)e, (~Ss (:arvy out (lifferl',nt oi)erations ()n t'eaturt*'s, st:t)rt;s mM lisLs. Th(!se Ol)(:rations (:on(:(*'rn t)asi(:a,lly die (tet(*'(:l;ion and Llm (*'vahlal;ion o\[ the error, providing a (tiagnosti(: on t, he error and (;or\]'(;(:t vahte(s) \[or fo,~/,1;lll'(~S involved. The list ~, of (\]SS \[~lV()lll'S ;/ ()llO Sl;(; I) (li null inflectional languages, like Sl)anish, this issue is essential given thaPS in certain (;otti,(~xl2-; iL is nol, t)ossi/)le to give a, single (:orr(',cl,ioll wht!n lmrforn> ing a.na.ly,qis only aL s(!nten(:( ~, level (i.e. without a.napht)ric, r(!la.d()ns). For these ('.ases, l;h(! sys\[,(!lll should 1)e provided with a hemistics for the c.()rrt!c.don in order to detect and diagnose 1,t1(.' 1)la(:t,.(s) Ii,*',., to take a (ledsion about the unit(s) to 1)e (:of rt'x:te(l, l/or (;r~mK\]h(~(:l% dfis htmrisdt:s relies on a t);u'aaneLriz;~Li(m of two assumptions: 1. Lhc (:OllSLiLitellt which holds the fea.tm(~ vahms l,hal; ill a, given (~r\]or siLuadon control I;he resL of Llw. lea.Lure vahl(!S in the other (:()nstil;u(mi,s, 2. tim (wa\]uaLion of Lhe llllllJ)(!l' O{ c.onsLitu(!ntN which share and d() not ,share dm same values.</Paragraph> <Paragraph position="6"> Our diagnosis 1)rocedure assmnes dial t,h(' g(mder and munber thatures in tim head of a l)luas(~ coIfl,rol t;\]msc ill Lhe (teptmdeafl; constiLu(!nl;(s), a,1though, as it will 1)e l)rovcd laLer, this is not: net'essarily l.ru(;. Ill order Lo (Io tiffs dia.gnosis in(wedure, the CS will COltI;l'~lSl; thoso ~k~3q.\[;lll'(!s ;lll({ \[(~lV(! SOltte ('.lll(~,q o\[ l;his evalu;~don in phrasa.l proj(x'Lions in order f'()t I,h(;se to t)e availa.ble for furl;her op(!la.l;it)\[ts should l\]ley were tlect*'ss;H'y. 'l'\]les(? (;ht(m are sha.p(~(\[ as scores in the a,t)proa.ch a(h)I)t;(~(l tt)l' ~l, gt'(K':lItClll; (~,l'l'()l'S, D31(1, ill LIIis seltse, ollr heurisLi('.s is clos('xl to 1;he inel;ri(: Ol)erations 1)('af()rm(!(1 1)y ol;h(*'r/ffm)nnar checkers tmsed on Nil,l ) t:edmiques (Veronis, 1988), (Bolioli ct al., \]992), (Vosse, \]!).()2), ((~('.nl;hia\] (;I, al., t994).</Paragraph> <Paragraph position="7"> Tim core of dfis }wm'isl;i(:s is that deptmding (m a. set of linguistic l)rincilfles lmsed on lo.xi(:o morl)hoh)gi(:a\] prol)ert;ies , l;ll(; va,hms for gender a,ml mmll)er in ce, rl;a, iu h*'xit:al units will 1)e l)rO mol;t!d over Lhe, wflues in otlmr units, thus, ~msigning thent a. higher score.</Paragraph> <Paragraph position="8"> Ther(~ are s(w('aa,l conditions which have to I)e taken int;() a,(:(x)uHt; in or(h~r t;() t)(M't)rln lJle (Iiaw lit)sis 1)I'()(;(',(\[111'(*'. \]!'()F iltS(,;l\[l('(',, lit)liltS with iuht',r(!hi g(md('.r should c.()nl;rol l,he g(utdel' of the l(mL of I;h(~ eh',ut(mts in a giveu NIL llowever, if Lh(! noun (loes n()l; ha,v(~ inherent, gellder i/,'s a, noun thaL shows sex infle(:l,ion then l;hc gcn(h!r va.ht(', should t)(; (;(mtrolled t)y l;hos(; ('\](unenLs l;h~tl,, sh;i.ring Lhe mint(*' wflue, art; majority, lh,.n(',(', a st> (tu(;nc(; like el_rims(: casa_f(;n~ (t;he house) must bc corret:i;(M inl;o hz_fen~ ca.sa_t'enl t)e(:;ms(~ dfis n(mn has inher(~nt feminine g(*'n(ler in Sl)anish. ()n i,}m ot;h(*'r ha.rid, an NP like ln_fem chic-o_~l)as(: yflw.p a. \['era (lit;. 'the boy t)(!a,ulJlul') shou\](l 1)e (:orr(!clx!(/ a.s lo, li'.\[n (:h, ic-.,_ii.n ,q'n(qr-a. l'eln ('tim girl I)(~aut,iful'), thus (:ha.n~;ing the gen(l(~r value of t,h(, h(m(l n(mn ill tim (lirt~(:titm mlgg(mi;(xl \])y the ()Llmr (t(> p(',n(hmt (J(ml(;nt,'< This nma,us i;h~t alth()ut,~h dm sysl;en) (:()uhl l;a.k(; Lh('. gmMer v~flue ot! the h('.ml a~ the value which commands the whole phrase, the munber of elements that share the same feature values, if in contrast to those of the head and if the head takes its agreement properties from morphology --ie. are susceptible of keystroke errors, can influence the final decision. Finally, for cases where equal scores are obtained, as it happens with a non-inherent masculine noun and a fen> inine determiner, both possible corrections should be pertbrmed, since there is not enough information so as to decide the correct value (unless this can be obtained from other agreeing elements in the sentence --for instance an attribute to this NP).</Paragraph> <Paragraph position="9"> Basically, the final operation to be performed with the scores is to determine that the higher the score of an element the severer its substitution.</Paragraph> <Paragraph position="10"> Thus, scores are clues for the correction of those elements having the lowest scores.</Paragraph> <Paragraph position="11"> The initialization steps in order to perform the heuristic technique are related to the assignment of values and scores to lexical projections depending on its inherentness. The values for gender and number of the head of the projection serve as a parameter for the computation of values and scores for the possible modifier which could appear closed to it. Note that; agreement in Spanish is based on a binary value system. Thus, the computation of values for the modifier of a given head simply relies on the instantiation of opposite values to those of the head. In the case of under-specification of the head for gender, for instance, the presupposition is that this value is the same as the one of the modifier, if this is not underspecified. Otherwise, both elements remain underspecified. Besides, the weight given to controlling elements (50) ensures that there is no way for modifiers to overpass this score. Note as well that the weight given to inherentless values, as number (10), ensures that there are no promoted elements in this calculation. The following schematic CS illustrates the assignment of scores:</Paragraph> <Paragraph position="13"> The following steps to be performed by CSs are related to the addition of all those scores associated to a given value in the successive rules building the nominal prbjection and the percolation of The final evahlation performed by CSs is done when categories showing agreement overpass their maximal projection, only if no other inter-syntagmatic agreement must be taken into account (as it is the case with subject-attribute agreement, for instance). Postponing in this way the final ewfluation ensures that the CS will take into account all the previous parameters to give an appropriate diagnosis about the complete XP containing the agreement violation. This evaluation is based on the comparison of scores by means of the 'greater than' predicate in order to determine (a) the correct wflue for the feature(s) checked corresponding to the highest score(s) (Right_Gender, Right_Number in the example below), to be used by the transfer module, and (b) the error diagnosis (gender, number and gender_number below), to be used by the error handling module that will display appropriate error information to the user: if a}l e}elnents agree, scores for one of the arguments will a}ways be 0, whi}e if this argum(',nt has a value different than 0, this information is considered as an evidence that an error has ()c(;urre(t, the subsequent (:omparison det(~rmining the value for {;tit winning score:</Paragraph> <Section position="1" start_page="178" end_page="178" type="sub_section"> <SectionTitle> 3.2 A pattern-related technique to </SectionTitle> <Paragraph position="0"> perfi)rm structural (}ITOr detection/diagnosis Tm'ning back to the. general (letinitions on error types given at; the beginning of this doculnent, st, ru(:tm'al violations can lie seen ~s special (:ases of feature mismatching t)roduced by addition, substitution and omission of elements whi(:h result in a wrong dependency re}alien: Wrong head-argmnent relations (it Substitution of a 1)nund preposition by another one (PP ~-> PP) Los alumnos rclacionan la tareo, \[*a/con\] .su conoci'mie'nto.</Paragraph> <Paragraph position="1"> (ii) Omission of a bound preposition resu}ting in a change of the sub(:at,egorized arguInent (Pl) ~+ Ni,/s) Se acord6 \[*/dc/ que tenla una reunidn pot la manana.</Paragraph> <Paragraph position="2"> (iii) Addition of a p,'eposition resulting in a (:}tang(; of the subcategorized argmnent (NP/S ,-} l)P) Las emprcsa.s dcma'ndan \[*dc\] 'm~!todo,s.</Paragraph> <Paragraph position="3"> In the IIPSG-likc grammar used, bound prel)Ositions at(, considered NI's attached t() the subcat list (ie. the subcategorization \]i:ature) of a t)re(lieative unit. These NPs have the feature pform instantiated to the value of the preposition, if atty. If the argmnent does not }lave a bound i)reposilion, the vahle for pform is none. Thus, the approach adopted within GramCheck is that these err()r cases have a (;orrect rei)reselltation of the det)enden('y structure where the only offending infl)rmation is stored as a thature in the governed e.lement.</Paragraph> <Paragraph position="4"> Tit(', linguistic principle behind the patternrelated t(;chni(tue is based on the fact that native writers substitute a l)reposition by another one when certain a,qsodations between 1)atterns, showing either the same }exi(:o-semantic and/or syntactic protmrtics , are performed. Thus, this kind of error is not. so a(:cidental as it could lm imagined.</Paragraph> <Paragraph position="5"> t, br instance, Spanish speakers/writers usually associate the argument structure of the comtmrat(re adje(:tiv(~ it@riot (lower), which sul)catcgorizes the l)rei)osition a (to), with the Spanish (:omparative syntactic pattern (inches ... que., less ... than) whose second term is introduced by the conjun(:tion q*u', producing phrases such as *i'~@rior q'ae instead of i'nferior a. With the verb relacionar' (to relate), something similar occurs: t;his verb sulmatcgorizes for t,he preposition con; however, due. to the fact; that there exists tilt: prel)ositional multi-word units ~:n rclo, cidn a an(l c'n r('.lac.idn con, st)eakers tend to think that the same 1)ret)osi-I;ional alternation can be 1)crfornm(t with tin,. verl) (*rclacionar a vs. r'clacionwr con).</Paragraph> <Paragraph position="6"> Following this idea, configurational ruh:s are regarded, R)r grammar dmcking, as desc:riptions of l)atl;erns~ each of them having associated a wrong pattern linked to the correct pattern. Both pat;-terns are in a complementary distribution. This way, structural errors can be foreseen and controlled, and the systeln is provi(led with a mechanism which establishes the way rule constraints lnust }to re}axed.</Paragraph> <Paragraph position="7"> To cope with this error, a CS operating on lists (:he(ks whether the prel)osition in 1;tlo (:onstitu(mt attached to the predicative sign belongs to the head of the list or to the tail. If the preposition is member of the tail, the salne actions showll fO\]' agreement errors are performed instandation of the (:orrect value and determination of the error type.</Paragraph> </Section> </Section> <Section position="5" start_page="178" end_page="179" type="metho"> <SectionTitle> 4 Error coverage </SectionTitle> <Paragraph position="0"> The current version of the GrmnCheck demonstrator is able to deal with the following types of erI'OFH: null * lntra- and inter-syntagmatie agreement errors (gender att(l/or number in act, lye with both predicative and (:opu}ative verbs and passive sentences).</Paragraph> <Paragraph position="1"> * Direct obje.cts: omission of tit(: preposition a with an animate entity and addition of such a preposition with a non-animate entity.</Paragraph> <Paragraph position="2"> * Addition, omission and sul)stitution of a bomM prepositi(m covering what is (:a}}ed deqnePSsmo the addition of a false bound preposition de with clausal arguments and quegsmo the omission of the bound preposition de with clausal arguments.</Paragraph> <Paragraph position="3"> * Errors Oil portmanteau words (use. de el, a el instead of del, a O.</Paragraph> <Paragraph position="4"> Regarding style issues, three different types of weaknesses are detected: structural weaknesses, lexical weaknesses and abusive use of passive, gerunds and rammer adverbs. While structural weaknesses are detected in tim phrase structure rules using CSs (noun + &quot;a&quot; + infinitive), by means of an error anticipation strategy, lexical weaknesses arc detected at the lexical level, with no st)octal mechanisms other than simple CSs. Lexieal errors currently detected are related with the use of Latin words which it is better to avoid, foreign words with Spanish deriwttion, cognitive erl'ors, foreign words for which a Spanish word is recommended and verbosity.</Paragraph> </Section> <Section position="6" start_page="179" end_page="179" type="metho"> <SectionTitle> 5 Further developments </SectionTitle> <Paragraph position="0"> i{,esults obtained with the cmTent demonst;rator are very promising. The performance of the system using CSs is similm' to that shown widlout them, hence its us(; in conjunction with the detection techniques proposed, rather than a burden, may be seen as a means to add robustness to NLP systems. In fact, CSs may provide more natural solutions to grammar implemental;ion issues, like PP-attachmellt control.</Paragraph> <Paragraph position="1"> Several directions for further developments have a\]ready been defined. These include the integration of these grammar checking techniques into the final release of the LS-GRAM Spanish grammar, which will have a more realistic coverage ill terms both of linguistic i)henomena and lexicon.</Paragraph> <Paragraph position="2"> Besides, on this new version of the grammar, hybrid teehniques will be used, taking advantage of the preproco.ssing facilities included in ALEP. In particular, while for errors like those presented in this paper the approach adopted is linguistically motivated, for certain imnctuation errors (or simply ill order to reduce lexi(:al arab*girl(y) other relatively simple iHeailS C~Lli be defined that illchide (:ertain extended pattern ma.tching on regular expressions or the passing of linguist*(: information gathered in a t)reproeessing phase to the unifh:ation-bas('.d parser. It; is also foreseen to inchlde a treatlnent for own*tire spelling errors, usually not dealt with by conventional st)elling checkers.</Paragraph> </Section> class="xml-element"></Paper>