File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1034_metho.xml
Size: 23,606 bytes
Last Modified: 2025-10-06 14:07:07
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1034"> <Title>Theory Refinement and Natural Language Learning Hervd Ddjean* Seminar fiir S1)rachwissenschaft</Title> <Section position="3" start_page="0" end_page="229" type="metho"> <SectionTitle> 2 Theory Refinement </SectionTitle> <Paragraph position="0"> We 1)resent here a l)rief illtro(hlction to theory refinement. For a more (tetailed presentation, we refer the reader to (Abecker and Schmid, 1996), (Brunk, 1996) or (Ourston and Mooney, 1990). (Mooney, 1993) detines it; as: Theory retinement systems develot)ed in Machine Learning automatically modify a Knowle(lge Base to render it; consiste.nt with a set of (:lassifie(1 training examples.</Paragraph> <Paragraph position="1"> This technique thus (:onsists of trot)roving a given Knowledge Base (here a grammar) on tile l/asis of examtIles (here a treebank). Some iml)OSe to modif'y the initial knowledge 1)ase as little as possible. Applied in conjmmtion with existing learning techniques (Explanation-Based Learning, Inductive Logic l)rogramming), TR seems to achieve' better results than these techniques used alone (Mooney, 1997). Theory retinement is mainly used (and has its origin) in Knowledge Based Systems (KBS) (Craw and Sleeman, 1990). It consists of two main steps: 1.. Build a more or less correct grammar on the basis of background knowled.qe.</Paragraph> <Paragraph position="2"> 2. Refine this grmnmar using training examt)les: (a) Identify the revision 1)oints (b) Correct them The first; step consists of acquiring an initial gramnmr (or more generally a knowledge base). In this work, the initial grainmar is automatically induce(1 from a tagged and bracketed corpus. The second step (the refinement) compares the prediction of the initial gralnmar with the training corpus in order to firstly identify the revision point.s, i.e. points that are not correctly described by the grammar, and secondly, to correct these revision points. The error identification and refinement operations are explained Section 5.3.</Paragraph> <Paragraph position="3"> The main difference between a TR, system and other symbolic learning systems is that a TR system must be able to revise existing rules given to tile system as background knowledge. (A system such as TBL (Brill, 1993) can not be considered us TR since it only acquires new rules). In the case of other techniques, new rules are learned in order to improve the general efficiency of the system (selection of the &quot;best rule&quot; according to a preference function) and not in order to correct a specific rule.</Paragraph> </Section> <Section position="4" start_page="229" end_page="229" type="metho"> <SectionTitle> 3 Theory Refinement, Default values </SectionTitle> <Paragraph position="0"> and Natural Language L~arning This section explains how default values combined with theory refinement can provide a good inachine learning framework for NLP.</Paragraph> <Section position="1" start_page="229" end_page="229" type="sub_section"> <SectionTitle> 3.1 The Use of Default Values </SectionTitle> <Paragraph position="0"> The use of default values is not new in NLP (Brill, 1993), (Vergne and Giguet, 1998). We can observe that often (but not necessarily) in a language, an element belongs to a predominant class (Vergne and Giguet, 1998). Some systems such as stochastic models use this property implicitly. Some others use it explicitly. For instance, the general principle of the Transformation-Based Learning (Brill, 1993) is to assign to each element its most frequent category, and then to learn transformation rules which correct its initial categorisation. A second example is: the parser described ill (Vergne and Giguet, 1998).</Paragraph> <Paragraph position="1"> They first assign to each granunatical word a default category (default tag), and then might modify it thanks to local contexts and grammatical relation assignment (in order to deal with constraints due to long distance relations which can not be expressed by local contexts).</Paragraph> <Paragraph position="2"> The main work is done by the lexicon and by default values (even if further operations are ol)viously necessary) These approaches are thus different for the disambiguation often used in tagging. The default rules are not numerous (one per tag), easy to automatically generate but they nevertheless produce a satisfactory starting level.</Paragraph> </Section> <Section position="2" start_page="229" end_page="229" type="sub_section"> <SectionTitle> 3.2 The Combination of Default Values </SectionTitle> <Paragraph position="0"> with TR The idea on which ALLiS relies is tile following: a first; &quot;naive gramnmr&quot; is lmilt up using default values, and then TR is used in order to provide a &quot;more realistic gralnmar&quot;. Tiffs initial grammar assigns to each element its default category (the algorithm is explained ill Section 5.2). Tile rules learned are categorisation rules: assign a category to an clement (a tag or a word). Since an element is automatically assigned to its default category, the system has not to learn the categorisation rules for its category, and just learns categorisation rules which correspond to cases in which the element does not belong to its default category. This minimised the number of rules that have to be learned. Suppose the element c can belong to several categories (a frequent case). Tile frst rule learned is tile &quot;default&quot; rule: assign(e, dc), where dc is the defimlt category of c. Then ALLiS just learns rules for cases where c does no 1)elong to its default category. The numerous rules concerning the default category are replaced by the simple default rule.</Paragraph> </Section> </Section> <Section position="5" start_page="229" end_page="230" type="metho"> <SectionTitle> 4 ALLiS </SectionTitle> <Paragraph position="0"> The goal of ALLiS x is to automatically build a regular expression grammar from a bracketed and tagged corpus 9. In this training data, only the structures we want to learn are marked at their boundaries by square brackets a. The following sentence shows an example of tile training corpus for the NP structure (only base-NPs occur inside brackets).</Paragraph> <Paragraph position="2"> ALl, iS uses an internal formalism in order to represent the grmmnar rules. In order to parse a text, a module converts its formalism into a regular expression grammar which can be used by a parser using such representation (two modules exist: one for the CASS parser (Almey, 1996) and one for XFST (Karttunen et al., 1997)).</Paragraph> <Paragraph position="3"> Following the principle of theory refinement, the learning task is composed of two steps. The first step is the generation of the initial grammar. This grammar is generated ti'om examples and baekgromxd knowledge (Section 5). This initial grammar provides an incomplete and/or incorrect analysis of the data. The second step is the refinement of this grammar (Section 5.3). During this step, the validity of the grammar rules is checked and the rules are improved (refined) if necessary. This improvement corresponds to find contexts in which elements which i http ://www. sfb441, uni- tuebingen, de/'dej ean/ chunker, html.</Paragraph> <Paragraph position="4"> 2'\]'he WSJ corpus (Marcus et al., 1993).</Paragraph> <Paragraph position="5"> a(Mufioz et al., 1999) showed that this representation tends to provide better results than the representation used in (Ramshaw and Marcus, 1995) where each word is tagged with a tag I(inside), O(outskte), or B(breaker).</Paragraph> <Paragraph position="6"> are considered to be meinbers (}f the structure do hoe 1}elong to this stru{:ture (and re(:it}r{)c~dly ). We give here a simple exami)le to illustrate tit(.* learning process. The first step (initial grammar generation) categorises the tag JJ (adje{:tive) as belonging by default to the NP strlleture if it; oec:irs before a noun. Tit{; second ste t) (refinemellt) tinds ouI; that some adjectives do not ol}ey to these rules 4. Tim refiimment is triggered in order t{} modify the default rule so that these ex{'el)tions can t}e {:(}rre(:tly processed.</Paragraph> <Paragraph position="7"> Thus, the learning algorithm siml}ly consists of eategorising the elements of the eorlms (tags and words) into st}celtic categories, and this eategorisalion allows the extraction of the stru{:i;ures we want to learn. These (:ategories are exi)lained in the next section.</Paragraph> </Section> <Section position="6" start_page="230" end_page="233" type="metho"> <SectionTitle> 5 The Learning System </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="230" end_page="230" type="sub_section"> <SectionTitle> 5.1 The Background Knowledge. </SectionTitle> <Paragraph position="0"> \]n order to ease the learning, the system uses 1)aekground knowledge. This knowledge, 1}rovide, s a forreal and general (les('ription of the S\[;lll{:tllr{)s that ALLiS can learn. We SUl)l)ose that the stru{:tures are (:Oml}(}se(l of a mmleus with (}t)ti{mal left; an(1 right ad.\]un{:ts. We here give int'{}rmal (letiniti{}ns, |h(', forinal/disl;rilmti{mal ones are given in Section 5.2.</Paragraph> <Paragraph position="1"> Tit(; nucleus is the head of the structure.. We authorise the presence of several nuclei in the. stone strll{;l;ure.</Paragraph> <Paragraph position="2"> All the other e.lements in the st;ruet;nre (except the linker) me (:onsidered as adjuncts. They are in dependence relation with the, head of tit{! st:ruetm'e. Tit(; adjuncts are (',hara(:l;eris(;{I 1}y l;heir \])osition (left/right) re, lative to the nu(:leus.</Paragraph> <Paragraph position="3"> A linker is a Sl)e{:ial elenle, nt; which 1}uihls an en{lo{:entri{: st;ruetlne with t;w(} elentenl;s. It usually {;{)rrest}on(ts t;o c(}or(lilla{;i{)n 5.</Paragraph> <Paragraph position="4"> An ehBment (micleus or adjunct) might possess the break 1)r{}I)erty. This llotiolt is iu|;rodtlce( |(;IS ill (l{axnshaw and Marcus, 19\[)5), (Mufioz et al., 1999)) in order to deal with sequences where adjacent m~elei compose several structures (Section 5.2.4).</Paragraph> <Paragraph position="5"> This 1}attern can 1)e seen as a variant of the Xl)ar template (Ilead, Spec and COral)), whieh was already used in a learning sysl;ent (Berwiek, 1.985) (alth(mgh Coral) is not useflfl for the n{m-reeursive structures).</Paragraph> <Paragraph position="6"> The possible ditferent categories of air element are summm'ised in Figure 5.1.</Paragraph> <Paragraph position="7"> The tbllowing sentence shows an exmnl}le of categorisation (elements which do not al}l)ear in the structure (NP) are tagged O): the sake of legit}ility, we do not introduce linkers in this expression. But each symbol X (NU, A) cml 1)e de, tined 1}y the rule {; X --~- X I X 1 X where l is the list; of linkers. The symbols B+ and \]Y- indicate whether the element has the. breaker i)rot)erty or n(}t. Since lit(; {:orpus does not contain information about these distrilmtional categories, ALLiS has to figure them {}ut. This {:ategorisation relies on the distritmtional behaviour of the ele.nlents, and ean be automati{:ally achieved.</Paragraph> </Section> <Section position="2" start_page="230" end_page="232" type="sub_section"> <SectionTitle> 5.2 The Initial Categorisation </SectionTitle> <Paragraph position="0"> The general idea t{} {:ateg(}rise elements is t(} use sl}e{:iti{: (:ont{:xts which 1)oint {}ut s(}nie of the distrilml;iona\] t)r(}perti{:s of lhe (;ategory. ~\]Plle {;al;eg()risaf, ioll is a sequential 1)ro{;ess. First; the mmM have to be found out. 14)r each tag (}f the {:(}rl)uS , we ai)l}ly the fim{'ti{m J'm~ (described bek}w). This flmction se\]e('ts a fist of elements w\]fich are eateg(}rised as mmIei. The fun{:tion ft) is al}p\]ied to this list in 0> der to tigure out mmlei which a,'e breakers. Then the a(ljmtcts are, found out, and the function fb is also apl}\]ied l;(} them to figure out breakers.</Paragraph> <Paragraph position="1"> The (;Olltext used to find out the nuclei relies ott this siml}le following ol)servation: A structure requires at least; {me nucleus r. Thus, the elements tha.t; occur alone in a structure are assimilated to nucleus, since a structure requires a nucleus. For example, the tags PRP (pronouns) and NNP (proi)er nora:s) may eomp{}se a.lone a structure, (respectively 99% and 48% of these tags appear alone in NP) but the tag JJ appears alone only 0.009%. We deduce that PRP and NNP belong to the nuclei and not JJ. But this criterion does not allow tile identification of all the nuclei. Some often appear with adjuncts (an English noun (NN) often s occurs with a deternfiner or an adjective and thus appears alone only 13%). The single use of this characteristic provides a continuum of values where the automatic set up of a threshold between adjuncts and nuclei is probleinatic and depends on the structure. To solve this problem, the idea is to decompose the categorisation of nuclei in two steps. First we identify characteristic adjuncts.</Paragraph> <Paragraph position="2"> The adjuncts can not appear alone since they depend on a nucleus. The function fchar is built so that it provides a very small value for adjuncts. If the value of an element is lower than a given threshold (0char = 0.05), then it is categorised as a characteristic adjunct.</Paragraph> <Paragraph position="3"> = ~c x ~c P corresi)onds to the number of occurrences of the pattern P in the corpus C. For exmnple the number of occurrences in the training corpus of the pattern \[JJ\] is 99: and the number of occurrences of the pattern JJ (including the pattern \[ JJ \]) is 11097. So lobar(J J) = 0.009, value being low enough to consider JJ as a characteristic adjunct. The list provi(le(1 by fchar for English NP is: Achar = {DT, P1H~$, POS, .l.l, .I.IR, JJS,&quot; &quot;,&quot; &quot;} These elements can correspond to left; or right adjuncts. All the adjuncts are not identified, but tiffs list allows the identification of the nuclei as explained in the next paragraplL The second step consists of introducing these elements into a new pattern used by the funetioll fro,. This pattern matches elements surrounded by these characteristic adjuncts. It; thus matches nuclei which often appear with adjuncts. Since a sequence of adjuncts (as an adjunct alone) can not alone compose a complete structure, X only matches elements which correspond to a nucleus.</Paragraph> <Paragraph position="4"> = Ec { Ache,.* x A l The flmction fnu is a good discrimination function between nuclei and adjuncts and provides very low values for adjuncts and very high values for nuclei (table 1).</Paragraph> <Paragraph position="5"> Once the nuclei are identified, we can easily find out tlm adjuncts. They correspond to all the other eleinents which appear in the context: There are two SAt least, in the training corpus.</Paragraph> <Paragraph position="6"> kinds of adjuncts: the left and the right adjuncts.</Paragraph> <Paragraph position="7"> The contexts used are:</Paragraph> <Paragraph position="9"> If an element appears at tile position of the underscore, it is categorised as adjunct. Once tile left adjuncts are found out, they can be used for the categorisation of the right adjuncts. They thus appear in the context as optional dements (this is helpfifl to capture circumpositions).</Paragraph> <Paragraph position="10"> Since tilt; Adjective Phrase occurring inside an NP is not marked in the Upenn treebank, we introduce the class of all'inner of adjunct. The contexts used to find out the acljuncts of the left adjunct are: \[ _ A 1 NU \] for the left adjuncts of A 1 \[ al* A 1 _ NU \] for the right adjuncts of A 1 The contexts are similar for adjuncts of right a(1juncts. null By definition, a linker commcts two elements, and appears between them. The contexts used to find</Paragraph> <Paragraph position="12"> Elements which occur in these contexts trot which have already been categorised as nucleus or adjuncts are deleted from the list.</Paragraph> <Paragraph position="13"> A sequence of several nuclei (and the adjuncts which depend on them) can 1)elong to a mfique structure or compose several ad.iacent structures. An element is a breaker if its presence introduces a break into a sequence of adjacent nuclei. For example, the presonce of the tag DT in the sequence NN DT JJ NN introduces a break before the tag DT, although the sequence NN aJ NN (without \])T) can compose a single structure in the training corlms.</Paragraph> <Paragraph position="14"> ... \[the/DT coming/VBC/ week/NN\]</Paragraph> <Paragraph position="16"> The tag DT introduces a 1)reak on its lefl;, but some tags can introduce a break on thoir right or on their left, and right. For instance, the tag WDT (NU by defmflt) introduces a bre, ak on its M't and on its right. In other words, this tag can not belong to the same structure as the preceding adjacent nucleus and to the same structure as the following adjacent</Paragraph> <Paragraph position="18"> corpus wiLhout l)ra(:kcl,s These funcl;ions are used to coniput(; l;he, break 1)roperl;y for nuclei, but; also for adjullcts (\]\]i i.hi.q case, the pattern X is conll)leted })y a(hling the, elemeut NU to the left; or 1;o the right of X (tim 1)ot(mtial adjunct) according 1;o the kind of adjull(:t (left or right adjunct)). The table 2 shows some vahles for some tags. An olon)eIll; Call \])e a left brcakem (DT), a right breaker (no example fi)r English NI? at the tag level), or both (PRP). The break property is generally well-markc'd and the thre~shold is easy to set up In the refinement step (Se(:tion 5.3), the breaker property can be extended to words. Thus, the word yesterday is considered as a right breaker, although its tag (NN) is not.</Paragraph> </Section> <Section position="3" start_page="232" end_page="233" type="sub_section"> <SectionTitle> 5.3 The Refinement Step 5.3.1 The notion of reliability </SectionTitle> <Paragraph position="0"> Tile preceding functions ident, i(y the category of D, II elenmnt when it occurs in the str'ltctltT'e. \]lilt, fill elelnent can occur in the structure as well as out; of the structure. For example, the tag VBG is only considered as adjunct when it occurs in the structure. Nevertheless, it mainly occurs out of the structure (84% of its occurrences). If an element mainly 9 occurs out of the st, ructure, it is considered as non-reliable and its default category is OUT. For each dement occurring inside the structure, its rdiable is tested. The initial grammar corresponds to the grammar which only contains the reliable elements. Its precision and its recall are aromM 86%.</Paragraph> <Paragraph position="1"> Itow is determined the reliability of an element? This notion of reliabh; element is contextual and depends on th(! category of the element.</Paragraph> <Paragraph position="2"> For the uuclei, the context is eml)ty. W'e just comtmte l;he ratio })el;ween l;hc number of occurrences in the strll(;tur(} over th(} lllllllber of occtlrrelices occurring outside of the sl, ructure.</Paragraph> <Paragraph position="3"> For the adjmicts, the context in(-ludes an adjacent nuc:leus (on the right for left adjun(tts or on the h~ft for right a(ljun(-l;s). For instance, the tag .lJ is categorise(l as left a@mct for the English NP. It ap1)ears 9617 times 1)efore a lm(:leus and 9,189 times in the stru(-ture. It is thus (:onsi(lered as relial)le, and its default category is left adjmmt. In the case where the tag .\]J occurs without mmleus on its righl; (a pr(,dicative us(,,), it is not considered as adjunct mM this kind of occnrr(mces ix not used to deterniine the rclialfility of the element. On l;he coutrary, the tag VIIG appears 468 times before a mmleus, but, in this context, it occurs only 138 times in the structure. This is not enough (29%) to consider t.he element as a relial)le left; adjunct, and thus il;s de5mlt ca(;egoi'y is OUT. For the adjunct of adjunct, the conl;ext includes a(tjm/ct and imcleus.</Paragraph> <Paragraph position="4"> 5.a.2 Detection of errors Once the inil;iM glmnmar ix built ufh its errors have I;o 1)e corrected. The detection of the errors corresponds to a lniscategorisation of a tag. An aut;omatic error done by the initial grammar is to wrongly analyse the structures coml)osed with non-reliable elements (false negative examples). Each time that a non-reliable elenmnt occurs in the structure cormspends to an error. For instance, the initial grammar cmi not correctly recognise the following sequence as an NP, the default category of the tag VBG being OUT (outside the structure): ... \[the/1)T eoming/VBG week/NN\] ...</Paragraph> <Paragraph position="5"> ~')'l'he threshold used is of 50%.</Paragraph> <Paragraph position="6"> The second kind of errors corresponds to sequences wrongly recognised as structures (false positive exmnples). This kind of error is generated by reliable elements which exceptionally do not occur ill the structure. In the following example, orde~NN occurs outside of the structure, although the default category of the tag NN is NU (nucleus), and thus the initial grammar recognises an NP.</Paragraph> <Paragraph position="7"> ... in/IN order/NN to/TO 1,ay/VB ...</Paragraph> </Section> <Section position="4" start_page="233" end_page="233" type="sub_section"> <SectionTitle> 5.3.3 Correction </SectionTitle> <Paragraph position="0"> In both kinds of errors, the stone technique is used to correct them. I~r this purpose, ALLiS disposes of two operators, the eontextualisation and the lexicalisation. null The contcxtualisation consists of finding out contexts in order to fix the errors. The idea is to add constraints for recategorising non-reliable elements as reliable m. The t)resence of some specific elements can completely change the behaviour of a tag. The table 3 shows the list of contexts where the tag VBN is not categorised as OUT but as left Adjmmt.</Paragraph> <Paragraph position="1"> ment VBN becomes reliable.</Paragraph> <Paragraph position="2"> For each tag occurring in the, structure, all tile possible contexts 11 are generated. For the non-reliable tags (first kind of error), we evaluate the reliability of them contextually, and we delete the contexts in which the tag is still non-reliable (tile list of contexts can be empty, and in this case the error can not be fixed). For the reliable tags (second kind of error), we keel) the contexts in which the tag is categorised OUT.</Paragraph> <Paragraph position="3"> The lcxicalisation consists of introducing lexical information: the word level. Some words can have a specific behaviour which does not appear at the Part-Of-Speech (POS) level. For instance, the word yesterday is a left breaker, behaviour which can not be figured out at the POS level (Table 4). The introduction of the lexicalisation improves the result by 2% (Section 6).</Paragraph> <Paragraph position="4"> The lexicalisation and the contextualisation can be combined when both separately a.re not powerflfl enough to fix the error. For example, the word about tagged RB (default category of RB: OUT) followed ldegq'he same techifique is used in (Sima'an, 1997). HThe contexts depend on the category of the tag, but are just composed of one element.</Paragraph> <Paragraph position="5"> word(context) default cat. new cat.</Paragraph> <Paragraph position="7"> by the tag CD is recategorised as left: adjunct and left breaker (Table 4).</Paragraph> </Section> </Section> class="xml-element"></Paper>