File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2149_metho.xml

Size: 19,618 bytes

Last Modified: 2025-10-06 14:14:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2149">
  <Title>Learning Lineal&amp;quot; Precedence Rules</Title>
  <Section position="5" start_page="883" end_page="883" type="metho">
    <SectionTitle>
3 The Task of LP Rules
Acquisition Viewed As Learning
from Examples
</SectionTitle>
    <Paragraph position="0"> A program which learns from examples usually reasons from very specific, low-level, instances (positive or both positive and negative) to more general, high-level, rules that adequately describe those instances. Upon a common understanding (Lea and Simon, 1974), learning from examples is a cooperative search in two spaces, the instance space, i.e. the space of all possible training instances, and the rule (=h.ypotheses) space, i.e. the space of all possible general rules. Besides these two spaces, two additional processes are needed, intermediating them: interpretatioT~ and instance selection. The interpretation process is needed, in moving from the instance space to the rule space, to interpret the raw instances, which may be far removed in form froln the form of the rules, so that instances can guide the search in the rule space.</Paragraph>
    <Paragraph position="1"> Analogously, the instance selection rules serve to transform the high-level hypotheses (rules) to a representation useflfl for guiding the search in the instance space.</Paragraph>
    <Paragraph position="2"> A general description of our task is as follows: Given a specific ID grammar with no LP rules, find those LP rules. 1 In this task we also need to reason from very specific instances of LP rules (language phrases like small childreu, *children smalt) to rnore general LP rules (adjective &lt; noun), therefore it can be interpreted in terms of tile twospace model, described above.</Paragraph>
    <Paragraph position="3"> Our instance space will consist of all strings generable by the given ID grammar (the size of this instance space for any non-toy grammar will be very large). The LP rules space will be an unordered set, whose elements are pairs ot' nodes, connected by one of the relations &lt;, &gt; or &lt;&gt;, e. a LP set = \[\[A &lt; B\], \[B &lt; l';\], \[E &gt; C\], ... \]. (The size of the LP rules space will deperid upon the size of the specific ID grammar whose LP rnles are to be learned.) We also need to define the interpretation and instance-selection processes. In the learning system to be described, for both purposes serves (basically) a mete-interpreter for ll)/LP grammars, which can parse tile concrete grammar, given at the outset, for both analysis and generation. In an interpretation phase, the mete-interpreter will parse a natural language expression outputting an LP-rules-space approriate representation, whereas in the instance-selection phase the 1Though indeed this is the usual way of looking at the task, sometimes we may need to start with some LP rules already known; the program we shMl describe supports both regimes.</Paragraph>
    <Paragraph position="4"> meta-interpreter, given an LP space representation as input, will generate a language expression to be classified as positive (i.e. not violating word order rules) or negative (i.e. violating those rules) by a teacher.</Paragraph>
  </Section>
  <Section position="6" start_page="883" end_page="884" type="metho">
    <SectionTitle>
4 The Version Space Method
</SectionTitle>
    <Paragraph position="0"> There are a variety of methods in the AI literature for learning from exarnples. For handling our task, we have chosen tile so called &amp;quot;version space&amp;quot; method (also known as the &amp;quot;candidate elimination algorithm&amp;quot;), cf. (Mitchell, 1982). So we need to have a look at this method.</Paragraph>
    <Paragraph position="1"> Tile basic idea is, that ill all representation languages for the rule space, there is a partial ordering of expressions according to their generality.</Paragraph>
    <Paragraph position="2"> This fact allows a compact representation of the set of plausible rules (=hypotheses) in the rule space, since the set of points in a partially ordered set can be represented by its most general and its most specific elements. Tile set of most general rules is called the G-set, and tile set of most specific rules tile S-set.</Paragraph>
    <Paragraph position="3"> Figure 1 illustrates the LP rules space of a determiner of some grammatical nulnber (singular or l)lura.l) and an adjective, expressed in predicate logic.</Paragraph>
    <Paragraph position="4"> Viewed top-down, the hierarchy is in descending order of generality (arrows point from specific to general). The topmost LP rule is most general and covers all the other rules, since det(Num), where Nnm is a variable, covers both det(sg) and det(pl), and &lt;&gt; covers both &lt; and &gt;. Each of the rules at level 2 are neither more general nor more specific than each other, but are more general than the most specific rules at the bottorn.</Paragraph>
    <Paragraph position="5"> The learT~ing method assumes a set of positive and negative examples, and its aim is to induce a rule which covers all the positive examples and none of the counterexamples. Tile basic algorithm is as follows:  (1) The G-set is instantiated to the most general rule, and the S-set to the first positive example (i.e. a positive is needed to start the learning process). null (2) The next training instance is accepted. If it is positive, frorn the G-set are removed the rules which do not cover the example, and the elements of S-set are generalized as little as possible, so that they cover the new instance. If the next instance is negative, then fl'om the S-set are removed the rules that cover the counterexample, and the elements of the G-set are specialized as little as possible so that the counterexample is no longer covered by any of the elements of the G-set.</Paragraph>
    <Paragraph position="6"> (3) The learning process terminates when the G-set and the S-set are both singleton sets which are identical. If they are diflhrent singleton sets, the training instances were inconsistent. Otherwise a new training instance is accepted.</Paragraph>
    <Paragraph position="8"> Now, let us see how this works with the l,P rules version space in Figure 1, asslllning further the following classitied exaanplcs ((+) nteans l)ositiw~,  and (-) negative instance): (+) det(sg) &lt; a,ti (--) det(sg) &gt; adj (+) dct(pl) &lt; adj  The algorithni will instanLiate the {i-set to the n:iost general rule iii tile version space, and tim ,5'-set to the first positive, obl, aining: G-set: \[\[dot(Nu,n) &lt;&gt; ad.i}\] s-set: \[\[det(sg) &lt; add\]\] '/'hen the next exmnple will lie accepted, which is negative. The current ~g-sel does not cover it, so it relnains the sanle; the G-set is specialized as little as possible to exchlde tile negative, which yields: G-sot: \[\[det(Num) &lt; adj\]\] S-set: &lt; adj\]\] The last example is positive. \]'lie (;-sol reliiaills l, he same since it covers the positive. The ,5'-set however does not, so it has to be uiilliinally generalized to cover it, obtaining: G-set: \[\[de.t(Nuui) &lt; a~ti\]\] S-sot: \[\[det(Num)&lt; acid\]\] These are singleton sets which are identical, and the resultant (consistent) generalization is there-fore: \[det(Num) &lt; adj\]. That is, a determiner of any grarrniiatical lunnber niust precede a,n adjec-rive. null</Paragraph>
  </Section>
  <Section position="7" start_page="884" end_page="884" type="metho">
    <SectionTitle>
5 Overview of the Learner
</SectionTitle>
    <Paragraph position="0"> Our learning program has two basic modules: IAm version space learner which performs the olenmn-.</Paragraph>
    <Paragraph position="1"> tary learning Step (as descril)ed in 1.he previous section), and a nmta-hiterpretcr for ll)/I,P grainiiiars which serves the processes of interpretai.ion alld instance selection (as described in section 3).</Paragraph>
    <Paragraph position="2"> The learning proceeds in a dialog forni with tile teacher: for the learning of each individual l,P rule, the system produces natural language phrases to be classitied by the teacher mttil it can converge to a single concept (rule). The whole process ends when all LP rules are learned.</Paragraph>
    <Paragraph position="3"> At tile outset, the prograrn is supplied with the specific H) grallHl-lar whose l,P rules are to be acquired, and the user-provided bias of the. system.</Paragraph>
    <Paragraph position="4"> The latl;er implies an explicit statement Oil the part of tlw user of what featm'es and values are relevant to the task, by ilqmtting the corresponding generalization hierarchies (the precedence generalization hierarchy is taken for granted).</Paragraph>
    <Paragraph position="5"> In the particular implementation, the acceptable 11) grammar format is essentially that of a logic gt'ammar (Pereira and Warren, 1980), (l)ahl and Abramson, 1990). We only use a double arrow (to avoid mixing up with the often built-in Deftnite Clause ()irammar notation), and besides empty productions and sisters ha.ving the very same nallm are not allowed, since they interfere with 1,1 &gt; rules statmnenl.s, of. e.g. (Sag, 1987), (Saintl)izier, 1988).</Paragraph>
  </Section>
  <Section position="8" start_page="884" end_page="887" type="metho">
    <SectionTitle>
6 Tile implementation
</SectionTitle>
    <Paragraph position="0"> Below we discuss the basic aspects of tile implementation, illustrating it; with the ll) grannnar wil, h no LP restrictions, given on Figure 2.</Paragraph>
    <Paragraph position="1"> The grammar will generate simple declarative and interrogative sentences like The Jonses read this thick book, The 3onses read these thick books, Do the Jonses smile, etc. as well as all their (ungranuimtical) permutations Read this thick book the ,lo,lscs, 7'he Jonses read thick this book do, ct, c.</Paragraph>
    <Paragraph position="2"> The progranl knows at the outset that the values &amp;quot;sg&amp;quot; and &amp;quot;pl&amp;quot; are hoth more specitic than l, he varial)lc &amp;quot;N ran&amp;quot;, mal;ching any mmtber (this is tilt bias of the system).</Paragraph>
    <Paragraph position="3"> Step I. The prograni determines the siblings  (1) s ==&gt; name, vp.</Paragraph>
    <Paragraph position="4"> (2) sq ~ aux, name, vp.</Paragraph>
    <Paragraph position="5"> (3) vp ==&gt; vtr, np.</Paragraph>
    <Paragraph position="6"> (4) vp --&gt; vintr.</Paragraph>
    <Paragraph position="7"> (5) np =:=&gt; det(Num),adj,n(Nm,,).</Paragraph>
    <Paragraph position="8"> (6) , arno \[the-jonses\].</Paragraph>
    <Paragraph position="9"> (7) n(sg) ~ \[book\].</Paragraph>
    <Paragraph position="10"> (8) n(pl) =:~ \[books\].</Paragraph>
    <Paragraph position="11"> (9) det(sg) ~ \[this\].</Paragraph>
    <Paragraph position="12"> (10) det(p,) \[these\].</Paragraph>
    <Paragraph position="13"> (11) act(_) \[the\].</Paragraph>
    <Paragraph position="14"> (12) adj ==&gt; \[thick\].</Paragraph>
    <Paragraph position="15"> (la) vtr \[read\].</Paragraph>
    <Paragraph position="16"> (14) vintr ~ \[smile\].</Paragraph>
    <Paragraph position="17"> (15) aux \[do\].</Paragraph>
    <Paragraph position="18"> Figure 2: A simple i1) grammar with no LP constraints (=the right-hand sides of ID rules) that will later have to be linearized, by collecting them in a partially ordered list. Singleton right-hand sides (rule (4) above and all dictionary rules) are therefore left out, and so are cuts, and &amp;quot;escapes to Pro- null log&amp;quot; in curly brackets, since they are not used to represent tree nodes, but are rather coustraints on such nodes. Also, if some right-hand side is a set which (properly) includes another right-hand side (as in rule (2) and rule (1) abowe), the latter is not added to the sibling list, since we do not want to learn twice the linearization of some two nodes (&amp;quot;name&amp;quot; and &amp;quot;vp&amp;quot; in our case). The sibling list then, after the hierarchical sorting frollt lower-level to higher-level nodes, becomes: \[\[aux,name,vp\] ,\[vtr,np\],\[det (Nurn),adj ,n(Num)\]\] Now, despite the fact that the set of LP rules we need to learn is itself unordered, the order in which the program learns each individual LP rule rnay be very essential to the acquisition process. Thus, starting Dom the first, element of the above sibling list, viz. \[aux, name, vp\], we will be in trouble when attempting to locate the misorderings in any negative example. Considering just a single negative instance, say The Jonses read thick this book do: What is(are) the misplacenmnt(s) and where do they occur? In the higher-level tree nodes \[aux, name, vp\] or in the lower-level nodes \[vtr, np\] or in the still lower \[det(Num),adj,n(Num)\] ? Our program solves this problem by exploiting the fact, peculiar to our application, that the nodes in a grammar are hierarchically structured, therefore we may try to linearize a set of nodes A and B higher up in a tree o~lly after all lower-level nodcs dominated by both A and ft have already been ordered. Knowing these lower-lewq LP rules, our rneta-interpreter would never generate instances like The Jonses read thick this book do, but only some repositionings of the nodes \[aux, name, vp\], their internal ordering being gua.ranteed to be correct. The sibling list then, after hierarchical sorting from lower-level to higher-level nodes, becomes: \[\[det(Num),adj,n(Sum)\],\[vtr,np\],\[aux,,mme,vp\]\] and the lirsl; element of this list, is first passed to the learning engine.</Paragraph>
    <Paragraph position="19"> Slep ~. The. program now needs to produce a first positive example, as required by the version sl)ace method. Taking as input the first elelneut of tim sibling list, the. II)/LP meta-interpret.er generates a phrase conforming to this description and asks the teacher to re-order it correctly (if needed). In our case, tile first positiw', example would be this thick book. The phrase will be re-parsed in order to determine the linearization of constituents.</Paragraph>
    <Paragraph position="20"> A word about the I1)/I,P parser/generator. Its analysis role is needed in processing the firs|; positive example, and the generation role in the production of language examples for all intermediate stages of the learning process which are then evaluated by the teacher. The predicate observes two types of LP constraints: tile globally valid LP rules tha.t have been acquired by the system so far, &amp;quot; and the &amp;quot;transitory&amp;quot; LP constraints, serving to produce an ordering, as required by an intermediate stage of the learning process.</Paragraph>
    <Paragraph position="21"> l)isposing of the ordering of constituents in the positive example, the tra~silive closure of these partial orderings is computed (in our case, from \[\[det(Num) &lt; adj\],\[adj &lt; n(Num)\]\] we get \[\[det(Num) &lt; adj\], \[ad.i &lt; n(Num\]), \[det(Num) &lt; n(Num)\]\]). This result is the. east into a ropresentation that SUl)ports our learning process. :3 20r are priorly known, in the case when the system starts with some LP rules declared by the user.</Paragraph>
    <Paragraph position="22">  and the individual 1,1 ) rules, resulting fron, linding a consistent generalization, are asserted in the II)/bP granmm.r datal)ase to lie resl)eete,l by any 4 further generation process.</Paragraph>
    <Paragraph position="23"> Figure 3 gives a learning cycle starting \['rein the sibling list ele,nent \[det(Nun,),adj,n(Nu,,,)\]. The first column gives the dialog with the teacher, the second the program's internal representation of the l,P rules space., and the third those |'ules a.re expressed in their nnore familiar, and final, form that can be utilized directly by the 11) gra.nmmr. Afl,er processing the lirst 1)ositive (tirst row), tile system generalizes by varying a paraluet(:r (imnd)er or l)recedenec), verbalizes 1,he generalization, the generated phrase is class|tied by the teacher, then another generalization is made, depending on the classiiication, it is verbalized, evaluated and so on. The 1)rocess results in the three l,P rules: det(Num) &lt; adj; adj &lt; ,,(Num); and det(N.,n) &lt; ,~(N.,,,).</Paragraph>
    <Paragraph position="24"> A remark on notation: # delimits individual l,P rules, allowing their recovery in terms of Prolog structures. The underbars, _, are nmrely placeholders for bound variables (in our case. those bound to &amp;quot;N&amp;quot;). Clearly, mutually depemhmt fleature values need to be eonsidcre(l (i.e. wu'ie(I I)y the program) only once, and so they occtu&amp;quot; just once in the expressions.</Paragraph>
    <Paragraph position="25"> Severa.1 additional points regarding the learning process need to be lnadc.</Paragraph>
    <Paragraph position="26"> 4Assertions are actually made after (:he(:king for consistency with LP's already present in the database. Though no contradictions may ~u'isc with acquired rules, they may come from LP's declared by the user it, the case when the system is started with some such LP's.</Paragraph>
    <Paragraph position="27"> The firsl i~ that after couw.q'ging to a. single l,p rule, it is tesl.e(l wlmther 1.his rule covers allluost specific instnnces. For doing this, the stated gen-C/~ralization hierarchies are t.aken into account alongside with the fact that in an II)/LP format a rule of tile type d &gt; 13 logically implies the negalion of its &amp;quot;inverse rule&amp;quot; A &lt; IL Thus, the rule det(Num) &lt; adj covers all potential most specif~ ie instances since the rule itself and its inverse rule det(Nmn) &gt; adj cover them, which is clearly seen on the generalization hierarchy in Figure 1.</Paragraph>
    <Paragraph position="28"> \[\[&amp;quot; SO\]fie lllOSt sl)eci\[i(: illsta, l|('es rettlaill /lllcovere(.l~ theu they are fed again to the version space algo~ ril.hlzl for a second pass.</Paragraph>
    <Paragraph position="29"> The second point is that when it is impossible \['or SOll|e structur(&amp;quot; to be verbalized due to cont, radictory LP statelnellts (as ill the second row), the system itself evaluates this exa.ml)le as negative and l)roceeds fln'ther.</Paragraph>
    <Paragraph position="30"> We also nee(I to emphasize that the program selectively, rather than randomly, wries the potentially relevant parameters (munber and precedence, in this particular case), ~ttempting to converge the. generalization process most quickly. This is done in order to minilnize the nunnber of training iustanees that need to be generated, and hence (,o nfinimize the number of evMuations that tim teach(~r lice(Is to Ina.ke. In other words, being generalization-driven, the generator never produces training instances which arc superfluous to the generalization pro('ess. '\]'his, in particular, allows the program to avoid outputting all strings generable by the grammar whose LP rules are being acquired {notice, for instance, in the lh'st colunm of Figure 3 that no language expression involving the dictionary rule (11) det(_) \[the\] from Figure 2 is displayed to the user).</Paragraph>
    <Paragraph position="31">  In this respect our approach is in sharp contrast to a learning process whose training examples are given en bloc, and hence the teacher would, of necessity, make a great lot of assessions that the learner would never use.</Paragraph>
    <Paragraph position="32"> Step ~. The learning terminates successfully when all LP rules are found (i.e. all elements of the sibling list are processed) and fails when no consistent generalization may be found for some data. The latter fact needs to be interpreted in the sense that these data are not correctly describable within the ID/LP format.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML