File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/p80-1014_metho.xml
Size: 26,516 bytes
Last Modified: 2025-10-06 14:11:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P80-1014"> <Title>Computational Analogues of Constraints on Grammars: A Model of Syntactic Acquisition</Title> <Section position="2" start_page="0" end_page="50" type="metho"> <SectionTitle> 2. Constraints Establish the Program's Success 2. I Current Status of the Acquisition Program </SectionTitle> <Paragraph position="0"> To date, the accomplishments of the research are two-fold.</Paragraph> <Paragraph position="1"> First, from an engineering standpoint, the program succeeds admirably; starting with no grammar rules and just two base schema rules, the currently implemented version (dubbed LPARSIFAL) acquires from positive example sentences many of the grammar rules in a &quot;core grammar&quot; of English originally hand-written by .Marcus. The currently acquired rules are sufficient to parse simple declaratives, much of the English auxiliary system including auxiliary verb inversion, simple passives, simple wh.questions (e.g., Who did John kiss.'), imperatives, and negative adverbial preposing. Carrying acquisition one step further, by starting with a relatively restricted set of context-free base rule schemas - the X-bar system of Jackendoff \[7\] - the program can also easily induce the proper phrase structure rules for the language at hand.</Paragraph> <Paragraph position="2"> Acquired base rules include those for noun phrases, verb phrases, prepositional phrases, and a substantial part of the English auxiliary verb system.</Paragraph> <Paragraph position="3"> The decision to limit the program to restricted sorts of evidence for its acquisition of new rules - that is, positive data of only limited complexity - arises out of a commitment to develop the weakest possible acquisition procedure that can still successfully acquire syntactic rules. This co,nmitment in turn follows from the position (cogently stated by Pinker) that &quot;any plausible theory of language learning will have to meet an unusually rich set of empirical conditions. The theory ... will have to be \[. But clfildren might (and seem to) receive negative evidence for what i~ a ,~emantically well-formed ,~entence. See Brown and Hanlon \[3\]2. There is a another rea.,on for rejecting negative examples as inductive evidence: from farina |results first established by Gold \[5\], it is known that by pairing positive and negative example string.~ with the appropriate labels &quot;grammaticaC and &quot;ungrammatical&quot; one can learn &quot;almost any&quot; language. Thus. enriching the input to admit negative evidence broadens the class of &quot;l~'~ssibly learnable languages&quot; enormously. (Explicit instruction and negative examples are often closely yoked. Compare the necessity for a benign teacher in Wlnston',~ blocks world learning program \[6'j.) Of course, many rules lie beyond the current program's reach.</Paragraph> <Paragraph position="4"> PARSIFAL employed dual mechanisms to distinguish Noun Phrase ;rod wh-moveznents: at present, LPARSIFAL has only a single device to handle all constituent movements. Lacking a distinguished facility to keep track of wh-movements, LPARSIFAL cannot acqt, ire the rules where these movements might interact with Noun Phrase movements. Current experiments with the system include adding the wh facility back into the domain of acquisition. Also, the present model cannot capture all &quot;knowledge of language&quot; in the sense ;ntended by generative grammarians. For example, since the weakest form of the acquisition procedure does not employ backup, the program cannot re-analyze &quot;garden path&quot; sentences and so deduce that they are grammatically well-formed) In part, this deficit arises because it is not perfectly clear to what extent knowledge of parsing encompasses al_! our knowledge about language. 4</Paragraph> <Section position="1" start_page="49" end_page="50" type="sub_section"> <SectionTitle> 2.2 Constraints and the Acquisition Program </SectionTitle> <Paragraph position="0"> However, beyond the simple demonstration of what can and cannot be acquired, there is a second, more important accomplishment of the research. This is the demonstration that constraint is an essential element of the acquisition program's success. To ease the computational burden of acquiring grammar rules it was necessary to place certain constraints on the operation of the model, tightly restricting both the class of h.vpothesizable phrase structure rules and the class of possible gramlnar rules.</Paragraph> <Paragraph position="1"> The constraints on grammar rules fall into two rough groups: consteainrs o,x rule application and constraints on rule form.</Paragraph> <Paragraph position="2"> The constraints on rule application can be formulated as specific /oca/i O, principles that govern the operation of the parser and the acquisition procedure. Recall that in Marcus' PARSIFAL grammar rules consist of simple production rules of the form If <pattern> then <action>, where a pattern is a set of feature predicates that must be true of the current environment of the parse i,~ order for an action to be taken. Actions are the basic tree-building ol~raTions that construct the desired output, a (modified) annotated surface structure tree (in the sense of Fiengo \[S\] or Chomsky \[9\]).</Paragraph> <Paragraph position="3"> Adopting the operating principles of the original PARSIFAL, grammar rules can trigger only by successfully matching features of the (finite) local em@onment of the parse, an environment that includes a small, three-cell look-ahead buffer holding * &quot;already-built constituents whose grammatical function is as yet 3. A related issue is that the current procedure do~ not acquire the PARSIFAL &quot;diagnostic&quot; grammar rules that exploit look.ahead. Typically, diagnostic rules us.- the specific features of lexical items far ahead in the Io~k-ahead buffer to decide between alternative courts of action. However. I~y extendih, the acqui~;tion procedure -- allowing it to re-analyze apparently &quot;bad&quot; ~ntences in a careful mode and adding the stipui;Jti,~n that more &quot;specific&quot; rules should take priority over more &quot;general&quot; rules (an c, ften-made assumption for production systems) -- one can begin to aecomodate the acquisition of diagnostic rules, and in fact provide a kind of developmental theory for such rules. Work testing this idea is underway. 4. In mo.,t too<lets, the string-to-structural description mapping implied by the directionality of parsing is not &quot;neutral&quot; with respect speakers and listeners.</Paragraph> <Paragraph position="4"> undecided (e.g., a noun phrase that is not yet known to be the subject of a sentence) or single words. It is Marcus' claim that the addition of the look-ahead buffer enables PARSIFAL to always correctly decide what to do next - at least for English.</Paragraph> <Paragraph position="5"> The parser uses the buffer to make discriminations that would otherwise appear to require backtracking. Marcus dubbed this &quot;no bocktracking&quot; stipulation the Determinism Hygothesis. The Determiqism Hypothesis crucially entails that all structure the parser builds is correct - that already-executed grammar rules have performed correctly. This fact provides the key to easy acquisition: if parsing runs into trouble, the difficulty can be pinpointed as the current locus of parsing, and no_._tt with any already-built structure (previously executed grammar rules). In brief, any errors are assumed to be locally and immediately detectable. This constraint on error detectability appears to be a computational analogue of the restrictions on a transformational system advanced by Wexler and his colleagues.</Paragraph> <Paragraph position="6"> (see Culicover ;rod Wexler \[I0\]) In their independent but related formal mathematical modelling, they have proved that a finite error detectability restrict/on suffices to ensure the learnability of a tr;msformational grammar, a fact that might be taken as independent support for the basic design of LPARSIFAL.</Paragraph> <Paragraph position="7"> Turning now to constraints on rule form, it is easy to see that any such constraints wilt aid acquisition directly, by cutting down the space of rules that can be hypothesized. To introduce the constraints, we simply restrict the set of possible rule <patterns> and <actions>. The trigger patterns for PARSIFAL rules consist of just the items in the look-ahead buffer and a local (two node) portion of the parse tree under constructionfive &quot;cells&quot; in all. Thus, patterns for acquired rules can be assumed to incorporate just five cells as well. As for actions, a major effort of this research was to demonstrate that just three or so basic operations are sufficient to construct the annotated surface structure parse tree, thus eliminating many of the grammar rule actions in the original PARSIFAL. Together, the restrictions on rule patterns and actions ensure that the set of rules available for hypothesis by the acquisition program is finite.</Paragraph> <Paragraph position="8"> The restrictions just described constrain the space of available gr:,mmnr rules. However, in the case of phrase structure rules :ldditional strictures are necessary to reduce the acquisitiona\[ burden. LPARSIFAL depends heavily on the X.bar theory of phrase structure rules \[7\] to furnish the necessary constraints. In the X-bar theory, ,all phrase structure rules for human grammars are assu,ned to be expansions of just a few schemas of a rather specific form: for example, XP->...X ..... Here, the &quot;X&quot; stands for an oblig;,tory phrase structure category (such as a Noun, Verb, or Preposition): the ellipses represent slots for possible, but optional &quot;XP&quot; elements or specified grammatical formatives. Actual phrase structure rules ;sre fleshed out by setting the &quot;X&quot; to some known category and settling upon some way to fill out the ellipses. For example, by setting X=N(oun) and allowing some other &quot;XP&quot; to the left of the Noun (call it the category &quot;Determiner&quot;) we would get one verson 3f a Noun Phrase rule, NP-->Determiner N . In this case, the problem for the learner must include figuring out what items are permitted to go in the slots on either side of the &quot;N&quot;. Note that the XP schema tightly constrains the set of possible phrase structure rules; for instance, no rule of the form, XP-->X X would be admissible, immediately excluding such forms as, Noun Phrase->Noun Noun. It is this rich source of constraint that makes the induction of the proper phrase structure from positive examples feasible; section 4 below illustrates how this induction method works in practice.</Paragraph> <Paragraph position="9"> Finally, it should be pointed out that the category names like &quot;N&quot; and &quot;V&quot; are just arbitrary labels for the &quot;X&quot; categories; the standard approach of X-bar theorists is to assume that the names st:md for bundles of distinctive features that do the actual work of classifying tokens into one category bin or another. All important area for future research will be to formulate precise models of how the feature system evolves in interaction with lexical and syntactic acquisition.</Paragraph> <Paragraph position="10"> This research completed so far assumes that the acquisition procedure is initially provided with just the X-bar schema described above along with an ability to categorize lexical items ;is noun.c, ~'erbs, or other. In .addition, the program has an initial schema for a well-formed predicate argument structure, namely, a predicate (verb) along with its &quot;object&quot; arguments. Other phrase structure categories such as Prepositional P/ware are inferred by noticing lexical items of unknown categorization and then insisting upon the constraint that only &quot;XP&quot; items or specified formatives appear before and after the main &quot;X&quot; entry. To take im over-simplified example, given the Noun Phrase the book behind the ~'indow, the presence of the non-Noun, non-Verb behind and the Noun Phrase lhe window immediately after the noun book would force creation of a new &quot;X&quot; category, since possible alternatives such as, NP->NP \[the book\] NP \[behind...\] are prohibited by the X-bar ban on directly adjacent, duplicate &quot;X&quot; items.</Paragraph> <Paragraph position="11"> The X-bar acquisition component of the acquisition procedure is still experimental, and so open to change. However, even crude use of the X-bar restrictions has been fruitful. For one thing, it enables the acquisition procedure to start without any pre-conceptions about canonical word order for the language at hand. This would seem essential if one is interested in the acquisition of phrase structure rules for languages whose canonical Subject-Verb-Object ordering is different from that of English. Ill addition, since so much of the acquisition of the category names is tied up with the elaboration of a distinctive feature system for lexical items, adoption of the X-bar theory appears to provide a driving wedge into the difficult problems of lexica\[ acquisition and lexical ambiguity. To take but one example, the X-bar theory provides a framework for studying how items of one phrase structure category, e.g., verbs, can be converted into items of another category, e.g., nouns. This line of research is also currently ander investigation.</Paragraph> </Section> </Section> <Section position="3" start_page="50" end_page="51" type="metho"> <SectionTitle> 3. The Acquisition Algorithm is Simple </SectionTitle> <Paragraph position="0"> As mentioned, LPARSIFAL proceeds by trying its hand at parsing a series of positive example sentences. Parsing normally operates by executing a series of tree-boilding and token-shifting grammar rule actions. These actions are triggered by matches of rule patterns against features of tokens in a small thtee-ceU constituent look-ahead buffer and the local part of the annotated surface structure tree currently under constructionthe lowest, right-most edge of the parse tree.</Paragraph> <Paragraph position="1"> Grammar nile execution is also controlled by reference to base phrase structure rules. To implement this control, each of the parser's grammar rules are linked to one or more of the componeqts of the phrase structure rules. Then, grammar rules are defined to be eligible for triggering, or active, only if they are associ:tted with that p:lrt of the phrase structure which is the current locus of the parser's attentions; otherwise, a gramm;ir rule does not even have the opportunity to trigger against the buffer, and is inactive. This is best illustrated by an ex;tmple. Suppose there were but a single phrase structure rule for English, Sentence->NounPhrase VerbPhrase. Flow of control during a parse would travel left-to-right in accordance with the S--NP--VP order of this rule, and could activate and deactivate buqdles of grammar rules along the way. For example, if the parser had evidence to enter the S->NP VP phrase structure rule, pointers would first be set to its &quot;S&quot; and the &quot;NP&quot; portions. Then, all the grammar rules associated with &quot;S&quot; and &quot;NP&quot; would have a chance to run and possibly build a Noun Phrase constituent. The parser would eventually advance in order to construct a Verb Phrase, deactivating the Noun Phrase building grammar rules and activating any grammar rules :lssociated with the Verb Phrase. 5 Together with (1) the items in the buffer and (2) the leading edge of the parse tree under construction, the currently pointed-at portion of the phrase structure forms a triple that is called the current machine slate of the parser.</Paragraph> <Paragraph position="2"> If in the midst of a parse no currently known grammar rules can trigger, acquisition is initiated: LPARSIFAL attempts to construct a single new executable grammar rule. New rule assembly is straightforward. LPARSIFAL simply selects a new pattern and action, utilizing the current machine stale triple of the parser at the point of failure as the new pattern and one of four primitive (atomic) operations as the new action. The primitive operations are: attach the item in the left-most buffer cell to the node currently under construction; switch (exchange) the items in the first and second buffer cells; insert one of a finite number of lexical items into the first buffer cell; and insert a trace (an anaphoric-like NP) into the first buffer cell.</Paragraph> <Paragraph position="3"> The actions have turned out to be sufficient and mutually exclusive, so that there is little if any combinatorial problem of choosing among many alternative new grammar rule candidates.</Paragraph> <Paragraph position="4"> As a further constraint on the program's abilities, the acquisition procedure itself cannot be recursively invoked; that is, if in its attempt to build a single new executable grammar rule the program finds that it must acquire still other new rules, the current attempt at acquisition is immediately abandoned. This restriction has the apparently desirable effect of ensuring that the program use just local context to debug its new rules as well as ignore overly complicated example sentences that are beyond its reach.</Paragraph> <Paragraph position="5"> 5. This mherne w&.L first ,',uggested by Marcus \[I. ~ge 60\]. The actu~ procedure uses the X-bar ~hernas instead of explicitly labellad nodes like &quot;Vl&quot; or &quot;S'.</Paragraph> <Section position="1" start_page="51" end_page="51" type="sub_section"> <SectionTitle> 3.1 Mark Acquisition Procedure as Invoked. 3.2 Attempt to construct new grammar rule 3.2.2 Try attach </SectionTitle> <Paragraph position="0"> Success: (Save new rule) Go to Step 3.3 Failure: (Try next action) On to Step 3.2.3 3.2.3 Try to switch first and second buffer cell items. Success: (Save new rule) Go to Step 3.3.</Paragraph> <Paragraph position="1"> Failure:. (Restore buffer and try next action)</Paragraph> </Section> <Section position="2" start_page="51" end_page="51" type="sub_section"> <SectionTitle> 4.1 Phrase Structure for Verb Phrases </SectionTitle> <Paragraph position="0"> To see exactly how the X-bar constraints can simplify the phrase stru~ure induction task, suppose that the learner has already acquired the phrase structure rule for sentences, i.e., something like, Sentence->Noun Phrase Verb Phrase, and now requires information to determir,, the proper expansion of a Verb phrase, Verb Phrase->..777.</Paragraph> <Paragraph position="1"> The X-bar theory cuts through the maze of possible expansions for the right-hand side of this rule. Assuming that Noun Phrases are the only other known category type, the X-bar theory then tells us is that these are the only possible configurations for a Verb Phrase rule: If the learner can classify basic word tokens as either nouns or verbs, then by simply matching an example sentence such as John kissed Mary against the possible phrase structure expansions, the correct Verb Phrase rule can be qu;:kly deduced:</Paragraph> <Paragraph position="3"> d. kissed M. d, kissed M. d. kissed M.</Paragraph> <Paragraph position="4"> (N) (V) (N) Only one possible Verb Phrase rule expansion can successfully be matched against the sample string, Verb Phrase->Noun Phrase(NP)Verb(V) - exactly the right result for English. Although this is but a simple example, it illustrates how the phrase structure rules can be acquired on the basis of a process akin to &quot;parameter setting&quot;; given a highly constrained initial state, the desired final state can be obtained upon exposure to very simple triggering data.</Paragraph> <Paragraph position="5"> Suppose that at a certain point LPARSIFAL has all the grammar rules and phrase structure rules sufficient to build a parse tree for John did kiss Mary. The program now must parse, Did John kiss Mary?. No currently known rule can fire, for all the rules in the phrase structure component activated at the beginning of a sentence will have a triggering pattern roughly like f=Aroun Phrase?\]\[=i/erb?\], but the input buffer will hold the pattern \[Did: auxrerb, verbffJohn: Noun Phrase\], and so thwart all attempts at triggering a grammar rule. A new rule must be written. Acting according to its acquisition procedure, the program first tries to attach the first item in the buffer, did, to the current active node, S(entence) as the Subject Noun Phrase. The attach fails because of category restrictions from the X-bar theory; as a kztown verb, did can't be attached as a Noun Phrase. But switch works, because when the first and second buffer positions are interchanged, the buffer now looks like \[Johnffdid\] Since the ability to parse declaratives such as John did kiss.., was assumed, an NP-attaching rule will now match.</Paragraph> <Paragraph position="6"> Recording its success, the program saves the switch rule along with the current buffer pattern as a trigger for remembering the context of auxiliary inversion. The rest of the sentence can now be parsed as if it were a declarative (the fact that a switch was performed is also permanently recorded at the appropriate place in the parse tree, so that a distinction between declarative and inverted sentence forms can be maintained for later &quot;semantic&quot; Ugh.)</Paragraph> </Section> </Section> <Section position="4" start_page="51" end_page="52" type="metho"> <SectionTitle> 5. Summary </SectionTitle> <Paragraph position="0"> A simple procedure for the acquisition of syntactic knowledge has been presented, making crucial use of linguistically- and computationally-motivated constraints. Computationally, the system exploits the local and incremental approach of the Marcus parser to ensure that the search space for hypothesizabie new rules is finite and small. In addition, rule ordering information need not be explicitly acquired. That is, the system need not learn that, say, Rule A must obligatorily precede Rule B. Extrinsic ordering of this sort appears difficult (if not impossible) to attain under conditions of positive-only evidence.</Paragraph> <Paragraph position="1"> Third, the system acquires its complement of rules via the step-wise hypothesis of new rules. This ability to incrementally refine a set of grammar rules rests upon the incremental properties of the Marcus parser, which in turn might reflect the characteristics of the English language itself.</Paragraph> <Paragraph position="2"> The constraints on the parser and acquisition procedure also parallel many recent proposals in the linguistic literature, lending considerable support to LPARSIFAL's design. Both the power and range of rule actions match those of constrained transformational systems; in this regard, one should compare the (independently) formalized transformational system of Lasnik and Kupin \[I1\] that ahnost point-for-point agrees with the restrictions on LPARSIFAL. Turning to other proposals, two of LPARSIFAL's rule actions, attach and switch, correspond to Emonds' \[12\] categories of structure-preserving and local (minor-movement) rules. A third, insert trace, is analagous to the more alpha rule of Chomsky \[13\]. Rule application is correspondingly restricted. The Culicover and Wexler Binary Principle (an independently discovered constraint akin to Chomsky's Subiacency Condition; see \[10\]) can be identified with the restriction of rule pattern-matching to a local radius about the current point of parse tree construction (eliminating rules that directly require unbounded complexity for refinement). The remaining Culicover and Wexler sufficiency conditions for learnability, including their Freezing and Ralsin~ Principles, are subsumed by LPARSIFAL's assumption of strict local operation and no backtracking (eliminating rules that permit the unbounded cascading of errors, and hence unbounded complexity for refinement).</Paragraph> <Paragraph position="3"> These striking parallels should not be taken - at least not immediately -- as a functional, &quot;processing&quot; explanation for the constraints on grammars uncovered by modern linguistics. An expl:mation of this sort would take computational issues as the basis for an &quot;evaluation metric&quot; of grammars, and then proceed to tells us why constraints are the way they are and not some other way. But this explanatory result does not necessarily follow from the identity of description between traditional transformational and LPARSIFAL accounts. Rather, LPARSIFAL ,night simply be translating the transformational constraints into a different medium - a computational one.</Paragraph> <Paragraph position="4"> Even more intriguing would be the finding that the constraints desirable from the standpoint of efficient parsing turn out to be exactly the constraints that ensure efficient acquisition. The current work with LPARSIFAL at least hints that this might be the case. However, at present the trade-off between the various kinds of &quot;computational issues&quot; as they enter into the evaluation metric is unknown ground; we simply do not yet know exactly what &quot;counts&quot; in the computational evaluation of grammars. ACKNOWLEDGE}4ENTS This article de,~rihes r~earch done at the Artificial Intelligence Laboratory of the M&,~sachusetts Institute of Technology. Support for the Laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N00014-75-C-0643.</Paragraph> <Paragraph position="5"> The author is also deeply indebted to Milch Marcus. Only by starting with a higi~ly restricted parser could one even begin to consider the problem of acquiring the knowledge that such a par.',er embodies. The effort aimed at restricting the operation of PARSIFAL flows C/s much from his thoughts in this direction as from the research into acquisition alone.</Paragraph> </Section> class="xml-element"></Paper>