File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/p80-1028_metho.xml
Size: 26,825 bytes
Last Modified: 2025-10-06 14:11:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P80-1028"> <Title>On Parsing Strategies and Closure'</Title> <Section position="2" start_page="0" end_page="107" type="metho"> <SectionTitle> 1. The FS Hypothesis </SectionTitle> <Paragraph position="0"> We assume a severe processing limitation on available short term memory (5TM), as commonly suggested in the psycholinguistic literature (\[Frazier79\], \[Frazier and Fodor?9\]. \[Cowper76\], \[Kimball73, 75\]). Technically a machine with limited memory is a finite state machine (FSM) which has very good complexity bounds compared to a TM.</Paragraph> <Paragraph position="1"> How does this assumption interact with competence? It is plausible for there to be a rule of competence (call it Ccomplex) which cannot be processed with limited memory.</Paragraph> <Paragraph position="2"> What does this say about the psychological reality of Ccomplex? What does this imply about the FS hypothesis? When discussing certain performance issues (e.g. centerembedding). 4 it will be most useful to view the processor as a FSM; on the other hand, competence phenomena (e.g. subjacency) suggest a more abstract point of view. It will be assumed that there is ultimately a single processing machine with its multiple characterizations (the ideal and the real components). The processor does not literally apply ideal rules of competence for lack of ideal TM resources, but rather, it resorts to more realistic approximations. Exactly where the idealizations call for inordinate resources, we should expect to find empirical discrepancies between competence and performance.</Paragraph> <Paragraph position="3"> A F5 processor is unable to parse complex sentences even though they may be grammatical. We claim these complex sentences are unacceptable. Which constructions are in principle beyond the capabilities of a finite state machine? Chomsky and Bar-Hillel independently showed that (arbitrarily deep) center-embedded structures require unbounded memory \[Chomsky59a, b\] \[Bar-Hillelbl\] \[Langendoen75\]. As predicted, arbitrarily center-embedded sentences are unacceptable, even at relatively shallow depths.</Paragraph> <Paragraph position="4"> (2) ;g\[The man \[who the boy \[who the students recognized\] pointed out\] is a friend of mine.\] (3) ~\[The rat \[the cat \[the dog chased\] bit\] ate the cheese.\] A memory limitation provides a very attractive account of the center-embedding phenomena (in the limit)J 1. I would like to thank Peter Szolovits, Mitch Marcus, Bill Martin, Bob Berwick, Joan Bresnan, Jon Alien, Ramesh Patil, Bill $wartout, Jay Keyser. Ken Wexler, Howard L&,nik, Dave McDonald, Per-Kristian Halvorsen, and countless others for many useful comments, 2. Throughout this work, the complexity notion will be u=md in iu computational sense as a measure of time and space resources required by an optimal processor. The term will not he used in the linguistic sense (the .~ite of the grammar itself). In general, one can trade one off for the other, which leads to conslderable confusion. The site of a program (linguistic compiexhy) is typically inversely related to the power of ttle interpreter (computational complexily).</Paragraph> <Paragraph position="5"> 3. A ha.~i~ mark (~) is used to indicate that a sentence is unacceptable;, an asterisk (=) is used in the traditional fashion to denote ungrammaficality. Grammaticality is associated with competence (post-theoretic), where&,~ acceptability is a matter of performance (empirical).</Paragraph> <Paragraph position="6"> (4) &quot;This fact \[that deeply center-embedded sentences are unacceptable\], and this alone, follows from the assumption of finiteness of memory (which no one, surely, has ever questioned).&quot; \[Chomskybl, pp. 127\] What other phenomena follow from a memory limitation? Center-embedding is the most striking example, but it is nor unique. There have been many refutations of FS competence 4. A center-embedded sentence contains an embedded clause surrounded by \]exical material from the higher claus:. \[sx \[s -\] Y\]' where both x and y contain lexical material.</Paragraph> <Paragraph position="7"> 5. A complexity argumem of this sort does not distinguish between a depth of three or a depth of four. It would require considerable psychological experimentation to di~over the precise limitations, models: each one illustrates the point: computationally complex structures are unacceptable. Lasnik's noncoreference rule \[Lasnik76\] is another source of evidence. The rule observes tllat two noun phrases in a particular structural configuration are noncoreferential.</Paragraph> <Paragraph position="8"> (5) The Noncoreference Rule: Given two noun phrases NP 1. NP 2 in a sentence, if NP 1 precedes and commands NP 2 and NP 2 is not a pronoun, then NP1 and NP 2 are noncoreferentiaL It appears t o be impossible to apply Lasnik's rule with only finite memory. The rule becomes harder and harder to enforce as more and more names are mentioned. As the memory requirements grow, the performance model is less and less likely to establish the noncoreferential link. In (6). the co-indeaed noun phrases cannot be coreferential. At the depth increases. the noncoreferential judgments become less and less sharp, even though (6)-(8) are all equally ungrammatical (65 *~Did you hear that John i told the teacher John i threw the first punch.</Paragraph> <Paragraph position="9"> (7) *??Did you hear that John i told the teacher that Bill said John i threw the first punch.</Paragraph> <Paragraph position="10"> (85 *?Did you hear that John i told the teacher that Bill said that Sam thought John i threw the first punch.</Paragraph> <Paragraph position="11"> Ideal rules of competence do not (and should not) specify real processing limitations (e.g. limited memory); these are matters of performance. (65-(8) do not refute Lasnik's rule in any way; they merely point out thal its performance realization has some important empirical differences from Lasnik's idealization. Notice that movement phenomena can cross unbounded distances without degrading acceptability. Compare this with the center-embedding examples previously discussed. We claim that center-embedding demands unbounded resources whereas movement has a bounded cost (in the wont case). 6 It is possible for a machine to process unbounded movement with very limited resources. 7 This shows that movement phenomena (unlike center-embedding) can be implemented in a performance model without approximation.</Paragraph> <Paragraph position="12"> (9) There seems likely to seem likely ... to be a problem. (10) What did Bob say that Bill said that ... John liked? It is a positive result when performance and competence happen to converge, as in the movement case. Convergence enables performance to apply competence rules without approximation. However. there is no logical necessity that performance and 6. The claim is that movement will never consume more than a bounded cost: the cost is independent of the length of the sentence. Some movement .~entences may be ea.'~ier than others (subject vs. object relatives). See (Church80\] for more di~ussion.</Paragraph> <Paragraph position="13"> 7. In fact, the human processor may not be optimal The functional argument ob~erve~ that an optimal proce~r could process unbounded movement with bounded resources. This should encourage further investigation, but it alone is not sufficient evidence that the human procesr.or has optimal properties.</Paragraph> <Paragraph position="14"> competence will ultimately converge in every area. The FS hypothesis, if correct, would necessitate compromising many competence idealizations.</Paragraph> </Section> <Section position="3" start_page="107" end_page="108" type="metho"> <SectionTitle> 2. The Proposed Model: YAP </SectionTitle> <Paragraph position="0"> Most psycholinguists believe there is a natural mapping from the complex competence model onto the finite performance world.</Paragraph> <Paragraph position="1"> This hypothesis is intuitively attractive, even though there is no logical reason that it need be the case. s Unfortunately, the ~ychoiinguistic literature does not precisely describe the mapping. We have implemented a parser (YAP) which behaves like a complex competence model on acceptable 9 cases, but fails to pane more difficult unacceptable sentences. This performance model looks very similar to the more complex competence machine on acceptable sentences even though it &quot;happens&quot; to run in severely limited memory. Since it is a minimal augmentation of existing psychological and linguistic work, it will hopefully preserve 1heir accomplishments, and in addition, achieve computational advantages.</Paragraph> <Paragraph position="2"> The basic design of YAP is similar to Marcus' Parsifal \[Marcus79\], with the additional limitation on memory. His parser, like most stack machine parsers, will occasionally fill the stack with structures it no longer needs, consuming unbounded memory. To achieve the finite memory limitation, it must be guaranteed that this never happens on acceptable structures.</Paragraph> <Paragraph position="3"> That is, there must be a procedure (like a garbage collector) for cleaning out the stack so that acceptable sentences can be parsed without causing a stack overflow. Everything on the stack should be there for a reason; in Marcus' machine it is possible to have something on the stack which cannot be referenced again. Equipped with its garbage collector, YAP runs on a bounded stack even though it is approximating a much more complicated machine (e.g. a PDA). ldeg The claim is that YAP can parse acceptable sentences with limited memory, although there may be certain unacceptable sentences that will cause YAP to overflow its stack.</Paragraph> <Paragraph position="4"> 3. Marcus' Determinism Hypothesis The memory constraint becomes particularly interesting when it is combined with a control constraint such as Marcus' Detfrminism Hvvothesis \[Marcus79\]. The Determinism Hypothesis claims that once the processor is committed to a particular path, it is extremely difficult to select an alternative. For example, most readers will misinterpret the underlined portions of (11)-(135 and then have considerable difficulty continuing. \]=or this reason, these unacceptable sentences are often called Qarden Paths (GP). The memory limitation alone fails to predict the unacceptability of (115-(I 3) since GPs don't 8. Chomsky and Lasnik (per~naI communication) have each suggested that the competence model might generate a non-computable ,..eL If this were indeed the c&~e, it would seem unlikely that there could be a mapping onto tile finite performance world.</Paragraph> <Paragraph position="5"> 9. Acceptability is a formal term: see footnote 3.</Paragraph> <Paragraph position="6"> 10. A push down automata (PDA) is a formalization of stack machines. center-embed very deeply. Determinism offers an additional constraint on memory allocation which provides an account for the data.</Paragraph> <Paragraph position="7"> (11) ~T_.~h horse raced past the barn fell.</Paragraph> <Paragraph position="8"> (12) ~John .lifted a hundred pound bags.</Paragraph> <Paragraph position="9"> (1 3) HI told the boy the doR bit Sue would help him.</Paragraph> <Paragraph position="10"> At first we believed the memory constraint alone would subsume Marcus' hypothesis as well as providing an explanation of the center-embedding phenomena. Since all FSMs have a deterministic realization, tl it was originally supposed that the memory limitation guaranteed that the parser is deterministic (or equivalent to one that is). Although the argument is theoretically sound, it is mistaken) ~ The deterministic realization may have many more states than the corresponding non-deterministic FSM. These extra states would enable the machine to parse GPs by delaying the critical decision) 3 In spirit, Marcus' Determinism Hypothesis excludes encoding non-determinism by exploding the state space in this way. This amounts to an exponential reduction in the size of the state space, which is an interesting claim, not subsumed by FS (which only requires the state space to be finite).</Paragraph> <Paragraph position="11"> By assumption, the garbage collection procedure must act &quot;deterministically&quot;; it cannot backup or undo previous decisions. Consequently, the machine will not only reject deeply center-embedded sentences but it will also reject sentences such as (14) where the heuristic garbage collector makes a mistake (takes a garden path).</Paragraph> <Paragraph position="12"> (14) .if:Harold heard \[that John told the teacher \[that Bill said that Sam thought that Mike threw the first punch\] yesterday\].</Paragraph> <Paragraph position="13"> YAP is essentially a stack machine parser like Marcus' Parsifal with the additional bound on stack depth. There will be a garbage collector to remove finished phrases from the stack so the space can be recycled. The garbage collector will have to decide when a phrase is finished (closed).</Paragraph> </Section> <Section position="4" start_page="108" end_page="108" type="metho"> <SectionTitle> 4. Closure Specifications </SectionTitle> <Paragraph position="0"> Assume that the stack depth should be correlated to the depth of center-embedding. It is up to the garbage collector to close phrases and remove them from the stack, so only center-embedded phrases will be left on the stack. The garbage collector could err in either of two directions; it could be overly uthless, cleaning out a node (phrase) which will later turn out to be useful, or it could be overly conservative, allowing its limited memory to be congested with unnecessary information.</Paragraph> <Paragraph position="1"> In either case. the parser will run into trouble, finding the , I. A non-deterministic FSM with n states is equivalent to another deterministic FSM with 2 a states.</Paragraph> <Paragraph position="2"> 12. l am indebted to Ken Wexier for pointing this out.</Paragraph> <Paragraph position="3"> 13. The exploded states encode disjunctive alternatives. Intuitively, GPs mgge.~t that it im't possible to delay the critical decision: the machine has to decide which way to proceed.</Paragraph> <Paragraph position="4"> sentence unacceptable. We have defined the two types of errors below.</Paragraph> <Paragraph position="5"> (15) Premature Closure: The garbage collector prematurely removes phrases that turn out to be necessary.</Paragraph> <Paragraph position="6"> (16) Ineffective Closure: The garbage collector does not remove enough phrases, eventually overflowing the limited memory.</Paragraph> <Paragraph position="7"> There are two garbage collection (closure) procedures mentioned in the psycholinguistic literature: KimbaU's early closure \[Kimball73. 75\] and Frazier's late closure \[Frazier79\]. We will argue that Kimball's procedure is too ruthless, closing phrases too soon, whereas Frazier's procedure is too conservative, wasting memory. Admittedly it is easier to criticize than to offer constructive solutions. We will develop some tests for evaluating solutions, and then propose our own somewhat ad hoc compromise which should perform better than either of the two extremes, early closure and late closure, but it will hardly be the final word. The closure puzzle is extremely difficult, but also crucial to understanding the seemingly idiosyncratic parsing behavior that people exhibit.</Paragraph> </Section> <Section position="5" start_page="108" end_page="109" type="metho"> <SectionTitle> 5. Kimball's Early Closure </SectionTitle> <Paragraph position="0"> The bracketed interpretations of (17)-(19) are unacceptable even though they are grammatical. Presumably, the root matrix&quot;* was &quot;closed off&quot; before the final phrase, so that the alternative attachment was never considered.</Paragraph> <Paragraph position="1"> (17) ~:Joe figured \[that Susan wanted to take the train to New York\] out.</Paragraph> <Paragraph position="2"> (18) HI met \[the boy whom Sam took to the park\]'s friend.</Paragraph> <Paragraph position="3"> (19) ~The girl i applied for the jobs \[that was attractive\]i. Closure blocks high attachments in sentences like (17)-(19) by removing the root node from memory long before the last phrase is parsed. For example, it would close the root clause just before that in (21) and who in (22) because the nodes \[comp that\] and \[comp who\] are not immediate constituents of the root. And hence, it shouldn't be possible to attach anything directly to the root after that and who. js (20) Kimball's Early Closure: A phrase is closed as soon as possible, i.e., unless the next node parsed is an immediate constituent of that phrase. \[Kimball73\] (21) \[s Tom said is- that Bill had taken the cleaning out ...</Paragraph> <Paragraph position="4"> (22) \[s Joe looked the friend is- who had smashed his new car ... up 14. A matrix is roughly equivalent to a phra.,e or a clause. A matrix is a frame wifl~ slots for a mother and several daughters. The root matrix is the highest clause.</Paragraph> <Paragraph position="5"> \[5, Kimbali's closure is premature in these examples since it is po~ibie to interpret yesterday attaching high as in: Tom said\[that Bill had taken the c/caning out\] yesterday.</Paragraph> <Paragraph position="6"> This model inherently assumes that memory is costly and presumably fairly limited. Otherwise. there wouldn't be a motivation for closing off phrases.</Paragraph> <Paragraph position="7"> Although Kimball's strategy strongly supports our own position. it isn't completely correct. The general idea that phrases are unavailable is probably right, but the precise formulation makes an incorrect prediction. If the upper matrix is really closed off, then it shouldn't be possible to attach anything to it. Yet (23)-(24) form a minimal pair where the final constituent attaches low in one case. as Kimball would predict, but high in the other, thus providing a counter-example to Kimball's strategy.</Paragraph> <Paragraph position="8"> (23) I called \[the guy who smashed my brand new car up\]. (low attachment) (24) I called \[the guy who smashed my brand new car\] a rotten driver. (high attachment) Kimball would probably not interpret his closure strategy as literally as we have. Unfortunately computer modeh are brutally literal. Although there is considerable content to Kimball's proposal (closing before memory overflow,), the precise formulation has some flaws. We will reformulate the basic notion along with some ideas proposed by Frazier.</Paragraph> </Section> <Section position="6" start_page="109" end_page="109" type="metho"> <SectionTitle> 6. Frazier's Late Closure </SectionTitle> <Paragraph position="0"> Suppose that the upper matrix is not closed off. as Kimball suggested, but rather, temporarily out of view. Imagine that only the lowest matrix is available at any given moment, and that the higher matrices are stacked up. The decision then becomes whether to attach to the current matrix or to c.l.gse it off. making the next higher matrix available. The strategy attaches as low as possible; it will attach high if all the lower attachments are impossible. Kimhall's strategy, on the other hand. prevents higher attachments by closing off the higher matrices as soon as possible. In (23). according to Frazier's late closure, up can attach t~ to the lower matrix, so it does; whereas in (24). a rotten driver cannot attach low. so the lower matrix is closed off. allowing the next higher attachment. Frazier calls this strategy late cto~ure because lower nodes (matrices) are closed as late as possible, after all the lower attachments have been tried. She contrasts her approach with Kimball's early closure, where :~e higher matrices are closed very early, before the lower matrices are done. j7 (25) Late Closure: When possible, attach incoming material into the clause or phrase currently being parsed.</Paragraph> <Paragraph position="1"> Unfortunately. it seems that Frazier's late closure is too conservative, allowing nodes to remain open too long.</Paragraph> <Paragraph position="2"> congesting valuable stack space. Without any form of early closure, right branching structures such as (26) and (27) are a real problem; the machine will eventually flU up with unfinished matrices, unable to close anything because it hasn't reached the bottom right-most clause. Perhaps Kimball's suggestion is premature, but Frazier's is ineffective. Our compromise will augment Frazier's strategy to enable higher clause, to close earlier under marked conditions (which cover the right branching case).</Paragraph> <Paragraph position="3"> (26) This is the dog that chased the cat that ran after the rat that ate the cheese that you left in the trap that Mary bought at the store that ...</Paragraph> <Paragraph position="4"> (27) I consider every candidate likely to be considered capable of being considered somewhat less than honest toward the people who ...</Paragraph> <Paragraph position="5"> Our argument is like all complexity arguments; it coasiden the limiting behavior as the number of clauses increase. Certainly there are numerous other factors which decide borderline cares (3-deep center.embedded clauses for example), some of which Frazier and Fodor have discussed. We have specifically avoided borderline cases because judgments are so difficult and variable; the limiting behavior is much sharper. In these limiting case,, though, there can be no doubt that memory limitations are relevant to parsing strategies. In particular, alternatives cannot explain why there are no acceptable sentences with 20 deep center-embedded clauses. The only reason is that memory is limited; see \[Chomsky59a.b\]. \[Bar-Hillel6l\] and \[Langendnen75\] for the mathematical argument.</Paragraph> </Section> <Section position="7" start_page="109" end_page="110" type="metho"> <SectionTitle> 7. A Compromise </SectionTitle> <Paragraph position="0"> After criticizing early closure for being too early and late closure for being too late. we promised that ~e would provide yet another &quot;improvement&quot;. Our suggestion is similar to late closure, except that we allow one case of early closure (the A-over-A early closure principle), to clear out stack space in the right recursive case. I~ The A-over-A early closure principle is similar to Kimball's early closure principle except that it wait, for two nodes, not just one. For example in (28). our principle would close \[I that Bill raid $2\] just before the that in S 3 whereas Kimball's scheme would close it just before the that in S 2 .</Paragraph> <Paragraph position="1"> 16. Deczding whether a node ca__nq or cannot attach is a difficult question which must be addressed. YAP uses the functional .~tructure \[Bre.'man (to appear)\] and the phrase structure rules. For now we will have to appeai to the reader's intuitions.</Paragraph> <Paragraph position="2"> |7, Frazier'.s strategy will attach to the lower matrix even when the final particle is required by the higher ciau.,.e &, in: ?! looked the guy who smashed my car ,40. or ?Put the block which is on the box on the tablC/~ ig. Earl)' closure is similar to a compil&quot; optimization called tail recursion, which converts right recursive exp,'essions into iterative ones, thus optimizing stack u~ge. Compilers would perform the optimization only when the structure is known to be right recursive: the A..over-A clo.,,ure principle is somewhat heuristic since the structure may turn out to be center-embedded.</Paragraph> <Paragraph position="3"> (28) John said \[I that Bill said \[2 that Sam said \[3 that * Jack ...</Paragraph> <Paragraph position="4"> (29) The A-over-A early closure principle: Given two phrases in the same category (noun phrase, verb phrase, clause, etc.), the higher closes when both are eligible for Kimball closure. That is. (1) both nodes are in ~he same category, (2) the next node parsed is not an immediate constituent of either phrase, and (3) the mother and all obligatory daughters have been attached to both nodes.</Paragraph> <Paragraph position="5"> This principle, which is more aggressive th.qn late closure, enables the parser to process unbounded right recursion within a bounded stack by constantly closing off. However, it is not nearly as ruthless as Kimball's early closure, because it waits for two nodes, not just one. which will hopefully alleviate the problems that Frazier observed with Kimball's strategy.</Paragraph> <Paragraph position="6"> There are some questions about the borderline cases where judgments are extremely variable. Although the A-over.A closure principle makes very sharp distinctions, the borderline are often questionable, l~ See \[Cowper76\] for an amazing collection of subtle judgments that confound every proposal yet made. However, we think that the A-over-A notion is a step in the right direction: it has the desired limiting behavior, although the borderline cases are not yet understood. We are still experimenting with the YAP system, looking for a more complete solution to the closure puzzle.</Paragraph> <Paragraph position="7"> In conclusion, we have argued that a memory limitation is critical to reducing performance model complexity. Although it is difficult to discover the exact memory allocation procedure, it seems that the closure phenomenon offers an interesting set of evidence. There are basically two extreme closure models in the literature. Kimball's early and Frazier's late closure. We have argued for a compromise position: Kimball's position is too restrictive (rejects too many sentences) and Frazier's position is too expensive (requires too much memory for right branching).</Paragraph> <Paragraph position="8"> We have propo~d our own compromise, the A-over-A closure principle, which shares many advantages of both previous proposals without some of the attendant disadvantages. Our principle is not without its own problems; it seems that there is considerable work to be done.</Paragraph> <Paragraph position="9"> By incorporating this compromise, YAP is able to cover a wider range of phenomena :deg than Parsifal while adhering to a finite state memory constraint. YAP provides empirical evidence that it is possible to build a FS performance device which approximates a more complicated competence model in the easy acceptable cases, but fails on certain unacceptable constructions such as closure violations and deeply center embedded sentences. In short, a finite state memory limitation simplifies the parsing task.</Paragraph> </Section> class="xml-element"></Paper>