File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2095_metho.xml
Size: 24,537 bytes
Last Modified: 2025-10-06 14:12:59
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2095"> <Title>Isolating Cross-linguistic Parsing Complexity with a Principles-and-Parameters Parser: A Case Study of Japanese and English *</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Princlplc-based parsing </SectionTitle> <Paragraph position="0"> In a t)rinciple-based parser, construction- and language~ speeitic rules arc rcplaeed with broader principles that remain invariant aside from parametric variation (see below). The parser works by a (partially interleaved) generate-and-test tedmique that uses a canonical LR(1) covering grammar (derived from X theory plus the theory of movement) to lirst buibl an initial set of tree structures; these structures are then run through a se1'1'o the best of our knowledge, this system is the first and broa(lest-coveragc of it~ type to be able to parse Japanese aml English by setting jnt;t a few parameter switches. Dorr (1987), uoder the supervlsi:)n of tim second author, developed a c(mc&quot;i:tually similar scheme to fiandle L'nglish, Spanish, and (~crman. lit,wryer, l)orr's system did not have the same bro~td coverage of English; did not handle Japanese; used hand rather than automatic compiling; and was approximately 15 times slowek. Gunji's (1987) Japanese unificeti(m grammar comes closest to the principle-ba~ed model, but requires hand-modification from a set of core principles ~utd does not really accommodate the important Japanese phenomenon of scrambling; see below. Otficr such systems work only on nmch smatlcr parts of English, e.g., Sharp (1985); Wehrli (1987); Crocker (1989); Cortes (1988); Johno son, (1989); or are not in fact parsers, but proof-cfieckers, e.g., Stabler, (1991, forthcoming).</Paragraph> <Paragraph position="1"> Ac~\].;s DE COLING-92, NANTES, 23-28 Ao(rr 1992 6 3 1 l'l~o(:, oi: COI,ING-92, N^NTES, AUO. 23-28, 1992 ries of predicates whese conjunction defines the remainder of the constraints the (sentence, I)hra.ue structurc, LF) triple must satisfy. This is done using familiar machinery from Prolog to output LFs that satisfy all the declarative constraints of the linguistic theory. In practice, a straightforward generate-and-test mechanism is grossly inefficient, since the principles that apply to at the level of surface structure (S-structure) are but a fraction of those that apply in the overall system.</Paragraph> <Paragraph position="2"> The usual problems of lexical and structural ambiguity the the underconstrained nature of the initial X system means that the number of possible S-structures to hypothesize may be buge. qb obtain an efficient parser we use a full multiple-entry table with backtracking (as in 2bruits, 198fi), extending it to a canonical LR(1) parser. The LIt. machine uses an atttomatically-built S-structure grammar tbat folds together enough of tim constraints from other principles, parameters, lexical subcategory information oflline to produce a 25-fold improvement over tile online phrase structure recovery procedure originally proposed by Fong and Berwick (1989). Optimizations include extra conditions in action classes to permit interleaving of other principles (like movement) with structure-building (the 'intcrleaw ing' noted by principles marked 'l' in the snapshot in figure 2 below); control structure flexibility in principle ordering; precomputation of the LK transition function; elimination of infinite recursion of empty elements by an additional stack mechanism, and so forth. We exploit the explicit modularity of the principle-ba.qcd system in way that is impossible in an ordinary rule-based system: we can build a grammar for phrase strneture that is small enough to make full, canonical LI~.(1) parsing usable, unlike large CFGs. The earlier error detection of full Lll(1) parsing over LALtt methods means that fail as early as possible, to avoid expensive trcc constructions that can sever participate in final solutions. 2</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The Japanese parser </SectionTitle> <Paragraph position="0"> We begin with a very simple parameterization of Japanese that will nonetheless be able to cover all the Lasnik and Salts w/l-questions, scrambling, and so forth; sec tile table on the next page that follows the example sentences. The important point is that very little additional must bc said in order to parse a wide variety of distinctive Japancsc sentences; the principles as shown on tbe ri~hthand side of the computer snapshot do no~ change. ~ Consider first the example wh-movement sentences found in the linguistics paper On the Nature of Proper Govcramenl by Lasnik & Salts (1984). 4 These seu:qb providca rough measure of the machine size for the phrase structurc grammar of S-structure for both English and Japanese, the augmented CFC, consists of about 74 productions derived fronl a schema of 30-34 rules. The resulting characteristic tisitc state automaton (CFSM) consists of 123 states with 550 traalsitions between the various states. The action table consists of a totM of 984 individual (nonerror) entries.</Paragraph> <Paragraph position="1"> shown is 0.37sees/word on a Symbolies 3650 (64K LIPS) (el tence.q (listed below) display nlany familiar typological Japanese-English differences, and cover a rather soplfiaticated sct of differences between English and Japanese: for instance, why (6) is fine in Japanese but not in English; frec omissiol) of NPs; &quot;scrambling&quot; of subjects and objects; Verb-final (more generally, IIeadfinal) constituent structure, and no overt movement of wh-phras~. We also consider a different set of Japanese sentences (also listed below) designed to illustrate a range of the same phenomena, taken from ttosokawa (1990). We stress that these sentences are designed to illustrate a range of sentence distinctions in Japanese, as well a.q our investigative method, rat|mr than serve as any complete list of syntactic differences between the two languages (since they aro obviously not). s</Paragraph> <Paragraph position="3"> (2) Watashi-wa Taro-ga nani-o katta ka shitte iru 'I know what Johll bought' {6) Kimi-wa dare-ni Taro~ga naze kubi-ni natta tte itta no 'qb whom did you say that John was fired why' (32) *Meari-wa Taro-g~ nani-o katta ka do ka sltiranai 'Mary does not know whether or sot John bought what' (37a) Taro-wa haze kubi-ni natta no 'Why was John fired' (37b) Iliru-wa Taro-ga haze kubi-ni nntta tte itta no 'Why did Bill say that John was fired' (39a) Taro-ga nani-o te-ni ireta koto-o sonnani okotteru no ~What arc you so angry about the fact that &quot;Faro obtained' (39b) ~l'aro-ga naze sore-o te-ni ireta koto-o sonnani okotterll no 'Why are you so angry about the fact that Taro obtained it' (41a) ltanoko-ga Taro-ga nani-o te-ni frets tte itta koto0 sonn&lll okottcru )to ~What arc you so angry about the fact that I\[anoko said that Taro obtained' (4lb) *Hanoko-ga Taro-ga naze sore-o re-hi frets tte itta koto-o Solinalli okottern no 'Why are you so angry about the fact that Ilanoko said that Taro obtained it' (60) Kimi-wa nani-o doko-de katta no ~Where did you buy what' (63) Kimi-wa nani-o sagashiteru no 'Why are you looking for what' Complement/noneomplement asymmetry, scrambling and uneXl)eeted parses To see bow the parser handles one Japanese examplc (see the actual computer output in figure 1 or figure 2), consider (39a) (and thc corresponding illicit (39b)), where a complement wh but not a noncomplement wh can be extracted from a complex NP: (a) Taro-ga nani~o te-ni frets koto-o sonnani okotterun no; (b) *Taro-ga haze sorc-o re-hi frets koto-o 'What/*Wlly are you so angry about the fact that 'Faro obtained' Tbis example illustrates several Japanese typological differences with Englisb. The subject of the matrix clause (= you) has been omitted. Nani ('what') and te ('hand') have been scrambled; the direct object = 1.52see, n= 100). Parsing time on a Sun Sparestation 2 is approximately an order of magnitude faster. SE.g., the doublc-o constraint; cast-overwriting, passive and causative constructions, etc. all remain to be fully implmoented.</Paragraph> <Paragraph position="4"> ACYEs DE COLING~92, NANTI~S, 23-28 AOt)r 1992 6 3 2 PROC. OF COLING-92, NANfI!S, AUG. 23-28, 1992 (marked -o) now al>pcaring in front of tim indirect object re, Phr~ule structure is llead final. Our relaxation of the Caac Adjacency paranteter and the rule that allows adjunctiou of NP to VP, plus transmission of Case to the scrambled NP will let this analysis through. '\]?he LF for this nentence should be something along the lines of: for what x, pro is so angry about \[tiLe fact that &quot;Faro obtained x\] IlL this example ply denotes the understood subject of okottern (&quot;be angry&quot;). The Ll:s actually returned by the parser are shown in tile siLapshot in tigure l.S Tile parametric differences that we need to accomodate all these differences between English and Japanese arc quite few: OWe will not have room to tlescribe in detail the derivation of these LFs. But, it uhouhl be noted tbat the derivation sequence is quite complex. Note, for example, that .ant ('what') undergoes moventent at two levels of phrase structure in order to get to the specilier position of the matrix Complementizer: lOP nani\[IP *l~aro\[NP\[CP pro\[ VP~t'&\[ VI~ iretal\]\] huts\],..\]\] Furthermore, the LF trace t' violates the so-called empty category principle unless it is deleted (as indicated by \[\] in the snapshot), under the present theory. Tile lack of whntovement at S-structure in Japanese, and its presence in Engbsh, interacts with these constraints to bar example8 mpecFinal :- \+ apeelnitial.</Paragraph> <Paragraph position="5"> headFm~l. headlnitial :\+ hendFinal. agr(weak).</Paragraph> <Paragraph position="6"> boundingNode(i2 ). boundingNode(np).</Paragraph> <Paragraph position="7"> ;- no caneAdjacency, :-no whlnSynmx i)roDrop. __ _ As one can see from the figure, the system does correctly rccover the right l,F, a.s the lmut one in snapshot. llowever, it also (surprisingly) discovers three additional LFs, illustrating the power of the system to uncover alternative interpretations that a proper theory of context would have the job of ruling out. Ignoring in(liccs~ they all have tile sanlc t~)rn|: for what x, 'Faro is so angry about \[the f~tct that pro obtained z\] llere the embedded subject 7hro h~ been interchanged with the matrix subject pro. It turns out that the sentence happens to bc ambiguous with respect to the two basic interl)rc~atiotts, z l,br complcteness, here ate the three variants of that correspond to the first three LFs reported l)y the parser. S. Miyagawa (i).c.) informs us that the last two, given proper context, are in fact possible. These include.: (1) pro is eoreferent with koto (&quot;fact&quot;): s, i.e., for what x, Taro is so attgry about \[the fact that tim fact obtained x\]; (2) pry is corefcrent with taro: for what z, Taro is so angry about \[the fact that Taro obtained ~:\]; and (3) pry is free in the sentence: for what x, Taro is so &ngry al)out \[thc fact that (someone else) t)btained x\]. ~</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Parsing Japanese: the computational </SectionTitle> <Paragraph position="0"> effects of scrantbllng~ pro-drop, and phrase structure Next we turn to the investigation of the computalioaal differcnee,~ between the two languages that we have explored, and show how to use the system in mJ exploratory mode I~o discover complexity differences between English and Japanese. Ia the discussion that fop lows, we shall need to draw on comparisons between the complexity of different parses. While this is a dclicate matter, there arc two obvious metrics to use in comparing this parser's comt)lexity. The tirst is the total number of principle operations used to analyze a sentence the munber of S-structures, chain forlnations, indcxings, tile case filter and otitcr constraint applications, etc. We can treat these individually and tm a whole to give all account of the entire &quot;search space&quot; the parser moves thr(mgh to discow~r analyses, llowever, this is rThis was pointed out by D, PesetHky, and conlirmed by M. Salts. llowever, t)resumably the nse of wa rather than 9a and intonational pauses could be exploited as a surface cue to rate out more gcnerally ambiguity in this examptc and others like it. See l'bng and llerwick (1989) for a discuusion of how to integrate mtrfax:e cues into the principled~ased ~ystem.</Paragraph> <Paragraph position="1"> tThis interpretation c~n be eliminated by itoposing sclcctional restriction, on the possible &quot;agents&quot; of okotteru (let uu say tbat they muut be animate).</Paragraph> <Paragraph position="2"> ~Itaving a parsing system that can recover all such linguistic alternatives is of interest in its own rigltt, both to verify and correct the linguiutie theory, as well a8 enmlre that no possibilities are overlooked by human interpreters. often not a good measure of the total time speut in analysis. The second measure we use is more particular and precisely tailored to the specific backtracking-Lit design we have built to recover structural descriptions: we can count the total number of Lit finite-state control steps taken in recovering the S-structure(s) for a given sentence; indeed, this accmmts for tile bulk of parslug time for those cases, as in Japanese and many English sentences, where multiple quasi-S-structures are returned. Taken together, these two measures provide both a coarse and a more fine-grained way of seeing what is hard or easy to compute) deg</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Complexity of Japanese parsing </SectionTitle> <Paragraph position="0"> Given this initial set of analyses, let us now examine the complexity of Japanese sentence processing as compared to English. To do this, we initially examined sentences that we thought would highlight the ease of Japanese relative to English, namely, the &quot;classic&quot; English center-embedded vs. Japanese left-branching constructs from Kuno (1973), e.g., The cheese the rat the cat John keeps killed, :Taro-ga kaHe-iru ncko-ga korosila nezumi-ga On the conventional Chomsky-Miller account, the English construction is very difficult to parse, while the left-branching Japanese form is completely understandable. Interestingly, as shown in figure 2 the nmnber of operations required to complete this parse correctly is enormous, as one can see from the righthand column numbers that show the structures that are passed into and out of each principle module.</Paragraph> <Paragraph position="1"> It at first appears that left-branching structures are definitely not simpler than the corresponding center-embedded examples. Why should this be? On a modern analysis such as the one adopted here, recall that restrictive relative clauses, e.g. the rat the cat killed, are open sentences, and so contain an operator-variable structure coindexed with the rat, roughly: (l) \[NP\[NP the rat\]l \[ep Op .... the cat killed h\]\] ldegNote that these two are metrics that are stable across compile-cycles and different platforms. This would be not true, of course, for simple parse times -- the obvious alternative.</Paragraph> <Paragraph position="2"> where the empty operator (Op) is base-generated in an A-position and subsequently fronted by Move-c~ (Chomsky, 1986:86).</Paragraph> <Paragraph position="3"> Thus, the Japanese structures are center-embedded after aU--thc parser places a potcntially arbitrary string of empty Operators at tile front of tile sentence. Perhaps, then, the formal accounts of wily this sentence should be easy are incorrect; it is formally difficult but easy oil other grounds. Of course, alternatively, the thepry or parsing model could be incorrect, or perhaps it is scrambling, or pro-drop, or the tlead-final character of the language makes such sentences difficult. In the rest of this paper we focus on 3 attempts to discover the source of the cmnplcxity.</Paragraph> <Paragraph position="4"> To investigate these questions, wc embarked on a series of optimization efforts that focused on the Spec positions of CP and the Ilead-final character of tile language, with the goal of making the Japanese as easy, or easier than, the corresponding English sentences or determining why we could not make it easier. In all, we conducted three empirical tests: (1) using dummy nonterminals to &quot;lift&quot; information from the verb to the VP node, to test the lIead-first/final hypothesis; (2) placing Spec of CP on the left rather than the right, to test the center-embedding hypothesis; and (3) building a &quot;restricted&quot; pseudo-Japanese that eliminated scrambling and frec pro-drop, while nol lifting the information up aml to the left, leaving the llead-final character intact. We will next cover cash computer expcrimeut in turn. Figure 3 gives a bar-graph summary of tim three experimental results in the form of times improvemcnt (reduction) ill LR state creation.</Paragraph> <Paragraph position="5"> Optimization 1: Head-final information Our first optimization centers on the IIead-fiual phrase structure of Japanese. With Heads at the end, valuable information (subcategorization, etc.) may bc unavailable at the time the parser is to make a particular decision, tIowever, for our Lit machine, there is a well-known programming language optimization: introduce dummy nonterminals on the left of a real nonterminal, e.g., VP--* X V NP, which, when reduced, call semantic action routines that can check the input stream for a particular property (say, tile presence of a noun arbitrarily far to the right). Specifically, if verb hand fringe of the tree, note the string of empty operators, ms well as, on the right-hand column, the large number of parser operations required to build this single correct LF as COml)arcd to English (in the text). Still, a single parse is correctly returned.</Paragraph> <Paragraph position="6"> information occurs on the right we can oflline &quot;lift&quot; that information up to the VP node, where it can then influence the Lit state transitions that are made when examining material to the left of the head. For example, for each V subcategory, the LK machine will contain in effect a a new Lit state; the system will add a command to look ms far into the input as needed to determine whether to branch to this new state or another V subcategory state. This is precisely tile mechanism we used to determine whether to insert an empty category or not in a flead-first language. For instance, in Japanese relative clauses this is of importance because tile parser may get valuable information from the verb to determine whether a preceding NP belongs to that relative clause or not. tile action and transition tables of the resulting Japanese machine, which we will call &quot;optimized,&quot; will be far larger than its base case counterpart (more precisely: the action table is 3 times larger, or about 380K to 980K, while tile transition table is about twice as large, 72K to 142K).</Paragraph> <Paragraph position="7"> The advantages accrued by this optimization are substantial, 2-10 times better; see the table below. (This also holds across other sentences; see the bar graph summary at the end of the paper.) The unoptimized number of LR state transitions grows astonishingly rapidly. For example, the transitions needed to parse ce4 is exactly mu shown--over 20 million of them, compared to The same basic trend also holds, though not as strongly, when we look at these and other sentences in terms of the total number of principle operations required; while we do not have space to review all of these here, as an example, sentence (15b) takes 4126 operations in the base case, and 455 when optimized in this fashion; while ce3 takes 1280 operations and 667 when optimized, respectively.</Paragraph> <Paragraph position="8"> a i We should point out that in all cases, about a two-thirds of these transitions occur before the LR machine reaches a point in the search space where the solutions are &quot;clustered&quot; enough that the remaining solutions do not take go much effort.</Paragraph> <Paragraph position="9"> AcrEs DE COLING-92, NANTES, 23-28 Aour 1992 6 3 5 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 Optimization 2: Spec of CP on the right A second obvious strategy is to remove the center-embedding itself, llere there is a grammatical move we can make. Evidently, in Japanese the only elements that appear in Spec of CP are put there by LF movement. Thus, these elements can never be visible in this position on the surface. If this is so, then there is really nothing to prevent us from placing just the Spec of CP on the right, rather titan the left. This is an example of the &quot;testbed&quot; property of the system; this change takes two lines of code. Given this change, the resulting structures will have their Operators on the right, rather than the left, and will not be center-embedded.</Paragraph> <Paragraph position="10"> In addition, in this test the parser will not take advantage of right-hand information, thus eliminating this a.s a possible source of speedup.</Paragraph> <Paragraph position="11"> Parsing complexity is reduced by this move, by a factor of just about one-half, if one considers either LR state transitions or principle operations; not. as good as the first optimization; see below for some representative results. Also, with the most deeply center-embedded sentence the total number of principle operations actually is worse titan in the base case. Evidently we have not located the source of the parser's problems in center-embedding alone.</Paragraph> <Paragraph position="12"> Complexlty for Spee on the right While it appears that tlead~final information helps the most, we nmst also remember that part of the complexity of Japanese is the result of frce scrambling and pr0-drop. To factor apart these effects, we ran a series of computer experimeuts on a quasi-J apanese grammar, J*, ttlat was just like Japanese except scrambling and pro-drop were barred. The changes were again simple to make: one change was automatic, just turning off a parameter value, while the second involved 3 lines of hand-coding in the X&quot; schemas to force the system to look for a lexical NP in DO (and IO) positions l&quot;urther, we did not optimize for right-hand information (so that the tlead-final character was left intact). Of course, we now can rio longer parse sentences with scrambled direct objects.</Paragraph> <Paragraph position="13"> The table below shows the results. This was the best optimization of all. Without scrambling, and hence no movement at all compared to English, the Ilead-final quasi-Japanese was for the most part parsed 510 times more efficiently than English, and at worst (for the triply-embedded sentence) with three times fewer LR transitions and only about 30% more principle operations than English. Thus, this was even more efficient than the righthand information optimized Japanese parser. (The first column gives the number of LK transitions and the second gives the total munber of principle operations for this &quot;no scramble/drop&quot; version, while the last two columns give the same informa- null As before, with a short sentence, there is little difference between optimization mcthods, but over a range of sentences and with longer sentences, the no-scramble or pro-drop optimization works better than any other.</Paragraph> <Paragraph position="14"> Evidently, given the framework of assumptions we have made, the IleFtd-fnal character of Japanese does not hurt the most; rather, it is scrambling and pro-drop that does, since if we remove these latter two effects we get the biggest improvement in parsing efficiency. We can confirm this by looking at the Lt~ transitions for the other sentences (lb)-(18b) across methods, summarizing our tests. We can summarize the three experiments acro~q sentences in figure 3.</Paragraph> <Paragraph position="15"> Summary of colnplexity across teats u Sentence ,opt. Opt. Spcc-Final No Scra-</Paragraph> </Section> class="xml-element"></Paper>