File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-2009_metho.xml
Size: 21,202 bytes
Last Modified: 2025-10-06 14:12:24
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2009"> <Title>A Logic-Based Government-Binding Parser for Mandarin Chinese</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. A Government-Binding Based Logic Grammar Formalism </SectionTitle> <Paragraph position="0"> The formal definition of Government-Binding based Logic Grammars (GBLGs) is specified incrementally in the following.</Paragraph> <Paragraph position="1"> Definition 1. A Government-Binding based Logic Grammar is a 6-tuple GBLG = (T,2,B,S,C,R) where: (1) T is the set of lexical terminals. Each lexical terminal is denoted by an atomic formula with lexical category as its predicate symbol.</Paragraph> <Paragraph position="2"> (2) ,'~ is the set of non-terminals. Y. = ZI' U ~\]v k) \]~M k) ~G where: (a) Zp is the set of phrasal non-terminals. Each phrasal non-terminal is represented by an atomic formula with phrasal category as its predicate symbol.</Paragraph> <Paragraph position="3"> (b)Y. V is the set of virtual non-terminals. Each virtual non-terminal is specified by an atomic formula. (c)Y. M is the set of movement non-terminals. A movement :non-terminal is one of the following two forms: A<<<BorB>>>AwhereAETk) ~pt9 ~v,and B E ~V&quot; \]~-7'LM and ~RM denote the set of non-terminals A <<< B and the set of non-terminals B >>> A, respectively. (d)~ G is the set of goals. Each goal is denoted by a literal.</Paragraph> <Paragraph position="4"> (3) B C ~p is the set of bounding non-terminals. A botmding non-terminal is a phrasal non-terminal with bounding node as its predicate symbol.</Paragraph> <Paragraph position="5"> (,4) S E ~p is the start non-terminal.</Paragraph> <Paragraph position="6"> (5) C is the set of logic connectives 'and' and 'or' that are denoted by ',' and ';' respectively. A grammar element is defined rccursivcly in terms of logic connectives as follows: (a) A lexical tm'minal L E T is a grammar element. (b)A phrasal non-terminal P E ~p is a grammar element. (c) A virtual non-terminal V E .~v is a grammar element. (d)A movement non-terminal M ~ ~M is a grammar element.</Paragraph> <Paragraph position="7"> (e)A goat E ~JG is a grammm&quot; element.</Paragraph> <Paragraph position="8"> (f) If A and B are grammar elements, then (A,B) and (A;B) are g,ammar elements.</Paragraph> <Paragraph position="9"> The first five types are called basic grammar elements, and the last one is a compound grammar element. Let G t and G E be the set of basic granmaar elements and the set of compound grammar elements, respectively.</Paragraph> <Paragraph position="10"> (6) R is the set of production rules. A production rule is of the following form:</Paragraph> <Paragraph position="12"> It is obvious each production rule can be translated into a sequence of production rules with the logical operator 'and' only.</Paragraph> <Paragraph position="13"> An example written with this formalism is shown as follows. It captures the relative clauses in English like &quot;The man who he met is a teacher.&quot; (rl) s-->np, vp.</Paragraph> <Paragraph position="14"> (r2) np --> pronoun.</Paragraph> <Paragraph position="15"> (r3) np --> det, notm.</Paragraph> <Paragraph position="16"> (r4) np--> det, noun, rel.</Paragraph> <Paragraph position="17"> (r5) vp -.-> tv, np.</Paragraph> <Paragraph position="18"> (r6) vp --> tv, trace.</Paragraph> <Paragraph position="19"> (rT) vp --> iv.</Paragraph> <Paragraph position="20"> (r8) rel --> tel pronoun <<< trace, s.</Paragraph> <Paragraph position="22"> The rule (r8) describes a constituent in phrase structure s is extraposed to the rel pronoun position. Which constituent may be moved from which position is specified by rule (r6).</Paragraph> <Paragraph position="23"> Definition 2. \]\[:or X E ~p, Y E ~v and TR is a transitive relation, X TR Y if (1) X is tile rule head of a production rule, and Y is a grammar clement in its ntle body, or (2) X is tile rule head of a production rule, 1 {- Y.p is a grammar element in its rule body, and I TR Y, or (3) there exist 11, 12 ..... and I n E ~,p, such that X TR I t</Paragraph> <Paragraph position="25"> The transitive relation TR is also a dominate relation. This is because TR is a dominate relation between a phrasal non-temfinal and a virtual non-terminal.</Paragraph> <Paragraph position="26"> Definition 3. A production rule X 0 --> X 1, X 2 ..... X m (where X i E G I for I < i < m) is significant if it satisfies the extra restrictions: (1) for any grammar element X i = (A <<< B) E \]~LM, there must exist some Xj, i <j -< m, such that (Xj, B) E TR. (2) for any grammar element X i = (B >>> A) E ~RM' there must exist some Xj, 1 _<j < i, such that (Xj, B) E TR. A logic grammar GBLG is significant if each production rule E R is significant. The above sample grammar is significant for the following reasons: (1) The rules (rl) - (r7) are significant trivially. (2) The rule tel --> rel pronoun <<< trace, s is significant because there exists a transitive relation TR 1 such that s TR 1 vp TR l trace.</Paragraph> <Paragraph position="27"> Proposition 1. The c-command condition is embedded implicitly in GBLGs if these grammars are significant. Proof. For a significant production rule: X 0 --> X l, X 2 ..... X m if X i = (A <<< B) E ~LM then there must exist some Xj (i < j < m), such that Xj dominates the virtual non-terminal B in the other production rule. The phrasal non-terminal X 0 is the first branching node that dominates A and Xj, and thus also dominates B. Therefore, A c-commands B. X i = (B >>> A) E ~RM has the similar behavior.</Paragraph> <Paragraph position="28"> This property can be used to check the con'ectness of granmwas automatically before parsing.</Paragraph> <Paragraph position="29"> Definition 4. The transitive relation TRsubjacency is a subset of TR and satisfies the restrictions: for X E ~p, Y E Y~V, X TRsubjacency Y if X TR I l TR 12 TR ... TR I n TR Y, and there does not exist more than one Ij such that lj E B. Proposition 2. A significant logic grammar is a restrictive context sensitive grammar. This is because the truth value of a movement non-terminal depends on the appearance of a virtual non-temainal preceding or following it.</Paragraph> <Paragraph position="30"> /Chen 1988/ proposes a bottom-up parsing system for GBLGs. Figure 2 shows the execution of our sample grammar for the sentence &quot;The man who he met is a teacher&quot;. The label on the are indicates the step number during parsing. The empty constituent trace is generated in phrase vp, then passed to phrase s, and finally cut in phrase rel. Comp,'tred with other logic programming approaches/Matsumoto 1983, McCord 1987, Pereira 1981, Stabler 1987/, especially RLGs/Stabler 1987/, GBLGs have the following features: (1) the uniform treatments of leftward movement and the rightward movement, (2) the arbitrary number of movement non-terminals in the rule body, (3) automatic detection of grammar errors befi)re parsing. The former two features are useful to express the highly flexible languages like Chinese.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. A Chinese Parser </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Topic-comment Structures </SectionTitle> <Paragraph position="0"> Topic-comment structure is one of the specific features in Mandarin Chinese. There are several interesting linguistic phenomena concerning these structures: (1) Topic may be moved from the argument positions in the comment - as subject, direct object, or indirect object. (2) Many categories may appear in the topic position, e.g. n&quot;, s', v&quot;, or p&quot;.</Paragraph> <Paragraph position="1"> (3) There may be multiple topics in a sentence.</Paragraph> <Paragraph position="2"> (4) The comment may not contain a constituent which is anaphorically related to the element in the topic.</Paragraph> <Paragraph position="3"> Under the above observations, topic may be represented as:</Paragraph> <Paragraph position="5"> The second argument of predicate topic specifies the phrasal category of the topic, i.e., n2bar in this example. It is important for tile parser to decide whether the constituent may co-index with a trace.</Paragraph> <Paragraph position="6"> Next, the production rules for generating sentences are shown as follows: s 1 bar(s 1 bar(Topic 1 ,Topic2,S)) --></Paragraph> <Paragraph position="8"> ()1' these three production rules, the first two define the &quot;topie-comnrent&quot; pattern, and the last one is a rule without topic.</Paragraph> <Paragraph position="9"> Finally, the phrasal non-terminal s is introduced.</Paragraph> <Paragraph position="11"> &quot;llhe first s rule is a nornutl case, i.e., no movement. Semantic denotes the semantic feature of tire head noun, It must be unifiable with tt~e semantic feature prey dec by the matrix verb with the type tree matching/McCord 198'7/. The same logical variable Case appears in the phrasal non-temfinals n2bar and v2bar. It means tire case of subjcct is assigned by tire maltix vcrb externally according to 0 - theory, The second s rule captures one IC/f tile movement transfornralions - relativizaticm, topiealizalion, ha-Ira rs\[o II ttion, or bci-transformat on An (,' err llorlll phFas(? is lllovcd via the foFlrlCr operatiotl, \[hlis ii virtual noll-ternrinal tlTI( (:(X in/(;(n2hdr,5'emavtic,lnde.r, Cave)) i:C/ left at the empty sile. It specific:; onb' n2bar can appear herc, a~d what ki :Is i) no;.'ements are not concerned. Tile semantic t~.:ature and case arc confined by the matrix verb. The third s rule deals with beiqra rst'o m~ tie ~. Vet example, (The tiller i is arrested t i by, tire t'o\]ice.) ~'he thief (')J\['; {\[;~;l dxl'l~ ') is m}t a lo.,qcal subject of v2bar. The r~:al subject is tile object af bci (~:), i.e., the police. Thus, at different group <S,I,C> of vaiab!es is used. The ti2bar acts as the &quot;.' .. o,\]cct of if)/)or or the subicct of lhe embedded sentence. &quot;l'he tol1I-I\]1 5&quot; rtl!e ctlpItlrcs double movements for fill tl2bar, l:or</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 c G~irll I le, </SectionTitle> <Paragraph position="0"> (The thief i arrested t i by tire police escaped again.) A left-moved constituent (')\]lt {N ,~\]x{~\]'~', the thief) is moved rightward furthermore. In this rule, two virtual non-terminals appear art both sides of movement operator '<<<'. Tim fifth s nile describes those sentences without subject. An atom nosut)j ins/cad of ,wd.~/,~pecilics StlC}l ii silualioli.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Nnt,n Phrase </SectionTitle> <Paragraph position="0"> A rlo/lrl phrase ca~l be a protlOtll?~ a simple noun, or a noHn phls other elements that act as pre-modifiers of that noun.</Paragraph> <Paragraph position="1"> Those clements are (1) classifier phrases, (2) associative phrases, and (3) modifying phrases. Only associative phrase, relative clause, and appositive clause atre listed in the tbllowing. Associative phrase denotes two noun phrases are linked by a special Chinese word tie ('f19 '). For example,</Paragraph> <Paragraph position="3"> represents this constnmtion. The definition of associative clause is:</Paragraph> <Paragraph position="5"> Both relative clause and appositive clause are nominalization in the form: nominalization + head noun, and are defined as follows:</Paragraph> <Paragraph position="7"> ttowever, they are different in the restricting the reference of tire head noun. The head noun that a rehttive chmse modifies refers to some unspecified participant in the nominalization part. l:or example, (the former i who t i grows fluits), ',rod 4lt~\]'tJ N t i \[l',J :/k-~C/ i (the fluits i that they grow ti).</Paragraph> <Paragraph position="8"> The head uoun 'Zk-~ -~' (tire fruits) refers to an empty constituent (either subjcct or object) in the relative clause. This type of constructions can be considered a rightward movement. For appositive clause and head noun pair, tile head noun does not refer to any entity in the modifying clause, i.e., appositive clause, t;or example, ;fJ~ {l'g ~It N::e: fl',J N (the matter concerning our renting a house).</Paragraph> <Paragraph position="9"> The nominalization ,~.~ C/lj ;fll .~-~' (our renting it house) serves as a complement to the head noun -:~' (the matter). This type of constrllctiorrs cannot be regarded as a 111ovcrllerlt transformation. Two rules are specitied for them:</Paragraph> <Paragraph position="11"> app(App), n2bar(N2bar,S,I,C,Classifier). The only difference between these two rules is a trace has to be found i&quot;n rehltive clause. Note the cases of the empty constituent and the overt constituent may be different in relative clause + head noun cot}strut/ion. For tire sake of space, the nlbar is neglected in this paper.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Verb Phrase </SectionTitle> <Paragraph position="0"> Different from a noun phrase, a verb phrase may have pre-modifiers and post-modifiers. The preverbal specifiers are ha-phrases, bei-phrases, adverbial phrases, degree phrases, preposition phrases, quantifier phrases, aspect, and modal.</Paragraph> <Paragraph position="1"> The postverbal modifiers are semential constructions, adverbial phrases, quantifier phzascs, classifier phrases, prepositional ph,ases, and aspect. Only Serial Verb Constructions (SVCs) are abom to discuss in detail. The rule v2bar(v2bar(Va 1 bar, V b 1 bar),S,I,\[C l,C2\],subj) --> v 1 bar(Va 1 bar,S,I,C 1 ,sub j), vl bar(Vb 1 bar,S,I,C2,su b j) means two separate events juxtaposed together, e.g. ~'J~ iv' -~\] Iv' i~2-2J~\] (I Iv' bought a ticket\] and iv' went inD. It is one of the SVCs. The two events have tile identical subject, but cases may be different. The other groups of SVCs are: (1) One verb phrase or clause serving as the direct object of another verb, e.g.</Paragraph> <Paragraph position="2"> ~ ~ ~2, ~1~ o (I want to go to school.) ~J~ ~I~ ,(\[~ 5~ -~\]~ o (I want him to go to school.) (2) Pivotal constructions, e.g.</Paragraph> <Paragraph position="3"> 4 51 (I entrust him to take care of an affair.) (3) Descriptive clauses, e.g.</Paragraph> <Paragraph position="4"> (She cooked a dish that I very much enjoyed eating.) Only the former two are considered. Tile verbs with first use are classified into t2 attd t3, attd the verbs with the second use, i.e., pivotal construction, are classified into t8. It is not easy to define descriptive clauses with a rule or a new category, e.g. POSSESSIVE/Yang 1987/. This is because tile descriptive clause is optional. Without this clause, the original sentence is acceptable too. Furthermore, many verbs may be used with the descriptive clauses.</Paragraph> <Paragraph position="5"> The lowest level vlbar (v') touches on the uses of the subcategorization frames of the specified verb. According to the frames and ECP, a virtual non-terminal trace is placed wherever it is needed. For example, v 1 bar(v 1 bar(T l,N2bar),Semantic,Index,Case,HasSubj)--> The lexical category tl denotes transitive verb. Here, the trace may be generated by any movement transformation. The third rule is for SVCs. Note v2bar should have a subject and share it (Index) with the matrix verb. Thus, the semantic features of the two are the same. However, cases may be different. That is, one is assigned by the matrix verb, and the other one by the embedded verb. The rules for other lexical categories are omitted in this paper. The details can refer to/Lin 1989/.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Ba-construction </SectionTitle> <Paragraph position="0"> Ba-construction is usually generated by ba-transformation, which is one of the movement transformations. Tile direct object is placed immediately after '|P2' (ba) and before the verb like: subject '~' (ba) _direct ~ verb.</Paragraph> <Paragraph position="1"> For example, ~J~ }~, -~)'~',: N:i ~l~ ~ t i -\]&quot; o (I sold all three books.) t Iowever, there is another pattern for ba-construction: subject '\]\[.q' (ba) ~ verb ~.</Paragraph> <Paragraph position="2"> It is not constructed by movement transfom\]ation becanse some noun phrase appears after verb, i.e., ot!ject 2. For example, ~J~ }U ~ ~ !aZ T =_:Z ~ o (I ate three of apples.) It shows a part-whole relation between object 1 and object 2. In the well-performed parsing systems, all the two patterns must be treated. It is also easy to represent this construction with our formalism.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.5 Bei-construction </SectionTitle> <Paragraph position="0"> Bei-construction is a familiar Chinese pattern like the following: nonn~ ~gU (bei) noun phrase 2 verb. For example, (The bird was let go (by me).) Bei-construction has disposal shown as below similar to ba-construction: ~I~ ~ I&quot;1 ~ C/~ } l~} T ~ I~I il~ o (That door was kicked (by naB) and a hole is left.) The rules in Section 4.1 (topic-comment structure) capture the above phenomena.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.6 Pronoun Resolution </SectionTitle> <Paragraph position="0"> Binding Theory can be rephrased in the following procedures. Assume /3 is an anaphor, a pronominal, or an R-expression depending on which principle is used. Each element/3 may have two sets: set of possible pairs and set of flnpossible pairs. These two sets are denoted by possible-pair and impossible-pair respectively, and are defined in the following: possible-pair(B )={ a I cx can co-index with/3 }, impossible-pair(/3 )={ a I ca cannot co-index with 13 }. (Principle A) For an acceptable sentence, try to find some such that ca is in/3 's Governing Category and c--commands /3. Each a that is outside of this range should not have a co-index relationship with 13. This principle defines two sets for/3. For example, (* Mr. Lee i said \[s that you saw yourselfiJ. )</Paragraph> <Paragraph position="2"> (impossible-pair(self)={Mr. Lee\]).</Paragraph> <Paragraph position="3"> Both '4~;' (you) and '~5~ J_~.= ' (Mr. Lee) c-command '~ t~' (self). The former is in the governing category of the reflexive ' ~ ~' (self), but the latter is outside. So the index assignment is not acceptable.</Paragraph> <Paragraph position="4"> (Principle B) Those a s that are in tile range of Governing Category and c-command /3 should not co-index with /3.</Paragraph> <Paragraph position="5"> This principle just says which a s cannot be in the candidate set. However, we cannot determine whether those cx s that are in its range and do not c-command/3, co-index with/3 or not. If such an a co-indexes with/3, it must satisfy other criteria, e.g. other binding principles, the same semantic feature, and so on. Thus, this principle says only the i,qmssible-pair. For example,</Paragraph> <Paragraph position="7"> The phrase ~ 3~t5'-~.' (Mr. Lee) c-commands '~' (him), thus they cannot be co-indexed based on Principle B. Consider a~othcr example: * \[s {lgi~..~, 71&quot; ~R-~i\]o (* \[sHei saw Mr. Leeil.) The R-expressiort does not c-command the pronominal.</Paragraph> <Paragraph position="8"> According to Principle B, we have no way to detemfine their binding relationship. But if Principle C is applied, it can tell t,s the index assignment is wrong, (Principle C) For any ca where a c-commands/3, a ought not to have co-index relationship with/3. This principle says nothing for those a s that do not c-command /3. A set impossible-pair is defined from this principle. For example, *~i~ \[s4~ ~-~ T ~i\]o (* He i said \[s that you saw Mr. Leei\].) 52 5 impossible-pair(-'-4 !g~ 3~ ~'~ ') = { &quot;(t~', 'gJ~' ) (impossible-pair(Mr. Lee)={he, you }).</Paragraph> <Paragraph position="9"> The pronominal '~' (he) c-commands '~-3~'35' (Mr. Lee), so they should have different indices.</Paragraph> <Paragraph position="10"> Based on these three principles, a post-processing routine embedded in the parser is used to determine the co-index relationship between constituents from the parse tree. The algorithm is sinai:de: Traverse the parse tree, generate the relations possible-pair and impossible-pair. If it is unknown up to now, a rehttion unknown is given temporarily. When a new relation possible-pair or impossible-pair is got, use it to check all the unknown relations. Retract the unknowns accordingly. Finally, assign the anaphors and pronominals suitable indices based on the relations possible-pair and irtwossible-pair.</Paragraph> </Section> </Section> class="xml-element"></Paper>