File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/j97-4002_metho.xml
Size: 25,034 bytes
Last Modified: 2025-10-06 14:14:31
<?xml version="1.0" standalone="yes"?> <Paper uid="J97-4002"> <Title>An Empirical Approach to VP Ellipsis</Title> <Section position="5" start_page="527" end_page="527" type="metho"> <SectionTitle> 3. VPE-RES System </SectionTitle> <Paragraph position="0"> The VPE-RES system has the following subparts: 1. Syntactic Filter 2. Preference Factors 3. Post-Filter The candidates for VPE antecedents are all full VPs appearing within a three sentence window--the current sentence and the two preceding sentences. 4 The Syntactic Filter eliminates all VPs that contain the VPE in an improper fashion. A preference ordering is imposed upon the remaining candidate antecedents, based on recency, clausal relations, parallelism, and quotation structure. After the candidates have been weighted according to these Preference Factors, the highest-rated candidate is selected, and its form is modified by a Post-Filter.</Paragraph> </Section> <Section position="6" start_page="527" end_page="534" type="metho"> <SectionTitle> 4 The limitation to three sentences is arbitrary. However, no examples were found in the Treebank in </SectionTitle> <Paragraph position="0"> which the antecedent was more distant.</Paragraph> <Section position="1" start_page="528" end_page="530" type="sub_section"> <SectionTitle> 3.1 Syntactic Filter </SectionTitle> <Paragraph position="0"> The Syntactic Filter rules out antecedents that improperly contain the VPE occurrence, s While the precise definition of improper containment is an active area of theoretical research, 6 we rule out antecedents that contain the VPE in a sentential complement. An example of this is given in Figure 1, the parse tree for the sentence in (4). 7 (4) She said she would not.</Paragraph> <Paragraph position="1"> Here, the VPE occurrence would cannot select as its antecedent the containing VP headed by said. This is ruled out by the Syntactic Filter, because the VPE is contained in SBar, a sentential complement to said.</Paragraph> <Paragraph position="2"> Pronoun resolution systems often incorporate a syntactic filter--a mechanism to remove certain antecedents based on syntactic structure. The basic syntactic constraint for pronouns is that they cannot take a &quot;local&quot; antecedent, as described, for example, in Principle B of the binding theory (Chomsky, 1981). 8 The Syntactic Filter for VPE also rules out &quot;local&quot; antecedents in a sense: it rules out antecedents in certain containment configurations.</Paragraph> <Paragraph position="3"> The implementation of the Syntactic Filter is complicated by two factors: first, there are certain cases in which a containing antecedent is possible, where the VPE is 5 This constraint is discussed in Hardt (1992) as a way of ruling out antecedents for VPE. 6 See, for example Sag (1976) and May (1985) for discussion, and for example Lappin and McCord (1990) and Jacobson (1992) for alternative views.</Paragraph> <Paragraph position="4"> 7 Parse trees display the exact category labels and structure represented in the Penn Treebank parses. We have added a label, VPE, for VPE occurrences. See Appendix A for a list of Penn Treebank tags; for more information, see Marcus, Santorini, and Marcinkiewicz (1993).</Paragraph> <Paragraph position="5"> 8 While the precise formulation of Principle B remains controversial, it is generally agreed to rule out, for example, the binding of a pronoun in object position by an NP in subject position. Such constraints on pronoun resolution have been incorporated into several computational approaches to pronoun resolution, such as Brennan, Friedman, and Pollard (1987), Lappin and McCord (1990), and Lappin and Leass (1994).</Paragraph> <Paragraph position="6"> Parse tree for She was getting too old to take the pleasure from it that she used to. contained in an NP argument of the containing VP, as in Figure 2, the parse tree for the following example: (5) She was getting too old to take the pleasure from it that she used to.</Paragraph> <Paragraph position="7"> Here, the (circled) VP headed by take is the antecedent for the VPE, despite the containment relation.</Paragraph> <Paragraph position="8"> The second complication results from a basic limitation in Treebank parses; there is no distinction between arguments and adjuncts. A VP must be ruled out if the VPE is within a nonquantificational argument; when a VPE occurs in an adjunct position, the &quot;containing&quot; VP is a permissible antecedent. The following sentence, whose parse tree is in Figure 3, is an example of this: (6) get to the corner of Adams and Clark just as fast as you can In this case, the (circled) VP headed by get is the antecedent for the VPE, despite the appearance of containment. Since the VPE is contained in an adjunct (an adverbial phrase), there is in fact a nonmaximal VP headed by get that does not contain the VPE: this is the VP get to the corner of Adams and Clark. However, because of the approach taken in annotating the Penn Treebank, this nonmaximal VP is not displayed as a VP. To capture the above data, the Syntactic Filter rules out VPs that contain the VPE in a sentential complement; any other antecedent-containment relation is permitted. Parse tree for get to the corner of Adams and Clark just as fast as you can.</Paragraph> <Paragraph position="9"> This correctly rules out the containing antecedent in (4), and permits it in (5) and (6). 9</Paragraph> </Section> <Section position="2" start_page="530" end_page="533" type="sub_section"> <SectionTitle> 3.2 Preference Factors </SectionTitle> <Paragraph position="0"> Remaining candidates are ordered according to the following four Preference Factors: 1. Recency 2. Clausal Relations 3. Parallelism 4. Quotation 9 An anonymous CL reviewer suggests that the filter may be overly restrictive, because of examples like the following: A: It's an important issue, and I'm very concerned about it. B: Well, frankly, I don't care that you do. (Italicized expressions receive pitch accents.) Here, the antecedent for the VPE is care; this would not be permitted by the filter. The reviewer suggests that examples like this should not be categorically excluded, although they are perhaps less than fully acceptable. If this is true, it raises interesting theoretical issues about the acceptability of antecedent-containment configurations. However, the reviewer notes that &quot;such examples are no doubt rare and perhaps the proposed containment filter does enough work in correctly excluding ill-formed instances of ellipsis to justify the categorical exclusion of these cases.&quot; Based on our empirical research up to this point, we concur with this. No examples of this sort have been observed among the 644 VPE examples in the Penn Treebank, and the Syntactic Filter as currently formulated contributes significantly to the overall performance of the system (see Section 4 for figures on this).</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 23, Number 4 Each candidate is initialized with a weight of 1. This weight is modified by any applicable Preference Factors.</Paragraph> <Paragraph position="2"> 3.2.1 Recency. The simplest and most important factor is recency: if no other Preference Factors obtain, the most recent (syntactically possible) antecedent is always chosen. The weights are modified as follows: the first VP weight is set to be the recency factor, 1.15. Moving rightward, toward the VPE, the weight of each subsequent VP is multiplied by the recency factor. Thus, if there are three VPs preceding the VPE, we have (1.15 1.32 1.52). If a VP contains another VP, the two VPs are set at the same level. Finally, VPs following the VPE are penalized in a symmetrical fashion. ~deg clausal relation to the VPE. n Consider the following example: (7) tells you what the characters are thinking and feeling \[ADVP far more precisely than intertitles, or even words, \[VPE would\]\].</Paragraph> <Paragraph position="3"> The VP headed by tells is modified by the adverbial phrase (labeled ADVP) containing the VPE. This VP is the correct antecedent. A VP in such a relation is given a very high weight, by the Preference Factor Clause-Rel, which in practice makes it an obligatory antecedent. If Clause-Rel is deactivated, the system incorrectly selects feeling as the antecedent, because it is the most recent VP.</Paragraph> <Paragraph position="4"> The modification relation can also be a comparative relation, as illustrated by the following example, whose parse tree is given in Figure 4: (8) All felt freer to discuss things than students had previously.</Paragraph> <Paragraph position="5"> Here, the correct antecedent is the (circled) VP headed by felt. This VP is modified by the comparative clause containing the VPE, and thus is correctly selected by the system. With Clause-Rel deactivated, the system incorrectly selects the more recent VP discuss things.</Paragraph> <Paragraph position="6"> Note that such VPs are parsed as containing the VPE, but they are not removed by the Syntactic Filter. Thus, the effect of this constraint is best observed in conjunction with the Syntactic Filter. In the testing of the system, we examined each system component separately, as described below. However, we also examined Clause-Rel in combination with the Syntactic Filter, because of their close connection. We did this by defining a Composite system component, consisting of Syntactic Filter, Clause-Rel, and Post-Filter.</Paragraph> <Paragraph position="7"> 3.2.3 Parallelism. There is a preference for similar parallel elements, that is, the elements surrounding the ellipsis site, and the elements that correspond to them surrounding the antecedent. Notions of parallelism figure prominently in many theoretical studies of ellipsis. 12 However, the proposal that similarity of parallel elements can be 10 This reflects the fact that VPE, like pronominal anaphora, permits the antecedent to follow, rather than precede, the VPE occurrence.</Paragraph> <Paragraph position="8"> 11 This constraint is discussed in Hardt (1992). 12 The term parallel elements is from Dalrymple, Shieber, and Pereira (1991), where parallelism is emphasized in the interpretation of ellipsis. Parallelism is also important in many other treatments of ellipsis, such as Priest, Scha, and van den Berg (1991), Asher (1993), and Fiengo and May (1994). Parse tree for All felt freer to discuss things than students had previously. used to guide ellipsis resolution is, to our knowledge, a new one) 3 Our current results involving parallelism provide support for this claim. 14 We are continuing to experiment with more sophisticated ways of measuring the similarity of parallel elements. In the case of VPE, the subject and auxiliary are parallel elements. Currently, the system only examines the form of the auxiliary. In Hardt (1992) a preference for VPE with coreferential subjects is suggested. This information is not available in the Penn Treebank, and we do not use any forms of subject matching in the current version of the system.</Paragraph> <Paragraph position="9"> Aux-Match (Form of Auxiliary). There is a preference for a similar base form of auxiliary in antecedent and VPE. The categories for auxiliary forms we use are: do, be, have, can, would, should, to. We prefer an antecedent that shares the same category of auxiliary form as the VPE. The weights of all potential antecedents that do not match the VPE auxiliary category are multiplied by our Standard Penalty Value, which is .667. This preference is illustrated by the following example: (9) Someone with a master's degree in classical arts who works in a deli would \[vP be ideal\], litigation sciences \[vP advises\]. So \[VPE would\] someone recently divorced or widowed.</Paragraph> <Paragraph position="10"> 13 The importance of similar parallel elements in discourse relations is emphasized in Hobbs (1979), and it is applied to VPE resolution in Hobbs and Kehler (1997), in a rather different context than that of this paper. 14 As discussed in Section 4, the Parallelism Preference Factor makes an important contribution to the system performance.</Paragraph> <Paragraph position="11"> Computational Linguistics Volume 23, Number 4 Here, the correct antecedent is be ideal. It is selected because it has a would auxiliary, which is the same category as the VPE. Without this constraint, the system incorrectly selects the VP advises.</Paragraph> <Paragraph position="12"> Another example is the following: (10) In the past, customers had to \[VP go to IBM when they \[vp outgrew the Vax\]\]. Now they don't have \[vPE to\].</Paragraph> <Paragraph position="13"> Here, the correct antecedent is the matrix VP headed by go. It has a to auxiliary, as does the VPE. Without this constraint, the VP outgrew the VAX is incorrectly selected by the system.</Paragraph> <Paragraph position="14"> Parallel-Match (Be-do conflict). There is an additional penalty for a VP antecedent with a be-form auxiliary, if the VPE is a do-form. 15 This is implemented by multiplying the VP by our Standard Penalty Value of .667. Consider the following example: (11) You \[VP know what the law of averages \[VP is\]\], don't you VPE? Here, neither potential antecedent matches the auxiliary category of the VPE, and therefore both are penalized by the general auxiliary match constraint. However, the nearer antecedent, is, is a be-form, and is thus subject to an additional penalty. This allows the matrix antecedent, know what the law of averages is, to be correctly selected. an antecedent that also occurs within quoted material. 16 This is illustrated by the following example: (12) &quot;We \[vP have good times\].&quot; This happy bulletin \[VP convulsed Mr. Gorboduc\]. &quot;You \[VPE do\] ? &quot;, he asked between wheezes of laughter. Here, the correct antecedent is have good times. The VP convulsed Mr. Gorboduc is penalized by the Standard Penalty Value, because it is not within quotations, while the VPE is within quotations. Without the application of the quote preference, the system incorrectly selects convulsed Mr. Gorboduc.</Paragraph> </Section> <Section position="3" start_page="533" end_page="534" type="sub_section"> <SectionTitle> 3.3 Post-Filter </SectionTitle> <Paragraph position="0"> Once the highest-rated antecedent has been identified, it may be necessary to modify it by removing an argument or adjunct that is incorrectly included. If the selected VP contains the VPE in an argument or adjunct, that argument or adjunct must be eliminated. For example, (13) Different as our minds are, yours has \[VP nourished mine \[pp as no other social influence \[VPE has \]\]\].</Paragraph> <Paragraph position="1"> The antecedent VP selected is nourished mine as no other social influence ever has. The PP containing the VPE must be eliminated, leaving the correct antecedent nourished mine. This Preference Factor is extremely important in achieving success by the 15 This constraint is suggested in Hardt (1992). 16 A preference of this sort is discussed in Malt (1984).</Paragraph> </Section> <Section position="4" start_page="534" end_page="534" type="sub_section"> <SectionTitle> Hardt VP Ellipsis </SectionTitle> <Paragraph position="0"> Exact-Match criterion, and it results in a great deal of improvement over the baseline approach (see results in Section 4).</Paragraph> </Section> </Section> <Section position="7" start_page="534" end_page="537" type="metho"> <SectionTitle> 4. Empirical Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="534" end_page="535" type="sub_section"> <SectionTitle> 4.1 Success Criteria </SectionTitle> <Paragraph position="0"> To test the performance of the system, we first obtained a coded file, which indicates a human coder's preferred antecedent for each example. Then we compared the output of the system with the coder's selections.</Paragraph> <Paragraph position="1"> As mentioned in Section 1, we define three criteria for success:</Paragraph> <Paragraph position="3"> Head Overlap: either the head verb of the system choice is contained in the coder choice, or the head verb of the coder choice is contained in the system choice.</Paragraph> <Paragraph position="4"> Head Match: the system choice and coder choice have the same head verb.</Paragraph> <Paragraph position="5"> Exact Match: the system choice and coder choice match word-for-word. To illustrate these criteria, we give three examples, one for each success criterion. Note that the success criteria are increasingly strict--if an example satisfies Exact Match, it will also satisfy the other two criteria, and if an example satisfies Head Match, it will also satisfy Head Overlap.</Paragraph> <Paragraph position="6"> Example: Head Overlap (14) In July, Par and a 60% owned unit agreed to plead guilty in that inquiry, as did another former Par official.</Paragraph> <Paragraph position="7"> System output: plead guilty in that inquiry Coder selection: agreed to plead guilty in that inquiry According to Head Overlap, the system choice is correct, since its head verb, plead, is contained in the coder selection. This would not be considered correct according to Head Match, since the head of the coder selection is agreed.</Paragraph> <Paragraph position="8"> Example: Head Match (15) The question is, if group conflicts still exist, as undeniably they do, System output: exist Coder selection: still exist Here, both the system output and the coder selection have the head verb exist, but there is not an exact, word-for-word match.</Paragraph> <Paragraph position="9"> Example: Exact Match (16) It is difficult if not impossible for anyone who has not pored over the thousands of pages of court pleadings and transcripts to have a worthwhile opinion on the underlying merits of the controversy. Certainly I do not.</Paragraph> </Section> <Section position="2" start_page="535" end_page="535" type="sub_section"> <SectionTitle> 4.2 Test Results </SectionTitle> <Paragraph position="0"> After identifying 644 examples of VPE in the Treebank, we reserved 96 randomly selected examples from the Wall Street Journal corpus for a blind test. In Table 4, we give results for the blind test and for the entire Penn Treebank, and we report separate figures on the Brown Corpus and Wall Street Journal Corpus} 7 As a baseline, we also report results (Table 5) on a simple recency-based approach: the most recent VP is always chosen. No Preference Factors or filters are applied.</Paragraph> <Paragraph position="1"> The difference between the VPE-RES performance and the baseline is statistically significant by all three criteria, based on a ~2 analysis, p < .001.</Paragraph> </Section> <Section position="3" start_page="535" end_page="536" type="sub_section"> <SectionTitle> 4.3 Evaluating System Subparts </SectionTitle> <Paragraph position="0"> In Tables 6, 7, and 8, we present results on each major subpart of the program. For this evaluation, we used the Exact Match criterion. We evaluated subparts in three ways: first, we began with the baseline (recency) approach, and activated a single additional component, to see how the system performance changed based on that component.</Paragraph> <Paragraph position="1"> Second, we began with the complete system, and deactivated a single component.</Paragraph> <Paragraph position="2"> Finally, we evaluated system components in an incremental fashion, beginning with Post-Filter, then activating Syntactic Filter with Post-Filter still activated, etc. The Composite Factor is a combination of Post-Filter, Syntactic Filter, and Clause-Rel.</Paragraph> <Paragraph position="3"> 17 Since the blind test examples are all taken from the Wall Street Journal corpus, it is most appropriate to compare the blind test results directly to the results on the Wall Street Journal Corpus. Not surprisingly, the blind test results are slightly lower than the results on the complete Wall Street Journal Corpus, since this contains the examples that functioned as training data.</Paragraph> </Section> <Section position="4" start_page="536" end_page="537" type="sub_section"> <SectionTitle> 4.4 System Components </SectionTitle> <Paragraph position="0"> The most important system component is the Composite Factor, which is a combination of the Syntactic Filter, the Post-Filter, and Clause-Rel. The contribution of Clause-Rel is not evident individually; if it is the only factor activated together with Recency-Only, performance in the complete corpus actually declines from 29.2% to 28.1%. However, this is because Clause-Rel requires the Syntactic Filter to make a contribution. This can be observed from the fact that Composite performs better than its individual components. Also, when Clause-Rel is the deactivated factor, performance declines from 75.9% to 72.8%. The Parallelism Preference Factors, Aux-Match and Parallel-Match, Computational Linguistics Volume 23, Number 4 also make an important contribution: when they are activated in the incremental analysis, there are 22 additional correct selections in the complete corpus, an improvement of 3.4%.</Paragraph> </Section> <Section position="5" start_page="537" end_page="537" type="sub_section"> <SectionTitle> 4.5 Errors and Evaluation Criteria </SectionTitle> <Paragraph position="0"> Many of the errors occurring under the Exact Match criterion involve alternatives that are virtually identical in meaning, as in the following example: (17) Stephen Vincent Benet's John Brown's Body \[VP comes immediately to mind\] \[pp in this connection\], as does John Steinbeck's The Grapes Of Wrath and Carl Sandburg's The People, Yes.</Paragraph> <Paragraph position="1"> Here, VPE-RES selected comes immediately to mind, since the PP in this connection is parsed as a sister to the VP. One coder selected comes immediately to mind in this connection, while the other coder made the same selection as VPE-RES. It is difficult to see any difference in meaning between the two choices.</Paragraph> <Paragraph position="2"> Because of examples like this, we believe Head Overlap or Head Match are preferable criteria for success. Even with the Head Match criterion, there are errors that involve very subtle differences, such as the following example: (18) We were there at a moment when the situation in Laos threatened to ignite another war among the world's giants. Even if it did not, how would this little world of gentle people cope with its new reality of grenades and submachine guns? The coder selected ignite another war among the world's giants, while VPE-RES selected threatened to ignite another war among the world's giants.</Paragraph> <Paragraph position="3"> Some errors result from problems with the Syntactic Filter. The following example illustrates a case of antecedent containment that is not recognized by the filter as currently formulated.</Paragraph> <Paragraph position="4"> (19) All the generals who held important commands in World War 2, did not write books. It only seems as if they did.</Paragraph> <Paragraph position="5"> The VPE-RES system incorrectly selects seems as the antecedent, because it does not recognize that the VP headed by seems improperly contains the VPE.</Paragraph> </Section> </Section> <Section position="8" start_page="537" end_page="538" type="metho"> <SectionTitle> 5. Related Work </SectionTitle> <Paragraph position="0"> There is no comparable work we are aware of dealing with VPE resolution; to our knowledge, this is the first empirical study of a VPE resolution algorithm. There is, however, a large body of empirically oriented work on pronoun resolution. A prominent recent example is Lappin and Leass (1994), in which a pronoun resolution system is evaluated on 360 examples taken from computer manuals, with a success rate of 86%. This work involves a post hoc evaluation of the system output, and it appears that evaluation is based on Head Match, although this is not discussed explicitly. The VPE-RES system achieves an 84.4% success rate according to Head Match in the Blind Test data from the Wall Street Journal corpus. This compares favorably with Lappin and Leass's result, especially considering that computer manual text is a good deal more restricted than newspaper text. It is also likely that the VPE-RES success rate would be higher using a post hoc evaluation scheme.</Paragraph> <Section position="1" start_page="538" end_page="538" type="sub_section"> <SectionTitle> Hardt VP Ellipsis </SectionTitle> <Paragraph position="0"> Previous work on pronoun resolution (Hobbs 1978, Walker 1989) reports higher success rates. However, these involved hand-tested algorithms on rather small data sets. Lappin and Leass (1994) implemented and tested Hobbs's algorithm, and reported results that were about 4% less than that of Lappin and Leass (1994).</Paragraph> </Section> </Section> class="xml-element"></Paper>