File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2116_metho.xml
Size: 15,248 bytes
Last Modified: 2025-10-06 14:14:13
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2116"> <Title>A GENEI~LIZEI) RECONSTRUCTION ALGORITHM FOR ELLIPSIS RESOLUTION</Title> <Section position="3" start_page="0" end_page="687" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Ellipsis structures pose an important problem for NLP systems designed to provide text understanding or to handle dialogue. They contain information which is not overtly expressed, but which must be recovered through the identification of an antecedent. However, unlike pronominal anaphora, which is resolved by matching a pronoun with an antecedent noun phrase, the interpretation of an ellipsis fragment (or sequence of fragments) generally involves mapping it (them) into a sentential structure by association with an antecedent clause. It is possible to distinguish two main approaches to ellipsis resolution. The first seeks to associate an elided construction directly with a semantic representation, while the second mediates semantic interpretation through the reconstruction of the syntactic structure of the antecedent. The algorithm we propose implements the second view of ellipsis, by characterizing ellipsis resolution as the specification of a relation of (possibly partial) correspondence between the lexically unrealized head of an elided clause and its arguments and adjuncts as one term of the relation, and the realized head of the antecedent clause and its arguments and adjuncts as the second term.</Paragraph> <Paragraph position="1"> The algorithm is a generalized procedure for syntactic reconstruction which provides a unified way of handling a significant variety of ellipsis constructions. It modifies and extends the reconstruction strategy for handling VP ellipsis suggested in Lappin and McCord (1990). The algorithm covers VP ellipsis, illustrated in 1, pseudo-gapping (in 2), bare ellipsis involving sequences of bare arguments, adjuncts or both (in 3), and gapping (in 4).</Paragraph> <Paragraph position="2"> 1. John completed his paper before he expected to.</Paragraph> <Paragraph position="3"> 2. John sent the flowers to Lucy betbre he did the chocolates.</Paragraph> <Paragraph position="4"> 3. Bill wrote reviews :for the journal last year, and articles this year.</Paragraph> <Paragraph position="5"> 4. Sam teaches in London, and Lucy in Boston.</Paragraph> <Paragraph position="6"> It will be a useful component for source analysis in machine translation, text understanding systems, and discourse interpretation systems.</Paragraph> </Section> <Section position="4" start_page="687" end_page="691" type="metho"> <SectionTitle> 2 The Reconstruction Algorithm </SectionTitle> <Paragraph position="0"> Let an ellipsis fragment be a phrase which (i) occurs outside of a lexieally realized sentence, and (ii) is interpreted as an argument or an adjunct of the head verb of a non-elided sentence. Let s = <bl,...,bk> (1 _< k) be a sequence of ellipsis fragments such that, for each bi ~ s, b~q immediately follows b i. Take s to be maximal in that there is no ellipsis fragment, b0 or bk+ 1, not contained in s, which immediately precedes or immediately follows an element of s.</Paragraph> <Paragraph position="1"> A. Identify an antecedent sentence S for s.</Paragraph> <Paragraph position="2"> B. Take the head verb of S, A, as the new interpreted head of the sentence to be constructed from s (we will refer to the new head as A').</Paragraph> <Paragraph position="3"> C. Consider in sequence each argument slot Slot~ in the SUBCAT list of A.</Paragraph> <Paragraph position="4"> 1. If there is a phrase C' in s which is of the appropriate type for filling Slot i, then fill Slot~ in the SUBCAT list of A' with C' and remove C' from s. Else, 2. If Slota is filled by a phrase C, then fill Slot, in A' with C, and list C as a new argument of A'. Else, 3. If Sloti is empty in the frame of A, it remains empty in the frame of A'.</Paragraph> <Paragraph position="5"> 4. Construct a list, Arg-List, of the phrases which fill the SUBCAT list slots of A'.</Paragraph> <Paragraph position="6"> D. Construct a list of adjunct phrases for A' as follows.</Paragraph> <Paragraph position="7"> 1. Construct the list L of adjunct phrases in s.</Paragraph> <Paragraph position="8"> a. If L 4: nil, then for each element AdjP' of L, fill an adjunct slot for A' with AdjP'.</Paragraph> <Paragraph position="9"> 2. Consider each adjunct slot of A filled by a phrase AdjP.</Paragraph> <Paragraph position="10"> a. If there is a phrase AdjP' filling an adjunct slot of the same type in A', then leave AdjP' in this slot and remove AdjP' from s. Else, b. Fill an adjunct slot for A' with AdjP, and list AdjP as a new adjunct of A'.</Paragraph> <Paragraph position="11"> 3. Construct a list, Adj-List, of all the phrases which fill adjunct slots of A'. E. Generate a new syntactic structure as follows.</Paragraph> <Paragraph position="12"> 1. Concatenate Arg-List and Adj-List to create a combined list, Ph-List, of the phrasal arguments and adjuncts of A'.</Paragraph> <Paragraph position="13"> 2. Reorder the elements of Ph-List to produce a new list, Ord-Ph-List, in which the sequence of arguments and adjunct phrases corresponds to the order of arguments and adjuncts phrases of A.</Paragraph> <Paragraph position="14"> 3. Construct a new clause headed by A'. 4. Substitute Ord-Ph-List for the list of arguments and adjunct phrases of A' in the new structure.</Paragraph> <Paragraph position="15"> 3 Coverage and Implementation of the</Paragraph> <Section position="1" start_page="687" end_page="688" type="sub_section"> <SectionTitle> Algorithm </SectionTitle> <Paragraph position="0"> At this point, the algorithm has been partially implemented in Prolog to apply to the output of McCord's English Slot Grammar ESG parser (which also runs in Prolog) in order to generate reconstructed trees for VP ellipsis and pseudo-gapping constructions (see McCord et al. (1992) for a description of ESG and NLP systems which run on top of it). Examples of the algorithm's output for theses cases are given in 5 and 6.</Paragraph> <Paragraph position="1"> VP Ellipsis 5. John completed the paper before he expected to.</Paragraph> <Paragraph position="3"/> </Section> <Section position="2" start_page="688" end_page="691" type="sub_section"> <SectionTitle> Pseudo-Gapping </SectionTitle> <Paragraph position="0"> 6. John sent the flowers to Mary before he did the chocolates.</Paragraph> <Paragraph position="2"> The algorithm is currently being re-implemented in Prolog to apply to the output of a modified ItPSG (Pollard and Sag (1994)) grammar designed to handle ellipsis.</Paragraph> <Paragraph position="3"> We are developing the grammar within the framework of Erbach's (1995) ProFIT system for augmenting Prolog with typed feature structures. The feature structures which the grammar currently generates tbr simple bare argument and bare adjunct ellipsis cases are illustrated by the AVM's in 7 and 8, respectively (cases of bare adverb ellipsis are discussed in Chao (1988) and Kcmpson and Gabbay (1993)).</Paragraph> <Paragraph position="4"> 7. John gives Mary flowers, and chocolates too.</Paragraph> <Paragraph position="5"> phon!\[john, gives, mary, flowers, and, chocolates, tool&</Paragraph> <Paragraph position="7"> 8. John sings, and beautifully too.</Paragraph> <Paragraph position="8"> phon!\[john, sings, aml, beautifully, too I&</Paragraph> <Paragraph position="10"> The bare NP chocolates is the head of the elided clause in the second conjunct of 7. The generalized ellipsis reconstruction algorithm will identify gives as the head V of the antecedent clause in the first conjunct, and then will fill one of the positions in its SUBCAT list with the local features of chocolates. If it fills the direct object (third complement) position of this list with the bare NP, then it will fill the subject and indirect object positions with the local features of John and Mary, generating the reconstructed feature structure corresponding to 9.</Paragraph> <Paragraph position="11"> 9. \[~p \[Np John\] \[vP \[v gives\] \[NP Mary\] \[NP flowers\]\]\] and \[,, \[Ne John\] \[vP \[vp \[v gives\] \[NP Mary\]</Paragraph> <Paragraph position="13"> By contrast, the bare adverb beautifully is an adjunct daughter of a VP headed by an empty verb in 8. This is due to the fact that in our grammar, an adverb is an adjunct which modifies a VP. The algorithm will identify sings as the head V of the antecedent clause and substitute it for the empty V in 8. This will yield a feature structure corresponding to 10.</Paragraph> <Paragraph position="14"> 10. \[m \[~ John\] \[ve \[v plays\]\]\] and \[n,\[NP John\] \[vv \[vp \[vP \[v plays\]\] \[AdvP beautifully\]\] too\]\] We employ a rule which permits an unbounded number of adverbs to be generated in successively higher VP's through left recursion on the daughter VP node. The relevant PS rule is of the form VP ~ VP, ADV. We require this rule in order to allow for the fact that there is no apparent upper bound on the number of adverbs in a VP. 11 indicates that it is possible to obtain an unbounded number of bare adverbial adjuncts in an ellipsis site. 11 a. John sang, but not in New York.</Paragraph> <Paragraph position="15"> b. John sang, but not in New York at the concert.</Paragraph> <Paragraph position="16"> c. John sang, but not in New York at the concert for three hours.</Paragraph> <Paragraph position="17"> d. John sang, but not in New York at the concert for three hours on Tuesday.</Paragraph> <Paragraph position="18"> e. John sang, but not in New York at the concert for three hours on Tuesday to impress his music teacher.</Paragraph> <Paragraph position="19"> reconstruction account of bare ellipsis which adjoins an NP in the antecedent clause to an NP fragment by LF movement. The result is a conjoined NP which, taken as a generalized quantifier, applies to the antecedent clause, interpreted as a predicate formed by lambda abstraction. So, for example, adjunction of ,flowers in the antecedent clause of 7 to the NP fragment chocolates in the ellipsis site produces the LF structure 12a, which is interpreted as 12b. 12a. \[IP'\[IP John gives Mary tl \] \[NP\[NP flowers\], \[NP and \[NP chocolates\]2\]2\]\] b. (flowers and chocolates)(~x\[john gives mary x\]) Given that Reinhart's analysis relies on LF adjunction of an NP in the antecedent to an NP in the ellipsis site in order to create a generalized quantifier corresponding to a coordinate NP, it is not clear how it can apply to bare ellipsis cases like 3, in which a sequence of arguments and adjuncts appear in the ellipsis site. Moreover, the analysis cannot deal with bare ellipsis cases like 8, where a bare adjunct fragment does not correspond to any constituent in the antecedent clause. Therefore, this account does not cover the full range of bare ellipsis cases. As we have seen, the proposed generalized reconstruction algorithm does handle bare ellipsis structures like 8. In cases like 3 the algorithm will substitute the head V of the antecedent for the empty verb of the elided clause, and the bare PP adverb will modify the VP headed by this verb. The algorithm will fill some of the complement positions in the SUBCAT list of the reconstructed V with the NP arguments in the ellipsis site, and it will fill the remaining positions with arguments inherited from the antecedent head V. This procedure will yield at least one appropriate reconstruction for the elided clause.</Paragraph> <Paragraph position="20"> Dalrymple et al. (1991) and Shieber et al.</Paragraph> <Paragraph position="21"> (1995) present a generalized semantic account which employs higher order-unification of property and relation variables to resolve ellipsis. Their general strategy is to specify the interpretation of the antecedent clause as an equation between a propositional variable S and a predicate-argument structure. The arguments of the predicate correspond to the fragments in the ellipsis site, and ellipsis resolution consists in finding an appropriate value for the predicate variable which can apply to both the sequence of arguments in the interpretation of the antecedent clause, and the sequence of arguments in the ellipsis site. Given the equations in 13a-c, higher-order unification correctly generates 13d as the interpretation of 3.</Paragraph> <Paragraph position="23"> d. (book reviews)()~x\[(last year)()~y\[bill wrote x for the journal (during) y\])\]) and (articles)(~,x\[(this year)0~y\[bill wrote x for the journal (during) y\])\]) While the higher-order unification analysis can deal with bare ellipsis cases like 3 (as well as VP ellipsis and pseudo-gapping), it is not clear how to apply it to bare ellipsis examples like 8, where the adjunct in the ellipsis site lacks a corresponding element in the antecedent clause. Lappin (1996) suggests positing a t}ee manner adverbial function variable in the lexical semantic representation of verbs like sing. This will permit the specification of the equations in 14a-c for 8. Higher-order unification solves these equations to yield 14d, the desired interpretation of 8.</Paragraph> <Paragraph position="25"> In fact, this solution does not generalize to cases like 11, which indicate that there is no upper bound on the number of antecedentless bare adjuncts which can appear in a bare ellipsis sequence. As it is not possible to posit an unbounded number of free adjunct function variables in the semantic representation of a verb (VP), it seems that the higher-order unification analysis cannot deal with these cases.</Paragraph> <Paragraph position="26"> The generalized reconstruction algorithm presented here does not require the presence of constituents in the antecedent corresponding to adjunct elements of the fragment sequence. When a bare adjunct phrase AdjP does not correspond to a phrase in the antecedent clause, AdjP is simply added to the list of adjuncts of the new head verb of the reconstructed clause. Therefore, the algorithm produces the correct reconstructed forms for the elided clauses in ll.</Paragraph> <Paragraph position="27"> Another problem is posed by the fact that, as higher-order unification applies to semantic interpretations of antecedents, it will not have access to syntactic structure. But at least some cases of ellipsis resolution seem to require reference to this structure. Consider the contrast between 15a and 15b.</Paragraph> <Paragraph position="28"> 15a. The studems sent invitations to the professors yesterday, and to each other today.</Paragraph> <Paragraph position="29"> b.??The students said that John sent invitations to the professors yesterday, and to each other today.</Paragraph> <Paragraph position="30"> The elided conjunct in 15b is ill-lbrmed because the reciprocal NP each other in the bare argument is interpreted as illicitly bound from outside of its local syntactic domain. By contrast, the generalized reconstruction algorithm generates the full syntactic structure of the elided clause, and so it provides the representation required to specify the contrast between 15a and 15b.</Paragraph> </Section> </Section> class="xml-element"></Paper>