File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/j03-2002_evalu.xml
Size: 14,075 bytes
Last Modified: 2025-10-06 13:58:50
<?xml version="1.0" standalone="yes"?> <Paper uid="J03-2002"> <Title>c(c) 2003 Association for Computational Linguistics Implementing the Binding and Accommodation Theory for Anaphora Resolution and Presupposition Projection</Title> <Section position="7" start_page="204" end_page="208" type="evalu"> <SectionTitle> 5. Implementation and Performance </SectionTitle> <Paragraph position="0"> The resolution algorithm is implemented as part of a natural language understanding system. I will describe the general architecture underlying this system and the implementation of the algorithm and the acceptability constraints and present performance results obtained from applying the algorithm to a corpus of route instructions.</Paragraph> <Section position="1" start_page="204" end_page="205" type="sub_section"> <SectionTitle> 5.1 Architecture </SectionTitle> <Paragraph position="0"> Open Agent Architecture (OAA) (Cheyer and Martin 2001) is used as prototyping environment to implement the presupposition resolution component as part of a natural language understanding system. OAA is a collection of software agents that communicate with each other via a facilitator, a piece of middleware that distributes requests to appropriate agents and returns the responses to the requester. OAA makes it convenient to combine different components that are required in natural language process- null Bos Implementing Binding and Accommodation Theory ing, such as speech recognition or parsing, the presupposition resolution component, and theorem provers, because OAA agents can be implemented in different programming languages and run simultaneously on different machines (Bos and Oka 2002). The resolution component is realized as an OAA agent implemented in PROLOG.</Paragraph> </Section> <Section position="2" start_page="205" end_page="206" type="sub_section"> <SectionTitle> 5.2 Acceptability Constraints </SectionTitle> <Paragraph position="0"> To implement inference, a theorem prover as well as a model builder is used, both encapsulated as OAA agents. The theorem-proving agent is used to find a counterproof for the DRS translated into first-order logic. The model-building agent is used to check whether the same DRS is satisfiable. So, although we are faced with the limitations for reasoning with first-order logic (validity is undecidable in first-order logic, and model generation is restricted to finite models), these limitations are reduced to a minimum.</Paragraph> <Paragraph position="1"> For each inference problem, the two inference agents attack the problem in parallel, and as soon as one of them finds an answer (a model or a counterproof), their task is completed.</Paragraph> <Paragraph position="2"> The three acceptability constraints that do not require first-order inference (proper binding, bound variables, and sortal compatibility) are not implemented as separate agents but instead are part of the resolution agent. Proper binding is checked via a neo-Davidsonian semantics to describe events in terms of their thematic relations (Parsons 1990). Binding is violated when a (di)transitive verb has a reflexive pronoun as object and the discourse referents for the agent and patient denote different objects, or when a (di)transitive verb has a nonreflexive object and the discourse referents for agent and patient denote the same object. The check for free variables is rather straightforward, given the definitions in Section 3.</Paragraph> <Paragraph position="3"> Sortal violations are detected using a conceptual ontology. Based on WordNet (Fellbaum 1998), this ontology is substantially adapted and extended to deal with anaphora resolution in BAT. As usual, it reflects background knowledge in the form of inheritance (is-a) and disjointness. The three (disjoint) top concepts in this ontology are GROUP (a collection of things), SITUATION (a condition in which certain propositions hold or do not hold), and THING (an individual object that is talked about). The last is further divided into ABSTRACTION (a thing without mass) and ENTITY (a thing with mass). The concept ENTITY has two subconcepts: OBJECT (a nonliving entity) and ORGANISM (a living entity). OBJECTS are divided into ARTIFACTS (human-made things), NATURAL-OBJECTS (things that are found in nature), and SUBSTANCES (things that are indivisible). The subconcepts of ORGANISM are HUMAN, ANIMAL, and PLANT.</Paragraph> <Paragraph position="4"> In the case of English pronouns, there is a need to distinguish between third-person singular male, female, and neuter pronouns, as well as, of course, plural pronouns.</Paragraph> <Paragraph position="5"> The plural pronouns are the easiest to deal with and introduce a discourse referent with condition GROUP; hence they cannot bind to situations or things. Three mutually disjoint concepts are used for singular pronouns: MALE (for he), FEMALE (for she), and UNISEX (for concepts that disallow binding of he and she). The neuter pronoun it comes with the feature NONHUMAN, so we allow it to refer to any nonhuman entity (this is obviously not entirely accurate, as in certain situations, it can be used to refer to persons). To prevent reference from singular pronouns to plural entities, we further define GROUP disjoint from MALE, FEMALE, and NONHUMAN.</Paragraph> <Paragraph position="6"> The sortal violation checker is implemented in PROLOG, where the inheritance information is stored in the PROLOG database by clauses of the following form: sort(ENTITY(X)) - sort(ORGANISM(X)).</Paragraph> <Paragraph position="7"> sort(ENTITY(X)) - sort(OBJECT(X)).</Paragraph> <Paragraph position="8"> sort(ORGANISM(X)) - sort(HUMAN(X)).</Paragraph> <Paragraph position="9"> sort(ORGANISM(X)) - sort(ANIMAL(X)).</Paragraph> <Paragraph position="10"> Computational Linguistics Volume 29, Number 2 sort(ORGANISM(X)) - sort(PLANT(X)).</Paragraph> <Paragraph position="11"> sort(MALE(X)) - sort(MAN(X)).</Paragraph> <Paragraph position="12"> sort(FEMALE(X)) - sort(WOMAN(X)).</Paragraph> <Paragraph position="13"> Disjointness relations are implemented by clauses of the following form: inconsistent - sort(ORGANISM(X)), sort(OBJECT(X)).</Paragraph> <Paragraph position="14"> inconsistent - sort(HUMAN(X)), sort(ANIMAL(X)).</Paragraph> <Paragraph position="15"> inconsistent - sort(MALE(X)), sort(FEMALE(X)).</Paragraph> <Paragraph position="16"> For each sortal compatibility check, the discourse referents are skolemized, and the basic conditions of the resolved DRSs are asserted to the database. The following clause links these basic conditions to sorts: sort(S) - basic(S).</Paragraph> <Paragraph position="17"> The PROLOG inference engine then attempts to prove a sortal incompatibility by trying to find an instance of a discourse referent that has two conflicting properties, within the transitive closure of the is-a relation, here implemented via the predicate sort. Using negation as failure, sorts are compatible if !inconsistent can be proven. Consider the following example illustrating sortal incompatibility: (29) Suppose the result of binding is a DRS in which the two basic conditions MAN(X) and WOMAN(X) are applied to the same variables. Asserting this to the database as basic(MAN(a)) and basic(WOMAN(a)), it is possible to conclude sort(MAN(a)) as well as sort(WOMAN(a)). From this, we are able to conclude sort(MALE(a)) and sort(FEMALE(a)), and we can prove that inconsistent holds.</Paragraph> <Paragraph position="18"> Summarizing, the sortal compatibility check is used as a filter for the more general consistency check, for which fully fledged first-order theorem proving is used. If it is impossible to prove inconsistent, it is assumed that the antecedent discourse referent is compatible with its binder. As I will show in the next section, this filter reduces the search space in resolution enormously.</Paragraph> </Section> <Section position="3" start_page="206" end_page="208" type="sub_section"> <SectionTitle> 5.3 Performance </SectionTitle> <Paragraph position="0"> The resolution algorithm was tested on a corpus of route instructions collected in a scenario in which somebody explains to a mobile robot how to reach a certain destination. The corpus, collected in the IBL project (Lauria et al. 2001), comprises 283 utterances in 72 different route instructions, spoken by 24 different native English speakers. A typical sequence is the following: (30) Instructor: Go to the university! Robot: How do I get to the university? Instructor: Go straight ahead until you reach the post office. Just past the post office turn left over the bridge. Keep walking, there will be a building on the right and a building on your left. Keep walking until you come to a train station on the left hand side and the university is opposite the train station.</Paragraph> <Paragraph position="1"> The corpus was processed on utterance-by-utterance basis, starting with a new DRS for each new route instruction. Only the first (consistent) solution returned by the resolution algorithm was considered for subsequent processing of the route instruction. A total of 898 referential expressions appeared in the 283 utterances of the corpus. As Table 3 shows, pronouns and proper names are relatively rare in these route instructions, but on average there are 1.5 definite noun phrases per utterance. The average number of accommodation sites for a presupposition trigger in this corpus was 7.5. (This relatively high number can perhaps be attributed to the way DRSs are nested into each other in representing route instructions and the way utterance grounding is realized in the DRS. Discussion of these issues, however, falls outside the scope of this article.) The average number of potential antecedents (i.e., accessible discourse referents) for binding a presupposition trigger was 16.7. These statistics illustrate the immense search space in presupposition resolution.</Paragraph> <Paragraph position="2"> The implemented resolution algorithm performs with an average CPU time (measured on a Sun Blade 100 workstation with 1 GB memory and a 500 MHz processor) of 1.21 seconds to transform an unresolved a-DRS into a proper DRS (disregarding the consistency checking; see below). Table 4 shows the average CPU times for DRS resolution relative to the number of processed utterances and so illustrates the dependence of processing time on the size of the DRS capturing the previous discourse.</Paragraph> <Paragraph position="3"> To find out which of the acceptability constraints contribute the most in narrowing down the search space, the number of attempts and success/failure rate were computed for sortal compatibility, proper binding, and bound variables, after a presuppositional DRS has been resolved or accommodated. Most of the credit for reducing the size of the search space goes to checking for sortal violations, which were detected 8,111 times in 8,303 attempts (97%). Only 99 (1.04% of 9,466 cases) a-DRSs were found to contain free variables. Similarly rare were cases of binding violation (73 occurrences in 9,156 considered cases). Still, it pays off to verify these constraints on partially resolved a-DRSs. For instance, the average CPU time for resolving a DRS that violated the bound variable constraint during resolution was 2.5 seconds (n = 35) when this constraint was checked partially, but 15.0 seconds when it was checked on fully resolved representations.</Paragraph> <Paragraph position="4"> Finally, let us consider the findings regarding the use of first-order inference engines to implement consistency checking of DRSs. This is a very hard task: Dialogues Computational Linguistics Volume 29, Number 2 such as that in example (30) generate up to several hundred thousand clauses. Moreover, off-the-shelf provers are not designed for linguistic problems. Instead, they are mostly tuned to mathematical problems.</Paragraph> <Paragraph position="5"> Several theorem provers and model builders were put to the test, including Hans de Nivelle's (1998) BLIKSEM, which is optimized for the &quot;guarded&quot; fragment of first-order logic, Bill McCune's OTTER and MACE (McCune and Padmanabhan 1996; McCune 1998), and the theorem prover SPASS (Weidenbach et al. 1999). For this particular task the model builder MACE and the theorem prover SPASS clearly outperformed the other inference engines; they were able to find an answer within 30 seconds for 66% of the 283 inference problems assigned to them (the majority of the DRSs being consistent) in CPU times varying from 2.5 to 29.9 seconds (average 13.0 secs).</Paragraph> <Paragraph position="6"> These results are perhaps too limited to justify the inclusion of first-order theorem proving in today's natural language understanding components. Nevertheless, I believe that first-order theorem provers will play an important future role in computational semantics for three reasons. First of all, automated theorem proving is a promising, emerging field. Moreover, most of the first-order inference engines, albeit general purpose, are designed to cope with nonlinguistic problems, and cooperation of computational linguists with researchers in the area of automated deduction might improve the performance of these inference engines on linguistic inference problems. Second, the current approach is nonincremental. After a new utterance is combined with the previous DRS, the complete newly constructed DRS is translated to first-order logic and checked for consistency, without appealing to previous inference results at all. It is likely that inference-based natural language understanding would benefit from an incremental approach, particularly with regard to model building. Third, there is room for improvement in the formulation of the inference problem itself. Future work should address the use of sorted logics, include experimenting with other modal formulations, and consider the use of discourse structure to limit the size of DRSs to be checked for consistency.</Paragraph> </Section> </Section> class="xml-element"></Paper>