File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1052_intro.xml
Size: 5,281 bytes
Last Modified: 2025-10-06 14:03:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1052"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Improved Redundancy Elimination Algorithm for Underspecified Representations</Title> <Section position="3" start_page="0" end_page="409" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Underspecification is nowadays the standard approach to dealing with scope ambiguities in computational semantics (van Deemter and Peters, 1996; Copestake et al., 2004; Egg et al., 2001; Blackburn and Bos, 2005). The basic idea behind it is to not enumerate all possible semantic representations for each syntactic analysis, but to derive a single compact underspecified representation (USR). This simplifies semantics construction, and current algorithms support the efficient enumeration of the individual semantic representations from an USR (Koller and Thater, 2005b).</Paragraph> <Paragraph position="1"> A major promise of underspecification is that it makes it possible, in principle, to rule out entire subsets of readings that we are not interested in wholesale, without even enumerating them. For instance, real-world sentences with scope ambiguitiesoftenhavemanyreadingsthataresemantically null equivalent. Subsequent modules (e.g. for doing inference) will typically only be interested in one reading from each equivalence class, and all others could be deleted. This situation is illustrated by the following two (out of many) sentences from the Rondane treebank, which is distributed with the English Resource Grammar (ERG; Flickinger (2002)), a large-scale HPSG grammar of English.</Paragraph> <Paragraph position="2"> (1) For travellers going to Finnmark there is a bus service from Oslo to Alta through Sweden. (Rondane 1262) (2) We quickly put up the tents in the lee of a small hillside and cook for the first time in the open. (Rondane 892) For the annotated syntactic analysis of (1), the ERG derives an USR with eight scope bearing operators, which results in a total of 3960 readings. These readings are all semantically equivalent to each other. On the other hand, the USR for (2) has 480 readings, which fall into two classes of mutually equivalent readings, characterised by the relative scope of &quot;the lee of&quot; and &quot;a small hillside.&quot; In this paper, we present an algorithm for the redundancy elimination problem: Given an USR, compute an USR which has fewer readings, but still describes at least one representative of each equivalence class - without enumerating any readings. This algorithm makes it possible to compute the one or two representatives of the semantic equivalenceclassesintheexamples,sosubsequent modules don'thave todeal withall theother equivalent readings. It also closes the gap between the large number of readings predicted by the grammar and the intuitively perceived much lower degree of ambiguity of these sentences. Finally, it can be helpful for a grammar designer because it is much more feasible to check whether two readings are linguistically reasonable than 480. Our algorithm is applicable to arbitrary USRs (not just those computed by the ERG). While its effect is particularly significant on the ERG, which uniformly treats all kinds of noun phrases, including proper names and pronouns, as generalised quantifiers,itwillgenerallyhelpdealwithspuriousambi- null guities (such as scope ambiguities between indef- null inites), which have been a ubiquitous problem in most theories of scope since Montague Grammar.</Paragraph> <Paragraph position="3"> We model equivalence in terms of rewrite rules that permute quantifiers without changing the semantics of the readings. The particular USRs we workwithareunderspecifiedchartrepresentations, which can be computed from dominance graphs (or USRs in some other underspecification formalisms) efficiently (Koller and Thater, 2005b).</Paragraph> <Paragraph position="4"> We evaluate the performance of the algorithm on the Rondane treebank and show that it reduces the median number of readings from 56 to 4, by up to a factor of 666.240 for individual USRs, while running in negligible time.</Paragraph> <Paragraph position="5"> To our knowledge, our algorithm and its less powerful predecessor (Koller and Thater, 2006) are the first redundancy elimination algorithms in the literature that operate on the level of USRs.</Paragraph> <Paragraph position="6"> There has been previous research on enumerating only some representatives of each equivalence class (Vestre, 1991; Chaves, 2003), but these approaches don't maintain underspecification: After running their algorithms, they are left with a set of readings rather than an underspecified representation, i.e. we could no longer run other algorithms on an USR.</Paragraph> <Paragraph position="7"> The paper is structured as follows. We will first define dominance graphs and review the necessary backgroundtheoryinSection2.Wewillthenintroduce our notion of equivalence in Section 3, and present the redundancy elimination algorithm in Section 4. In Section 5, we describe the evaluation of the algorithm on the Rondane corpus. Finally, Section 6 concludes and points to further work.</Paragraph> </Section> class="xml-element"></Paper>