File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-2601_abstr.xml
Size: 7,259 bytes
Last Modified: 2025-10-06 13:44:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2601"> <Title>OntoSem Methods for Processing Semantic Ellipsis</Title> <Section position="1" start_page="0" end_page="3366" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes various types of semantic ellipsis and underspecification in natural language, and the ways in which the meaning of semantically elided elements is reconstructed in the Ontological Semantics (OntoSem) text processing environment. The description covers phenomena whose treatment in OntoSem has reached various levels of advancement: fully implemented, partially implemented, and described algorithmically outside of implementation. We present these research results at this point - prior to full implementation and extensive evaluation - for two reasons: first, new descriptive material is being reported; second, some subclasses of the phenomena in question will require a truly long-term effort whose results are best reported in installments.</Paragraph> <Paragraph position="1"> Introduction Syntactic ellipsis - the non-expression of syntactically obligatory elements - has been widely studied in computational (not to mention other branches of) linguistics, largely because accounting for missing syntactic elements is a crucial aspect of achieving a full parse, and parsing is required for many approaches to NLP.</Paragraph> <Paragraph position="2"> Much less attention has been devoted to what we will call semantic ellipsis, or the non-expression of elements that, while not syntactically obligatory, are required for a full semantic interpretation of a text.</Paragraph> <Paragraph position="3"> Naturally, semantic ellipsis is important only in truly knowledge-rich ap- null Examples of NLP efforts to resolve syntactic ellipsis include Hobbs and Kehler 1997; Kehler and Shieber 1997; and Lappin 1992, among many others.</Paragraph> <Paragraph position="4"> Some of the types of semantic underspecification treated here are described in the literature (e.g., Pustejovsky 1995) in theoretical terms, not as heuristic algorithms. This is due, in large part, to a lack of knowledge sources for semantic reasoning in those contributions.</Paragraph> <Paragraph position="5"> proaches to NLP, which few current non-toy systems pursue.</Paragraph> <Paragraph position="6"> All definitions of ellipsis derive from a stated or implied notion of completeness. Taking, again, the example of syntactic ellipsis, this means that obligatory verbal arguments must be overt, auxiliary verbs must have complements, etc. - all of which is defined in lexico-grammatical terms. But even if a text is devoid of syntactic gaps, much remains below the surface, easily interpretable by people but not directly observable. Typical examples of semantically underspecified elements are pronouns and indexicals (e.g., here, now, yesterday), whose real-world anchors must be clarified in a fully developed semantic representation (i.e., yesterday has a concrete meaning only if one knows when today is). Pronouns and indexicals, though often difficult to resolve, have one advantage over the cases to be discussed here: the trigger that further semantic specification need be carried out is the word itself, and the inventory of such words is well known.</Paragraph> <Paragraph position="7"> By contrast, the semantically underspecified cases in the following examples are more subtle: (1) After boosting employment the past few years, Aluminum Co. of America won't be doing any hiring this fall beyond replacing those who leave. (2) Mitchell said he planned to work late tonight to complete the legislation.</Paragraph> <Paragraph position="8"> (3) Civilians invited into the prison by the admini null stration to help keep the peace were unable to stanch the bloodshed.</Paragraph> <Paragraph position="9"> The categories of semantic ellipsis illustrated by these examples can be described as follows. (1) shows reference resolution that relies on the reconstruction of a semantically elided category: i.e., to understand who those refers to, one must understand that the implicit object of hire is 'employees', and that the elided head of the NP with those as its determiner also refers to employees (albeit a different real-world set of employees). (2) illustrates semantic event ellipsis in configurations containing modal/aspectual + OBJECT: i.e., the meaning of complete the legislation is actually complete writing the legislation. (3) illustrates lexical patterns with predictable event ellipsis: e.g., invite <person> to <location> means 'invite someone to come/go to the location.' These examples, which illustrate the types of semantic ellipsis to be discussed below, require special treatment in our ontological semantic (OntoSem) text processing system, since its goal is to automatically produce fully specified semantic representations of unrestricted text that can then be used in a wide variety of applications.</Paragraph> <Paragraph position="10"> A Snapshot of the OntoSem Environment null OntoSem is a text-processing environment that takes as input unrestricted raw text and carries out preprocessing, morphological analysis, syntactic analysis, and semantic analysis, with the results of semantic analysis represented as formal text-meaning representations (TMRs) that can then be used as the basis for many applications. Text analysis relies on: * The OntoSem language-independent ontology, which is written using a metalanguage of description and currently contains around 5,500 concepts, each of which is described by an average of 16 properties. * An OntoSem lexicon for each language processed, which contains syntactic and semantic zones (linked using variables) as well as calls to &quot;meaning procedures&quot; (i.e., programs that carry out procedural semantics, see McShane et al. forthcoming) when applicable. The semantic zone most frequently refers to ontological concepts, either directly or with property-based modifications, but can also describe word meaning extra-ontologically, for example, in terms of modality, aspect, time, etc. The current English lexicon contains approximately 12K senses, including all closed-class items and the most frequent verbs, as indicated by corpus analysis.</Paragraph> <Paragraph position="11"> * An onomasticon, or lexicon of proper names, which contains approximately 350,000 entries and is growing daily using automated extraction techniques. * A fact repository, which contains real-world facts represented as numbered &quot;remembered instances&quot; of ontological concepts (e.g., SPEECH-ACT-3366 is the th instantiation of the concept SPEECH-ACT in the world model constructed during the processing of some given text(s)).</Paragraph> <Paragraph position="12"> * The OntoSem text analyzers, which cover preprocessing, syntactic analysis, semantic analysis, and creation of TMRs.</Paragraph> <Paragraph position="13"> * The TMR language, which is the metalanguage for representing text meaning.</Paragraph> <Paragraph position="14"> A very simple example of a TMR, reflecting the meaning of the sentence The US won the war, is as follows: null</Paragraph> </Section> class="xml-element"></Paper>