File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1018_intro.xml
Size: 9,984 bytes
Last Modified: 2025-10-06 14:06:14
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1018"> <Title>Integrating Symbolic and Statistical Representations: The Lexicon Pragmatics Interface</Title> <Section position="3" start_page="0" end_page="138" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> VVhen words have multiple senses, these may have very different frequencies. For example, the first two senses of the noun diet given in WordNet are: O 1. (a prescribed selection of foods) => fare - (the food and drink that are regularly consumed) 2. => legislature, legislative assembly, general assembly, law-makers \]k|ost English speakers will share the intuition that the first sense is much more common than the second, and that this is (partly) a property of the word and not its denotation, since near-synonyms occur with much greater frequency. Frequency differences are also found between senses of derived forms (including morphological derivation, zero-derivation and compounding). For example, canoe is less frequent as a verb than as a noun. and the induced action use (e.g., they canoed the kids across the lake) is much less frequent than the intransitive form (with location PP) (they canoed across the lake). 1 A derived form may become established with one meaning, but this does not preclude other uses in sufficiently marked contexts (e.g., Bauer's (1983) example of garbage man with an interpretation analogous to snowman).</Paragraph> <Paragraph position="1"> Because of the difficulty of resolving lexical ambiguity, it is usual in NLP applications to exclude 'rare' senses from the lexicon, and to explicitly list frequent forms, rather than to derive them. But this increases errors due to unexpected vocabulary, especially for highly productive derivational processes. For this and other reasons it is preferable to assume some generative devices in the lexicon (Pustejovsky, 1995). Briscoe and Copestake (1996) argue that a differential estimation of the productivity of derivation processes allows an approximation of the probabilities of previously unseen derived uses. If more probable senses are preferred by the system, the proliferation of senses that results from unconstrained use of lexical rules or other generative devices is effectively controlled. An interacting issue is the granularity of meaning of derived forms. If the lexicon produces a small number of very underspecifled senses for a wordform, the ambiguity problem is apparently reduced, but pragmatics may have insufficient information with which to resolve meanings, or may find impossible interpretations.</Paragraph> <Paragraph position="2"> We argue here that by utilising probabilities, a language-specific component can offer hints to a pragmatic module in order to prioritise and control the application of real-world reasoning to disambiguation. The objective is an architecture utilising a general-purpose lexicon with domain-dependent probabilities. The particular issues we consider here are the integration of the statistical and symbolic components, and the division of labour between se- null mantics and pragmatics in determining meaning.</Paragraph> <Paragraph position="3"> We concentrate on (right-headed) compound nouns, since these raise especially difficult problems for NLP system architecture (Sparck Jones, 1983).</Paragraph> <Paragraph position="4"> 2 The grammar of compound nouns Within linguistics, attempts to classify nominal compounds using a small fixed set of meaning relations (e.g., Levi (1978)) are usually thought to have failed, because there appear to be exceptions to any classification. Compounds are attested with meanings which can only be determined contextually. Downing (1977) discusses apple juice seat, uttered in a context in which it identifies a place-setting with a glass of apple juice. Even for compounds with established meanings, context can force an alternative interpretation (Bauer, 1983).</Paragraph> <Paragraph position="5"> These problems led to analyses in which the relationship between the parts of a compound is undetermined by the grammar, e.g., Dowty (1979), Bauer (1983). Schematically this is equivalent to the following rule, where R is undetermined (to simplify exposition, we ignore the quantifier for y):</Paragraph> <Paragraph position="7"> Similar approaches have been adopted in NLP with further processing using domain restrictions to resolve the interpretation (e.g., Hobbs et al (1993)).</Paragraph> <Paragraph position="8"> However, this is also unsatisfactory, because (1) overgenerates and ignores systematic properties of various classes of compounds. Overgeneration is apparent when we consider translation of German compounds, since many do not correspond straight-forwardly to English compounds (e.g., Figure 1). Since these exceptions are English-specific they cannot be explained via pragmatics. Furthermore they are not simply due to lexical idiosyncrasies: for instance, Arzttermin/*doctor appointment is representative of many compounds with human-denoting first elements, which require a possessive in English.</Paragraph> <Paragraph position="9"> So we get blacksmith's hammer and not * blacksmith hammer to mean 'hammer of a type conventionally associated with a blacksmith' (also driver's cab, widow's allowance etc). This is not the usual possessive: compare (((his blacksmith)'s) hammer) with (his (blacksmith's hammer)). Adjective placement is also restricted: three English blacksmith's hammers/ *three blacksmith's English hammers. We treat these as a subtype of noun-noun compound with the possessive analysed as a case marker.</Paragraph> <Paragraph position="10"> In another subcategory of compounds, the head provides the predicate (e.g., dog catcher, bottle crusher). Again, there are restrictions: it is not usually possible to form a compound with an agentire predicate taking an argument that normally requires a preposition (contrast water seeker with * water looker). Stress assignment also demonstrates inadequacies in (1): compounds which have the interpretation 'Y made of X' (e.g., nylon rope, oak table) generally have main stress on the righthand noun, in contrast to most other compounds (Liberman and Sproat, 1992). Stress sometimes disambiguates meaning: e.g., with righthand stress cotton bag has the interpretation bag made of cotton while with leftmost stress an alternative reading, bag for cotton, is available. Furthermore, ordering of elements is restricted: e.g., cotton garment bag/ *garment cotton bag.</Paragraph> <Paragraph position="11"> The rule in (1) is therefore theoretically inadequate, because it predicts that all noun-noun compounds are acceptable. Furthermore, it gives no hint of likely interpretations, leaving an immense burden to pragmatics.</Paragraph> <Paragraph position="12"> We therefore take a position which is intermediate between the two extremes outlined above. We assume that the grammar/lexicon delimits the range of compounds and indicates conventional interpretations, but that some compounds may only be resolved by pragmatics and that non-conventional contextual interpretations are always available. We define a number of schemata which encode conventional meanings. These cover the majority of compounds, but for the remainder the interpretation is left unspecified, to be resolved by pragmatics.</Paragraph> <Paragraph position="14"> Space limitations preclude detailed discussion but Figures 2 and 3 show a partial default inheritance hierarchy of schemata (cf., Jones (1995)). 2 Multiple schemata may apply to a single compound: for example, cotton bag is an instantiation of the made-of schema, the non-derived-purposepatient schema and also the general-nn schema. Each applicable schema corresponds to a different sense: so cotton bag is ambiguous rather than vague.</Paragraph> <Paragraph position="15"> The interpretation of the hierarchy is that the use of a more general schema implies that the meanings given by specific subschemata are excluded, and thus we have the following interpretations for cotton bag: 1. Ax\[cotton(y) A bag(x) A made-of(y, x)\] 2. Ax\[cotton(y) A bag(x) A TELIC(bag)(y,x)\] = Ax\[cotton(y) A bag(x) A contain(y, x)\] 2We formalise this with typed default feature structures (Lascarides et al, 1996). Schemata can be re- null garded formally as lexical/grammar rules (lexical rules and grammar rules being very similar in our framework) but inefficiency due to multiple interpretations is avoided in the implementation by using a form of packing.</Paragraph> <Paragraph position="16"> 3. Ax\[R(y, x) A -~(made-of(y, x) V contain(y, x) V ...)\] The predicate made-of is to be interpreted as material constituency (e.g. Link (1983)). We follow Johnston and Busa (1996) in using Pustejovsky's (1995) concept of telic role to encode the purpose of an artifact. These schemata give minimal indications of compound semantics: it may be desirable to provide more information (Johnston et al, 1995), but we will not discuss that here.</Paragraph> <Paragraph position="17"> Established compounds may have idiosyncratic interpretations or inherit from one or more schemata (though compounds with multiple established senses due to ambiguity in the relationship between constituents rather than lexical ambiguity are fairly unusual). But established compounds may also have unestablished interpretations, although, as discussed in SS3, these will have minimal probabilities. In contrast, an unusual compound, such as apple-juice scat, may only be compatible with general-nn, and would be assigned the most underspecified interpretation. As we will see in SS4, this means pragmatics</Paragraph> <Paragraph position="19"> must find a contextual interpretation. Thus, for any compound there may be some context in which it can be interpreted, but in the absence of a marked context, only compounds which instantiate one of the subschemata are acceptable.</Paragraph> </Section> class="xml-element"></Paper>