File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-2082_intro.xml
Size: 8,897 bytes
Last Modified: 2025-10-06 14:05:11
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2082"> <Title>Automatic Acquisition of Hyponyms ~om Large Text Corpora</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Currently there is much interest in the automatic acquisition of lexiea\[ syntax and semantics, with the goal of building up large lexicons for natural lain guage processing. Projects that center around extracting lexical information from Machine Readable Dictionaries (MRDs) have shown much success but are inherently limited, since the set of entries within a dictionary is fixed. In order to find terms and expressions that are not defined in MRDs we must turn to other textual resources. For this purpose, we view a text corpus not only as a source of information, but also as a source of information about the language it is written in.</Paragraph> <Paragraph position="1"> When interpreting unrestricted, domain-independent text, it is difficult to determine in advance what kind of infbrmation will be encountered and how it will be expressed. Instead of interpreting everything in the text in great detail, we can searcil for specific lexical relations that are expressed in well-known ways. Surprisingly useful information can be found with only a very simple understanding of a text. Consider the following sentence: 1.</Paragraph> <Paragraph position="2"> (SI) The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck :for each string.</Paragraph> <Paragraph position="3"> Most fluent readers of English who }lave never before encountered the term 'q3amhara ndang&quot; will nevertheless from this sentence infer that a &quot;Bambara udang&quot; is a kind of &quot;bow Iute&quot;. This is true even if tile reader has only a fuzzy conception of what a how lute is. Note that the attthor of the sentence is not deliberately defining the term, as would a dictionary or a children's book containing a didactic sentence like A Bambara ndang is a kind of bow lute. However, the semantics of the lexico-syntactic construction indicated by the pattern: (la) NPo ..... h as {NP1, NP2 .... (and Ior)} NP,, are such that they imply (lb) for all NP,, 1 < i< n, hyponym(NPi, NPo) Thus from sentence (SI) we conclude hyponym ( &quot;Barn bare n dang&quot;, &quot;how lu re&quot;). We use the term hyponym similarly to the sense used in (Miller et el. 1990): a concept represented by a lexicaI item L0 is said to be a hyponym of the concept represented by a lexical item LI if native speakers of English accept sentences constructed from the frame An Lo is a (kind of) L1. Here Lt is the hypernym of Lo and the relationship is reflexive and transitive, but not symmetric.</Paragraph> <Paragraph position="4"> This example shows a way to discover a hyponymic lexical relationship between two or more noun phrases in a naturally-occurring text. This approach is simllar in spirit to the pattern-based interpretation techniques being used in MRD processing. For example, t All examples in this paper are real text, taken from Grolter's Amerwan Acaderntc Encyclopedia(Groher tg00) AcrF.s DE COLING-92, NANTI~S, 23-28 Aol}r 1992 5 3 9 PROC. OV COLING-92, NhNTIIS, AUG. 23-28, 1992 (Alshawi 1987), in interpreting LDOCE definitions, uses a hierarchy of patterns which consist mainly of part-of-speech indicators and wildcard characters.</Paragraph> <Paragraph position="5"> (Markowitz e~ al. 1986), (Jensen & Binot 1987), and (Nakamura & Nagao 1988) also use pattern recognition to extract semantic relations such as taxonomy from various dictionaries. (Ahlswede & Evens I988) compares an approach based on parsing Webster's 7th definitions with one based on pattern recognition, and finds that for finding simple semantic relations, pattern recognition \[s far more accurate and efficient than parsing. The general feeling is that the structure and function of MRDs makes their interpretation amenable to pattern-recognition techniques.</Paragraph> <Paragraph position="6"> Thus one could say by interpreting sentence (S1) according to (In-b) we are applying pattern-based relation recognition to general texts. Since one of the goals of building a lexical hierarchy automatically is to aid in the construction of a natural language processing program, this approach to acquisition is preferable to one that needs a complex parser ~nd knowledge base. The tradeoff is that the the reformation acquired is coarse-grained.</Paragraph> <Paragraph position="7"> There are many ways that the structure of a language can indicate the meanings of lexical items, but the difficulty lies in finding constructions that frequently and reliably indicate the relation of interest. It might seem tbat because free text is so varied in form and content (as compared with the somewhat regular structure of the dictionary) that it may not be possible to find such constructions. However, we have identified a set of lexico-syntactic patterns, including the one shown in (In) above, that indicate the hyponymy relation and that satisfy the following desiderata: (i) They occur frequently and in many text genres.</Paragraph> <Paragraph position="8"> (ii) They (almost) always indicate the relation of interest. null (iii) They can be recognized with little or no pre-encoded knowledge.</Paragraph> <Paragraph position="9"> Item (i) indicates that the pattern will result in the discovery of many instances of the relation, item (ii) that the information extracted will not be erroneous, and item (iii) that making use of the pattern does not require the tools that it is intended to help build.</Paragraph> <Paragraph position="10"> Finding instances of the hyponymy relation is useful for several purposes: Lexicon Augmentation. Hyponymy relations can be used to augment and verify existing lexicons, including ones built from MRDs. Section 3 of this paper describes an example, comparing results extracted from a text corpus with information stored in the noun hierarchy of WordNet ((Miller et al. 1990)), a hand-built lexical thesaurus.</Paragraph> <Paragraph position="11"> Noun Phrase Semantics. Another purpose to which these relations can be applied is the identification of the general meaning of an unfamiliar noun phrases. For example, discovering the predicate hyponym( &quot;broken bone&quot;, &quot;injury&quot;) indicates that tbe term &quot;broken bone&quot; can be understood at some level as an &quot;injury&quot; without having to determine the correct senses of the component words and how they combine. Note also that a term like &quot;broken bone&quot; is not likely to appear in a dictionary or lexicon, although it is a common locution.</Paragraph> <Paragraph position="12"> Semantic Relatedness Information. There bas recently been work in the detection of semantically related nouns via, for example, shared argument structures (Hindle 1990), and shared dictionary definition context (Wilks eC/ al. 1990). These approaches attempt to infer relationships among \[exical terms by looking at very large text samples and determining which ones are related in a statistically significant way. The technique introduced in this paper can be seen as having a similar goal but an entirely different approach, since only one sample need be found in order to determine a salient relationship (and that sample may be infrequently occurring or nonexistent).</Paragraph> <Paragraph position="13"> Thinking of the relations discovered as closely related semantically instead of as hyponymic is most felicitous when the noun phrases involved are modified and atypical. Consider, for example, the predicate hyponym( &quot;detonating explosive&quot;, &quot;blasting agent&quot;) This relation may not be a canonical ISA relation but the fact that it was found in a text implies that the terms' meanings are close. Connecting terms whose expressions are quite disparate but whose meanings are similar should be useful for improved synonym expansion in information retrieval and for finding chains of semantically related phrases, as used in the approach to recognition of topic boundaries of (Morris Hirst 1991). We observe that terms that occur in a list are often related semantically, whether they occur in a hyponymy relation or not.</Paragraph> <Paragraph position="14"> In the next section we outline a way to discover these lexico-syntactic patterns as well as illustrate those we have found. Section 3 shows the results of searching texts for a restricted version of one of the patterns and compares the results against a hand-built thesaurus.</Paragraph> <Paragraph position="15"> Section 4 is a discussion of the merits of this work and describes future directions.</Paragraph> </Section> class="xml-element"></Paper>