File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0306_intro.xml
Size: 5,102 bytes
Last Modified: 2025-10-06 14:06:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0306"> <Title>Acquisition of Semantic Lexicons: Using Word Sense Disambiguation to Improve Precision</Title> <Section position="2" start_page="0" end_page="42" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> This paper addresses the problem of large-scale acquisition of computational-semantic lexicons from machine-readable resources. We describe semantic filters designed to reduce the number of incorrect assignments (i.e., improve precision) made by a purely syntactic technique. We demonstrate that it is possible to use these filters to build broad-coverage lexicons with minimal effort, at a depth of knowledge that lies at the syntax-semantics interface. We report on our results of disambiguating the verbs in the semantic filters by adding WordNet sense annotations. We then show the results of our classification on unknown words and we evaluate these results.</Paragraph> <Paragraph position="1"> As machine-readable resources (i.e., online dictionaries, thesauri, and other knowledge sources) become readily available to NLP researchers, automated acquisition has become increasingly more attractive. Several researchers have noted that the average time needed to construct a lexical entry by hand can be as much as 30 minutes (see, e.g., (Neff and McCord, 1990; Copestake et al., 1995; Walker and Amsler, 1986)). Given that most large-scale NLP applications require lexicons of 20-60,000 words, automation of the acquisition process has become a necessity.</Paragraph> <Paragraph position="2"> Previous research in automatic acquisition focuses primarily on the use of statistical techniques, such as bilingual alignment (Church and Hanks, 1990; Klavans and Tzoukermann, 1996; Wu and Xia, 1995), or extraction of syntactic constructions from online dictionaries and corpora (Brent, 1993; Dorr, Garman, and Weinberg, 1995). Others who have taken a more knowledge-based (interlingual) approach (Lonsdale, Mitamura, and Nyberg, 1996) do not provide a means for systematically deriving the relation between surface syntactic structures and their underlying semantic representations. Those who have taken more argument structures into account, e.g., (Copestake et al., 1995), do not take full advantage of the systematic relation between syntax and semantics during lexical acquisition.</Paragraph> <Paragraph position="3"> ! We adopt the central thesis of Levin (1993), i.e., that the semantic class of a verb and its syntactic behavior are predictably related. We base our work on a correlation between semantic classes and patterns of grammar codes in the Longman's Dictionary of Contemporary English (LDOCE) (Procter, 1978). We extend this work by coupling the syntax-semantics relation with a pre-defined association between WordNet (Miller, 1985) word senses and Levin's verbs in order to group the full Set of LDOCE verbs into semantic classes.</Paragraph> <Paragraph position="4"> While the LDOCE has been used previously in automatic extraction tasks (Alshawi, 1989; Farwell, Guthrie, and Wilks, 1993; Boguraev and Briscoe, 1989; Wilks et al., 1989; Wilks et al., 1990) these tasks are primarily concerned with the extraction of other types of information including syntactic phrase structure and broad argument restrictions or with the derivation of semantic structures from definition analyses. The work of Sanfilippo and Poznanski (1992) is more closely related to our approach in that it attempts to recover a syntactic-semantic relation from machine-readable dictionaries. However, they claim that the semantic classification of verbs based on standard machine-readable dictionaries (e.g., the LDOCE) is &quot;a hopeless pursuit \[since\] standard dictionaries are simply not equipped to offer this kind of information with consistency and exhaustiveness.&quot; Others have also argued that the task of simplifying lexical entries on the basis of broad semantic class membership is complex and, perhaps, infeasible (see, e.g., Boguraev and Briscoe (1989)).</Paragraph> <Paragraph position="5"> However, a number of researchers (Fillmore, 1968; Grimshaw, 1990; Gruber, 1965; Guthrie et al., 1991; Hearst, 1991; Jackendoff, 1983; Jackendoff, 1990; Levin, 1993; Pinker, 1989; Yarowsky, 1992) have demonstrated conclusively that there is a clear relationship between syntactic context and word senses; it is our aim to exploit this relationship for the acquisition of semantic lexicons. We first describe the LDOCE verb classification resulting from a purely syntactic approach to deriving semantic classes. We then describe a semantic filter designed to reduce the number of incorrect assignments made by the syntactic technique; we show how this filter can be enhanced with a method that accounts for multiple word senses. Finally we show the results of our classification of unknown verbs, and we evaluate these results. Our results clearly indicate that the resolution of polysemy is a key component to developing an effective semantic filter.</Paragraph> </Section> class="xml-element"></Paper>