File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/w96-0104_abstr.xml
Size: 4,871 bytes
Last Modified: 2025-10-06 13:48:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0104"> <Title>Learning similarity-based word sense disambiguation from sparse data</Title> <Section position="2" start_page="0" end_page="42" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data.</Paragraph> <Paragraph position="1"> Introduction Word Sense Disambiguation (WSD) is the problem of assigning a sense to an ambiguous word, using its context. We assume that different senses of a word correspond to different entries in its dictionary definition. For example, suit has two senses listed in a dictionary: an action in court, and suit of clothes. Given the sentence The union's lawyers are reviewing the suit, we would like the system to decide automatically that suit is used there in its court-related sense (we assume that the part of speech of the polysemous word is known).</Paragraph> <Paragraph position="2"> In recent years, text corpora have been the main source of information for learning automatic WSD (see, e.g., (Gale et al., 1992)). A typical corpus-based algorithm constructs a training set from all contexts of a polysemous word W in the corpus, and uses it to learn a classifier that maps instances of W (each supplied with its context) into the senses. Because learning requires that the examples in the training set be partitioned into the different senses, and because sense information is not available in the corpus explicitly, this approach depends critically on manuM sense tagging -a laborious and time-consuming process that has to be repeated for every word, in every language, and, more likely than not, for every topic of discourse or source of information.</Paragraph> <Paragraph position="3"> The need for tagged examples creates a problem referred to in previous works as the knowledge acquisition bottleneck: training a disambiguator for W requires that the examples in the corpus be partitioned into senses, which, in turn, requires a fully operational disambiguator. The method we propose circumvents this problem by automatically tagging the training set examples for W using other examples, that do not contain W, but do contain related words extracted from its dictionary definition. For instance, in the training set for suit, we would use, in addition to the contexts of suit, all the contexts of cour'c and of clothes in the corpus, because court and clothes appear in the MRD entry of suit that defines its two senses. Note that, unfike the contexts of suit, which may discuss either court action or clothing, the contexts of court are not fikely to be especially related to clothing, and, similarly, those of clothes will normally have tittle to do with lawsuits. We will use this observation to tag the original contexts of suic.</Paragraph> <Paragraph position="4"> Another problem that affects the corpus-based WSD methods is the sparseness of data: these methods typically rely on the statistics of cooccurrences of words, while many of the possible cooccurrences are not observed even in a very large corpus (Church and Mercer, 1993). We address this problem in several ways. First, instead of tallying word statistics for the examples of each sense (which may be unrefiable when the examples are few), we collect sentence-level statistics, representing each sentence by the set of features it contains. Second, we define a similarity measure on the feature space, which allows us to pool the statistics of similar features. Third, in addition to the examples of the polysemous word W in the corpus, we learn also from the examples of all the words in the dictionary definition of W. In our experiments, this resulted in a training set that could be up to 20 times larger than the set of original examples.</Paragraph> <Paragraph position="5"> The rest of this paper is organized as follows. Section 1 describes the approach we have developed. In section 2, we report the results of tests we have conducted on the Treebank-2 corpus. Section 3 describes related work. Proofs and other details of our scheme can be found in (Karov and Edelman, 1996).</Paragraph> </Section> class="xml-element"></Paper>