File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1047_intro.xml
Size: 4,901 bytes
Last Modified: 2025-10-06 14:05:46
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1047"> <Title>A New Approach to Word Sense Disambiguation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Assigning sense tags to the words in a text can be viewed as a classification problem. A probabilistic classifier assigns to each word the tag that has the highest estimated probability of having occurred in the given context. Designing a probabilistic classifier for word-sense disambiguation includes two main sub-tasks: specifying an appropriate model and estimating the parameters of that model. The former involves selecting informative contextual features (such as collocations) and describing the joint distribution of the values of these features and the sense tags of the word to be classified. The parameters of a model are the characteristics of the entire population that are cohsidered in the model. Practical applications require the use of estimates of the parameters.</Paragraph> <Paragraph position="1"> Such estimates are based on functions of a data sample (i.e., statistics) rather than the complete population. To make the estimation of parameters feasible, a model with a simplified form is created by limiting the number of contextual features considered and by expressing the joint distribution of features and sense tags in terms of only the most important systematic interactions among variables.</Paragraph> <Paragraph position="2"> To date, much of the work in statistical NLP has focused on parameter estimation (\[11\], \[13\], \[12\], \[4\]). Of the research directed toward identifying the optimum form of model, most has been concerned with the selection of individually informative features (\[2\], \[5\]), with relatively little attention directed toward the identification of an optimum approximation to the joint distribution of the values of the contextual features and object classes. Most previous efforts to formulate a probabilistic classifier for word-sense disambiguation did not attempt to systematically identify the interdependencies among contextual features that can be used to classify the meaning of an ambiguous word. Many researchers have performed disambiguation on the basis of only a single feature (\[61, \[15\], \[2\]), while others who do consider multiple contextual features assume that all contextual features are either conditionally independent given the sense of the word (Is\], \[14\]) or fuRRy independent (\[10\], \[16\]).</Paragraph> <Paragraph position="3"> J In earlier work, we describe a method for identifying an uppropriate model for use in disambiguating a word given a set of contextual features. We chose a particular set of contextual features and, using this method, identified a model incorporating these features for use in disambiguating the noun interest. These features, which are assigned automatically, are of three types: morphological, collocation-specific, and class-based, with part-of-speech (POS) categories serving as the word classes (see \[3\] for how the features were chosen).</Paragraph> <Paragraph position="4"> The results of using the model to disambiguate the noun interest were encouraging. We suspect that the model provides a description of the distribution of sense tags and contextual features that is applicable to a wide range of content words. This paper provides suggestive evidence supporting this, by testing its applicability to the disambiguation of several words. Specifically, for each word to be disambiguated, we created a model according to a schema, where that schema is a generalization of the model created for interest. We evaluate the performance of probabilistic word-sense classifiers that utilize maximum likelihood estimates for the parameters of models created for the following lexical items: the noun senses of bill and concern, the verb senses of close and help, and the adjective senses of common. We also identify upper and lower bounds for the performance of any probabilistic classifier utilizing the same set of contextual features, as well as compare, for each word, the performance of' (1) a classifier using a model created according to the schema for that word, with (2) the performance of a classifier that uses a model selected, per the procedure to be described in section 2, as the best model for that word given the same set of contextual features.</Paragraph> <Paragraph position="5"> Section 2 of this paper describes the method used for selecting the form of a probabilistic model given sense tags and a set of contextual features. In section 3, the model schema is presented and, in section 4, the experiments using models created according to the schema are described. Section 5 discusses the results of the experiments and section 6 discusses future work.</Paragraph> </Section> class="xml-element"></Paper>