File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1046_intro.xml
Size: 5,560 bytes
Last Modified: 2025-10-06 14:05:16
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1046"> <Title>Lexical Disambiguation using Simulated Annealing</Title> <Section position="2" start_page="0" end_page="238" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The problem of word-sense disambiguation is central to text processing. Recently, promising computational methods have been suggested \[Lesk, 1987; McDonald et al., 1990; Wilks et al., 1990; Zernik and Jacobs, 1990; Guthrie et al., 1991; Hearst, 1991\] which attempt to use the local context of the word to be disambiguated together with information about each of its word senses to solve this problem.</Paragraph> <Paragraph position="1"> Lesk \[1987\] described a technique which measured the amount of overlap between a dictionary sense definition and the local context of the word to be disambiguated to successfully disambiguate the word &quot;cone&quot; in the phrases &quot;pine cone&quot; and &quot;ice cream cone&quot;. Later researchers have extended this basic idea in various ways. Wilks et al., \[1990\] identified neighborhoods of the 2,187 control vocabulary words in Longman's Dictionary of Contemporary English (LDOCE) \[Procter, 1978\] based on the co-occurrence of words in LDOCE dictionary definitions.</Paragraph> <Paragraph position="2"> These neighborhoods were then used to expand the word sense definitions of the word to be disambiguated, and the'overlap between the expanded definitions and the locdl context was used to select the correct sense of a word. */~ similar method reported by Guthrie et al., \[1991\] defined subject-specific neighborhoods of words, using the subject area markings in the machine readable version of LDOCE. Hearst \[1991\] suggests using syntactic information and part-of-speech tagging to aid in the disambiguation. She gathers co-occurrence information from manually sense-tagged text. Zernik and Jacobs \[1990\] also derive their neighborhoods from a training text which has been sense-tagged by hand. This method incorporates other clues as to the sense of the word in question found in the morphology or by first tagging the text as to part of speech.</Paragraph> <Paragraph position="3"> Although each of these techniques looks somewhat promising for disambiguation, the techniques have only been applied to several words, and the results have been based on experiments which repeatedly disambiguate a single word (or in \[Zernik and Jacobs, 1990\], one of three words) in a large number of sentences. In the cases where a success rate for the technique is reported, the results vary from 35% to 80%, depending on whether the correct dictionary sense is desired, or some coarser grained distinction is considered acceptable.</Paragraph> <Paragraph position="4"> For even the most successful of these techniques, processing of text is limited because of the amount of computation necessary to disambiguate each word in a sentence. A sentence which has ten words, several of which have multiple senses, can easily generate a million possible combinations of senses. The following figure ?? illustrates the number of combinations of word senses in the example sentences used in our experiment described below. Furthermore, if only one sense is computed at a time, as is the case in all of the numerically based work on disambiguation, the question arises as to whether and how to incorporate the fact that a sense has been chosen for a word when attempting to disambiguate the next.</Paragraph> <Paragraph position="5"> Should this first choice be changed in light of how other word senses are selected? These problems have not yet been addressed.</Paragraph> <Paragraph position="6"> In contrast to the somewhat numerical techniques described above, more principled methods based on linguistic information such as semantic preferences \[Wilks, 1975a; 1975b; Wilks and Fass, 1991\] have also been used for lexical disambiguation. These methods require exten-</Paragraph> <Section position="1" start_page="238" end_page="238" type="sub_section"> <SectionTitle> Total Senses </SectionTitle> <Paragraph position="0"> sive hand crafting by specialists of lexical items: assigning semantic categories to nouns, preferences to verbs and adjectives, etc. Maintaining consistency in these categories and preferences is a problem, and these methods are also susceptible to the combinatorial explosion described above.</Paragraph> <Paragraph position="1"> In this paper we suggest the application of a computational method called simulated annealing to this general class of methods (including some of the numerical methods referenced above) to allow all senses to be determined at once in a computationally effective way. We describe the application of simulated annealing to a basic method similar to that of Lesk \[1987\] which doesn't make use of any of the features such as part of speech tagging, subject area, or the use of morphology to determine part of speech. The simplicity of the technique makes it fully automatic, and it requires no hand-tagging of text or hand-crafting of neighborhoods. When this basic method operates under the guidance of the simulated annealing algorithm, sense selections are made concurrently for all ambiguous words in the sentence in a way designed to optimize their choice. The system's performance on a set of test sentences was encouraging and can be expected to improve when some of the refinements mentioned above are incorporated.</Paragraph> </Section> </Section> class="xml-element"></Paper>