File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1056_metho.xml
Size: 10,568 bytes
Last Modified: 2025-10-06 14:12:56
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1056"> <Title>Lexical Disambiguation using Simulated Annealing</Title> <Section position="4" start_page="79968" end_page="79968" type="metho"> <SectionTitle> 3. Word-Sense Disambiguation </SectionTitle> <Paragraph position="0"> Given a sentence with N words, we may represent the senses of the ith word as sil, si2, ' sik,, where k~ is the number of senses of the ith word which appear in LDOCE. A configuration of the system is obtained by choosing a sense for each word in the sentence. Our goal is to choose that configuration which a human disambiguator would choose. To that end, we must define a function E whose minimum we may reasonable expect to correspond to the correct choice of the word senses.</Paragraph> <Paragraph position="1"> The value of E for a given configuration is calculated in terms of the definitions of the N senses which make it up. All words in these definitions are stemmed, and the results stored in a list. If a subject code is given for a sense, the code is treated as a stemmed word. The redundancy R is computed by giving a stemmed word form which appears n times a score of n-1 and adding up the scores. Finally, E is defined to be - 1 I+R ' ACRES DI,; COLING-92, NANTES, 23-28 Ate' 1992 3 6 1 PROC. OF COLING-92, NANTEs, Ant;. 23-28, 1992 The rationale behind this choice of E is that word senses which belong together in a sentence will have more words and subject codes in common in their definitions (larger values of R) than senses which do not belong together. Minimizing E will maximize R and determine our choice of word senses, The starting configuration C is chosen to be that in which sense number one of each word is chosen. Since the senses in LDOCE are generally listed with the most frequently used sense first, this is a likely starting point. The value of E is computed for this configuration. The next step is to choose at random a word number i and a sense S~j of that ith word. The configuration C' is is constnacted by replacing the old sense of the ith word by the sense S o. Let L~E be the change fTom E to the value computed for C'. If ~E < 0, then C' replaces C, and we make a new random change in C'.</Paragraph> <Paragraph position="2"> If A~. > 0, we change to C' with probability</Paragraph> <Paragraph position="4"> whose initial value is 1, and thedecision of whether or not to adopt C' is made by calling a random number generator. If the number generated is less than P, C is replaced by C'. Otherwise, C is retained.</Paragraph> <Paragraph position="5"> This process of generating new configurations and checking to see whether or not to choose them is repeated on the order of 1000 times, T is replaced by 0.9T, and the loop entered again. Once the loop is executed with no change in the configuration, the routine ends, and this final configuration tells which word seflses are to be selected.</Paragraph> </Section> <Section position="5" start_page="79968" end_page="79968" type="metho"> <SectionTitle> 4. Experiments </SectionTitle> <Paragraph position="0"> To evaluate a method of word sense dtsambiguation it is necessary to check the results by hand or have text which has already been disambiguated by hand to use as test data. Since there is no general agreement on word senses, each system must have its own test data. Thus even though the algorithm we have described is automatic and has coverage of the 28, 000 words in LDOCE, the evaluation is the tedious hand work the system is meant to ease or eliminate.</Paragraph> <Paragraph position="1"> In our first experiment, the algorithm described above was used to disambiguate 50 example sentences from LDOCE. A stop list of very common words such as &quot;the&quot;, &quot;as&quot;, and &quot;of&quot; was removed from each sentence. The sentences then contained from two to fifteen words, with an average of 5.5 ambiguous words per sentence. Definitions in LDOCE are broken down first into broad senses which we call &quot;homographs&quot;, and then into individual senses which distinguish among the various meanings. For example, one homograph of &quot;bank&quot; means roughly &quot;something piled up.&quot; There are five senses in this homograph which distinguish whether the thing piled up is snow, clouds, earth by a river, etc.</Paragraph> <Paragraph position="2"> Results of the algorithm were evaluated by having a Iterate human disambiguate the sentences and comparing these choices of word senses with the output of the program.</Paragraph> <Paragraph position="3"> Using the human choices as the standard, the algorithm correctly disambiguated 47% of the words to the sense level, and 72 % to the homograph level.</Paragraph> <Paragraph position="4"> More recently we have developed a software tool to improve the process of manual disambiguation of test sentences.</Paragraph> <Paragraph position="5"> Slight modifications to the software allow it to be used in conjunction with the algorithm as a computer aided disambiguation system.</Paragraph> <Paragraph position="6"> The software displays the text to be disambiguated in a window, and when the user chooses a word, all its definitions are displayed in another window. The user then selects the appropriate sense, and this selection is added to a file corresponding to the original text. This file is called the key and the results of the algorithm are scored against it.</Paragraph> <Paragraph position="7"> Using this tool, 17 sentences for the Wall Street Journal were disambiguated by hand relative to LDOCE. The same stop list AcI'~ DE COLING-92, NANTES, 23-28 Ao(rr 1992 3 6 2 Pgoc. OF COLING-92, NANTES, AUG. 23-28, 1992 of common words was used as in the first experiment. The algorithm was used to disambiguate the 17 sentences, and the results automatically scored against the key.</Paragraph> <Paragraph position="8"> Results for the Wall Street Journal sentences were similar to those for the first experiment. null One difficulty with the present algorithm is that long definitions tend to be given preference over shorter ones. Words defined succinctly by a synonym are greatly penalized. The function E must be made to better model the problem to improve performance. On the other hand, the simulated annealing itself seems to be doing very well at finding the minimum. In those cases where the configuration selected is not the correct disambiguation of the sentence, the correct disambiguation never had a lower value of E than the configuration selected.</Paragraph> <Paragraph position="9"> Experiments in which we varied the beginning temperature and the rate of cooling didn't change tile configuration ultimately selected and seemed to show that those parameters are not very delicate.</Paragraph> <Paragraph position="10"> Direct comparisons of these success rates with those of other methods is difficult. Veronis and Ide \[1990\] propose a large scale method, but results are reported for only one sentence, and no success rate is given.</Paragraph> <Paragraph position="11"> None of the other methods was used to disambiguate every ambiguous word in a sentence. They were applied to one, or at most a few, highly ambiguous words. It appears that in some cases the fact that our success rates include not only highly ambiguous words, but some words w~th only a few senses is offset by the fact that other researchers have used a broader definition of word sense. For example, the four senses of &quot;interest&quot; used by Zernlk and Jacobs \[1990\] may correspond more closely to our two homographs and not our ten senses of &quot;interest.&quot; Their success rate in tagging the three words &quot;interest&quot;, &quot;stock&quot;, and &quot;bond&quot; was 70%. Thus it appears that the method we propose is comparable in effectiveness to the other computational methods of word-sense disambiguation, and has the advantages of being automatically applicable to all the 28,000 words in LDOCE and of being computationally pructical.</Paragraph> <Paragraph position="12"> Below we give two examples of the results of the technique. The words following the arrow are the stemmed words selected from the definitions and used to calculate the redundancy. The headword and sense numbers are those used in the machine readable version of LDOCE.</Paragraph> </Section> <Section position="6" start_page="79968" end_page="79968" type="metho"> <SectionTitle> EXAMPLE SENTENCE 1 </SectionTitle> <Paragraph position="0"> The fish floundered on the river bank, struggling to breathe</Paragraph> </Section> <Section position="7" start_page="79968" end_page="79968" type="metho"> <SectionTitle> DISAMBIGUATION </SectionTitle> <Paragraph position="0"> 1) fish hw 1 sense ! : DEF -> fish creature whose blood change temperature according around live water use its FIN tail swim 2) river hw 0 sense 1 : DEF -> river wide nature stream water flow between bank lake another sea 3) bank hw 1 sense 1 : DEF -> bank land along side river lake 4) sta-uggle hw 1 sense 0 : DEF -> s~uggle violent move fight against thing Finally, we show two graphs which illustrate the convergence of the simulated annealing technique to the minimum energy * ' (E) level. The second graph is a close-up of the final cycles of the complete process shown in the first graph.</Paragraph> </Section> <Section position="8" start_page="79968" end_page="79968" type="metho"> <SectionTitle> 5. Conclusion </SectionTitle> <Paragraph position="0"> This paper describes a method for word-sense disambiguation based on the ample technique of choosing senses of the words in a sentence so that their definitions in LDOCE have the most words and subject codes in common. The amount of computation necessary to find this optimal choice exactly quickly becomes prohibitive as the number of ambiguous words and the number of senses increase. The computational technique of simulated annealing allows a good approxamation to be computed quickly.</Paragraph> <Paragraph position="1"> Thus all the words m a sentence are disambiguated simultaneously, m a reasonable rune, and automatically (with no hand disambiguation of training text). Results using this technique are comparable to other computational techniques and enhancements incorporating co-occurrence and part-of-speech information, which have been exploited in one-word-at-a time techniques, may be expected to improve the performance. null</Paragraph> </Section> class="xml-element"></Paper>