File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/j98-1003_metho.xml
Size: 14,712 bytes
Last Modified: 2025-10-06 14:14:51
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-1003"> <Title>Topical Clustering of MRD Senses Based on Information Retrieval Techniques</Title> <Section position="5" start_page="82" end_page="138" type="metho"> <SectionTitle> 6. Other Approaches </SectionTitle> <Paragraph position="0"> Sanfilippo and Poznanski (1992) propose a so-called dictionary correlation kit (DCK) in a dialogue-based environment for correlating word senses across a pair of MRDs such as the LDOCE and the LLOCE. The approach taken in DCK is essentially a heuristic one, based on a correlation in the headwords, grammar codes, definition, and examples between the senses in LDOCE and LLOCE. The authors indicate that for the heuristics to yield optimum results, the degree of overlap in the examples should be weighted twice as heavily as all other factors. However, they do not elaborate on how the comparisons are done, or on how effective the program is.</Paragraph> <Paragraph position="1"> Dolan (1994) describes a heuristic approach to forming unlabeled clusters of closely related senses in an MRD. The clustering program relies on LDOCE domain code, grammar code, and 25 types of semantic relations extracted from definitions such as Hypernym, Location, Manner, Purpose, PartOf, and IngredientOf. Matching two senses Computational Linguistics Volume 24, Number 1 involves comparing any values that have been identified for each of the semantic relation types. The author reports that straightforwardly comparing the values of the same semantic relation types, particularly the Hypernym relation, for two senses would be quite effective. In addition to such a comparison, a number of &quot;scrambled&quot; comparisons between values of different types of semantic relations are also helpful. For instance, in comparing the two senses of coffee, the value &quot;drink&quot; in the sense, &quot;the coffee as a drink&quot; is compared with that of the IngredientOf relation in another sense, &quot;the powder as an ingredient of the drink.&quot; Yarowsky (1992) describes a WSD method and an implementation based on Roget's Thesaurus and a very large corpus, the 10-million-word Grolier's Encyclopedia. He suggests that the method can be applied to disambiguation and merging of MRD deftnitions as well, and gives the results of applying the method to the senses of the word crane for the COBUILD and Collins dictionaries using Roget's categories as an example.</Paragraph> <Paragraph position="2"> It is not known how the method fares for words other than crane. Contrary to our approach, the method requires substantial data for training.</Paragraph> <Paragraph position="3"> In most of the above-mentioned works, experimental results are reported only for some senses of a few words. In this study, we have evaluated our method using all senses for 20 words that have been studied in WSD literature. This evaluation provides an overall picture of the expected success rate of the method when applied to all word senses in the MRD. Direct comparison of methods is often difficult, but it is clear that, as compared to other methods discussed above, our algorithm is very simple, requires minimal preprocessing, and does not rely on information idiosyncratic to the MRD, such as the LDOCE subject code or grammar code. Thus, the algorithm described in this paper can be readily applied to other MRDs besides LDOCE. Although our algorithm makes use of defining words in various semantic relations with the sense, those relations need not be explicitly computed through an elaborated parsing and extraction process.</Paragraph> <Paragraph position="4"> Finally, it is interesting to compare our method with some aspects of the program for induction of sense division of Sch/.itze (1992). As mentioned in the introduction, the program uses distributional similarity of lexical co-occurrence to partition word instances into clusters that are likely to be related to sense division. Drawing on the work of latent semantic indexing in IR research, words and contexts are represented as vectors in a multidimensional space. Regression techniques of singular value decomposition are used to reduce the representation to a lower dimensional space. After that, sense division is derived through unsupervised clustering of these word instances. Our method, on the other hand, relies primarily on co-occurrence in an existing set of topical clusters, the topics in LLOCE or Roget's. The sense in question is simply merged to the nearest topical cluster. Low-cost distance calculation is done according to the overlap between words in a definition and a topical cluster.</Paragraph> <Paragraph position="5"> 7. Conclusions and Future Work This paper presents the issues of WSD using machine-readable dictionaries. It describes simple but effective algorithms for disambiguating and clustering dictionary senses to create a sense division for WSD. The proposed algorithms are effective for specific linguistic reasons. Although word sense is an abstract concept that relies on the subjective and subtle distinction of many factors, coarse word sense division can be attributed primarily to the subject and topic. This is evident from the observation that very topical genus and differentiae show up in dictionary definitions in rather rigid patterns. Therefore, an MRD coupled with a thesaurus organized according to subjects and topics is very effective for acquisition of sense division for WSD.</Paragraph> <Paragraph position="6"> Chen and Chang Topical Clustering In a broader context, this paper presents an approach to automatic construction of semantic lexicons through integration of lexicographic resources such as MRDs and thesauri. As noted in Dolan (1994), it is possible to run a sense-clustering algorithm on several MRDs to build an integrated lexical database with more complete coverage of word senses. If TopSense is run on several bilingual MRDs, there is a potential for creating an integrated multilingual lexicon enriched with thesaurus concepts as language-neutral signs to support knowledge-based machine translation. A similar idea has been put forward by Okumura and Hovy (1994).</Paragraph> <Paragraph position="7"> The TopSense algorithm's performance could definitely be improved by handling deictic, metonymic, and metaphoric sense definitions more appropriately. Nevertheless, the algorithm already produces clustered MRD sense entries that not only are exploitable as a workable sense division but also are likely to be an effective knowledge source for many NLP tasks related to semantic processing, such as WSD. In summary, this paper presents a functional core for automatic construction of the semantic lexicon.</Paragraph> <Paragraph position="8"> Appendix The following table shows the experimental results of running TopSense on the LDOCE senses in a test set of 20 highly polysemous words.</Paragraph> <Paragraph position="9"> bass * the fruit of a PINE or FIR, consisting of several partly separate seed-containing pieces laid over each other, shaped rather like this.</Paragraph> <Paragraph position="10"> * a hollow or solid object shaped like this.</Paragraph> <Paragraph position="11"> * a solid object with a round base and a point at the top.</Paragraph> <Paragraph position="12"> moving heavy objects by means of a very strong rope or wire fastened to a movable arm (JIB).</Paragraph> <Paragraph position="13"> * a type of large tall bird with very long legs and neck, which spends much time walking in water catching fish in its very long beak.</Paragraph> <Paragraph position="14"> (communicating) printer to hold the letters (TYPE) which have been arranged for the first stage of printing.</Paragraph> <Paragraph position="15"> * =GALLEY PROOF. * Gd (communicating) * Mf (shipping) * a ship which was rowed along by slaves.</Paragraph> <Paragraph position="16"> * Mf (shipping) * a ship's kitchen.</Paragraph> <Paragraph position="17"> interest Topical Clustering Definition Sentences Applicability Precision * *Bj (medicine) * a readiness to give attention. 100% 67% * *Gd * an activity, subject, etc., which (communicating) one gives time and attention to.</Paragraph> <Paragraph position="18"> * Je (banking) * advantage, advancement, or favour (esp. in the phrs. in the interest of (something)/in someone's interest).</Paragraph> <Paragraph position="19"> * money paid for the use of money.</Paragraph> <Paragraph position="20"> * a share (in a company, business, etc.</Paragraph> <Paragraph position="21"> * a quality of causing attention to be given.</Paragraph> <Paragraph position="22"> given out.</Paragraph> <Paragraph position="23"> * an important point.</Paragraph> <Paragraph position="24"> * old use and law children (esp.</Paragraph> <Paragraph position="25"> in the phr. die without issue).</Paragraph> <Paragraph position="26"> * the act of bringing out something in a new form.</Paragraph> <Paragraph position="27"> * a type of small insect-eating animal with very small eyes and soft dark fur, which digs holes and passage underground and makes its home in them.</Paragraph> <Paragraph position="28"> * a stone wall of great strength built out into the sea from the land as a defense against the force of the waves, or to act as a road.</Paragraph> <Paragraph position="29"> plant Topical Clustering Definition Sentences Applicability Precision * Ai (plants) * a living thing that has leaves and 100% 83% roots, and grows usu. in earth, esp.</Paragraph> <Paragraph position="30"> the kind smaller than trees.</Paragraph> <Paragraph position="31"> of people thought to be criminals in order to discover facts about them.</Paragraph> <Paragraph position="32"> * a thing, esp. stolen goods, hidden on a person so that he will seem guilty.</Paragraph> <Paragraph position="33"> something is or stands, esp. in relation to other objects, places, etc.</Paragraph> <Paragraph position="34"> * the place where someone or something is (in the phr. in position).</Paragraph> <Paragraph position="35"> * the place where someone or something is supposed to be; the proper place.</Paragraph> <Paragraph position="36"> the place of advantage in a struggle (in the phrs. manoeuvre/ jockey for position).</Paragraph> <Paragraph position="37"> the way or manner in which someone or something is placed or moves, stands, sits, etc.</Paragraph> <Paragraph position="38"> * a condition or state, esp. in relation to that of someone or something else.</Paragraph> <Paragraph position="39"> (communicating) statement, command, EXCLAMATION, or question, usu. contains a subject and a verb, and (in writing) begins with a capital letter and ends with one of the marks &quot;.!?&quot; to the SNAIL but with no shell, that often do damage to gardens.</Paragraph> <Paragraph position="40"> * Gd * a machine-made piece of metal (communicating) with a row of letters along the edge for printing.</Paragraph> <Paragraph position="41"> * Hc (specific * a machine-made piece of metal substances) with a row of letters along the edge for printing.</Paragraph> <Paragraph position="42"> * Hd (equipment) * a coin-shaped object unlawfully put into a machine in place of a coin.</Paragraph> <Paragraph position="43"> in length, width, or depth and regarded as not filled up; distance, area, or VOLUME (3); room.</Paragraph> <Paragraph position="44"> * a quantity or bit of this for a particular purpose.</Paragraph> <Paragraph position="45"> * that which surrounds all objects and continues outward in all directions.</Paragraph> <Paragraph position="46"> * what is outside the earth's air; where other heavenly bodies move.</Paragraph> <Paragraph position="47"> * land not built on (esp. in the phr. open space).</Paragraph> <Paragraph position="48"> * a period of time.</Paragraph> <Paragraph position="49"> * an area or distance left between written or printed words, lines etc.</Paragraph> <Paragraph position="50"> * the width of a letter on a * Dg (personal * a piece of metal in this shape for 100% 90% belongings) wearing as a mark of office, rank, honour, etc.</Paragraph> <Paragraph position="51"> * a 5- or more-pointed figure.</Paragraph> <Paragraph position="52"> * a famous or very skillful performer.</Paragraph> <Paragraph position="53"> * STARS.</Paragraph> <Paragraph position="54"> * a brightly-burning heavenly body of great size, such as the sun but esp.</Paragraph> <Paragraph position="55"> one very far away.</Paragraph> <Paragraph position="56"> * any heavenly body (such as a PLANET) that appears as a bright point in the sky.</Paragraph> <Paragraph position="57"> * a heavenly body regarded as determining one's fate.</Paragraph> <Paragraph position="58"> * a sign used with numbers from usu.</Paragraph> <Paragraph position="59"> 1 to 5 in various systems, and in the imagination, to judge quality.</Paragraph> <Paragraph position="60"> match, usu. including a short coat (JACKET) with trousers or skirt.</Paragraph> <Paragraph position="61"> * a garment or set of garments for a special purpose.</Paragraph> <Paragraph position="62"> * a set (of armour) (in the phrs.</Paragraph> <Paragraph position="63"> suit of armour/mail).</Paragraph> <Paragraph position="64"> * one of the 4 sets of cards used in games.</Paragraph> <Paragraph position="65"> woman to marry (esp. in the phrs. plead/press one's suit).</Paragraph> <Paragraph position="66"> supported by one or more upright legs.</Paragraph> <Paragraph position="67"> * made to be placed and used on such a piece of furniture.</Paragraph> <Paragraph position="68"> * such a piece of furniture specially made for the playing of various games.</Paragraph> <Paragraph position="69"> * the food served at a meal.</Paragraph> <Paragraph position="70"> * the people sitting at a table.</Paragraph> <Paragraph position="71"> * a printed or written collection of figures, facts, or information arranged in orderly rows across and down the page.</Paragraph> <Paragraph position="72"> * also multiplication table a list which young children repeat to learn what number results when a number from 1 to 12 is multiplied by any of the numbers from 1 to 12.</Paragraph> <Paragraph position="73"> or animal knows one food from another by its sweet, bitter, salty, etc.</Paragraph> <Paragraph position="74"> * the sensation that is produced when food or drink is put in the mouth and that makes it different from other foods or drinks by its salty, sweet, bitter, etc.</Paragraph> <Paragraph position="75"> * the ability to enjoy and judge beauty, style, art, music, etc.; ability to choose and use the best manners, behaviour, fashions, etc.</Paragraph> </Section> class="xml-element"></Paper>