File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-1005_intro.xml
Size: 4,136 bytes
Last Modified: 2025-10-06 14:05:58
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1005"> <Title>Word Sense Disambiguation using Conceptual Density</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Much of recent work in lexical ambiguity resolution offers the prospect that a disambiguation system might be able to receive as input unrestricted text and tag each word with the most likely sense with fairly reasonable accuracy and efficiency. The most extended approach use the context of the word to be disambiguatcd together with inl'ormation about each of its word senses to solve this problem.</Paragraph> <Paragraph position="1"> Interesting experiments have been performed in recent years using preexisting lexical knowledge resources: \[Cowie el al. 92\], \[Wilks et al. 93\] with LDOCE, \[Yarowsky 92\] with Roget's International Thesaurus, and \[Sussna 93\], \[Voorhees 9311, \[Richardson et al. 94\], \[Resnik 95\] with WordNet.</Paragraph> <Paragraph position="2"> Although each of these techniques looks promising for disambiguation, either they have been only applied to a small number of words, a few sentences or not in a public domain corpus. For this reason we have tried to disambiguate all the nouns from real *Eneko Agirre was supported by a grant from the Basque Goverment. Part of this work is included in projects 141226-TA248/95 of the Basque Country University and PI95-054 of the Basque Government.</Paragraph> <Paragraph position="3"> **German Rigau was supported by a grant from the Ministerio de Educaci6n y Ciencia.</Paragraph> <Paragraph position="4"> texts in the public domain sense tagged version of the Brown corpus \[Francis & Kucera 67\], \[Miller et al.</Paragraph> <Paragraph position="5"> 93\], also called Semantic Concordance or SemCor for short 1, The words in SemCor are tagged with word senses from WordNet, a broad semantic taxonomy for English \[Miller 90\] 2. Thus, SemCor provides an appropriate environment for testing our procedures and comparing among alternatives in a fully automatic way.</Paragraph> <Paragraph position="6"> The automatic decision procedure for lexical ambiguity resolution presented in this paper is based on an elaboration of the conceptual distance among concepts: Conceptual Density \[Agirre & Rigau 95\].</Paragraph> <Paragraph position="7"> Thc system needs to know how words are clustered in semantic classes, and how semantic classes are hierarchically organised. For this purpose, we have used WordNet. Our system tries to resolve the lexical ambiguity ot' nouns by finding the combination of senses from a set of contiguous nouns that maximises the Conceptual Density among senses.</Paragraph> <Paragraph position="8"> The perlbrmance of the procedure was tested on four SemCor texts chosen at random. For comparison purposes two other approaches, \[Sussna 93\] and \[Yarowsky 92\], were also tried. The results show that our algorithm performs better on the test set.</Paragraph> <Paragraph position="9"> Following this short introduction the Conceptual Dcnsity formula is presented. The main procedure to resolve lexical ambiguity of nouns using Conceptual Density is sketched on section 3. Section 4 descri'bes extensively the experiments and its results. Finally, sections 5 and 6 deal with further work and conclusions.</Paragraph> <Paragraph position="10"> 1Semcor comprises approximately 250,000 words. Tile tagging was done manually, and the error rate measured by the authors is around 10% for polysemous words.</Paragraph> <Paragraph position="11"> 2The senses of a word are represented by synonym sets (or synscts), one for each word sense. The nominal part of WordNct can be viewed as a tangled hierarchy of hypo/hypernymy relations among synsets. Nominal relations include also three kinds of meronymic relations, which can be paraphrased as member-of, made-of and component-part-of. The version used in this work is WordNet 1.4, The coverage in WordNet of senses lot open-class words in SemCor reaches 96% according to the authors.</Paragraph> </Section> class="xml-element"></Paper>