File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-1036_concl.xml
Size: 2,390 bytes
Last Modified: 2025-10-06 13:54:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1036"> <Title>Finding Predominant Word Senses in Untagged Text</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Conclusions </SectionTitle> <Paragraph position="0"> We have devised a method that uses raw corpus data to automatically find a predominant sense for nouns in WordNet. We use an automatically acquired thesaurus and a WordNet Similarity measure. The automatically acquired predominant senses were evaluated against the hand-tagged resources SemCor and the SENSEVAL-2 English all-words task giving us a WSD precision of 64% on an all-nouns task.</Paragraph> <Paragraph position="1"> This is just 5% lower than results using the first sense in the manually labelled SemCor, and we obtain 67% precision on polysemous nouns that are not in SemCor.</Paragraph> <Paragraph position="2"> In many cases the sense ranking provided in SemCor differs to that obtained automatically because we used the BNC to produce our thesaurus. Indeed, the merit of our technique is the very possibility of obtaining predominant senses from the data at hand. We have demonstrated the possibility of finding predominant senses in domain specific corpora on a sample of nouns. In the future, we will perform a large scale evaluation on domain specific corpora. In particular, we will use balanced and domain specific corpora to isolate words having very different neighbours, and therefore rankings, in the different corpora and to detect and target words for which there is a highly skewed sense distribution in these corpora.</Paragraph> <Paragraph position="3"> There is plenty of scope for further work. We want to investigate the effect of frequency and choice of distributional similarity measure (Weeds et al., 2004). Additionally, we need to determine whether senses which do not occur in a wide variety of grammatical contexts fare badly using distributional measures of similarity, and what can be done to combat this problem using relation specific thesauruses. null Whilst we have used WordNet as our sense inventory, it would be possible to use this method with another inventory given a measure of semantic relatedness between the neighbours and the senses. The lesk measure for example, can be used with definitions in any standard machine readable dictionary.</Paragraph> </Section> class="xml-element"></Paper>