File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2228_intro.xml

Size: 3,379 bytes

Last Modified: 2025-10-06 14:06:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2228">
  <Title>Word Sense Disambiguation using Optimised Combinations of Knowledge Sources</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper describes a system that integrates a number of partial sources of information to perform word sense disambiguation (WSD) of content words in general text at a high level of accuracy.</Paragraph>
    <Paragraph position="1"> The methodology and evaluation of WSD are somewhat different from those of other NLP modules, and one can distinguish three aspects of this difference, all of which come down to evaluation problems, as does so much in NLP these days. First, researchers are divided between a general method (that attempts to apply WSD to all the content words of texts, the option taken in this paper) and one that is applied only to a small trial selection of texts words (for example (Schiitze, 1992) (Yarowsky, 1995)). These researchers have obtained very high levels of success, in excess of 95%, close to the figures for other &amp;quot;solved&amp;quot; NLP modules, the issue being whether these small word sample methods and techniques will transfer to general WSD over all content words.</Paragraph>
    <Paragraph position="2"> Others, (eg. (Mahesh et al., 1997) (Harley and Glennon, 1997)) have pursued the general option on the grounds that it is the real task and should be tackled directly, but with rather lower success rates. The division between the approaches probably comes down to no more than the availability of gold standard text in sufficient quantities, which is more costly to obtain for WSD than other tasks.</Paragraph>
    <Paragraph position="3"> In this paper we describe a method we have used for obtaining more test material by transforming one resource into another, an advance we believe is unique and helpful in this impasse.</Paragraph>
    <Paragraph position="4"> However, there have also been deeper problems about evaluation, which has led sceptics like (Kilgarriff, 1993) to question the whole WSD enterprise, for example that it is harder for subjects to assign one and only one sense to a word in context (and hence the produce the test material itself) than to perform other NLP related tasks. One of the present authors has discussed Kilgarriff's figures elsewhere (Wilks, 1997) and argued that they are not, in fact, as gloomy as he suggests. Again, this is probably an area where there is an &amp;quot;expertise effect&amp;quot;: some subjects can almost certainly make finer, more intersubjective, sense distinctions than others in a reliable way, just as lexicographers do.</Paragraph>
    <Paragraph position="5"> But there is another, quite different, source of unease about the evaluation base: everyone agrees that new senses appear in corpora that cannot be assigned to any existing dictionary sense, and this is an issue of novelty, not just one of the difficulty of discrimination. If that is the case, it tends to undermine the standard mark-up-model-and-test methodology of most recent NLP, since it will not then be possible to mark up sense assignment in advance against a dictionary if new senses are present. We shall not tackle this difficult issue further here, but press on towards experiment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML