File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0816_intro.xml

Size: 3,197 bytes

Last Modified: 2025-10-06 14:02:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0816">
  <Title>Anselmo Pe~nas Dpto. Lenguajes y Sistemas Inform'aticos UNED, Spain anselmo@lsi.uned.es</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
Word Sense Disambiguation (WSD) is the task
</SectionTitle>
    <Paragraph position="0"> of deciding the appropriate sense for a particular use of a polysemous word, given its textual or discursive context. A previous non trivial step is to determine the inventory of meanings potentially attributable to that word. For this reason, WSD in Senseval is reformulated as a classification problem where a dictionary becomes the class inventory. The disambiguation process, then, consists in assigning one or more of these classes to the ambiguous word in the given context. The Senseval evaluation forum provides a controlled framework where different WSD systems can be tested and compared.</Paragraph>
    <Paragraph position="1"> Corpus-based methods have offered encouraging results in the last years. This kind of methods profits from statistics on a training corpus, and Machine Learning (ML) algorithms to produce a classifier. Learning algorithms can be divided in two main categories: Supervised (where the correct answer for each piece of training is provided) and Unsupervised (where the training data is given without any answer indication). Tests at Senseval-3 are made in various languages for which two main tasks are proposed: an all-words task and a lexical sample task. Participants have available a training corpus, a set of test examples and a sense inventory in each language. The training corpora are available in a labelled and a unlabelled format; the former is mainly for supervised systems and the latter mainly for the unsupervised ones.</Paragraph>
    <Paragraph position="2"> Several supervised ML algorithms have been applied to WSD (Ide and V'eronis, 1998), (Escudero et al., 2000): Decision Lists, Neural Networks, Bayesian classifiers, Boosting, Exemplar-based learning, etc. We report here the exemplar-based approach developed by UNED and tested at the Senseval-3 competition in the lexical sample tasks for English, Spanish, Catalan and Italian.</Paragraph>
    <Paragraph position="3"> After this brief introduction, Sections 2 and 3 are devoted, respectively, to the training data and the processing performed over these data.</Paragraph>
    <Paragraph position="4"> Section 4 characterizes the UNED WSD system.</Paragraph>
    <Paragraph position="5"> First, we describe the general approach based on the representation of words, lemmas and senses in a Context Space. Then, we show how results are improved by applying standard similarity measures as cosine in this Context Space. Once the representation framework is established, we define the criteria underlying the final similarity measure used at Senseval-3, and we compare it with the previous similarity measures.</Paragraph>
    <Paragraph position="6"> Section 5 reports the official results obtained at the Senseval-3 Lexical Sample tasks for English, Spanish, Italian and Catalan. Finally, we conclude and point out some future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML