File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3204_intro.xml
Size: 3,312 bytes
Last Modified: 2025-10-06 14:02:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3204"> <Title>Unsupervised WSD based on automatically retrieved examples: The importance of bias</Title> <Section position="2" start_page="0" end_page="1" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The results of recent WSD exercises, e.g. Senseval- null (Edmonds and Cotton, 2001) show clearly that WSD methods based on hand-tagged examples are the ones performing best. However, the main drawback for supervised WSD is the knowledge acquisition bottleneck: the systems need large amounts of costly hand-tagged data. The situation is more dramatic for lesser studied languages. In order to overcome this problem, different research lines have been explored: automatic acquisition of training examples (Mihalcea, 2002), bootstrapping techniques (Yarowsky, 1995), or active learning (Argamon-Engelson and Dagan, 1999). In this work, we have focused on the automatic acquisition of examples.</Paragraph> <Paragraph position="1"> When supervised systems have no specific training examples for a target word, they need to rely on publicly available all-words sense-tagged corpora like Semcor (Miller et al., 1993), which is tagged with WordNet word senses. The systems performing best in the English all-words task in Senseval-2 were basically supervised systems trained on Semcor. Unfortunately, for most of the words, this cor- null http://www.senseval.org.</Paragraph> <Paragraph position="2"> pus only provides a handful of tagged examples. In fact, only a few systems could overcome the Most Frequent Sense (MFS) baseline, which would tag each word with the sense occurring most frequently in Semcor. In our approach, we will also rely on Semcor as the basic resource, both for training examples and as an indicator of the distribution of the senses of the target word.</Paragraph> <Paragraph position="3"> The goal of our experiment is to evaluate up to which point we can automatically acquire examples for word senses and train accurate supervised WSD systems on them. This is a very promising line of research, but one which remains relatively under-studied (cf. Section 2). The method we applied is based on the monosemous relatives of the target words (Leacock et al., 1998), and we studied some parameters that affect the quality of the acquired corpus, such as the distribution of the number of training instances per each word sense (bias), and the type of features used for disambiguation (local vs. topical).</Paragraph> <Paragraph position="4"> Basically, we built three systems, one fully supervised (using examples from both Semcor and automatically acquired examples), one minimally supervised (using the distribution of senses in Semcor and automatically acquired examples) and another fully unsupervised (using an automatically acquired sense rank (McCarthy et al., 2004) and automatically acquired examples).</Paragraph> <Paragraph position="5"> This paper is structured as follows. First, Section 2 describes previous work on the field. Section 3 introduces the experimental setting for evaluating the acquired corpus. Section 4 is devoted to the process of building the corpus, which is evaluated in Section 5. Finally, the conclusions are given in Section 6.</Paragraph> </Section> class="xml-element"></Paper>