File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1050_intro.xml

Size: 6,466 bytes

Last Modified: 2025-10-06 14:03:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1050">
  <Title>Domain Kernels for Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="403" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The main limitation of many supervised approaches for Natural Language Processing (NLP) is the lack of available annotated training data. This problem is known as the Knowledge Acquisition Bottleneck.</Paragraph>
    <Paragraph position="1"> To reach high accuracy, state-of-the-art systems for Word Sense Disambiguation (WSD) are designed according to a supervised learning framework, in which the disambiguation of each word in the lexicon is performed by constructing a different classi er. A large set of sense tagged examples is then required to train each classi er. This methodology is called word expert approach (Small, 1980; Yarowsky and Florian, 2002). However this is clearly unfeasible for all-words WSD tasks, in which all the words of an open text should be disambiguated. null On the other hand, the word expert approach works very well for lexical sample WSD tasks (i.e.</Paragraph>
    <Paragraph position="2"> tasks in which it is required to disambiguate only those words for which enough training data is provided). As the original rationale of the lexical sample tasks was to de ne a clear experimental settings to enhance the comprehension of WSD, they should be considered as preceding exercises to all-words tasks. However this is not the actual case. Algorithms designed for lexical sample WSD are often based on pure supervision and hence data hungry .</Paragraph>
    <Paragraph position="3"> We think that lexical sample WSD should regain its original explorative role and possibly use a minimal amount of training data, exploiting instead external knowledge acquired in an unsupervised way to reach the actual state-of-the-art performance.</Paragraph>
    <Paragraph position="4"> By the way, minimal supervision is the basis of state-of-the-art systems for all-words tasks (e.g.</Paragraph>
    <Paragraph position="5"> (Mihalcea and Faruque, 2004; Decadt et al., 2004)), that are trained on small sense tagged corpora (e.g.</Paragraph>
    <Paragraph position="6"> SemCor), in which few examples for a subset of the ambiguous words in the lexicon can be found. Thus improving the performance of WSD systems with few learning examples is a fundamental step towards the direction of designing a WSD system that works well on real texts.</Paragraph>
    <Paragraph position="7"> In addition, it is a common opinion that the performance of state-of-the-art WSD systems is not satisfactory from an applicative point of view yet.</Paragraph>
    <Paragraph position="8">  To achieve these goals we identi ed two promising research directions:  1. Modeling independently domain and syntagmatic aspects of sense distinction, to improve the feature representation of sense tagged examples (Gliozzo et al., 2004).</Paragraph>
    <Paragraph position="9"> 2. Leveraging external knowledge acquired from  unlabeled corpora.</Paragraph>
    <Paragraph position="10"> The rst direction is motivated by the linguistic assumption that syntagmatic and domain (associative) relations are both crucial to represent sense distictions, while they are basically originated by very different phenomena. Syntagmatic relations hold among words that are typically located close to each other in the same sentence in a given temporal order, while domain relations hold among words that are typically used in the same semantic domain (i.e. in texts having similar topics (Gliozzo et al., 2004)). Their different nature suggests to adopt different learning strategies to detect them.</Paragraph>
    <Paragraph position="11"> Regarding the second direction, external knowledge would be required to help WSD algorithms to better generalize over the data available for training. On the other hand, most of the state-of-the-art supervised approaches to WSD are still completely based on internal information only (i.e. the only information available to the training algorithm is the set of manually annotated examples). For example, in the Senseval-3 evaluation exercise (Mihalcea and Edmonds, 2004) many lexical sample tasks were provided, beyond the usual labeled training data, with a large set of unlabeled data. However, at our knowledge, none of the participants exploited this unlabeled material. Exploring this direction is the main focus of this paper. In particular we acquire a Domain Model (DM) for the lexicon (i.e.</Paragraph>
    <Paragraph position="12"> a lexical resource representing domain associations among terms), and we exploit this information inside our supervised WSD algorithm. DMs can be automatically induced from unlabeled corpora, allowing the portability of the methodology among languages.</Paragraph>
    <Paragraph position="13"> We identi ed kernel methods as a viable framework in which to implement the assumptions above (Strapparava et al., 2004).</Paragraph>
    <Paragraph position="14"> Exploiting the properties of kernels, we have dened independently a set of domain and syntagmatic kernels and we combined them in order to de ne a complete kernel for WSD. The domain kernels estimate the (domain) similarity (Magnini et al., 2002) among contexts, while the syntagmatic kernels evaluate the similarity among collocations.</Paragraph>
    <Paragraph position="15"> We will demonstrate that using DMs induced from unlabeled corpora is a feasible strategy to increase the generalization capability of the WSD algorithm. Our system far outperforms the state-of-the-art systems in all the tasks in which it has been tested. Moreover, a comparative analysis of the learning curves shows that the use of DMs allows us to remarkably reduce the amount of sense-tagged examples, opening new scenarios to develop systems for all-words tasks with minimal supervision. The paper is structured as follows. Section 2 introduces the notion of Domain Model. In particular an automatic acquisition technique based on Latent Semantic Analysis (LSA) is described. In Section 3 we present a WSD system based on a combination of kernels. In particular we de ne a Domain Kernel (see Section 3.1) and a Syntagmatic Kernel (see Section 3.2), to model separately syntagmatic and domain aspects. In Section 4 our WSD system is evaluated in the Senseval-3 English, Italian, Spanish and Catalan lexical sample tasks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML