File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1037_intro.xml

Size: 2,936 bytes

Last Modified: 2025-10-06 14:06:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1037">
  <Title>A Concept-based Adaptive Approach to Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="237" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Word sense disambiguation for unrestricted text is one of the most difficult tasks in the fields of computational linguistics. The crux of the problem is to discover a model that relates the intended sense of a word with its context. It seems to be very difficult, if not impossible, to statistically acquire enough word-based knowledge about a language necessary to build a robust system capable of automatically disambiguating senses in unrestricted text. For such a system to be effective, a great deal of balanced materials must be assembled in order to cover many idiosyncratic aspects of the language. There exist three issues in a lexicalized statistical word sense disambiguation (WSD) model - data sparseness, lack of a level of abstraction, and static learning strategy.</Paragraph>
    <Paragraph position="1"> First, word-based models have a plethora of parameters that are difficult to estimate reliably even with a very large corpus. Under-trained models lead to low precision. Second, word-based models lack a degree of abstraction that is crucial for a broad coverage system. Third, a static WSD model is unlikely to be robust and portable, since it is very difficult to make a single static model relevant to a wide variety of unrestricted texts. Recent WSD systems have been developed using word-based model for specific limited domain to disambiguate senses appearing in usually easy context (Leacock, Towell, and Voorlees 1996) with a lot of typical salient words. For unrestricted text, however, the context tends to be very diverse and difficult to capture with a lexicalized model, therefore a corpus-trained system is unlikely to port to new domains and run off the shelf.</Paragraph>
    <Paragraph position="2"> Generality and adaptiveness are therefore key to a robust and portable WSD system. A concept-based model for WSD requires less parameter and has an element of generality built in (Liddy and Paik 1993). Conceptual classes make it possible to generalize from word-specific context in order to disambiguate a word sense appearing in a particularly unfamiliar context in term of word recurrences. An adaptive system armed with an initial lexical and conceptual knowledge base extracted from machine-readable dictionaries (MRDs), has two strong advantages over static lexicalized models trained using a corpus. First, the initial  knowledge is rich and unbiased such that a substantial portion of text can be disambiguated precisely. Second, based on the result of initial disambiguated text. Subsequently, the knowledge base is adjusted to suit the text at hand. The adjusted knowledge base is then</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML