File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0108_intro.xml

Size: 3,016 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0108">
  <Title>J Domain-Specific Semantic Class Disambiguation Using WordNet</Title>
  <Section position="2" start_page="0" end_page="56" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The semantic classi~cation of words refers to the abstraction of ambiguous (surface) words to un-ambiguous concepts. These concepts may be explicitly expressed in a pre-defmed taxonomy of classes, or implicitly derived through the clustering of sen~-ticany-related words. Semantic classification has proved useful in a range of application areas, such as information extraction (Soderland et at., 1995), acquim'tion of domain knowledge (Mikheev and Finch, 1995) and improvement of parsing accuracy through the speci~cation of selectional restrictions (Grishman and Sterling, 1994; Gri~h,n~n aud Sterling, 1992).</Paragraph>
    <Paragraph position="1"> In this paper, we address the problem of s~mantic class disambiguation, with a view towards applying it to information extraction. The disambiguation of the semantic class of words in a particular context facilitates the generalization of semantic extraction patterns used in information extraction from word-based to class-based forms. This abstraction is effectively taFped by CRYSTAL (Soderland et aL, 1995), one of the first few approaches to the automatic in-.</Paragraph>
    <Paragraph position="2"> duction of extraction patterns.</Paragraph>
    <Paragraph position="3"> Many existing information extraction systems (MUC-6, lg96) rely on tedious knowledge engineering approaches to hard-code semantic classes of words in a semantic lexicon, thus hampering the portability of their systems to di~erent domaln~.</Paragraph>
    <Paragraph position="4"> A notable exception is the approach taken by the Universi~ of Massachusetts. Its knowledge acquisition framework, Kenmore, uses a case-based learning mech--;am to learn domain knowledge automaticaUy (Cardie, 1993). Kenmore, being a supervised algorithm, relies on an annotated corpus of domain-specific classes. Grishman et aL (1992) too ventured towards automatic semantic acquisition for information extraction. However, they expressed reservations regmrding the use of WordNet to augment their semantic hierarchy automatically, citing examples of unintemded senses of words resulting in erroneous semantic cl~L~Sz~ation.</Paragraph>
    <Paragraph position="5"> To circumvent the ~notation bottleneck faced by Kenmore, our approach exploits general a~orithms and resources for the disambiguation of do,~i-specific semantic classes. Unlike Grishmau et al.'s approach, our application of general word sense disambiguation algorithms and semantic distance metrics allows for an effective use of the Rue sense granularity of WordNet. Experiments carried out on the MUC-4 (1992) terrorism domain saw our approach outtperform~g supervised algorithms and matching b,~n judgements.</Paragraph>
    <Paragraph position="7"/>
  </Section>
class="xml-element"></Paper>
Download Original XML