XML Viewer - p95-1026

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1026_intro.xml
Size: 1,983 bytes
Last Modified: 2025-10-06 14:05:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1026">
  <Title>UNSUPERVISED WORD SENSE DISAMBIGUATION RIVALING SUPERVISED METHODS</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper presents an unsupervised algorithm that can accurately disambiguate word senses in a large, completely untagged corpus) The algorithm avoids the need for costly hand-tagged training data by exploiting two powerful properties of human language:  1. One sense per collocation: 2 Nearby words provide strong and consistent clues to the sense of a target word, conditional on relative distance, order and syntactic relationship.</Paragraph>
    <Paragraph position="1"> 2. One sense per discourse: The sense of a tar- null get word is highly consistent within any given document.</Paragraph>
    <Paragraph position="2"> Moreover, language is highly redundant, so that the sense of a word is effectively overdetermined by (1) and (2) above. The algorithm uses these properties to incrementally identify collocations for target senses of a word, given a few seed collocations 1Note that the problem here is sense disambiguation: assigning each instance of a word to established sense definitions (such as in a dictionary). This differs from sense induction: using distributional similarity to partition word instances into clusters that may have no relation to standard sense partitions.</Paragraph>
    <Paragraph position="3"> 2Here I use the traditional dictionary definition of collocation - &amp;quot;appearing in the same location; a juxtaposition of words&amp;quot;. No idiomatic or non-compositional interpretation is implied.</Paragraph>
    <Paragraph position="4"> for each sense, This procedure is robust and selfcorrecting, and exhibits many strengths of supervised approaches, including sensitivity to word-order information lost in earlier unsupervised algorithms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML