File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/c92-2070_abstr.xml

Size: 1,429 bytes

Last Modified: 2025-10-06 13:47:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2070">
  <Title>Walker, Donald (1987), &amp;quot;Ka\]owledge Resource Tools for Aeo~'ssing</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes a program that disambignates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories.</Paragraph>
    <Paragraph position="1"> Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roger's index tend to correspond to sense distinctions; thus selecting the most likely category provides a useful level of sense disambiguatiou. The selection of categories is accomplished by identifying and weighting words that are indicative of each category when seen in context, using a Bayesian theoretical framework.</Paragraph>
    <Paragraph position="2"> Other statistical approaches have required special corpora or hand-labeled training examples for much of the lexicon. Our use of class models overcomes this knowledge acquisition bottleneck, enabling training on unresUicted monolingual text without human intervention. Applied to the 10 million word Grolier's Encyclopedia, the system correctly disambiguated 92% of the instances of 12 polysemous words that have been previously studied in the literature.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML