File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-1016_concl.xml

Size: 2,316 bytes

Last Modified: 2025-10-06 13:55:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1016">
  <Title>Determining Word Sense Dominance Using a Thesaurus</Title>
  <Section position="12" start_page="127" end_page="127" type="concl">
    <SectionTitle>
8 Conclusions and Future Directions
</SectionTitle>
    <Paragraph position="0"> We proposed a new method for creating a word-category co-occurrence matrix (WCCM) using a published thesaurus and raw text, and applying simple sense disambiguation and bootstrapping techniques. We presented four methods to determine degree of dominance of a sense of a word using the WCCM. We automatically generated sentences with a target word annotated with senses from the published thesaurus, which we used to perform an extensive evaluation of the dominance methods. We achieved near-upper-bound results using all combinations of the the weighted methods (DI BNW and DEBNW) and three measures of association (odds, pmi, and Yule).</Paragraph>
    <Paragraph position="1"> We cannot compare accuracies with McCarthy et al. (2004) because use of a thesaurus instead of WordNet means that knowledge of exactly how the thesaurus senses map to WordNet is required.</Paragraph>
    <Paragraph position="2"> We used a thesaurus as such a resource, unlike WordNet, is available in more languages, provides us with coarse senses, and leads to a smaller WCCM (making computationally intensive operations viable). Further, unlike the McCarthy et al. system, we showed that our system gives accurate results without the need for a large similarly-sense-distributed text or retraining. The target texts used were much smaller (few hundred sentences) than those needed for automatic creation of a thesaurus (few million words).</Paragraph>
    <Paragraph position="3"> The WCCM has a number of other applications, as well. The strength of association between a word and a word sense can be used to determine the (more intuitive) distributional similarity of word senses (as opposed to words). Conditional probabilities of lexical features can be calculated from the WCCM, which in turn can be used in unsupervised sense disambiguation. In conclusion, we provided a framework for capturing distributional properties of word senses from raw text and demonstrated one of its uses--determining word sense dominance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML