File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/e06-1016_relat.xml

Size: 2,734 bytes

Last Modified: 2025-10-06 14:15:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1016">
  <Title>Determining Word Sense Dominance Using a Thesaurus</Title>
  <Section position="11" start_page="126" end_page="127" type="relat">
    <SectionTitle>
7 Related Work
</SectionTitle>
    <Paragraph position="0"> The WCCM has similarities with latent semantic analysis, or LSA, and specifically with work by Sch&amp;quot;utze and Pedersen (1997), wherein the dimensionality of a word-word co-occurrence matrix is reduced to create a word-concept matrix. However, there is no non-heuristic way to determine when the dimension reduction should stop. Further, the generic concepts represented by the reduced dimensions are not interpretable, i.e., one cannot determine which concepts they represent in a given sense inventory. This means that LSA cannot be used directly for tasks such as unsupervised sense disambiguation or determining semantic similarity of known concepts. Our approach does not have these limitations.</Paragraph>
    <Paragraph position="1"> Yarowsky (1992) uses the product of a mutual information-like measure and frequency to identify words that best represent each category in the Roget's Thesaurus and uses these words for sense disambiguation with a Bayesian model. We improved the accuracy of the WCCM using simple bootstrapping techniques, used all the words that co-occur with a category, and proposed four new methods to determine sense dominance-two of which do explicit sense disambiguation. V'eronis (2005) presents a graph theory-based approach to identify the various senses of a word in a text corpus without the use of a dictionary. Highly interconnected components of the graph represent the different senses of the target word. The node (word) with the most connections in a component is representative of that sense and its associations with words that occur in a test instance are used as evidence for that sense. However, these associations are at best only rough estimates of the associations between the sense and co-occurring words, since a sense in his system is represented by a single (possibly ambiguous) word. Pantel (2005) proposes a framework for ontologizing lexical resources. For example, co-occurrence vectors for the nodes in WordNet can be created using the co-occurrence vectors for words (or lexicals). However, if a leaf node has a single lexical, then once the appropriate co-occurring words for this node are identified (coup phase), they are assigned the same co-occurrence counts as that of the lexical.5 5A word may have different, stronger-than-chance strengths of association with multiple senses of a lexical. These are different from the association of the word with the lexical.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML