File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/w95-0105_intro.xml

Size: 5,301 bytes

Last Modified: 2025-10-06 14:05:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0105">
  <Title>Disambiguating Noun Groupings with Respect to WordNet Senses</Title>
  <Section position="3" start_page="0" end_page="54" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Word groupings useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional techniques become increasingly widespread (e.g. (Bensch and Savitch, 1992; Brill, 1991; Brown et al., 1992; Grefenstette, 1994; McKcown and Hatzivassiloglou, 1993; Pereira et al., 1993; Schtltze, 1993)). However, for many tasks, one is interested in relationships among word senses, not words.</Paragraph>
    <Paragraph position="1"> Consider, for example, the cluster containing attorney, counsel, trial, court, and judge, used by Brown et al. (1992) to illustrate a &amp;quot;semantically sticky&amp;quot; group of words. As is often the case where sense ambiguity is involved, we as readers impose the most coherent interpretation on the words within the group without being aware that we are doing so. Yet a computational system has no choice but to consider other, more awkward possibilities -- for example, this cluster might be capturing a distributional relationship between advice (as one sense of counsel) and royalty (as one sense of court). This would be a mistake for many applications, such as query expansion in information retrieval, where a surfeit of false connections can outweigh the benefits obtained by using lexical knowledge.</Paragraph>
    <Paragraph position="2"> One obvious solution to this problem would be to extend distributional grouping methods to word senses.</Paragraph>
    <Paragraph position="3"> For example, one could construct vector representations of senses on the basis of their co-occurrence with words or with other senses. Unfortunately, there are few corpora annotated with word sense information, and computing reliable statistics on word senses rather than words will require more data, rather than less. 1 Furthermore, one widely available example of a large, manually sense-tagged corpus -- the WordNet group's annotated subset of the Brown corpus 2 -- vividly illustrates the difficulty in obtaining suitable data.</Paragraph>
    <Paragraph position="4"> 1Actually, this depends on the fine-grainedness of sense distinctions; clearly one could annotate corpora with very high level semantic distinctions For example, Basili et al. (1994) take such a coarse-grained approach, utilizing on the order of 10 to 15 semantic tags for a given domain. I assume throughout this paper that finer-grained distinctions than that are necessary.  It is quite small, by current corpus standards (on the order of hundreds of thousands of words, rather than millions or tens of millions); the direct annotation methodology used to create it is labor intensive (Marcus et al. (1993) found that direct annotation takes twice as long as automatic tagging plus correction, for part-of-speech annotation); and the output quality reflects the difficulty of the task (inter-annotator disagreement is on the order of 10%, as contrasted with the approximately 3% error rate reported for part-of-speech annotation by Marcus et al.).</Paragraph>
    <Paragraph position="5"> There have been some attempts to capture the behavior of semantic categories in a distributional setting, despite the unavailability of sense-annotated corpora. For example, Hearst and Schtltze (1993) take steps toward a distributional treatment of WordNet-based classes, using Schtltze's (1993) approach to constructing vector representations from a large co-occurrence matrix. Yarowsky's (1992) algorithm for sense disambiguation can be thought of as a way of determining how Roget's thesaurus categories behave with respect to contextual features. And my own treatment of selectional constraints (Resnik, 1993) provides a way to describe the plausibility of co-occuffence in terms of WordNet's semantic categories, using co-occurrence relationships mediated by syntactic structure. In each case, one begins with known semantic categories (WordNet synsets, Roget's numbered classes) and non-sense-annotated text, and proceeds to a distributional characterization of semantic category behavior using co-occurrence relationships.</Paragraph>
    <Paragraph position="6"> This paper begins from a rather different starting point. As in the above-cited work, there is no presupposition that sense-annotated text is available. Here, however, I make the assumption that word groupings have been obtained through some black box procedure, e.g. from analysis of unannotated text, and the goal is to annotate the words within the groupings post hoc using a knowledge-based catalogue of senses. If successful, such an approach has obvious benefits: one can use whatever sources of good word groupings are available -- primarily unsupervised word clustering methods, but also on-line thesauri and the like -- without folding in the complexity of dealing with word senses at the same time) The resulting sense groupings should be useful for a variety of purposes, although ultimately this work is motivated by the goal of sense disarnbiguation for unrestricted text using unsupervised methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML