File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-3027_intro.xml

Size: 2,471 bytes

Last Modified: 2025-10-06 14:03:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3027">
  <Title>SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> SenseClusters seeks to group together units of text (referred to as contexts) that are similar to each other using lexical features and unsupervised clustering.</Paragraph>
    <Paragraph position="1"> Our initial work (Purandare and Pedersen, 2004) focused on word sense discrimination, which takes as input contexts that each contain a given target word, and produces as output clusters that are presumed to correspond to the different senses of the word. This follows the hypothesis of (Miller and Charles, 1991) that words that occur in similar contexts will have similar meanings.</Paragraph>
    <Paragraph position="2"> We have shown that these methods can be extended to proper name discrimination (Pedersen et al., 2005). People, places, or companies often share the same name, and this can cause a considerable amount of confusion when carrying out Web search or other information retrieval applications. Name discrimination seeks to group together the contexts that refer to a unique underlying individual, and allow the user to recognize that the same name is being used to refer to multiple entities.</Paragraph>
    <Paragraph position="3"> We have also extended SenseClusters to cluster contexts that are not centered around any target word, which we refer to as headless clustering.</Paragraph>
    <Paragraph position="4"> Automatic email categorization is an example of a headless clustering task, since each message can be considered a context. SenseClusters will group together messages if they are similar in content, without requiring that they share any particular target word between them.</Paragraph>
    <Paragraph position="5"> We are also addressing a well known limitation to unsupervised clustering approaches. After clustering contexts, it is often difficult to determine what underlying concepts or entities each cluster represents without manually inspecting their contents.</Paragraph>
    <Paragraph position="6"> Therefore, we are developing methods that automatically assign descriptive and discriminating labels to each discovered cluster that provide a characterization of the contents of the clusters that a human can easily understand.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML