File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-3027_evalu.xml

Size: 3,204 bytes

Last Modified: 2025-10-06 13:59:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3027">
  <Title>SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts</Title>
  <Section position="7" start_page="106" end_page="107" type="evalu">
    <SectionTitle>
5 Experimental Results and Discussion
</SectionTitle>
    <Paragraph position="0"> Table 1 presents the experimental results for 2-way and 3-way name discrimination experiments, and Table 2 presents results for a 2-way email categorization experiment. The results are reported in terms of the F-measure, which is the harmonic mean of precision and recall.</Paragraph>
    <Paragraph position="1"> The first column in both tables indicates the possible names or newgroups, and the number of contexts associated with each. The next column indicates the percentage of the majority class (MAJ.) and count (N) of the total number of contexts for the names or newsgroups. The majority percentage provides a simple baseline for level of performance, as this is the F-measure that would be achieved if every context were simply placed in a single cluster. We refer to this as the unsupervised majority classifier.</Paragraph>
    <Paragraph position="2"> The next two columns show the F-measure associated with the order 1 and order 2 representations of context, with all other options being held constant. These experiments used bigram features, SVD was performed as appropriate for each representation, and the method of Repeated Bisections was used for clustering.</Paragraph>
    <Paragraph position="3">  Finally, note that the number of clusters to be discovered must be provided by the user. In these experiments we have taken the best case approach and asked for a number of clusters equal to that which actually exists. We are currently working to develop methods that will automatically stop at an optimal number of clusters, to avoid setting this value manually. null In general all of our results significantly improve upon the majority classifier, which suggests that the clustering of contexts is successfully discriminating among ambiguous names and uncategorized email.</Paragraph>
    <Paragraph position="4"> Table 3 shows the descriptive and discriminating labels assigned to the 2-way experimental case of American Airlines and Tom Cruise, as well as the 3-way case of George Bush, Bill Gates and Tom Cruise. The bold face labels are those that serve as both descriptive and discriminating labels. The fact that most labels serve both roles suggests that the highest ranked bigrams in each cluster were also unique to that cluster. The normal font indicates labels that are only descriptive, and are shared between multiple clusters. There are only a few such cases, for example White House happens to be a significant bigram in all three of the clusters in the 3-way case. There were no labels that were exclusively discriminating in these experiments, suggesting that the clusters are fairly clearly distinguished. Please note that some labels include unigrams (e.g., President for George Bush). These are created from bigrams where the other word is the conflated form, which is not included in the labels since it is by definition ambiguous.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML