File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-2007_intro.xml

Size: 6,972 bytes

Last Modified: 2025-10-06 14:03:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2007">
  <Title>Selecting the &amp;quot;Right&amp;quot; Number of Senses Based on Clustering Criterion Functions</Title>
  <Section position="3" start_page="0" end_page="112" type="intro">
    <SectionTitle>
2 Methodology
</SectionTitle>
    <Paragraph position="0"> In word sense discrimination, the number of contexts (N) to cluster is usually very large, and considering all possible values of k from 1...N would be inefficient. As the value of k increases, the criterion function will reach a plateau, indicating that dividing the contexts into more and more clusters does not improve the quality of the solution. Thus, we identify an upper bound to k that we refer to as deltaK by finding the point at which the criterion function only changes to a small degree as k increases. null According to the H2 criterion function, the higher its ratio of within cluster similarity to between cluster similarity, the better the clustering. A large value indicates that the clusters have high internal similarity, and are clearly separated from each other. Intuitively then, one solution to selecting k might be to examine the trend of H2 scores, and look for the smallest k that results in a nearly maximum H2 value.</Paragraph>
    <Paragraph position="1"> However, a graph of H2 values for a clustering  ofthe4senseverbserveasshowninFigure1(top) reveals the difficulties of such an approach. There is a gradual curve in this graph and the maximum value (plateau) is not reached until k values greater than 100.</Paragraph>
    <Paragraph position="2"> We have developed three methods that take as input the H2 values generated from 1...deltaK and automatically determine the &amp;quot;right&amp;quot; value of k, based on finding when the changes in H2 as k increases are no longer significant.</Paragraph>
    <Section position="1" start_page="111" end_page="111" type="sub_section">
      <SectionTitle>
2.1 PK1
</SectionTitle>
      <Paragraph position="0"> The PK1 measure is based on (Mojena, 1977), which finds clustering solutions for all values of k from 1..N, and then determines the mean and standard deviation of the criterion function. Then, a score is computed for each value of k by subtracting the mean from the criterion function, and dividing by the standard deviation. We adapt this technique by using the H2 criterion function, and limit k from 1...deltaK:</Paragraph>
      <Paragraph position="2"> To select a value of k, a threshold must be set.</Paragraph>
      <Paragraph position="3"> Then, as soon as PK1(k) exceeds this threshold, k-1 is selected as the appropriate number of clusters. We have considered setting this threshold using the normal distribution based on interpreting PK1 as a z-score, although Mojena makes it clear that he views this method as an &amp;quot;operational rule&amp;quot; thatisnotbasedonanydistributionalassumptions.</Paragraph>
      <Paragraph position="4"> He suggests values of 2.75 to 3.50, but also states they would need to be adjusted for different data sets. We have arrived at an empirically determined value of -0.70, which coincides with the point in the standard normal distribution where 75% of the probability mass is associated with values greater than this.</Paragraph>
      <Paragraph position="5"> We observe that the distribution of PK1 scores tends to change with different data sets, making it hard to apply a single threshold. The graph of the PK1 scores shown in Figure 1 illustrates the difficulty - the slope of these scores is nearly linear, and as such the threshold (as shown by the horizontal line) is a somewhat arbitrary cutoff.</Paragraph>
    </Section>
    <Section position="2" start_page="111" end_page="112" type="sub_section">
      <SectionTitle>
2.2 PK2
</SectionTitle>
      <Paragraph position="0"> PK2 is similar to (Hartigan, 1975), in that both take the ratio of a criterion function at k and k-1,  gle (all), predicted number as square (PK1-3), and deltaK (17) shown as dot (H2) and upper limit of k (PK1-3).</Paragraph>
      <Paragraph position="1">  in order to assess the relative improvement when increasing the number of clusters.</Paragraph>
      <Paragraph position="3"> When this ratio approaches 1, the clustering has reached a plateau, and increasing k will have no benefit. If PK2 is greater than 1, then an additional cluster improves the solution and we should increase k. We compute the standard deviation of PK2 and use that to establish a boundary as to whatitmeanstobe&amp;quot;closeenough&amp;quot;to1toconsider that we have reached a plateau. Thus, PK2 will select k where PK2(k) is the closest to (but not less than) 1 + standarddeviation(PK2[1...deltaK]).</Paragraph>
      <Paragraph position="4"> The graph of PK2 in Figure 1 shows an elbow that is near the actual number of senses. The critical region defined by the standard deviation is shaded, and note that PK2 selected the value of k that was outside of (but closest to) that region.</Paragraph>
      <Paragraph position="5"> This is interpreted as being the last value of k that resulted in a significant improvement in clustering quality. Note that here PK2 predicts 3 senses (square) while in fact there are 4 actual senses (triangle). It is significant that the graph of PK2 provides a clearer representation of the plateau than does that of H2.</Paragraph>
    </Section>
    <Section position="3" start_page="112" end_page="112" type="sub_section">
      <SectionTitle>
2.3 PK3
</SectionTitle>
      <Paragraph position="0"> PK3 utilizes three k values, in an attempt to find a point at which the criterion function increases and then suddenly decreases. Thus, for a given value of k we compare its criterion function to the preceding and following value of k:</Paragraph>
      <Paragraph position="2"> PK3 is close to 1 if the three H2 values form a line, meaning that they are either ascending, or they are on the plateau. However, our use of deltaKeliminatestheplateau, soinourcasevalues of 1 show that k is resulting in consistent improvements to clustering quality, and that we should continue. When PK3 rises significantly above 1, we know that k+1 is not climbing as quickly, and we have reached a point where additional clustering may not be helpful. To select k we chose the largest value of PK3(k) that is closest to (but still greater than) the critical region defined by the standard deviation of PK3. This is the last point where a significant increase in H2 was observed.</Paragraph>
      <Paragraph position="3"> Note that the graph of PK3 in Figure 1 shows the value of PK3 rising and falling dramatically in thecritical region, suggestinga need foradditional points to make it less localized.</Paragraph>
      <Paragraph position="4"> PK3 is similar in spirit to (Salvador and Chan, 2004), which introduces the L measure. This tries to find the point of maximum curvature in the criterion function graph, by fitting a pair of lines to  thecurve(wheretheintersectionoftheselinesrepresents the selected k).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML