XML Viewer - w06-1649

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1649_evalu.xml
Size: 13,141 bytes
Last Modified: 2025-10-06 13:59:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1649">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Partially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora</Title>
  <Section position="8" start_page="417" end_page="420" type="evalu">
    <SectionTitle>
3 Experiments and Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="417" end_page="419" type="sub_section">
      <SectionTitle>
3.1 Experiment Design
</SectionTitle>
      <Paragraph position="0"> We evaluated the ELP based model order identi cation algorithm on the data in English lexical sample task of SENSEVAL-3 (including all  training data used as tagged data when instances with different sense sets are removed from of cial training data.</Paragraph>
      <Paragraph position="1"> The percentage of of cial training data used as tagged data</Paragraph>
      <Paragraph position="3"> the 57 English words ) 9, and further empirically compared it with other state of the art classi cation methods, including SVM 10 (the state of the art method for supervised word sense disambiguation (Mihalcea et al., 2004)), a one-class partially supervised classi cation algorithm (Liu et al., 2003) 11, and a semi-supervised k-means clustering based model order identi cation algorithm.</Paragraph>
      <Paragraph position="4"> The data for English lexical samples task in SENSEVAL-3 consists of 7860 examples as of cial training data, and 3944 examples as of cial test data for 57 English words. The number of senses of each English word varies from 3 to 11.</Paragraph>
      <Paragraph position="5"> We evaluated these four algorithms with different sizes of incomplete tagged data. Given of cial training data of the word w, we constructed incomplete tagged data XL by removing the all the tagged instances from of cial training data that have sense tags from Ssubset, where Ssubset is a subset of the ground-truth sense set S for w, and S consists of the sense tags in of cial training set for w. The removed training data and of cial test data of w were used as XU . Note that SL = S Ssubset.</Paragraph>
      <Paragraph position="6"> Then we ran these four algorithm for each target word w with XL as tagged data and XU as untagged data, and evaluated their performance using the accuracy on of cial test data of all the 57 words. We conducted six experiments for each target word w by setting Ssubset as fs1g, fs2g, fs3g, fs1, s2g, fs1, s3g, or fs2, s3g, where si is the i-th most frequent sense of w. Ssubset cannot be set as fs4g since some words have only three senses. Table 3 lists the percentage of of cial training data used as tagged data (the number of examples in in- null complete tagged data divided by the number of examples in of cial training data) when we removed the instances with sense tags from Ssubset for all the 57 words. If Ssubset = fs3g, then most of sense tagged examples are still included in tagged data. If Ssubset = fs1, s2g, then there are very few tagged examples in tagged data. If no instances are removed from of cial training data, then the value of percentage is 100%.</Paragraph>
      <Paragraph position="7"> Given an incomplete tagged corpus for a target word, SVM does not have the ability to nd the new senses from untagged corpus. Therefore it labels all the instances in the untagged corpus with sense tags from SL.</Paragraph>
      <Paragraph position="8"> Given a set of positive examples for a class and a set of unlabeled examples, the one-class partially supervised classi cation algorithm, LPU (Learning from Positive and Unlabeled examples) (Liu et al., 2003), learns a classi er in four steps: Step 1: Identify a small set of reliable negative examples from unlabeled examples by the use of a classi er.</Paragraph>
      <Paragraph position="9"> Step 2: Build a classi er using positive examples and automatically selected negative examples. null Step 3: Iteratively run previous two steps until no unlabeled examples are classi ed as negative ones or the unlabeled set is null.</Paragraph>
      <Paragraph position="10"> Step 4: Select a good classi er from the set of classi ers constructed above.</Paragraph>
      <Paragraph position="11"> For comparison, LPU 12 was run to perform classi cation on XU for each class in XL. The label of each instance in XU was determined by maximizing the classi cation score from LPU output for each class. If the maximum score of an instance is negative, then this instance will be labeled as a new class. Note that LPU classi es XL+U into kXL + 1 groups in most of cases.</Paragraph>
      <Paragraph position="12"> The clustering based partially supervised sense disambiguation algorithm was implemented by replacing ELP with a semi-supervised k-means clustering algorithm (Wagstaff et al., 2001) in the model order identi cation procedure. The label information in labeled data was used to guide the semi-supervised clustering on XL+U . Firstly, the labeled data may be used to determine initial cluster centroids. If the cluster number is greater 12The three parameters in LPU were set as follows: -s1 spy -s2 svm -c 1 . It means that we used the spy technique for step 1 in LPU, the SVM algorithm for step 2, and selected the rst or the last classi er as the nal classi er. It is identical with the algorithm Spy+SVM IS in Liu et al. (2003).</Paragraph>
      <Paragraph position="13">  than kXL, the initial centroids of clusters for new classes will be assigned as randomly selected instances. Secondly, in the clustering process, the instances with the same class label will stay in the same cluster, while the instances with different class labels will belong to different clusters. For better clustering solution, this clustering process will be restarted three times. Clustering process will be terminated when clustering solution converges or the number of iteration steps is more than 30. Kmin = kXL = jSLj, Kmax = Kmin + m. m is set as 4.</Paragraph>
      <Paragraph position="14"> We used Jensen-Shannon (JS) divergence (Lin, 1991) as distance measure for semi-supervised clustering and ELP, since plain LP with JS divergence achieves better performance than that with cosine similarity on SENSEVAL-3 data (Niu et al., 2005).</Paragraph>
      <Paragraph position="15"> For the LP process in ELP algorithm, we constructed connected graphs as follows: two instances u, v will be connected by an edge if u is among v's 10 nearest neighbors, or if v is among u's 10 nearest neighbors as measured by cosine or JS distance measure (following (Zhu and Ghahramani, 2002)).</Paragraph>
      <Paragraph position="16"> We used three types of features to capture the information in all the contextual sentences of target words in SENSEVAL-3 data for all the four algorithms: part-of-speech of neighboring words with position information, words in topical context without position information (after removing stop words), and local collocations (as same as the feature set used in (Lee and Ng, 2002) except that we did not use syntactic relations). We removed the features with occurrence frequency (counted in both training set and test set) less than 3 times. If the estimated sense number is more than the sense number in the initial tagged corpus XL, then the results from order identi cation based methods will consist of the instances from clusters of unknown classes. When assessing the agreement between these classi cation results and the known results on of cial test set, we will encounter the problem that there is no sense tag for each instance in unknown classes. Slonim and Tishby (2000) proposed to assign documents in each cluster with the most dominant class label in that cluster, and then conducted evaluation on these labeled documents. Here we will follow their method for assigning sense tags to unknown classes from LPU, clustering based order identi cation process, and ELP based order identi cation process. We assigned the instances from unknown classes with the dominant sense tag in that cluster. The result from LPU always includes only one cluster of the unknown class. We also assigned the instances from the unknown class with the dominant sense tag in that cluster. When all instances have their sense tags, we evaluated the their results using the accuracy on of cial test set.</Paragraph>
    </Section>
    <Section position="2" start_page="419" end_page="420" type="sub_section">
      <SectionTitle>
3.2 Results on Sense Disambiguation
</SectionTitle>
      <Paragraph position="0"> Table 4 summarizes the accuracy of SVM, LPU, the semi-supervised k-means clustering algorithm with correct sense number jSj or estimated sense number ^kXL+U as input, and the ELP algorithm with correct sense number jSj or estimated sense number ^kXL+U as input using various incomplete tagged data. The last row in Table 4 lists the average accuracy of each algorithm over the six experimental settings. Using jSj as input means that we do not perform order identi cation procedure, while using ^kXL+U as input is to perform order identi cation and obtain the classi cation results on XU at the same time.</Paragraph>
      <Paragraph position="1"> We can see that ELP based method outperforms clustering based method in terms of average accuracy under the same experiment setting, and these two methods outperforms SVM and LPU. Moreover, using the correct sense number as input helps to improve the overall performance of both clustering based method and ELP based method.</Paragraph>
      <Paragraph position="2"> Comparing the performance of the same system with different sizes of tagged data (from the rst experiment to the third experiment, and from the fourth experiment to the sixth experiment), we can see that the performance was improved when given more labeled data. Furthermore, ELP based method outperforms other methods in terms of accuracy when rare senses (e.g. s3) are missing in the tagged data. It seems that ELP based method has the ability to nd rare senses with the use of tagged and untagged corpora.</Paragraph>
      <Paragraph position="3"> LPU algorithm can deal with only one-class classi cation problem. Therefore the labeled data of other classes cannot be used when determining the positive labeled data for current class. ELP can use the labeled data of all the known classes to determine the seeds of unknown classes. It may explain why LPU's performance is worse than ELP based sense disambiguation although LPU can correctly estimate the sense number in XL+U  gorithm with correct sense number jSj or estimated sense number ^kXL+U as input, and the ELP algorithm with correct sense number jSj or estimated sense number ^kXL+U as input on the of cial test data of ELS task in SENSEVAL-3 when given various incomplete tagged corpora.  standard deviation of absolute values of the difference between ground-truth results jSj and sense numbers estimated by clustering or ELP based order identi cation procedure respectively.</Paragraph>
      <Paragraph position="4"> Clustering based method ELP based method  {s2, s3} 1.8+-0.5 1.8+-0.5 when only one sense is missing in XL.</Paragraph>
      <Paragraph position="5"> When very few labeled examples are available, the noise in labeled data makes it dif cult to learn the classi cation score (each entry in YDU ). Therefore using the classi cation con dence criterion may lead to poor performance of seed selection for unknown classes if the classi cation score is not accurate. It may explain why ELP based method does not outperform clustering based method with small labeled data (e.g., Ssubset = fs1g).</Paragraph>
    </Section>
    <Section position="3" start_page="420" end_page="420" type="sub_section">
      <SectionTitle>
3.3 Results on Sense Number Estimation
</SectionTitle>
      <Paragraph position="0"> Table 5 provides the mean and standard deviation of absolute difference values between ground-truth results jSj and sense numbers estimated by clustering or ELP based order identi cation procedures respectively. For example, if the ground truth sense number of the word w is kw, and the estimated value is ^kw, then the absolute value of the difference between these two values is jkw ^kwj.</Paragraph>
      <Paragraph position="1"> Therefore we can have this value for each word.</Paragraph>
      <Paragraph position="2"> Then we calculated the mean and deviation on this array of absolute values. LPU does not have the order identi cation capability since it always assumes that there is at least one new class in unlabeled data, and does not further differentiate the instances from these new classes. Therefore we do not provide the order identi cation results of LPU.</Paragraph>
      <Paragraph position="3"> From the results in Table 5, we can see that estimated sense numbers are closer to ground truth results when given less labeled data for clustering or ELP based methods. Moreover, clustering based method performs better than ELP based method in terms of order identi cation when given less labeled data (e.g., Ssubset = fs1g). It seems that ELP is not robust to the noise in small labeled data, compared with the semi-supervised k-means clustering algorithm.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML