XML Viewer - i05-3012

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-3012_metho.xml
Size: 13,982 bytes
Last Modified: 2025-10-06 14:09:35
<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-3012">
  <Title>Integrating Collocation Features in Chinese Word Sense Disambiguation</Title>
  <Section position="4" start_page="88" end_page="89" type="metho">
    <SectionTitle>
3 The Classifier With Topical Contex-
</SectionTitle>
    <Paragraph position="0"> tual and Local Collocation Features</Paragraph>
    <Section position="1" start_page="88" end_page="89" type="sub_section">
      <SectionTitle>
3.1 The Feature Set
</SectionTitle>
      <Paragraph position="0"> As stated early, an important issue is what features will be used to construct the classifier in WSD. Early researches have proven that using lexical statistical information, such as bi-gram co-occurrences was sufficient to produce close to the best results [10] for Chinese WSD. Instead of including bi-gram features as part of discrimination features, in our system, we consider both topical contextual features as well as local collocation features. These features are extracted form the 60MB human sense-tagged People's Daily News with segmentation information. null  Niu [11] proved in his experiments that Naive Bayes classifier achieved best disambiguation accuracy with small topical context window size (&lt; 10 words). We follow their method and set the contextual window size as 10 in our system. Each of the Chinese words except the stop words inside the window range will be considered as one topical feature. Their frequencies are calculated over the entire corpus with respect to each sense of an ambiguous word w. The sense definitions are obtained from HowNet.</Paragraph>
      <Paragraph position="1">  We chose collocations as the local features. A collocation is a recurrent and conventional fixed expression of words which holds syntactic and semantic relations [21]. Collocations can be classified as fully fixed collocations, fixed collocations, strong collocations and loose collocations. Fixed collocations means the appearance of one word implies the co-occurrence of another one such as &amp;quot;Z&gt; &amp;quot; (&amp;quot;burden of history&amp;quot;), while strong collocations allows very limited substitution of the components, for example, &amp;quot; L6 &amp;quot; (&amp;quot;local college&amp;quot;), or &amp;quot; : &amp;quot; (&amp;quot;local university&amp;quot;). The sense of ambiguous words can be uniquely determined in these two types of collocations, therefore are the collocations applied in our system. The sources of the collocations will be explained in Section 4.1.</Paragraph>
      <Paragraph position="2"> In both Niu [11] and Dang's [10] work, topical features as well as the so called collocational features were used. However, as discussed in Section 2, they both used bi-gram co-occurrences as the additional local features.</Paragraph>
      <Paragraph position="3"> However, bi-gram co-occurrences only indicate statistical significance which may not actually satisfy the conceptual definition of collocations. Thus instead of using co-occurrences of bigrams, we take the true bi-gram collocations extracted from our system and use this data to compare with bi-gram co-occurrences to test the usefulness of collocation for WSD. The local features in our system make use of the collocations using the template (wi, w) within a window size of ten (where i = +- 5). For example, &amp;quot;p</Paragraph>
      <Paragraph position="5"> departments and local government commanded that&amp;quot;) fits the bi-gram collocation template (w, w1) with the value of ( p ). During the training and the testing processes, the counting of frequency value of the collocation feature will be increased by 1 if a collocation containing the ambiguous word occurs in a sentence. To have a good analysis on collocation features, we have also developed an algorithm using lonely adjacent bi-gram as locals features(named Sys- null adjacent bi-gram as locals features(named System A) and another using collocation as local features(named System B).</Paragraph>
    </Section>
    <Section position="2" start_page="89" end_page="89" type="sub_section">
      <SectionTitle>
3.2 The Collocation Classifier
</SectionTitle>
      <Paragraph position="0"> We consider all the features in the features set F = Ft [?]Fl = {f1, f2, ... , fm } as independent, where Ft stands for the topical contextual features set, and Fl stands for the local collocation features set. For an ambiguous word w with n senses, let Sw = {ws1, ws2, ... , wsn } be the sense set. For the contextual features, we directly apply the Naive Bayes algorithm using Add-Lambda Smoothing to handle unknown words:</Paragraph>
      <Paragraph position="2"> To integrate the local collocation feature fj ) Fl with respect to each sense siw of w, we use the follows formula: )()()( 21 sisisi wscorewscorewscore *+= a (4) where a is tuned from experiments (Section 4.5), score1( siw ) refers the score of the topical contextual features based on formula (1) and score2( siw ) refers the score of collocation features with respect to the sense sjw of w defined below.</Paragraph>
      <Paragraph position="4"> where d(fj |sjw ) = 1 for fj ) Fl if the collocation occurs in the local context. Otherwise this term is set as 0.</Paragraph>
      <Paragraph position="5"> Finally, we choose the right skw so that )(maxarg sks wscores</Paragraph>
      <Paragraph position="7"/>
    </Section>
  </Section>
  <Section position="5" start_page="89" end_page="92" type="metho">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"> We have designed a set of experiments to compare the classifier with and without the collocation features. In system A, the classifier is built with local bi-gram features and topical contextual features. The classifier in system B is constructed from combining the local collocation features with topical features.</Paragraph>
    <Section position="1" start_page="89" end_page="90" type="sub_section">
      <SectionTitle>
4.1 Preparation the Data Set
</SectionTitle>
      <Paragraph position="0"> We have selected 20 ambiguous words from nouns and verbs with the sense number as 4 in average. The sense definition is taken from HowNet [22]. To show the effect of the algorithm, we try to choose words with high degree of ambiguity, high frequency of use [23], and high frequency of constructing collocations. The selection of these 20 words is not completely random although within each criterion class we do try to pick word randomly.</Paragraph>
      <Paragraph position="1"> Based on the 20 words, we extracted 28,000 sentences from the 60 MB People's Daily News with segmentation information as our training/test set which is then manually sense-tagged. The collocation list is constructed from a combination of a digital collocation dictionary, a return result from a collocation automatic extraction system [21], and a hand collection from the People's Daily News. As we stated early, the sense of ambiguous words in the fixed collocations and strong collocations can be decided uniquely although they are not unique in loose collocations. For example, the ambiguous word &amp;quot;M6, &amp;quot; in the collocation &amp;quot;,XM6, &amp;quot; may have both the sense of &amp;quot;appearance|? &amp;quot; or &amp;quot;reputation |&amp;quot;. Therefore, when labeling the sense of collocations, we filter out the ones which cannot uniquely determine the sense of ambiguous words inside. However, this does not mean that loose collocations have no contribution in WSD classification. We simply reduce its weight when combining it with the contextual features compared with the fixed and strong collocations. The sense and collocation distribution over the 20 words on the training examples can be found in Table 1.</Paragraph>
      <Paragraph position="2">  T#: total number of sentences contain the ambiguous word s1- s6: sense no; co#: number of collocations in each sense</Paragraph>
    </Section>
    <Section position="2" start_page="90" end_page="90" type="sub_section">
      <SectionTitle>
4.2 The Effect of Collocation Features
</SectionTitle>
      <Paragraph position="0"> We recorded 6 trials with average precision over six-fold validation for each word. Their average precision for the six trials in the system A, and B can be found in Table 2 and Table 3. From Table 3, regarding to precision, there are 16 words have improved and 4 words remained the same in the system B. The results from the both system confirmed that collocation features do improve the precision. Note that 4 words have the same precision in the two systems, which fall into two cases. In the first case, it can be seen that these words already have very high precision in the system A (over 93%) which means that one sense dominates all other senses. In this case, the additional collation information is not necessary. In fact, when we checked the intermediate outputs, the score of the candidate senses of the ambiguous words contained in the collocations get improved. Even though, it would not change the result. Secondly, no collocation appeared in the sentences which are tagged incorrectly in the system A. This is confirmed when we check the error files. For example, the word &amp;quot;G&amp;}&amp;quot; with the sense as &amp;quot;!&amp;quot; (&amp;quot;closeness&amp;quot;) appeared in 4492 examples over the total 4885 examples (91.9%). In the mean time, 99% of collocation in its collocation list has the same sense of &amp;quot;!&amp;quot; (&amp;quot;closeness&amp;quot;). Only one collocation &amp;quot;G&amp;} &amp;quot; has the sense of &amp;quot; &amp;quot; (&amp;quot;power&amp;quot;). Therefore, the collocation features improved the score of sense &amp;quot;!&amp;quot; which is already the highest one based on the contextual features.</Paragraph>
      <Paragraph position="1"> As can be seen from Table 3, the collocation features work well for the sparse data. For example, the word &amp;quot;1u &amp;quot; in the training corpus has only one example with the sense &amp;quot; q&amp;quot; (&amp;quot;human&amp;quot;), the other 30 examples all have the sense &amp;quot;1u) &amp;quot; (&amp;quot;management&amp;quot;). Under this situation, the topical contextual features failed to identify the right sense for the only appearance of the</Paragraph>
      <Paragraph position="3"> ever, it can be correctly identified in the system B because the appearance of the collocation &amp;quot;</Paragraph>
      <Paragraph position="5"> To well show the effect of collocations on the accuracy of classifier for the task of WSD, we also tested both systems on SENSEVAL-3 data set, and the result is recorded in the Table 4. From the difference in the relative improvement of both data sets, we can see that collocation features work well when the statistical model is not sufficiently built up such as from a small corpus like SENSEVAL-3. Actually, in this case, the training examples appear in the corpus only once or twice so that the parameters for such sparse training examples may not be accurate to forecast the test examples, which convinces us that collocation features are effective on handling sparse training data even for unknown words. Fig. 1 shows the precision comparison in the system A, and B on SENVESAL-3.</Paragraph>
    </Section>
    <Section position="3" start_page="90" end_page="92" type="sub_section">
      <SectionTitle>
4.3 The Effect of Collocations on the Size
of Training Corpus Needed
</SectionTitle>
      <Paragraph position="0"> Hwee [21] stated that a large-scale, human sense-tagged corpus is critical for a supervised learning approach to achieve broad coverage and high accuracy WSD. He conducted a thorough study on the effect of training examples on the accuracy of supervised corpus based WSD.</Paragraph>
      <Paragraph position="1"> As the result showed, WSD accuracy continues to climb as the number of training examples increases. Similarly, we have tested the system A, and B with the different size of training corpus based on the PDN corpus we prepared. Our experiment results shown in Fig 2 follow the same fact. The purpose we did the testing is that we hope to disclose the effect of collocations on the size of training corpus needed. From Fig 2, we can see by using the collocation features, the precision of the system B has increased slower along with the growth of training examples than the precision of the system A. The result is reasonable because with collocation feature, the statistical contextual information over the entire corpus becomes side effect. Actually, as can be seen from Fig. 2, after using collocation features  in the system B, even we use 1/6 corpus as training, the precision is still higher than we use 5/6 train corpus in the system A.</Paragraph>
    </Section>
    <Section position="4" start_page="92" end_page="92" type="sub_section">
      <SectionTitle>
4.4 Investigation of Sense Distribution on
</SectionTitle>
      <Paragraph position="0"> the Effect of Collocation Features To investigate the sense distribution on the effect of collocation features, we selected the ambiguous words with the number of sense varied from 2 to 6. In each level of the sense number, the words are selected randomly. Table 5 shows the effect of sense distribution on the effect of collocation features. From the table, we can see that the collocation features work well when the sense distribution is even for a particular ambiguous word under which case the classifier may get confused.</Paragraph>
      <Paragraph position="1">  o: 83% to 86% samples fall in one dominate sense</Paragraph>
    </Section>
    <Section position="5" start_page="92" end_page="92" type="sub_section">
      <SectionTitle>
4.5 The Test of a
</SectionTitle>
      <Paragraph position="0"> We have conducted a set of experiments based on both the PDN corpus and SENSEVLA-3 data to set the best value of a for the formula (4) described in Section 3.2. The best start value of a is tested based on the precision rate which is shown in Fig. 3. It is shown from the experiment that a takes the start value of 0.5 for both cor-</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML