File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0822_evalu.xml
Size: 3,915 bytes
Last Modified: 2025-10-06 13:59:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0822"> <Title>Augmenting Ensemble Classification for Word Sense Disambiguation with a Kernel PCA Model</Title> <Section position="4" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Results and discussion </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Accuracy </SectionTitle> <Paragraph position="0"> Table 1 summarizes the results of the submitted systems along with the individual voting models. Since our models attempted to disambiguate all test instances, we report accuracy (precision and recall being equal). Earlier experiments on Senseval-2 data showed that the KPCA-based model significantly outperformed both na&quot;ive Bayes and maximum entropy models (Wu et al., 2004). On the Senseval-3 data, the maximum entropy model fares slightly better: it remains significantly worse on the Multi-lingual (ts) task, but achieves statistically the same accuracy on the English (fine) task and is slightly models on the Senseval-3 Lexical Sample tasks. Percentages representing disagreement between KPCA and other voting models are shown in bold.</Paragraph> <Paragraph position="1"> kpca vs: me boost nb task incorrect correct incorrect correct incorrect correct more accurate on the Multilingual (t) task. For unknown reasons--possibly the very small number of training instances per Chinese target word, as mentioned earlier--there is an exception on the Chinese task, where boosting outperforms the KPCA-based model. We are investigating the possible causes.</Paragraph> <Paragraph position="2"> The na&quot;ive Bayes model remains significantly worse under all conditions.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Differentiated voting bias </SectionTitle> <Paragraph position="0"> For a new voting model to raise the accuracy of an existing classifier ensemble, it is not only important that the new voting model achieve accuracy comparable to the other voters, as shown above, but also that it provides a significantly differentiated prediction bias than the other voters. Otherwise, the accuracy is typically hurt rather than helped by the new voting model.</Paragraph> <Paragraph position="1"> To examine whether the KPCA-based model satisfies this requirement, we compared its predictions against each of the other classifiers (for those tasks where we have been given the answer key). Table 2 shows nine confusion matrices revealing the percentage of instances where the KPCA-based model votes differently from one of the other voters. The disagreement between KPCA and the other voting models ranges from 6.03% to 14.63%, as shown by the bold entries in the confusion matrices. Note that where there is disagreement, the KPCA-based model predicts the correct sense with significantly higher accuracy, in nearly all cases.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Voting effectiveness </SectionTitle> <Paragraph position="0"> The KPCA-based model exhibits the accuracy and differentiation characteristics requisite for an effective additional voter, as shown in the foregoing sec- null HKUST comb2 (me, boost, nb, kpca) 71.4 78.6 66.2 62.0 63.8 tions. To verify that adding the KPCA-based model to the voting ensemble indeed improves accuracy, we compared our voting ensemble's accuracies to that obtained with KPCA removed. The results, shown in Table 3, confirm that the KPCA-based model generally helps on Senseval-3 Lexical Sample tasks. The only exception is on Chinese, due to the aforementioned anomaly of boosting outperforming KPCA on that task. In the Multilingual (t) and (ts) cases, the improvement in accuracy is significant. null</Paragraph> </Section> </Section> class="xml-element"></Paper>