File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1190_intro.xml
Size: 3,851 bytes
Last Modified: 2025-10-06 14:02:11
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1190"> <Title>Semi-Supervised Training of a Kernel PCA-Based Model for Word Sense Disambiguation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Wu et al. (2004) propose an efficient and accurate new supervised learning model for word sense disambiguation (WSD), that exploits a nonlinear Kernel Principal Component Analysis (KPCA) technique to make predictions implicitly based on generalizations over feature combinations. Experiments performed on the Senseval-2 English lexical sample data show that KPCA-based word sense disambiguation method is capable of outperforming other widely used WSD models including na&quot;ive Bayes, maximum entropy, and SVM models.</Paragraph> <Paragraph position="1"> Despite the excellent performance of the supervised KPCA-based WSD model on average, though, our further error analysis investigations have suggested certain limitations. In particular, the supervised KPCA-based model often appears to perform poorly when it encounters target words whose contexts are highly dissimilar to those of any previously seen instances in the training set. Empirically, the supervised KPCA-based model nearly always disambiguates target words of this kind to the most frequent sense. As a result, for this particular subset of test instances, the precision achieved by the KPCA-based model is essentially no higher than the precision achieved by the most-frequent-sense baseline model (which simply always selects the most frequent sense for the target word). The work reported in this paper stems from a hypothesis that the most-frequent-sense 1The author would like to thank the Hong Kong Research Grants Council (RGC) for supporting this research in part through grants RGC6083/99E, RGC6256/00E, and DAG03/04.EG09.</Paragraph> <Paragraph position="2"> strategy can be bettered for this category of errors.</Paragraph> <Paragraph position="3"> This is a case of data sparseness, so the observation should not be very surprising. Such behavior is to be expected from classifiers in general, and not just the KPCA-based model. Put another way, even though KPCA is able to generalize over combinations of dependent features, there must be a sufficient number of training instances from which to generalize.</Paragraph> <Paragraph position="4"> The nature of KPCA, however, suggests a strategy that is not applicable to many of the other conventional WSD models. We propose a model in this paper that takes advantage of unsupervised training using large quantities of unannotated corpora, to help compensate for sparse data.</Paragraph> <Paragraph position="5"> Note that although we are using the WSD task to explain the model, in fact the proposed model is not limited to WSD applications. We have hypothesized that the KPCA-based method is likely to be widely applicable to other NLP tasks; since data sparseness is a common problem in many NLP tasks, a weakly-supervised approach allowing the KPCA-based method to compensate for data sparseness is highly desirable. The general technique we describe here is applicable to any similar classification task where insufficient labeled training data is available.</Paragraph> <Paragraph position="6"> The paper is organized as follows. After a brief look at related work, we review the baseline supervised WSD model, which is based on Kernel PCA. We then discuss how data sparseness affects the model, and propose a new semi-supervised model that takes advantage of unlabeled data, along with a composite model that combines both the supervised and semi-supervised models.</Paragraph> <Paragraph position="7"> Finally, details of the experimental setup and comparative results are given.</Paragraph> </Section> class="xml-element"></Paper>