File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0601_intro.xml
Size: 6,183 bytes
Last Modified: 2025-10-06 14:03:12
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0601"> <Title>Effective use of WordNet semantics via kernel-based learning</Title> <Section position="3" start_page="0" end_page="1" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The large literature on term clustering, term similarity and weighting schemes shows that document similarity is a central topic in Information Retrieval (IR). The research efforts have mostly been directed in enriching the document representation by using clustering (term generalization) or adding compounds (term specifications). These studies are based on the assumption that the similarity between two documents can be expressed as the similarity between pairs of matching terms. Following this idea, term clustering methods based on corpus term distributions or on external prior knowledge (e.g. provided by WordNet) were used to improve the basic term matching.</Paragraph> <Paragraph position="1"> An example of statistical clustering is given in (Bekkerman et al., 2001). A feature selection technique, which clusters similar features/words, called the Information Bottleneck (IB), was applied to Text Categorization (TC). Such cluster based representation outperformed the simple bag-of-words on only one out of the three experimented collections. The effective use of external prior knowledge is even more difficult since no attempt has ever been successful to improve document retrieval or text classification accuracy, (e.g. see (Smeaton, 1999; Sussna, 1993; Voorhees, 1993; Voorhees, 1994; Moschitti and Basili, 2004)).</Paragraph> <Paragraph position="2"> The main problem of term cluster based representations seems the unclear nature of the relationship between the word and the cluster information levels. Even if (semantic) clusters tend to improve the system recall, simple terms are, on a large scale, more accurate (e.g. (Moschitti and Basili, 2004)).</Paragraph> <Paragraph position="3"> To overcome this problem, hybrid spaces containing terms and clusters were experimented (e.g. (Scott and Matwin, 1999)) but the results, again, showed that the mixed statistical distributions of clusters and terms impact either marginally or even negatively on the overall accuracy.</Paragraph> <Paragraph position="4"> In (Voorhees, 1993; Smeaton, 1999), clusters of synonymous terms as defined in WordNet (WN) (Fellbaum, 1998) were used for document retrieval.</Paragraph> <Paragraph position="5"> The results showed that the misleading information due to the wrong choice of the local term senses causes the overall accuracy to decrease. Word sense disambiguation (WSD) was thus applied beforehand by indexing the documents by means of disambiguated senses, i.e. synset codes (Smeaton, 1999; Sussna, 1993; Voorhees, 1993; Voorhees, 1994; Moschitti and Basili, 2004). However, even the state-of-the-art methods for WSD did not improve the accuracy because of the inherent noise introduced by the disambiguation mistakes. The above studies suggest that term clusters decrease the precision of the system as they force weakly related or unrelated (in case of disambiguation errors) terms to give a contribution in the similarity function. The successful introduction of prior external knowledge relies on the solution of the above problem.</Paragraph> <Paragraph position="6"> In this paper, a model to introduce the semantic lexical knowledge contained in the WN hierarchy in a supervised text classification task has been proposed. Intuitively, the main idea is that the documents d are represented through the set of all pairs in the vocabulary < t,tprime >[?] V xV originating by the terms t [?] d and all the words tprime [?] V , e.g. the WN nouns. When the similarity between two documents is evaluated, their matching pairs are used to account for the final score. The weight given to each term pair is proportional to the similarity that the two terms have in WN. Thus, the term t of the first document contributes to the document similarity according to its relatedness with any of the terms of the second document and the prior external knowledge, provided by WN, quantifies the single term to term relatedness. Such approach has two advantages: (a) we obtain a well defined space which supports the similarity between terms of different surface forms based on external knowledge and (b) we avoid to explicitly define term or sense clusters which inevitably introduce noise.</Paragraph> <Paragraph position="7"> The class of spaces which embeds the above pair information may be composed by O(|V|2) dimensions. If we consider only the WN nouns (about 105), our space contains about 1010 dimensions which is not manageable by most of the learning algorithms. Kernel methods, can solve this problem as they allow us to use an implicit space representation in the learning algorithms. Among them Support Vector Machines (SVMs) (Vapnik, 1995) are kernel based learners which achieve high accuracy in presence of many irrelevant features. This is another important property as selection of the informative pairs is left to the SVM learning.</Paragraph> <Paragraph position="8"> Moreover, as we believe that the prior knowledge in TC is not so useful when there is a sufficient amount of training documents, we experimented our model in poor training conditions (e.g. less equal than 20 documents for each category). The improvements in the accuracy, observed on the classification of the well known Reuters and 20 NewsGroups corpora, show that our document similarity model is very promising for general IR tasks: unlike previous attempts, it makes sense of the adoption of semantic external resources (i.e. WN) in IR.</Paragraph> <Paragraph position="9"> Section 2 introduces the WordNet-based term similarity. Section 3 defines the new document similarity measure, the kernel function and its use within SVMs. Section 4 presents the comparative results between the traditional linear and the WN-based kernels within SVMs. In Section 5 comparative discussion against the related IR literature is carried out. Finally Section 6 derives the conclusions.</Paragraph> </Section> class="xml-element"></Paper>