File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2406_intro.xml
Size: 5,590 bytes
Last Modified: 2025-10-06 14:02:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2406"> <Title>Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> (Pedersen and Bruce, 1997) and (Pedersen and Bruce, 1998) propose a (dis)similarity based discrimination approach that computes (dis)similarity among each pair of instances of the target word. This information is recorded in a (dis)similarity matrix whose rows/columns represent the instances of the target word that are to be discriminated. The cell entries of the matrix show the degree to which the pair of instances represented by the corresponding row and column are (dis)similar. The (dis)similarity is computed from the first order context vectors of the instances which show each instance as a vector of features that directly occur near the target word in that instance.</Paragraph> <Paragraph position="1"> (Sch&quot;utze, 1998) introduces second order context vectors that represent an instance by averaging the feature vectors of the content words that occur in the context of the target word in that instance. These second order context vectors then become the input to the clustering algorithm which clusters the given contexts in vector space, instead of building the similarity matrix structure.</Paragraph> <Paragraph position="2"> There are some significant differences in the approaches suggested by Pedersen and Bruce and by Sch&quot;utze. As yet there has not been any systematic study to determine which set of techniques results in better sense discrimination. In the sections that follow, we highlight some of the differences between these approaches.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Context Representation </SectionTitle> <Paragraph position="0"> Pedersen and Bruce represent the context of each test instance as a vector of features that directly occur near the target word in that instance. We refer to this representation as the first order context vector. Sch&quot;utze, by contrast, uses the second order context representation that averages the first order context vectors of individual features that occur near the target word in the instance. Thus, Sch&quot;utze represents each feature as a vector of words that occur in its context and then computes the context of the target word by adding the feature vectors of significant content words that occur near the target word in that context.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Features </SectionTitle> <Paragraph position="0"> Pedersen and Bruce use a small number of local features that include co-occurrence and part of speech information near the target word. They select features from the same test data that is being discriminated, which is a common practice in clustering in general. Sch&quot;utze represents contexts in a high dimensional feature space that is created using a separate large corpus (referred to as the training corpus). He selects features based on their frequency counts or log-likelihood ratios in this corpus.</Paragraph> <Paragraph position="1"> In this paper, we adopt Sch&quot;utze's approach and select features from a separate corpus of training data, in part because the number of test instances may be relatively small and may not be suitable for selecting a good feature set. In addition, this makes it possible to explore variations in the training data while maintaining a consistent test set. Since the training data used in unsupervised clustering does not need to be sense tagged, in future work we plan to develop methods of collecting very large amounts of raw corpora from the Web and other online sources and use it to extract features.</Paragraph> <Paragraph position="2"> Sch&quot;utze represents each feature as a vector of words that co-occur with that feature in the training data. These feature vectors are in fact the first order context vectors of the feature words (and not target word). The words that co-occur with the feature words form the dimensions of the feature space. Sch&quot;utze reduces the dimensionality of this feature space using Singular Value Decomposition (SVD), which is also employed by related techniques such as Latent Semantic Indexing (Deerwester et al., 1990) and Latent Semantic Analysis (Landauer et al., 1998). SVD has the effect of converting a word level feature space into a concept level semantic space that smoothes the fine distinctions between features that represent similar concepts.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Clustering Space </SectionTitle> <Paragraph position="0"> Pedersen and Bruce represent instances in a (dis)similarity space where each instance can be seen as a point and the distance between any two points is a function of their mutual (dis)similarities. The (dis)similarity matrix showing the pair-wise (dis)similarities among the instances is given as the input to the agglomerative clustering algorithm. The context group discrimination method used by Sch&quot;utze, on the other hand, operates on the vector representations of instances and thus works in vector space. Also he employs a hybrid clustering approach which uses both an agglomerative and the Estimation Maximization (EM) algorithm.</Paragraph> </Section> </Section> class="xml-element"></Paper>