XML Viewer - w06-2501

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2501_metho.xml
Size: 20,078 bytes
Last Modified: 2025-10-06 14:10:52
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2501">
  <Title>Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts</Title>
  <Section position="4" start_page="1" end_page="3" type="metho">
    <SectionTitle>
3 Gloss Vectors in Semantic Relatedness
</SectionTitle>
    <Paragraph position="0"> In this research, we create a Gloss Vector for each concept (or word sense) represented in a dictionary. While we use WordNet as our dictionary, the method can apply to other lexical resources.</Paragraph>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Creating Vectors from WordNet Glosses
</SectionTitle>
      <Paragraph position="0"> A Gloss Vector is a second order context vector formed by treating the dictionary definition of a  concept as a context, and finding the resultant of the first order context vectors of the words in the definition.</Paragraph>
      <Paragraph position="1"> In particular, we define a Word Space by creating first order context vectors for every word w that is not a stop word and that occurs above a minimum frequency in our corpus. The specific steps are as follows:  1. Initialize the first order context vector to a zero vector -w.</Paragraph>
      <Paragraph position="2"> 2. Find every occurrence of w in the given corpus. null 3. For each occurrence of w, increment those dimensions of -w that correspond to the words  from the Word Space and are present within a given number of positions around w in the corpus.</Paragraph>
      <Paragraph position="3"> The first order context vector -w, therefore, encodes the co-occurrence information of word w. For example, consider the gloss of lamp - an artificial source of visible illumination. The Gloss Vector for lamp would be formed by adding the first order context vectors of artificial, source, visible and illumination.</Paragraph>
      <Paragraph position="4"> In these experiments, we use WordNet as the corpus of text for deriving first order context vectors. We take the glosses for all of the concepts in WordNet and view that as a large corpus of text. This corpus consists of approximately 1.4 million words, and results in a Word Space of approximately 20,000 dimensions, once low frequency and stop words are removed. We chose the WordNet glosses as a corpus because we felt the glosses were likely to contain content rich terms that would distinguish between the various concepts more distinctly than would text drawn from a more generic corpus. However, in our future work we will experiment with other corpora as the source of first order context vectors, and other dictionaries as the source of glosses.</Paragraph>
      <Paragraph position="5"> The first order context vectors as well as the Gloss Vectors usually have a very large number of dimensions (usually tens of thousands) and it is not easy to visualize this space. Figure 2 attempts to illustrate these vectors in two dimensions. The words tennis and food are the dimensions of this 2-dimensional space. We see that the first order context vector for serve is approximately halfway between tennis and food, since the word serve could</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
Vector
</SectionTitle>
      <Paragraph position="0"> mean to &amp;quot;serve the ball&amp;quot; in the context of tennis or could mean &amp;quot;to serve food&amp;quot; in another context.</Paragraph>
      <Paragraph position="1"> The first order context vectors for eat and cutlery are very close to food, since they do not have a sense that is related to tennis. The gloss for the word fork, &amp;quot;cutlery used to serve and eat food&amp;quot;, contains the words cutlery, serve, eat and food.</Paragraph>
      <Paragraph position="2"> The Gloss Vector for fork is formed by adding the first order context vectors of cutlery, serve, eat and food. Thus, fork has a Gloss Vector which is heavily weighted towards food. The concept of food, therefore, is in the same semantic space as and is related to the concept of fork.</Paragraph>
      <Paragraph position="3"> Similarly, we expect that in a high dimensional space, the Gloss Vector of fork would be heavily weighted towards all concepts that are semantically related to the concept of fork. Additionally, the previous demonstration involved a small gloss for representing fork. Using augmented glosses, described in section 3.2, we achieve better representations of concepts to build Gloss Vectors upon.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
3.2 Augmenting Glosses Using WordNet
Relations
</SectionTitle>
      <Paragraph position="0"> The formulation of the Gloss Vector measure described above is independent of the dictionary used and is independent of the corpus used. However, dictionary glosses tend to be rather short, and it is possible that even closely related concepts will be defined using different sets of words. Our belief is that two synonyms that are used in different glosses will tend to have similar Word Vectors (because their co-occurrence behavior should be similar). However, the brevity of dictionary glosses may still make it difficult to create Gloss Vectors that are truly representative of the concept.</Paragraph>
      <Paragraph position="1">  (Banerjee and Pedersen, 2003) encounter a similar issue when measuring semantic relatedness by counting the number of matching words between the glosses of two different concepts. They expand the glosses of concepts in WordNet with the glosses of concepts that are directly linked by a WordNet relation. We adopt the same technique here, and use the relations in WordNet to augment glosses for the Gloss Vector measure. We take the gloss of a given concept, and concatenate to it the glosses of all the concepts to which it is directly related according to WordNet. The Gloss Vector for that concept is then created from this big concatenated gloss.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="3" end_page="3" type="metho">
    <SectionTitle>
4 Other Measures of Relatedness
</SectionTitle>
    <Paragraph position="0"> Below we briefly describe five alternative measures of semantic relatedness, and then go on to include them as points of comparison in our experimental evaluation of the Gloss Vector measure.</Paragraph>
    <Paragraph position="1"> All of these measures depend in some way upon WordNet. Four of them limit their measurements to nouns located in the WordNet is-a hierarchy.</Paragraph>
    <Paragraph position="2"> Each of these measures takes two WordNet concepts (i.e., word senses or synsets) c1 and c2 as input and return a numeric score that quantifies their degree of relatedness.</Paragraph>
    <Paragraph position="3"> (Leacock and Chodorow, 1998) finds the path length between c1 and c2 in the is-a hierarchy of WordNet. The path length is then scaled by the depth of the hierarchy (D) in which they reside to obtain the relatedness of the two concepts.</Paragraph>
    <Paragraph position="4"> (Resnik, 1995) introduced a measure that is based on information content, which are numeric quantities that indicate the specificity of concepts. These values are derived from corpora, and are used to augment the concepts in WordNet's is-a hierarchy. The measure of relatedness between two concepts is the information content of the most specific concept that both concepts have in common (i.e., their lowest common subsumer in the is-a hierarchy).</Paragraph>
    <Paragraph position="5"> (Jiang and Conrath, 1997) extends Resnik's measure to combine the information contents of c1, c2 and their lowest common subsumer.</Paragraph>
    <Paragraph position="6"> (Lin, 1998) also extends Resnik's measure, by taking the ratio of the shared information content to that of the individual concepts.</Paragraph>
    <Paragraph position="7"> (Banerjee and Pedersen, 2003) introduce Extended Gloss Overlaps, which is a measure that determines the relatedness of concepts proportional to the extent of overlap of their WordNet glosses. This simple definition is extended to take advantage of the complex network of relations in Word-Net, and allows the glosses of concepts to include the glosses of synsets to which they are directly related in WordNet.</Paragraph>
  </Section>
  <Section position="6" start_page="3" end_page="4" type="metho">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> As was done by (Budanitsky and Hirst, 2001), we evaluated the measures of relatedness in two ways.</Paragraph>
    <Paragraph position="1"> First, they were compared against human judgments of relatedness. Second, they were used in an application that would benefit from the measures.</Paragraph>
    <Paragraph position="2"> The effectiveness of the particular application was an indirect indicator of the accuracy of the relatedness measure used.</Paragraph>
    <Section position="1" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
5.1 Comparison with Human Judgment
</SectionTitle>
      <Paragraph position="0"> One obvious metric for evaluating a measure of semantic relatedness is its correspondence with the human perception of relatedness. Since semantic relatedness is subjective, and depends on the human view of the world, comparison with human judgments is a self-evident metric for evaluation.</Paragraph>
      <Paragraph position="1"> This was done by (Budanitsky and Hirst, 2001) in their comparison of five measures of semantic relatedness. We follow a similar approach in evaluating the Gloss Vector measure.</Paragraph>
      <Paragraph position="2"> We use a set of 30 word pairs from a study carried out by (Miller and Charles, 1991). These word pairs are a subset of 65 word pairs used by (Rubenstein and Goodenough, 1965), in a similar study almost 25 years earlier. In this study, human subjects assigned relatedness scores to the selected word pairs. The word pairs selected for this study ranged from highly related pairs to unrelated pairs.</Paragraph>
      <Paragraph position="3"> We use these human judgments for our evaluation.</Paragraph>
      <Paragraph position="4"> Each of the word pairs have been scored by humans on a scale of 0 to 5, where 5 is the most related. The mean of the scores of each pair from all subjects is considered as the &amp;quot;human relatedness score&amp;quot; for that pair. The pairs are then ranked with respect to their scores. The most related pair is the first on the list and the least related pair is at the end of the list. We then have each of the measures of relatedness score the word pairs and a another ranking of the word pairs is created corresponding to each of the measures.</Paragraph>
      <Paragraph position="5">  1904) is used to assess the equivalence of two rankings. If the two rankings are exactly the same, the Spearman's correlation coefficient between these two rankings is 1. A completely reversed ranking gets a value of [?]1. The value is 0 when there is no relation between the rankings.</Paragraph>
      <Paragraph position="6"> We determine the correlation coefficient of the ranking of each measure with that of the human relatedness. We use the relatedness scores from both the human studies - the Miller and Charles study as well as the Rubenstein and Goodenough research. Table 1 summarizes the results of our experiment. We observe that the Gloss Vector has the highest correlation with humans in both cases.</Paragraph>
      <Paragraph position="7"> Note that in our experiments with the Gloss Vector measure, we have used not only the gloss of the concept but augmented that with the gloss of all the concepts directly related to it according to WordNet. We observed a significant drop in performance when we used just the glosses of the concept alone, showing that the expansion is necessary. In addition, the frequency cutoffs used to construct the Word Space played a critical role.</Paragraph>
      <Paragraph position="8"> The best setting of the frequency cutoffs removed both low and high frequency words, which eliminates two different sources of noise. Very low frequency words do not occur enough to draw distinctions among different glosses, whereas high frequency words occur in many glosses, and again do not provide useful information to distinguish among glosses.</Paragraph>
    </Section>
    <Section position="2" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
5.2 Application-based Evaluation
</SectionTitle>
      <Paragraph position="0"> An application-oriented comparison of five measures of semantic relatedness was presented in (Budanitsky and Hirst, 2001). In that study they evaluate five WordNet-based measures of semantic relatedness with respect to their performance in context sensitive spelling correction.</Paragraph>
      <Paragraph position="1"> We present the results of an application-oriented  evaluation of the measures of semantic relatedness. Each of the seven measures of semantic relatedness was used in a word sense disambiguation algorithm described by (Banerjee and Pedersen, 2003).</Paragraph>
      <Paragraph position="2"> Word sense disambiguation is the task of determining the meaning (from multiple possibilities) of a word in its given context. For example, in the sentence The ex-cons broke into the bank on Elm street, the word bank has the &amp;quot;financial institution&amp;quot; sense as opposed to the &amp;quot;edge of a river&amp;quot; sense. Banerjee and Pedersen attempt to perform this task by measuring the relatedness of the senses of the target word to those of the words in its context. The sense of the target word that is most related to its context is selected as the intended sense of the target word.</Paragraph>
      <Paragraph position="3"> The experimental data used for this evaluation is the SENSEVAL-2 test data. It consists of 4,328 instances (or contexts) that each includes a single ambiguous target word. Each instance consists of approximately 2-3 sentences and one occurrence of a target word. 1,754 of the instances include nouns as target words, while 1,806 are verbs and 768 are adjectives. We use the noun data to compare all six of the measures, since four of the measures are limited to nouns as input. The accuracy of disambiguation when performed using each of the measures for nouns is shown in Table 2.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="4" end_page="5" type="metho">
    <SectionTitle>
6 Gloss Vector Tuning
</SectionTitle>
    <Paragraph position="0"> As discussed in earlier sections, the Gloss Vector measure builds a word space consisting of first order context vectors corresponding to every word in a corpus. Gloss vectors are the resultant of a number of first order context vectors. All of these vectors encode semantic information about the concepts or the glosses that the vectors represent.</Paragraph>
    <Paragraph position="1"> We note that the quality of the words used as the dimensions of these vectors plays a pivotal role in  getting accurate relatedness scores. We find that words corresponding to very specific concepts and are highly indicative of a few topics, make good dimensions. Words that are very general in nature and that appear all over the place add noise to the vectors.</Paragraph>
    <Paragraph position="2"> In an earlier section we discussed using stop words and frequency cutoffs to keep only the high &amp;quot;information content&amp;quot; words. In addition to those, we also experimented with a term frequency * inverse document frequency cutoff.</Paragraph>
    <Paragraph position="3"> Term frequency and inverse document frequency are commonly used metrics in information retrieval. For a given word, term frequency (tf) is the number of times a word appears in the corpus.</Paragraph>
    <Paragraph position="4"> The document frequency is number of documents in which the word occurs. Inverse document frequency (idf) is then computed as idf = logNumber of DocumentsDocument Frequency (1) The tf * idf value is an indicator of the specificity of a word. The higher the tf * idf value, the lower the specificity.</Paragraph>
    <Paragraph position="5"> Figure 3 shows a plot of tf * idf cutoff on the x-axis against the correlation of the Gloss Vector measure with human judgments on the y-axis.</Paragraph>
    <Paragraph position="6">  The tf *idf values ranged from 0 to about 4200.</Paragraph>
    <Paragraph position="7"> Note that we get lower correlation as the cutoff is raised.</Paragraph>
  </Section>
  <Section position="8" start_page="5" end_page="6" type="metho">
    <SectionTitle>
7 Analysis
</SectionTitle>
    <Paragraph position="0"> We observe from the experimental results that the Gloss Vector measure corresponds the most with human judgment of relatedness (with a correlation of almost 0.9). We believe this is probably because the Gloss Vector measure most closely imitates the representation of concepts in the human mind. (Miller and Charles, 1991) suggest that the cognitive representation of a word is an abstraction derived from its contexts (encountered by the person). Their study also suggested the semantic similarity of two words depends on the overlap between their contextual representations. The Gloss Vector measure uses the contexts of the words and creates a vector representation of these. The overlap between these vector representations is used to compute the semantic similarity of concepts.</Paragraph>
    <Paragraph position="1"> (Landauer and Dumais, 1997) additionally perform singular value decomposition (SVD) on their context vector representation of words and they show that reducing the number of dimensions of the vectors using SVD more accurately simulates learning in humans. We plan to try SVD on the Gloss Vector measure in future work.</Paragraph>
    <Paragraph position="2"> In the application-oriented evaluation, the Gloss Vector measure performed relatively well (about 41% accuracy). However, unlike the human study, it did not outperform all the other measures. We think there are two possible explanations for this.</Paragraph>
    <Paragraph position="3"> First, the word pairs used in the human relatedness study are all nouns, and it is possible that the Gloss Vector measure performs better on nouns than on other parts of speech. In the application-oriented evaluation the measure had to make judgments for all parts of speech. Second, the application itself affects the performance of the measure. The Word Sense Disambiguation algorithm starts by selecting a context of 5 words from around the target word. These context words contain words from all parts of speech. Since the Jiang-Conrath measure assigns relatedness scores only to noun concepts, its behavior would differ from that of the Vector measure which would accept all words and would be affected by the noise introduced from unrelated concepts. Thus the context selection factors into the accuracy obtained. However, for evaluating the measure as being suitable for use in real applications, the Gloss Vector measure proves relatively accurate.</Paragraph>
    <Paragraph position="4"> The Gloss Vector measure can draw conclusions about any two concepts, irrespective of partof-speech. The only other measure that can make this same claim is the Extended Gloss Overlaps measure. We would argue that Gloss Vectors present certain advantages over it. The Extended  Gloss Overlap measure looks for exact string overlaps to measure relatedness. This &amp;quot;exactness&amp;quot; works against the measure, in that it misses potential matches that intuitively would contribute to the score (For example, silverware with spoon).</Paragraph>
    <Paragraph position="5"> The Gloss Vector measure is more robust than the Extended Gloss Overlap measure, in that exact matches are not required to identify relatedness.</Paragraph>
    <Paragraph position="6"> The Gloss Vector measure attempts to overcome this &amp;quot;exactness&amp;quot; by using vectors that capture the contextual representation of all words. So even though silverware and spoon do not overlap, their contextual representations would overlap to some extent.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML