File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2501_intro.xml

Size: 3,043 bytes

Last Modified: 2025-10-06 14:04:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2501">
  <Title>Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts</Title>
  <Section position="3" start_page="1" end_page="1" type="intro">
    <SectionTitle>
2 Second Order Context Vectors
</SectionTitle>
    <Paragraph position="0"> Context vectors are widely used in Information Retrieval and Natural Language Processing. Most often they represent first order co-occurrences, which are simply words that occur near each other in a corpus of text. For example, police and car are likely first order co-occurrences since they commonly occur together. A first order context vector for a given word would simply indicate all the first order co-occurrences of that word as found in a corpus.</Paragraph>
    <Paragraph position="1"> However, our Gloss Vector measure is based on second order co-occurrences (Sch&amp;quot;utze, 1998). For example, if car and mechanic are first order cooccurrences, then mechanic and police would be second order co-occurrences since they are both first order co-occurrences of car.</Paragraph>
    <Paragraph position="2"> Sch&amp;quot;utze's method starts by creating a Word Space, which is a co-occurrence matrix where each row can be viewed as a first order context vector. Each cell in this matrix represents the frequency with which two words occur near one another in a corpus of text. The Word Space is usually quite large and sparse, since there are many words in the corpus and most of them don't occur near each other. In order to reduce the dimensionality and the amount of noise, non-content stop words such as the, for, a, etc. are excluded from being rows or columns in the Word Space.</Paragraph>
    <Paragraph position="3"> Given a Word Space, a context can then be represented by second order co-occurrences (context vector). This is done by finding the resultant of the first order context vectors corresponding to each of the words in that context. If a word in a context does not have a first order context vector created for it, or if it is a stop word, then it is excluded from the resultant.</Paragraph>
    <Paragraph position="4"> For example, suppose we have the following context: The paintings were displayed in the art gallery.</Paragraph>
    <Paragraph position="5"> The second order context vector would be the resultant of the first order context vectors for painting, display, art, and gallery. The words were, in, and the are excluded from the resultant since we consider them as stop words in this example. Figure 1 shows how the second order context vector might be visualized in a 2-dimensional space.</Paragraph>
    <Paragraph position="6">  Intuitively, the orientation of each second order context vector is an indicator of the domains or topics (such as biology or baseball) that the context is associated with. Two context vectors that lie close together indicate a considerable contextual overlap, which suggests that they are pertaining to the same meaning of the target word.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML