File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1036_metho.xml

Size: 15,258 bytes

Last Modified: 2025-10-06 14:08:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1036">
  <Title>Feature Vector Quality and Distributional Similarity</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Empirical Analysis of Lin98 and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Vector Quality Measure
</SectionTitle>
      <Paragraph position="0"> To gain better understanding of distributional similarity we first analyzed the empirical behavior of Lin98, as a representative case for state of the art (see Section 5.1 for corpus details).</Paragraph>
      <Paragraph position="1"> As mentioned in the Introduction, distributional similarity may not correspond very tightly to meaning entailing substitutability. Under this judgment criterion two main types of errors occur: (1) word pairs that are of similar semantic types, but are not substitutable, like firm and government; and (2) word pairs that are of different semantic types, like firm and contract, which might (or might not) be related only at a topical level. Table 1 shows the top most similar words for the target word country according to Lin98a3 The two error types are easily recognized, e.g. world and city for the first type, and economy for the second.</Paragraph>
      <Paragraph position="2"> A deeper look at the word feature vectors reveals typical reasons for such errors. In many cases, high ranking features in a word vector, when sorting the features by their weight, do not seem very characteristic for the word meaning. This is demonstrated in Table 2, which shows the top-10 features in the vector of country. As can be seen, some of the top features are either too specific (landlocked, airspace), and so are less reliable, or too general (destination, ambition), and hence not indicative and may co-occur with many different types of words. On the other hand, more characteristic features, like population and governor, occur further down the list, at positions 461 and 832.</Paragraph>
      <Paragraph position="3"> Overall, features that characterize well the word meaning are scattered across the ranked list, while many non-indicative features receive high weights.</Paragraph>
      <Paragraph position="4"> This may yield high similarity scores for less similar word pairs, while missing other correct similarities. null An objective indication of the problematic feature ranking is revealed by examining the common features that contribute mostly to the similarity score of a pair of similar words. We look at the common features of the two words and sort them by the sum of their weights in the two word vectors (which is the enumerator of Lin's sim formula in Section 2.1). Table 3 shows the top-10 common features for a pair of substitutable words (country state) and non-substitutable words (country - economy). In both cases the common features are scattered across each feature vector, making it difficult to distinguish between similar and non-similar word pairs.</Paragraph>
      <Paragraph position="5"> We suggest that the desired behavior of feature ranking is that the common features of truly similar words will be concentrated at the top ranks of their vectors. The common features for non-similar words are expected to be scattered all across each of the vectors. More formally, given a pair of similar words (judged as substitutable) w and v we define the top joint feature rank criterion for evaluating feature vector quality:  where rank(w,f) is the feature's position in the sorted vector of the word w, and n is the number of top joint features to consider (top-n), when sorted by the sum of their weights in the two word vectors. We thus expect that a good weighting function would yield (on average) a low top-rank score for truly similar words.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Relative Feature Focus (RFF)
</SectionTitle>
    <Paragraph position="0"> Motivated by the observations above we propose a new feature weight function, called relative feature focus (RFF). The basic idea is to promote features which characterize many words that are highly similar to w. These features are considered as having a strong &amp;quot;focus&amp;quot; around w's meaning. Features which do not characterize sufficiently many words that are sufficiently similar to w are demoted. Even if such features happen to have a strong direct association with w they are not considered reliable, as they do not have sufficient statistical mass in w's semantic vicinity.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 RFF Definition
</SectionTitle>
      <Paragraph position="0"> RFF is defined as follows. First, a standard word similarity measure sim is computed to obtain initial approximation of the similarity space (Lin98 was used in this work). Then, we define the word set of a feature f, denoted by WS(f), as the set of words for which f is an active feature. The semantic neighborhood of w, denoted by N(w), is defined as the set of all words v which are considered sufficiently similar to w, satisfying sim(w,v)&gt;s where s is a threshold (0.04 in our experiments). RFF is then defined by: [?] [?][?]= ),(),( )()( vwsimfwRFF wNfWSv .</Paragraph>
      <Paragraph position="1"> That is, we identify all words v that are in the semantic neighborhood of w and are also characterized by f and sum their similarities to w.</Paragraph>
      <Paragraph position="2"> Notice that RFF is a sum of word similarity values rather than being a direct function of word-feature association values (which is the more common approach). It thus does not depend on the exact co-occurrence level between w and f. Instead, it depends on a more global assessment of the association between f and the semantic vicinity of w.</Paragraph>
      <Paragraph position="3"> Unlike the entropy measure, used in (Grefenstette, 1994), our &amp;quot;focused&amp;quot; global view is relative to each individual word w and is not a global independent function of the feature.</Paragraph>
      <Paragraph position="4"> We notice that summing the above similarity values captures simultaneously a desired balance between feature specificity and generality, addressing the observations in Section 3. Some features might characterize just a single word that is very similar to w. But then the sum of similarities will include a single element, yielding a relatively low weight.1 General features may characterize more words within N(f), but then on average the similarity with w over multiple words is likely to become lower, contributing smaller values to the sum. A reliable feature has to characterize multiple words (not too specific) that are highly similar to w (not too general).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Re-computing Similarities
</SectionTitle>
      <Paragraph position="0"> Once RFF weights have been computed they are sufficiently accurate to allow for aggressive feature reduction. In our experiments it sufficed to use only the top 100 features for each word in order to obtain optimal results, since the most informative features now have the highest weights. Similarity between words is then recomputed over the reduced vectors using Lin's sim function (in Section 2.1), with RFF replacing MI as the new weight function.</Paragraph>
      <Paragraph position="1"> 1 This is why the sum of similarities is used rather than an average value, which might become too high by chance when computed over just a single element (or very few elements).</Paragraph>
      <Paragraph position="2">  words by the RFF / Lin98 methods.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Experimental Setting
</SectionTitle>
      <Paragraph position="0"> The performance of the RFF-based similarity measure was evaluated for a sample of nouns and compared with that of Lin98. The experiment was conducted using an 18 million tokens subset of the Reuters RCV1 corpus,2 parsed by Lin's Minipar dependency parser (Lin, 1993). We considered first an evaluation based on WordNet data as a gold standard, as in (Lin, 1998; Weeds and Weir, 2003).</Paragraph>
      <Paragraph position="1"> However, we found that many word pairs from the Reuters Corpus that are clearly substitutable are not linked appropriately in WordNet.</Paragraph>
      <Paragraph position="2"> We therefore conducted a manual evaluation based on the judgments of two human subjects.</Paragraph>
      <Paragraph position="3"> The judgment criterion follows common evaluations of paraphrase acquisition (Lin and Pantel, 2001), (Barzilay and McKeown, 2001), and corresponds to the meaning-entailing substitutability criterion discussed in Section 1. Two words are judged as substitutable (correct similarity) if there are some contexts in which one of the words can be substituted by the other, such that the meaning of the original word can be inferred from the new one.</Paragraph>
      <Paragraph position="4"> Typically substitutability corresponds to certain ontological relations. Synonyms are substitutable in both directions. For example, worker and employee entail each other's meanings, as in the context &amp;quot;high salaried worker/employee&amp;quot;. Hyponyms typically entail their hypernyms. For example, dog entails animal, as in &amp;quot;I have a dog&amp;quot; which entails &amp;quot;I have an animal&amp;quot; (but not vice versa). In some cases part-whole and member-set relations satisfy the meaning-entailing substitutability criterion. For example, a discussion of division entails in many contexts the meaning of company. Similarly, the plural form of employee(s) often entails the meaning of staff. On the other hand, non-synonymous words that share a common hypernym (cohyponyms) like company and government, or country and city, are not substitutable since they always refer to different meanings (such as different entities).</Paragraph>
      <Paragraph position="5"> Our test set included a sample of 30 randomly selected nouns whose corpus frequency is above 2 Known as Reuters Corpus, Volume 1, English Language, 1996-08-20 to 1997-08-19.</Paragraph>
      <Paragraph position="6"> 500. For each noun we computed the top 40 most similar words by both similarity measures, yielding a total set of about 1600 (different) suggested word similarity pairs. Two independent assessors were assigned, each judging half of the test set (800 pairs). The output pairs from both methods were mixed so the assessor could not relate a pair with the method that suggested it.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Similarity Results
</SectionTitle>
      <Paragraph position="0"> The evaluation results are displayed in Table 4.</Paragraph>
      <Paragraph position="1"> As can be seen RFF outperformed Lin98 by 9-10 percentage points of precision at all Top-N levels, by both judges. Overall, RFF extracted 111 (21%) more correct similarity pairs than Lin98. The overall relative recall3 of RFF is quite high (89%), exceeding Lin98 by 16% (73%). These figures indicate that our method covers most of the correct similarities found by Lin98, while identifying many additional correct pairs.</Paragraph>
      <Paragraph position="2"> We note that the obtained precision values for both judges are very close at all table rows. To further assess human agreement level for this task the first author of this paper judged two samples of 100 word pairs each, which were selected randomly from the two test sets of the original judges. The proportions of matching decisions between the author's judgments and the original ones were 91.3% (with Judge 1) and 88.9% (with Judge 2).</Paragraph>
      <Paragraph position="3"> The corresponding Kappa values are 0.83 (&amp;quot;very good agreement&amp;quot;) and 0.75 (&amp;quot;good agreement&amp;quot;). As for feature reduction, vector sizes were reduced on average to about one third of their original size in the Lin98 method (recall that standard feature reduction, tuned for the corpus, was already applied to the Lin98 vectors).</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Relative recall shows the percentage of correct word
</SectionTitle>
    <Paragraph position="0"> similarities found by each method relative to the joint set of similarities that were extracted by both methods.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Empirical Observations for RFF
</SectionTitle>
      <Paragraph position="0"> We now demonstrate the typical behavior of RFF relative to the observations and motivations of Section 3 (through the same example).</Paragraph>
      <Paragraph position="1"> Table 5 shows the top-10 features of country.</Paragraph>
      <Paragraph position="2"> We observe (subjectively) that the list now contains quite indicative and reliable features, where too specific (anecdotal) and too general features were demoted (compare with Table 2).</Paragraph>
      <Paragraph position="3"> More objectively, Table 6 shows that most of the top-10 common features for country-state are now ranked highly for both words. On the other hand, there are only two common features for the incorrect pair country-economy, both with quite low ranks (compare with Table 3). Overall, given the set of all the correct (judged as substitutable) word similarities produced by both methods, the average top joint feature rank of the top-10 common features by RFF is 21, satisfying the desired behavior which was suggested in Section 3. The same figure is much larger for the Lin98 vectors, which have an average top joint feature rank of 105.</Paragraph>
      <Paragraph position="4"> Consequently, Table 7 shows a substantial improvement in the similarity list for country, where most incorrect words, like economy and company, disappeared. Instead, additional correct similarities, like kingdom and land, were promoted (compare with Table 1). Some semantically related but non-substitutable words, like &amp;quot;world&amp;quot; and &amp;quot;city&amp;quot;, still remain in the list, but somewhat demoted. In this case all errors correspond to quite close semantic relatedness, being geographic concepts.</Paragraph>
      <Paragraph position="5"> The remaining errors are mostly of the first type discussed in Section 3 - pairs of words that are ontologically or thematically related but are not substitutable. Typical examples are co-hyponyms (country - city) or agent-patient and agent-action pairs (industry - product, worker - job). Usually, such word pairs also have highly ranked common features since they naturally appear with similar characteristic features. It may therefore be difficult to filter out such non-substitutable similarities solely by the standard distributional similarity scheme, suggesting that additional mechanisms are required.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML