File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0415_metho.xml

Size: 23,212 bytes

Last Modified: 2025-10-06 14:08:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0415">
  <Title>Using LSA and Noun Coordination Information to Improve the Precision and Recall of Automatic Hyponymy Extraction</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Improving Precision Using Latent
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Semantic Analysis
</SectionTitle>
      <Paragraph position="0"> Solving all of the problems with pattern-based hyponymy extraction that we describe above would require nearhuman-level language understanding, but we have applied a far simpler technique for filtering out many of the incorrect and spurious extracted relations with good results, using a variant of latent semantic analysis (LSA) (Deerwester et al., 1990; Baeza-Yates and Ribiero-Neto, 1999, p. 44). LSA is a method for representing words as points in a vector space, whereby words which are related in meaning should be represented by points which are near to one another. The LSA model we built is similar to that described in (Sch&amp;quot;utze, 1998). First 1000 frequent content words (i.e. not on the stoplist)5 were chosen as &amp;quot;content-bearing words&amp;quot;. Using these content-bearing words as column labels, the other words in the corpus were assigned row vectors by counting the number of times they occured within a 15-word context window of a content-bearing word. Singular-value decomposition (Deerwester et al., 1990) was then used to reduce the number of dimensions from 1000 to 100. Similarity between two vectors (points) was measured using the cosine of the angle between them, in the same way as the similarity between a query and a document is often measured  lations (of 513 extracted) to which each of the authors assigned the five available scores.</Paragraph>
      <Paragraph position="1"> in information retrieval (Baeza-Yates and Ribiero-Neto, 1999, p. 28). Effectively, we could use LSA to measure the extent to which two words x and y usually occur in similar contexts. This LSA similarity score will be called sim(x,y).</Paragraph>
      <Paragraph position="2"> Since we expect a hyponym and its hypernym to be semantically similar, we can use the LSA similarity between two terms as a test of the plausibility of a putative hyponymy relation between those terms. If their similarity is low, it is likely that they do not have a true and useful hyponymy relationship; the relation was probably extracted erroneously for one or more of the reasons listed above. If the similarity between two terms is high, we have increased confidence that a hyponymy relationship exists between them, because we know that they are at least in similar &amp;quot;semantic regions&amp;quot;.</Paragraph>
      <Paragraph position="3"> We ranked the 513 putative hyponym/hypernym pairs that we extracted from our trial excerpt of the BNC according to the similarity between the putative hypernym and the putative hyponym in each pair; i.e. for each pair x and y where the relationship y a60 x had been suggested, we calculated the cosine similarity sim(x,y), then we ranked the extracted relations from highest to lowest similarity. We then manually evaluated the accuracy of the top 100 extracted relations according to this ranking using the 5-point scale described in Section 2. We found that 58 of these 100 top-ranked relations received scores of 4 or 3 according to our &amp;quot;gold standard&amp;quot; annotations. Comparing this 58% precision with the 40% precision obtained on a random sample in Section 2, we determine that LSA achieved a 30% reduction in error (see Table 3 for a breakdown of annotation results by author).6 Thus LSA proved quite an effective filter. LSA provides broad-based semantic information learned statistically over many occurences of words; lexicosyntactic hyponymy extraction learns semantic information from specific phrases within a corpus. Thus we have benefitted from combining local patterns with statistical in6It should be noted that 24 of the top 100 hyponymy relations evaluated in this section were also in the randomly-chosen sample of 100 relations described in Section 2. Thus there were a total of 176 distinct hyponymy relations across both test sets. formation. Considered in analogy with the process by which humans learn from reading, we might think of the semantic information learned by LSA as background knowledge that is applied by the reader when determining what can accurately be gleaned from a particular sentence when it is read.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Improving Recall Using Coordination
</SectionTitle>
    <Paragraph position="0"> Information One of the main challenges facing hyponymy extraction is that comparatively few of the correct relations that might be found in text are expressed overtly by the simple lexicosyntactic patterns used in Section 2, as was apparent in the results presented in that section.</Paragraph>
    <Paragraph position="1"> This problem has been addressed by Caraballo (1999), who describes a system that first builds an unlabelled hierarchy of noun clusters using agglomerative bottom-up clustering of vectors of noun coordination information.</Paragraph>
    <Paragraph position="2"> The leaves of this hierarchy (corresponding to nouns) are assigned hypernyms using Hearst-style lexicosyntactic patterns. Internal nodes in the hierarchy are then labelled with hypernyms of the leaves they subsume according to a vote of these subsumed leaves.</Paragraph>
    <Paragraph position="3"> We proceed along similar lines, using noun coordination information and an alternative graph-based clustering method. We do not build a complete hierarchy, but our method nonetheless obtains additional hypernym-hyponym pairs not extracted by lexicosyntactic patterns. Our method is based on the following sort of inference.</Paragraph>
    <Paragraph position="4"> Consider the sentence This is not the case with sugar, honey, grape must, cloves and other spices which increase its merit. (BNC) which provides evidence that clove is a kind of spice.</Paragraph>
    <Paragraph position="5"> Given this, the sentence Ships laden with nutmeg or cinnamon, cloves or coriander once battled the Seven Seas to bring home their precious cargo. (BNC) might suggest that nutmeg, cinnamon, and coriander are also spices, because they appear to be similar to cloves. Thus we can learn the hyponymy relations nutmeg a60 spice, cinnamon a60 spice, and coriander a60 spice that are not directly attested by lexicosyntactic patterns in our training corpus.</Paragraph>
    <Paragraph position="6"> This kind of information from coordination patterns has been used for work in automatic lexical acquisition (Riloff and Shepherd, 1997; Roark and Charniak, 1998; Widdows and Dorow, 2002). The basic rationale behind these methods is that words that occur together in lists are usually semantically similar in some way: for example, the phrase y1, y2, and y3 suggests that there is some link between y1 and y2, etc. Performing this analysis on a whole corpus results in a data structure which holds a collection of nouns and observed noun-noun relationships. If we think of the nouns as nodes and the noun-noun relationships as edges, this data structure is a graph (Bollob'as, 1998), and combinatoric methods can be used to analyze its structure. Work using such techniques for lexical acquisition has proceeded by building classes of related words from a single &amp;quot;seed-word&amp;quot; with some desired property (such as being a representative of a paticular semantic class). For example, in order to extract a class of words referring to kinds of disease from a corpus, you start with a single seed-word such as typhoid, and then find other nouns that occur in lists with typhoid. Using the graph model described above, Widdows and Dorow (2002) developed a combinatoric algorithm for growing clusters from a single seed-word, and used these methods to find correct new members for chosen categories with an accuracy of over 80%.</Paragraph>
    <Paragraph position="7"> The idea that certain patterns can be identified using finite-state techniques and used as evidence for semantic relationships is the same as Hearst's (1992), but appears to be more effective for finding just similar words rather than hypernyms because there are many more instances of simple coordination patterns than of hypernymy patterns--in the lists we used to extract these relationships, we see much more cooccurence of words on the same ontological level than between words from different ontological levels. For example, in the BNC there are 211 instances of the phrase &amp;quot;fruit and vegetables&amp;quot; and 9 instances of &amp;quot;carrots and potatoes&amp;quot;, but no instances of &amp;quot;fruit and potatoes&amp;quot;, only 1 instance of &amp;quot;apples and vegetables&amp;quot;, and so on.</Paragraph>
    <Paragraph position="8"> This sort of approach should be ideal for improving the recall of automatic hyponymy extraction, by using the hyponym from each of the correct hypernym/hyponym pairs as a seed-word for the category represented by the hypernym--for example, from the relationship clove a60 spice, the word clove could be taken as a seed-word, with the assumption that words which frequently occur in co-ordination with clove are also names of spices.</Paragraph>
    <Paragraph position="9"> We used the algorithm of (Widdows and Dorow, 2002) on the British National Corpus to see if many more hyponymy relations would be extracted in this way. For each correct pair y a60 x where y was a single-word hyponym of x discovered by the lexicosyntactic patterns of Section 2, we collected the 10 words most similar to y according to this algorithm and tested to see if these neighbors were also hyponyms of x.</Paragraph>
    <Paragraph position="10"> Of the 176 extracted hyponyms that we evaluated by hand in the overlapping test sets described in Section 2 and Section 3, 95 were rated 4 or 3 on our 5-point scoring system (Section 2) by at least one of the authors. Considering these correct or nearly-correct relations in their hand-corrected form, we found that 45 of these 95 relations involved single-word hyponyms. (We restricted our attention to these 45 relations because the graph model was built using only single words as nodes in the graph.) This set of 45 correct hypernym &amp;quot;seed-pairs&amp;quot; was extended by another potential 459 pairs (slightly more than 10 for each seed-pair because if there was a tie for 10th place both neighbors were used). Of these, 211 (46%) were judged to be correct hypernym pairs and 248 (54%) were not.7 This accuracy compares favorably with the accuracy of 40% obtained for the raw hyponymy extraction experiments in Section 2, suggesting that inferring new relations by using corpus-based similarities to previously known relations is more reliable than trying to learn completely new relations even if they are directly attested in the corpus. However, our accuracy falls way short of the figure of 82% reported by Widdows and Dorow (2002).</Paragraph>
    <Paragraph position="11"> We believe this is because the classes in (Widdows and Dorow, 2002) are built from carefully selected seedexamples: ours are built from an uncontrolled sample of seed-examples extracted automatically from a corpus.</Paragraph>
    <Paragraph position="12"> We outline three cases where this causes a critical difference. null The ambiguity of &amp;quot;mass&amp;quot; One of the correct hyponymy relations extracted in our experiments in Section 2 was mass a60 religious service.</Paragraph>
    <Paragraph position="13"> Using mass as a seed suggested the following candidates as potential hyponyms of religious service:</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Seed Semantically Similar Words
</SectionTitle>
      <Paragraph position="0"> mass length weight angle shape depth height range charge size momentum All these neighbors are related to the &amp;quot;measurement of physical property&amp;quot; sense of the word mass rather than the &amp;quot;religious service&amp;quot; sense. The inferred hyponymy relations are all incorrect because of this mismatch.</Paragraph>
      <Paragraph position="1"> The specific properties of &amp;quot;nitrogen&amp;quot; Another true relation we extracted was nitrogen a60 nutrient. Using the same process as above gave the following neighbors of nitrogen:</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Seed Semantically Similar Words
</SectionTitle>
      <Paragraph position="0"> nitrogen methane dioxide carbon hydrogen methanol vapour ammonia oxide oxygen monoxide water These neighboring terms are not in general nutrients, and the attempt to infer new hyponymy relations is a fail7As before, we consider scores of 4 and 3 on our 5-point scale to be correct and lower scores to be incorrect. The precision of graph-model results (reported in this section and in Section 5), unlike those reported elsewhere, are based on the annotations of a single author.</Paragraph>
      <Paragraph position="1"> ure in this case. While the relationship nitrogen a60 nutrient is one of the many facts which go to make up the vast store of world-knowledge that an educated adult uses for reasoning, it is not a necessary property of nitrogen itself, and one could arguably &amp;quot;know&amp;quot; the meaning of nitrogen without being aware of this fact. In traditional lexicographic terms, the fact that nitrogen is a nutrient might be regarded as part of the differentiae rather than the genus of nitrogen. Had our seed-pair instead been nitrogen a60 gas or nitrogen a60 chemical element, many correct hyponymy relations would have been inferred by our method, and both of these classifications are central to the meaning of nitrogen.</Paragraph>
      <Paragraph position="2"> Accurate levels of abstraction for &amp;quot;dill&amp;quot; Finally, even when the hyponymy relationship y a60 x used as a seed-case was central to the meaning of y and all of the neighbors of y were related to this meaning, they were still not always hyponyms of x but sometimes members of a more general category. For example, using the correct seed-pair dill a60 herb we retrieved the following suggested hyponyms for herb:</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Seed Semantically Similar Words
</SectionTitle>
      <Paragraph position="0"> dill rind fennel seasoning juice sauce pepper parsley vinegar oil pur All of these items are related to dill, but only some of them are herbs. The other items should also be placed in the same general area of a taxonomy as dill, but as cooking ingredients rather than specifically herbs.</Paragraph>
      <Paragraph position="1"> In spite of these problems, the algorithm for improving recall by adding neighbors of the correct hyponyms worked reasonably well, obtaining 211 correct relationships from 45 seeds, an almost fivefold increase in recall, with an accuracy of 46%, which is better than that of our baseline pattern-matching hyponymy extractor.</Paragraph>
      <Paragraph position="2"> It is possible that using coordination (such as co-occurence in lists) as a measure of noun-noun similarity is well-adapted for this sort of work, because it mainly extracts &amp;quot;horizontal&amp;quot; relationships between items of similar specificity or similar generality. Continuing the geometric analogy, these mainly &amp;quot;horizontal&amp;quot; relationships might be expected to combine particularly well with seed examples of &amp;quot;vertical&amp;quot; relationships, i.e. hyponymy relationships. null</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Combining LSA and Coordination to
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Improve Precision and Recall
</SectionTitle>
      <Paragraph position="0"> Having used two separate techniques to improve precision and recall in isolation, it made sense to combine our methods to improve performance overall. This was accomplished by applying LSA filtering as described in Section 3 to the results obtained by extending our initial hypernym pairs with coordination patterns in Section 4.</Paragraph>
      <Paragraph position="1"> LSA filtering of extended results: phase I The first application of filtering to the additional hyponymy relations obtained using noun-cooccurrence was straightforward. We took the 459 potential hyponymy relationships obtained in Section 4. For each of the prospective hyponymsy of a given hypernym x, we computed the LSA similarity sim(x,y). We then considered only those potential hyponyms whose LSA similarity to the hypernym surpassed a certain threshhold. Using this technique with an experimentally determined threshhold of 0.15, we obtained a set of 260 hyponymy relations of which 166 were correct (64%, as opposed to the 46% correct in the unfiltered results). The LSA filtering had removed 154 incorrect relationships and only 45 correct ones, reducing the overall error rate by 33%.</Paragraph>
      <Paragraph position="2"> In particular, this technique removed all but one of the spurious religious service hyponyms which were obtained through inappropriate similarities with mass in the example in Section 4, though it was much less effective in filtering the neighbors of nitrogen and dill, as might be expected.</Paragraph>
      <Paragraph position="3"> LSA filtering of extended results: phase II For some of the hyponymy relations to which we applied our extension technique, the hypernym had multiple words.8 In some of these cases, it was clear that one of the words in the hypernym had a meaning more closely related to the original (correct) hyponym. For instance, in the mass a60 religious service relation, the word religious tells us more about the appropriate meaning of mass than does the word service. It thus seemed that, at least in certain cases, we might be able to get more traction in LSA filtering of potential additional hyponyms by first selecting a particular word from the hypernym as the &amp;quot;most important&amp;quot; and using that word rather than the entire hypernym for filtering.9 We thus applied a simple two-step algorithm to refine the filtering technique presented above:  term-vector was produced for the multiword hypernym by averaging the LSA vectors for the constituent words.</Paragraph>
      <Paragraph position="4"> This filtering technique, with an LSA-similarity threshhold of 0.15, resulted in the extraction of 35 correct and 25 incorrect relationships. In contrast, using LSA similarity with the whole expression rather than the most important word resulted in the extraction of 32 correct and 30 incorrect relationships for those hypernyms with multiple words. On the face of it, selecting only the most important part of the hypernym for comparison enabled us to obtain more correct and fewer incorrect relations, but it is also clear that by this stage in our experiments our sample of seed-relationships had become too small for these results to be statistically significant.</Paragraph>
      <Paragraph position="5"> However, the examples we considered did demonstrate another point--that LSA could help to determine which parts of a multiword expression were semantically relevant. For example, one of the seed-relationships was France a60 European Community member. Finding that sim(france,european) &gt; sim(france,community), we could infer that the adjective European was central to the meaning of the hyponym, whereas for the example wallflowers a60 hardy biennials the opposite conclusion, that hardy is an adjectival modifier which isn't central to the relationship, could be drawn. However, these conclusions could also be drawn by using established collocation extraction techniques (Manning and Sch&amp;quot;utze, 1999, Ch. 5) to find semantically significant multiword expressions. null</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Obtaining Canonical Forms for
Relations
</SectionTitle>
    <Paragraph position="0"> An important part of extracting semantic relations like those discussed in this paper is converting the terms in the extracted relations to a canonical form. In the case of our extracted hyponymy relations, such normalization consists of two steps: 1. Removing extraneous articles and qualifiers. Our extracted hyponyms and hypernyms were often in the form &amp;quot;another x&amp;quot;, &amp;quot;some x&amp;quot;, and so forth, where x is the hypernym or hyponym that we actually want to consider.</Paragraph>
    <Paragraph position="1"> 2. Converting nouns to their singular form. This is elementary morphological analysis, or a limited form of lemmatization.</Paragraph>
    <Paragraph position="2"> We performed the second of these steps using the morph morphological analysis software (Minnen et al., 2001).10 To perform the first step of removing modifiers, we implemented a Perl script to do the following: 10This software is freely available from http://www.cogs.susx.ac.uk/lab/nlp/carroll/morph.html. * Remove leading determiners from the beginning of the hypernym and from the beginning of the hyponym. null * Remove leading prepositions from the beginning of the hypernym. Doing this after removing leading determiners eliminates the common &amp;quot;those of&amp;quot; construction. null * Remove cardinal numbers from the hypernym and the hyponym.</Paragraph>
    <Paragraph position="3"> * Remove possessive prefixes from the hypernym and the hyponym.</Paragraph>
    <Paragraph position="4"> * Remove &amp;quot;set of&amp;quot; and &amp;quot;number of&amp;quot; from the hypernym and the hyponym. This ad hoc but reasonable procedure eliminates common troublesome constructions not covered by the above rules.</Paragraph>
    <Paragraph position="5"> * Remove leading adjectives from hypernyms, but not from hyponyms. In addition to removing &amp;quot;other&amp;quot;, this amounts to playing it safe. By removing leading adjectives we make potential hypernyms more general, and thus more likely to be a superset of their potential hyponym. While this removal sometimes makes the learned relationship less useful, it seldom makes it incorrect. We leave adjectives on hyponyms to make them more specific, and thus more likely to be a subset of their purported hypernym.</Paragraph>
    <Paragraph position="6"> Using these simple rules, we were able to convert 73 of the 78 relations orginally scored as 3 (see Section 2) to relations receiving a score of 4. This demonstrates as a &amp;quot;proof of concept&amp;quot; that comparatively simple language processing techniques can be used to map relationships from the surface forms in which they were observed in text to a canonical form which could be included in a semantic resource.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML