File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1014_metho.xml

Size: 25,798 bytes

Last Modified: 2025-10-06 14:10:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1014">
  <Title>Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance</Title>
  <Section position="4" start_page="105" end_page="108" type="metho">
    <SectionTitle>
2 Producing a Coarse-Grained Sense
</SectionTitle>
    <Paragraph position="0"> Inventory In this section, we present an approach to the automatic construction of a coarse-grained sense inventory based on the mapping of WordNet senses to coarse senses in the Oxford Dictionary of English. In section 2.1, we introduce the two dictionaries, in Section 2.2 we illustrate the creation of sense descriptions from both resources, while in Section 2.3 we describe a lexical and a semantic method for mapping sense descriptions of Word-Net senses to ODE coarse entries.</Paragraph>
    <Section position="1" start_page="105" end_page="105" type="sub_section">
      <SectionTitle>
2.1 The Dictionaries
</SectionTitle>
      <Paragraph position="0"> WordNet (Fellbaum, 1998) is a computational lexicon of English which encodes concepts as synonym sets (synsets), according to psycholinguistic principles. For each word sense, WordNet provides a gloss (i.e. a textual definition) and a set of relations such as hypernymy (e.g. apple kind-of edible fruit), meronymy (e.g. computer has-part CPU), etc.</Paragraph>
      <Paragraph position="1"> The Oxford Dictionary of English (ODE) (Soanes and Stevenson, 2003)1 provides a hierarchical structure of senses, distinguishing between homonymy (i.e. completely distinct senses, like race as a competition and race as a taxonomic group) and polysemy (e.g. race as a channel and as a current). Each polysemous sense is further divided into a core sense and a set of subsenses. For each sense (both core and subsenses), the ODE provides a textual definition, and possibly hypernyms and domain labels. Excluding monosemous senses, the ODE has an average number of 2.56 senses per word compared to the average polysemy of 3.21 in WordNet on the same words (with peaks for verbs of 2.73 and 3.75 senses, respectively). null In Table 1 we show an excerpt of the sense inventories of the noun race as provided by both dictionaries2. The ODE identifies 3 homonyms and 3 polysemous senses for the first homonym, while WordNet encodes a flat list of 6 senses, some of which strongly related (e.g. race#1 and  convention w#p#i where w is a word, p a part of speech and i is a sense number; analogously, we denote an ODE sense with the convention w#p#h:k where h is the homonym number and k is the k-th polysemous entry under homonym h. root) which is not taken into account in WordNet.</Paragraph>
      <Paragraph position="2"> The structure of the ODE senses is clearly hierarchical: if we were able to map with a high accuracy WordNet senses to ODE entries, then a sense clustering could be trivially induced from the mapping. As a result, the granularity of the WordNet inventory would be drastically reduced. Furthermore, disregarding errors, the clustering would be well-founded, as the ODE sense groupings were manually crafted by expert lexicographers. In the next section we illustrate a general way of constructing sense descriptions that we use for determining a complete, automatic mapping between the two dictionaries.</Paragraph>
    </Section>
    <Section position="2" start_page="105" end_page="106" type="sub_section">
      <SectionTitle>
2.2 Constructing Sense Descriptions
</SectionTitle>
      <Paragraph position="0"> For each word w, and for each sense S of w in a given dictionary D 2 fWORDNET; ODEg, we construct a sense description dD(S) as a bag of words:</Paragraph>
      <Paragraph position="2"> where: + def D(S) is the set of words in the textual definition of S (excluding usage examples), automatically lemmatized and part-of-speech tagged with the RASP statistical parser (Briscoe and Carroll, 2002); + hyperD(S) is the set of direct hypernyms of S in the taxonomy hierarchy of D (; if hypernymy is not available); + domainsD(S) includes the set of domain labels possibly assigned to sense S (; when no domain is assigned).</Paragraph>
      <Paragraph position="3"> Specifically, in the case of WordNet, we generate def WN(S) from the gloss of S, hyperWN(S) from the noun and verb taxonomy, and domainsWN(S) from the subject field codes, i.e. domain labels produced semi-automatically by Magnini and Cavagli`a (2000) for each Word-Net synset (we exclude the general-purpose label, called FACTOTUM).</Paragraph>
      <Paragraph position="4"> For example, for the first WordNet sense of race#n we obtain the following description:</Paragraph>
      <Paragraph position="6"> In the case of the ODE, def ODE(S) is generated from the definitions of the core sense and the subsenses of the entry S. Hypernymy (for nouns only) and domain labels, when available, are included in the respective sets hyperODE(S)  indicate a subsense in the ODE, arrows (!) indicate hypernymy, DOMAIN LABELS are in small caps). race#n (WordNet) #1 Any competition (! contest).</Paragraph>
      <Paragraph position="7"> #2 People who are believed to belong to the same genetic stock (! group).</Paragraph>
      <Paragraph position="8"> #3 A contest of speed (! contest).</Paragraph>
      <Paragraph position="9"> #4 The flow of air that is driven backwards by an aircraft propeller (! flow).</Paragraph>
      <Paragraph position="10"> #5 A taxonomic group that is a division of a species; usually arises as a consequence of geographical isolation within a species (! taxonomic group).</Paragraph>
      <Paragraph position="11"> #6 A canal for a current of water (! canal).</Paragraph>
      <Paragraph position="12"> race#n (ODE) #1.1 Core: SPORT A competition between runners, horses, vehicles, etc. + RACING A series of such competitions for horses or dogs + A situation in which individuals or groups compete (! contest) + AS-TRONOMY The course of the sun or moon through the heavens (! trajectory).</Paragraph>
      <Paragraph position="13"> #1.2 Core: NAUTICAL A strong or rapid current (! flow). #1.3 Core: A groove, channel, or passage. + MECHANICS A water channel + Smooth groove or guide for balls (! indentation, conduit) + FARMING Fenced passageway in a stockyard (! route) + TEXTILES The channel along which the shuttle moves. #2.1 Core: ANTHROPOLOGY Division of humankind (! ethnic group). + The condition of belonging to a racial division or group + A group of people sharing the same culture, history, language + BIOLOGY A group of people descended from a common ancestor. #3.1 Core: BOTANY, FOOD A ginger root (! plant part). and domainsODE(S). For example, the first ODE sense of race#n is described as follows:</Paragraph>
      <Paragraph position="15"> Notice that, for every S, dD(S) is non-empty as a definition is always provided by both dictionaries. This approach to sense descriptions is general enough to be applicable to any other dictionary with similar characteristics (e.g. the Longman Dictionary of Contemporary English in place of ODE).</Paragraph>
    </Section>
    <Section position="3" start_page="106" end_page="108" type="sub_section">
      <SectionTitle>
2.3 Mapping Word Senses
</SectionTitle>
      <Paragraph position="0"> In order to produce a coarse-grained version of the WordNet inventory, we aim at defining an automatic mapping between WordNet and ODE, i.e.</Paragraph>
      <Paragraph position="1"> a function ,, : SensesWN ! SensesODE [ f+g, where SensesD is the set of senses in the dictionary D and + is a special element assigned when no plausible option is available for mapping (e.g.</Paragraph>
      <Paragraph position="2"> when the ODE encodes no entry corresponding to a WordNet sense).</Paragraph>
      <Paragraph position="3"> Given a WordNet sense S 2 SensesWN(w) we define ^m(S), the best matching sense in the ODE, as:</Paragraph>
      <Paragraph position="5"> is a function that measures the degree of matching between the sense descriptions of S and S0. We define the mapping ,, as:</Paragraph>
      <Paragraph position="7"> where is a threshold below which a matching between sense descriptions is considered unreliable. Finally, we define the clustering of senses c(w) of a word w as:</Paragraph>
      <Paragraph position="9"> where ,,!1(S0) is the group of WordNet senses mapped to the same sense S0 of the ODE, while the second set includes singletons of WordNet senses for which no mapping can be provided according to the definition of ,,.</Paragraph>
      <Paragraph position="10"> For example, an ideal mapping between entries in Table 1 would be as follows:</Paragraph>
      <Paragraph position="12"> resulting in the following clustering:</Paragraph>
      <Paragraph position="14"> In Sections 2.3.1 and 2.3.2 we describe two different choices for the match function, respectively based on the use of lexical and semantic information. null  As a first approach, we adopted a purely lexical matching function based on the notion of lexical overlap (Lesk, 1986). The function counts the number of lemmas that two sense descriptions of a word have in common (we neglect parts of speech), and is normalized by the minimum of the two description lengths:</Paragraph>
      <Paragraph position="16"> where S 2 SensesWN(w) and S0 2 SensesODE(w). For instance:</Paragraph>
      <Paragraph position="18"> Notice that unrelated senses can get a positive score because of an overlap of the sense descriptions. In the example, group#n, the hypernym of race#n#2, is also present in the definition of race#n#1:1.</Paragraph>
      <Paragraph position="19">  Unfortunately, the very same concept can be defined with entirely different words. To match definitions in a semantic manner we adopted a knowledge-based Word Sense Disambiguation algorithm, Structural Semantic Interconnections (SSI, Navigli and Velardi (2004)).</Paragraph>
      <Paragraph position="20"> SSI3 exploits an extensive lexical knowledge base, built upon the WordNet lexicon and enriched with collocation information representing semantic relatedness between sense pairs. Collocations are acquired from existing resources (like the Oxford Collocations, the Longman Language Activator, collocation web sites, etc.). Each collocation is mapped to the WordNet sense inventory in a semi-automatic manner and transformed into a relatedness edge (Navigli and Velardi, 2005).</Paragraph>
      <Paragraph position="21"> Given a word context C = fw1;:::;wng, SSI builds a graph G = (V;E) such that V =</Paragraph>
      <Paragraph position="23"> SensesWN(wi) and (S;S0) 2 E if there is at least one semantic interconnection between S and S0 in the lexical knowledge base. A semantic interconnection pattern is a relevant sequence of edges selected according to a manually-created context-free grammar, i.e. a path connecting a pair of word senses, possibly including a number of intermediate concepts. The grammar consists of a small number of rules, inspired by the notion of lexical chains (Morris and Hirst, 1991).</Paragraph>
      <Paragraph position="24"> SSI performs disambiguation in an iterative fashion, by maintaining a set C of senses as a semantic context. Initially, C = V (the entire set of senses of words in C). At each step, for each sense S in C, the algorithm calculates a score of the degree of connectivity between S and the other senses in C:</Paragraph>
      <Paragraph position="26"> where IC(S;S0) is the set of interconnections between senses S and S0. The contribution of a single interconnection is given by the reciprocal of its length, calculated as the number of edges connecting its ends. The overall degree of connectivity is then normalized by the number of contributing interconnections. The highest ranking sense S of word w is chosen and the senses of w are removed from the semantic context C. The algorithm terminates when either C = ; or there is no sense such that its score exceeds a fixed threshold.</Paragraph>
      <Paragraph position="27"> Given a word w, semantic matching is performed in two steps. First, for each dictionary D 2 fWORDNET; ODEg, and for each sense S 2 SensesD(w), the sense description of S is disambiguated by applying SSI to dD(S). As a result, we obtain a semantic description as a bag of concepts dsemD (S). Notice that sense descriptions from both dictionaries are disambiguated with respect to the WordNet sense inventory.</Paragraph>
      <Paragraph position="28"> Second, given a WordNet sense S 2 SensesWN(w) and an ODE sense S0 2 SensesODE(w), we define matchSSI(S;S0) as a function of the direct relations connecting senses in dsemWN (S) and dsemODE (S0):</Paragraph>
      <Paragraph position="30"> where c ! c0 denotes the existence of a relation edge in the lexical knowledge base between a concept c in the description of S and a concept c0 in the description of S0. Edges include the WordNet relation set (synonymy, hypernymy, meronymy, antonymy, similarity, nominalization, etc.) and the relatedness edge mentioned above (we adopt only direct relations to maintain a high precision).</Paragraph>
      <Paragraph position="31"> For example, some of the relations found between concepts in dsemWN (race#n#3) and</Paragraph>
      <Paragraph position="33"> contributing to the final value of the function on the two senses:</Paragraph>
      <Paragraph position="35"> Due to the normalization factor in the denominator, these values are generally low, but unrelated  senses have values much closer to 0. We chose SSI for the semantic matching function as it has the best performance among untrained systems on unconstrained WSD (cf. Section 4.1).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="108" end_page="109" type="metho">
    <SectionTitle>
3 Evaluating the Clustering
</SectionTitle>
    <Paragraph position="0"> We evaluated the accuracy of the mapping produced with the lexical and semantic methods described in Sections 2.3.1 and 2.3.2, respectively.</Paragraph>
    <Paragraph position="1"> We produced a gold-standard data set by manually mapping 5,077 WordNet senses of 763 randomly-selected words to the respective ODE entries (distributed as follows: 466 nouns, 231 verbs, 50 adjectives, 16 adverbs). The data set was created by two annotators and included only polysemous words. These words had 2,600 senses in the ODE.</Paragraph>
    <Paragraph position="2"> Overall, 4,599 out of the 5,077 WordNet senses had a corresponding sense in ODE (i.e. the ODE covered 90:58% of the WordNet senses in the data set), while 2,053 out of the 2,600 ODE senses had an analogous entry in WordNet (i.e. WordNet covered 78:69% of the ODE senses). The WordNet clustering induced by the manual mapping was 49.85% of the original size and the average degree of polysemy decreased from 6:65 to 3:32.</Paragraph>
    <Paragraph position="3"> The reliability of our data set is substantiated by a quantitative assessment: 548 WordNet senses of 60 words were mapped to ODE entries by both annotators, with a pairwise mapping agreement of 92:7%. The average Cohen's * agreement between the two annotators was 0:874.</Paragraph>
    <Paragraph position="4"> In Table 2 we report the precision and recall of the lexical and semantic functions in providing the appropriate association for the set of senses having a corresponding entry in ODE (i.e. excluding the cases where a sense + was assigned by the manual annotators, cf. Section 2.3). We also report in the Table the accuracy of the two functions when we view the problem as a classification task: an automatic association is correct if it corresponds to the manual association provided by the annotators or if both assign no answer (equivalently, if both provide an + label). All the differences between Lesk and SSI are statistically significant (p &lt; 0:01).</Paragraph>
    <Paragraph position="5"> As a second experiment, we used two information-theoretic measures, namely entropy and purity (Zhao and Karypis, 2004), to compare an automatic clustering c(w) (i.e. the sense groups acquired for word w) with a manual clustering ^c(w). The entropy quantifies the distribution of the senses of a group over manually-defined groups, while the purity measures the extent to which a group contains senses primarily from one manual group.</Paragraph>
    <Paragraph position="6"> Given a word w, and a sense group G 2 c(w), the entropy of G is defined as:</Paragraph>
    <Paragraph position="8"> i.e., the entropy4 of the distribution of senses of group G over the groups of the manual clustering ^c(w). The entropy of an entire clustering c(w) is defined as:</Paragraph>
    <Paragraph position="10"> that is, the entropy of each group weighted by its size. The purity of a sense group G 2 c(w) is defined as:</Paragraph>
    <Paragraph position="12"> i.e., the normalized size of the largest subset of G contained in a single group ^G of the manual clustering. The overall purity of a clustering is obtained as a weighted sum of the individual cluster purities:</Paragraph>
    <Paragraph position="14"> We calculated the entropy and purity of the clustering produced automatically with the lexical and the semantic method, when compared to the grouping induced by our manual mapping (ODE), and to the grouping manually produced for the English all-words task at Senseval-2 (3,499 senses of 403 nouns). We excluded from both gold standards words having a single cluster. The figures are shown in Table 3 (good entropy and purity values should be close to 0 and 1 respectively).</Paragraph>
    <Paragraph position="15"> Table 3 shows that the quality of the clustering induced with a semantic function outperforms both lexical overlap and a random baseline. The baseline was computed averaging among 200 random clustering solutions for each word. Random 4Notice that we are comparing clusterings against the manual clustering (rather than viceversa), as otherwise a completely unclustered solution would result in 1.0 entropy and 0.0 purity.</Paragraph>
    <Paragraph position="16">  clusterings were the result of a random mapping function between WordNet and ODE senses. As expected, the automatic clusterings have a lower purity when compared to the Senseval-2 noun grouping as the granularity of the latter is much finer than ODE (entropy is only partially affected by this difference, indicating that we are producing larger groups). Indeed, our gold standard (ODE), when compared to the Senseval groupings, obtains a low purity as well (0:75) and an entropy of 0:13.</Paragraph>
  </Section>
  <Section position="6" start_page="109" end_page="110" type="metho">
    <SectionTitle>
4 Evaluating Coarse-Grained WSD
</SectionTitle>
    <Paragraph position="0"> The main reason for building a clustering of Word-Net senses is to make Word Sense Disambiguation a feasible task, thus overcoming the obstacles that even humans encounter when annotating sentences with excessively fine-grained word senses.</Paragraph>
    <Paragraph position="1"> As the semantic method outperformed the lexical overlap in the evaluations of previous Section, we decided to acquire a clustering on the entire WordNet sense inventory using this approach. As a result, we obtained a reduction of 33.54% in the number of entries (from 60,302 to 40,079 senses) and a decrease of the polysemy degree from 3:14 to 2:09. These figures exclude monosemous senses and derivatives in WordNet.</Paragraph>
    <Paragraph position="2"> As we are experimenting on an automaticallyacquired clustering, all the figures are affected by the 22.06% error rate resulting from Table 2.</Paragraph>
    <Section position="1" start_page="109" end_page="110" type="sub_section">
      <SectionTitle>
4.1 Experiments on Senseval-3
</SectionTitle>
      <Paragraph position="0"> As a first experiment, we assessed the effect of the automatic sense clustering on the English all-words task at Senseval-3 (Snyder and Palmer, 2004). This task required WSD systems to provide a sense choice for 2,081 content words in a set of 301 sentences from the fiction, news story, and editorial domains.</Paragraph>
      <Paragraph position="1"> We considered the three best-ranking WSD sys- null vised system, namely IRST-DDD (Strapparava et al., 2004). We also included SSI as it outperforms all the untrained systems (Navigli and Velardi, 2005). To evaluate the performance of the five systems on our coarse clustering, we considered a fine-grained answer to be correct if it belongs to the same cluster as that of the correct answer. Table 4 reports the performance of the systems, together with the first sense and the random baseline (in the last column we report the performance on the original fine-grained test set).</Paragraph>
      <Paragraph position="2"> The best system, Gambl, obtains almost 78% precision and recall, an interesting figure compared to 65% performance in the fine-grained WSD task. An interesting aspect is that the ranking across systems was maintained when moving from a fine-grained to a coarse-grained sense inventory, although two systems (SSI and IRST-DDD) show the best improvement.</Paragraph>
      <Paragraph position="3"> In order to show that the general improvement is the result of an appropriate clustering, we assessed the performance of Gambl by averaging its results when using 100 randomly-generated different clusterings. We excluded monosemous clusters from the test set (i.e. words with all the senses mapped to the same ODE entry), so as to clarify the real impact of properly grouped clusters. As a result, the random setting obtained 64:56% average accuracy, while the performance when adopting our automatic clustering was 70:84% (1,025/1,447 items).</Paragraph>
      <Paragraph position="4"> To make it clear that the performance improvement is not only due to polysemy reduction, we considered a subset of the Senseval-3 test set including only the incorrect answers given by the fine-grained version of Gambl (623 items). In other words, on this data set Gambl performs with 0% accuracy. We compared the performance of  Gambl when adopting our automatic clustering with the accuracy of the random baseline. The results were respectively 34% and 15.32% accuracy. These experiments prove that the performance in Table 4 is not due to chance, but to an effective way of clustering word senses. Furthermore, the systems in the Table are not taking advantage of the information given by the clustering (trained systems could be retrained on the coarse clustering). To assess this aspect, we performed a further experiment. We modified the sense inventory of the SSI lexical knowledge base by adopting the coarse inventory acquired automatically. To this end, we merged the semantic interconnections belonging to the same cluster. We also disabled the first sense baseline heuristic, that most of the systems use as a back-off when they have no information about the word at hand. We call this new setting SSI/ (as opposed to SSI used in Table 4).</Paragraph>
      <Paragraph position="5"> In Table 5 we report the results. The algorithm obtains an improvement of 9.8% recall and 3.1% precision (both statistically significant, p &lt; 0:05). The increase in recall is mostly due to the fact that different senses belonging to the same cluster now contribute together to the choice of that cluster (rather than individually to the choice of a fine-grained sense).</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="110" end_page="110" type="metho">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> Dolan (1994) describes a method for clustering word senses with the use of information provided in the electronic version of LDOCE (textual definitions, semantic relations, domain labels, etc.). Unfortunately, the approach is not described in detail and no evaluation is provided.</Paragraph>
    <Paragraph position="1"> Most of the approaches in the literature make use of the WordNet structure to cluster its senses.</Paragraph>
    <Paragraph position="2"> Peters et al. (1998) exploit specific patterns in the WordNet hierarchy (e.g. sisters, autohyponymy, twins, etc.) to group word senses. They study semantic regularities or generalizations obtained and analyze the effect of clustering on the compatibility of language-specific wordnets. Mihalcea and Moldovan (2001) study the structure of WordNet for the identification of sense regularities: to this end, they provide a set of semantic and probabilistic rules. An evaluation of the heuristics provided leads to a polysemy reduction of 39% and an error rate of 5.6%. A different principle for clustering WordNet senses, based on the Minimum Description Length, is described by Tomuro (2001). The clustering is evaluated against WordNet cousins and used for the study of inter-annotator disagreement. Another approach exploits the (dis)agreements of human annotators to derive coarse-grained sense clusters (Chklovski and Mihalcea, 2003), where sense similarity is computed from confusion matrices.</Paragraph>
    <Paragraph position="3"> Agirre and Lopez (2003) analyze a set of methods to cluster WordNet senses based on the use of confusion matrices from the results of WSD systems, translation equivalences, and topic signatures (word co-occurrences extracted from the web). They assess the acquired clusterings against 20 words from the Senseval-2 sense groupings.</Paragraph>
    <Paragraph position="4"> Finally, McCarthy (2006) proposes the use of ranked lists, based on distributionally nearest neighbours, to relate word senses. This softer notion of sense relatedness allows to adopt the most appropriate granularity for a specific application.</Paragraph>
    <Paragraph position="5"> Compared to our approach, most of these methods do not evaluate the clustering produced with respect to a gold-standard clustering. Indeed, such an evaluation would be difficult and time-consuming without a coarse sense inventory like that of ODE. A limited assessment of coarse WSD is performed by Fellbaum et al. (2001), who obtain a large improvement in the accuracy of a maximum-entropy system on clustered verbs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML