File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2503_metho.xml
Size: 14,772 bytes
Last Modified: 2025-10-06 14:10:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2503"> <Title>Relating WordNet Senses for Word Sense Disambiguation</Title> <Section position="5" start_page="18" end_page="19" type="metho"> <SectionTitle> 3 Methods for producing RLISTs </SectionTitle> <Paragraph position="0"> JCN This is a measure from the WordNet similarity package (Patwardhan and Pedersen, 2003) originally proposed as a distance measure (Jiang and Conrath, 1997). JCN uses corpus data to populate classes (synsets) in the WordNet hierarchy withfrequency counts. Eachsynset isincremented with the frequency counts from the corpus of all words belonging to that synset, directly or via the hyponymy relation. The frequency data is used to calculate the &quot;information content&quot; (IC) of a class (IC(s) = [?]log(p(s))) and with this, Jiang and Conrath specify a distance measure:</Paragraph> <Paragraph position="2"> where the third class (s3) is the most informative, or most specific, superordinate synset of the two senses s1 and s2. This is transformed from a distance measure in the WN-Similarity package by taking the reciprocal:</Paragraph> <Paragraph position="4"> We use raw BNC data for calculating IC values.</Paragraph> <Paragraph position="5"> DIST We use a distributional similarity measure (Lin, 1998) to obtain a fixed number (50) of the top ranked nearest neighbours for the target nouns. For input we used grammatical relation data extracted using an automatic parser (Briscoe and Carroll, 2002). We used the 90 million words of written English from the British National Corpus (BNC) (Leech, 1992). For each noun we collect co-occurrence triples featuring the noun in a grammatical relationship with another word.</Paragraph> <Paragraph position="6"> The words and relationships considered are co-occurring verbs in the direct object and subject relation, the modifying nouns in noun-noun relations and the modifying adjectives in adjective-noun relations. Using this data, we compute the distributional similarity proposed by Lin between each pair of nouns, where the nouns have at least 10 triples. Each noun (w) is then listed with k (= 50) most similar nouns (the nearest neighbours).</Paragraph> <Paragraph position="7"> The nearest neighbours for a target noun (w) share distributional contexts and are typically se- null mantically related to the various senses (Sw) of w. The relationships between the various senses are brought out by the shared semantic relationships with the neighbours. For example the top nearest neighbours of chair include: stool, bench, chairman, furniture, staff, president. The senses of chair are 1 seat, 2 professorship, 3 chairperson and 4 electric chair. The seat and electric chair senses share semantic relationships with neighbours such as stool, bench, furniture whilst the professorship and chairperson senses are related via neighbours such as chairman, president, staff.</Paragraph> <Paragraph position="8"> The semantic similarity between a neighbour (n) e.g. stool and a word sense (si [?] Sw) e.g.</Paragraph> <Paragraph position="9"> electric chair is measured using the JCN measure described above.</Paragraph> <Paragraph position="10"> To relate the set of senses (Sw) of a word (w) we produce a vector vectorVsi = (f1...fk) with k features for each si [?] Sw. The jth feature in vectorVsi is the highest JCN score between all senses of the jth neighbour and si. Figure 2 illustrates this process for chair. In contrast to using JCN between senses directly, the nearest neighbours permit senses in unrelated areas of WordNet to be related e.g. painting - activity and painting - object since both senses may have neighbours such as drawing in common. The vectors are used to produce RLISTs for each si. To produce the RLIST of a sense si of w we obtain a value for the Spearman rank correlation coefficient (r) between the vector for si and that for each of the other senses of w (sl [?] Sw,where l negationslash= i). r is calculated by obtaining rankings for the neighbours on vectorVsi and vectorVs l using the JCN values for ranking. We then list si with the other senses ordered according to the r value, for example the RLIST for sense 1 of chair is [4 (0.50), 3 (0.34), 2 (0.20)] where the sense number is indicated before the bracketed r score.</Paragraph> </Section> <Section position="6" start_page="19" end_page="44" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> For our experiments we use the same set of 20 nouns used by Agirre and Lopez de Lacalle (2003). The gold standard used in that work was SEGR. These groupings were released for SENSEVAL-2 but we cannot find any documentation on how they were produced or on inter-annotator agreement. 4 We have therefore produced a new gold-standard (referred to as RS) for these nouns which we describe in section 4.1. We compare the results of our methods for relating senses and SEGR to RS. We then look at the performance of both the gold-standard groupings (SEGR and RS) compared to our automatic methods for coarser grained WSD of SEVAL-2 ENG LEX using some first sense heuristics.</Paragraph> <Section position="1" start_page="19" end_page="20" type="sub_section"> <SectionTitle> 4.1 Creating a Gold Standard </SectionTitle> <Paragraph position="0"> To create the gold-standard we gave 3 native english speakers a questionnaire with all possible pairings of WordNet 1.7 word senses for each of the 20 nouns in turn. The pairs were derived from all possible combinations of senses of the given noun and the judges were asked to indicate a &quot;related&quot;, &quot;unrelated&quot; or don't know response for each pair. 5 This task allows a sense to be related to others which are not themselves related.</Paragraph> <Paragraph position="1"> The ordering of the senses was randomised and fake IDs were generated instead of using the sense numbers provided with WordNet to avoid possible bias from indications of sense predominance.</Paragraph> <Paragraph position="2"> The words were presented one at a time and each combination of senses was presented along with the WordNet gloss. 6 Table 2 provides the pair-wise agreement (PWA) figures for each word along with the overall PWA figure. The number of word senses for each noun is given in brackets. Overall, morerelationships wereidentified compared tothe rather fine-grained classes in SEGR, although there was some variation. The proportion of related items for our three judges were 52.2%, 56.5% and 22.6% respectively. Given this variation, the last row gives the pairwise agreement for pairs where the more lenient judge has said the pair is unrelated. These figures are reasonable given that humans differ in their tendency to lump or split senses andthe factthat figures forsense annotation with three judges (as opposed to two, with a third to break ties) are reported in this region (Koeling et al., 2005). Again, there are no details on annotation and agreement for SEGR.</Paragraph> </Section> <Section position="2" start_page="20" end_page="20" type="sub_section"> <SectionTitle> 4.2 Agreement of automatic methods with RS </SectionTitle> <Paragraph position="0"> Figure 3 shows the PWA of the automatic methods JCN and DIST whencalculated against the RS gold-standard at various threshold cut-offs. The difference of the best performance for these two methods (61.1% DIST and 62.2%for JCN)are not statistically significant (using the chi-squared test). The baseline which assumes that all pairs are unrelated is 54.1%. If we compare the SEGR to RS we get 68.9% accuracy. 7 This shows that the SEGR accords with RS more than the automatic methods.</Paragraph> </Section> <Section position="3" start_page="20" end_page="44" type="sub_section"> <SectionTitle> 4.3 Application to SEVAL-2 ENG LEX </SectionTitle> <Paragraph position="0"> We used the same words as in the experiment above and applied our methods as back-off to naive WSD heuristics on the SEVAL-2 ENG LEX 7Since these are groupings, there is only one possible answer and no thresholds are applied.</Paragraph> <Paragraph position="1"> ful as a back-off method where local context is not sufficient. Disambiguation is performed using the first sense heuristic from i) SemCor (Semcor FS) ii) automatic rankings from the BNC produced using the method proposed by McCarthy et al. (2004) (Auto FS) and iii) an upper-bound first sense heuristic from the SEVAL-2 ENG LEX data itself (SEVAL-2 FS). This represents how well the method would perform if we knew the first sense.</Paragraph> <Paragraph position="2"> The results are shown in table 3. The accuracy figures are equivalent to both recall and precision as there were no words in this data without a first sense in either SemCor or the automatic rankings. The fourth row provides a random baseline which incorporates the number of related senses for each instance. Usually this is calculated as the sum of summationtextw[?]tokens 1|Sw |over all word tokens. Since we are evaluating RLISTs, as well as groups, the number of senses for a given word is not fixed, but depends in the token sense. We therefore calculate the random base-line as summationtextws[?]tokens |related senses to ws||Sw |, where ws is a word sense of word w. The columns show the results for different ways of relating senses; the senses are in the same group or above the threshold for RLISTs. The second column (fine-grained) are the results for these first sense heuristics with the raw WordNet synsets. The third and fourth columns are the results for the SEGR and RS gold standards. The final four columns give the results for RLISTs with JCN and DIST with the threshold indicated.</Paragraph> <Paragraph position="3"> 8We performed the experiment on both the SENSEVAL-2 English lexical sample training and test data with very similar results, but just show the results on the test corpus due to lack of space.</Paragraph> <Paragraph position="4"> SemCor FS outperforms Auto FS, and is itself outperformed by the upper-bound, SEVAL-2 FS.</Paragraph> <Paragraph position="5"> All methods of relating WordNet synsets increase the accuracy at the expense of an increased base-line because the task is easier with less senses to discriminate between. Both JCN and DIST have threshold values which improve performance of the first sense heuristics more than the manually created SEGR given a comparable or a lower base-line (smaller classes, and a harder task) e.g. SEVAL-2 FS and Auto FS for both types of RLISTs though SemCor FS only for JCN. RS should be compared to performance of JCN and DIST at a similar baseline so we show these in the 6th and 8th columns of the table. In this case the RS seems to outperform the automatic methods, but the results for JCN are close enough to be encouraging, especially considering the baseline 63.5 is lower than that for RS (65.3).</Paragraph> <Paragraph position="6"> The RLISTs permit atrade-off between accuracy and granularity. This can be seen by the graph in figure 5 which shows the accuracy obtained for the three first sense heuristics at a range of threshold values. The random baseline is also shown. The difference in performance compared to the base- null line for a given heuristic is typically better on the fine-grained task, however the benefits of a coarse-grained inventory will depend not on this difference, but on the utility of the relationships and distinctions made between senses. We return to this point in the discussion and conclusions.</Paragraph> </Section> </Section> <Section position="7" start_page="44" end_page="44" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> The RLISTs show promising results when compared to the human produced gold-standards on a WSD task and even outperform the SEGR in most cases. There are other methods proposed in the literature which also make use of information in WordNet, particularly looking for senses with related words in common (Tomuro, 2001; Mihalcea and Moldovan, 2001). Tomuro does this to find systematic polysemy, by looking for overlap in words in different areas of WordNet. Evaluation is performed using WordNet cousins and inter-tagger agreement. Mihalcea and Moldovan look for related words in common between different senses of words to merge WordNet synsets. They also use the hand tagged data inSemCorto remove low frequency synsets. They demonstrate a large reduction in polysemy of the words in SemCor(up to 39%) with a small error rate (5.6%) measured on SemCor. Our DIST approach relates to Agirre and Lopez de Lacalle (2003) though they produced groups and evaluated against the SEGR. We use nearest neighbours and associate these with word senses, rather than finding occurrences of word senses in data directly. Nearest neighbours have been used previously to induce word senses from raw data (Pantel and Lin, 2002), but not for relating existing inventories of senses. Measures of distance in the WordNet hierarchy such as JCN have been widely used for WSD (Patwardhan et al., 2003) as well as the information contained in the structure of the hierarchy (Kohomban and Lee, 2005) which has been used for backing off when training a supervised system.</Paragraph> <Paragraph position="1"> Though coarser groupings can improve inter-tagger agreement and WSD there is also a need to examine which distinctions are useful since there are many ways that items can be grouped (Palmer et al., forthcoming). A major difference to previous work is our use of RLISTs, allowing for the level of granularity to be determined for a given application, and allowing for &quot;soft relationships&quot; so that a sense can be related to several others which are not themselves related. This might also be done with soft hierarchical clusters, but has not yet been tried. The idea of relating word sense as a matter of degree also relates to the methods of Sch&quot;utze (1998) although his work was evaluated using binary sense distinctions.</Paragraph> <Paragraph position="2"> The child example in table 1 demonstrate problems with hard, fixed groupings. Table 4 shows the RLISTs obtained with our methods, with the r scores in brackets. While many of the relationships in the SEGR are found, the relationships to the other senses are apparent. In SEGR no relationship is retained between the offspring sense (2) and the young person sense (1). According to the RS, all paired meanings of child are related. 9 A distance measure, rather than a fixed grouping, seems appropriate to us because one might want the young person sense to be related to both human offspring and immature person, but not have the latter two senses directly related.</Paragraph> </Section> class="xml-element"></Paper>