XML Viewer - c04-1173

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1173_evalu.xml
Size: 5,351 bytes
Last Modified: 2025-10-06 13:59:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1173">
  <Title>Word Sense Disambiguation using a dictionary for sense similarity measure</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
7 Discussion
</SectionTitle>
    <Paragraph position="0"> The result for homographs is very good but not very significant given the low number of occurrences; this all the more true as we used a part-of-speech tagger to disambiguate homographs with different part-of-speech beforehand (these have been left out of the computation of the score).</Paragraph>
    <Paragraph position="1"> The scores we get are rather good for coarse polysemy, given the simplicity of the method.</Paragraph>
    <Paragraph position="2"> As a means of comparison, (Patwardhan et al., 2003) applies various measures of semantic proximity (due to a number of authors), using the WordNet hierarchy, to the task of word sense disambiguation of a few selected words with results ranging from 0.2 to 0.4 with respect to sense definition given in WordNet (the average of senses for each entry giving a random score of about 0.2).</Paragraph>
    <Paragraph position="3"> Our method already gives similar results on the fine polysemy task (which has an even harder random baseline) when using both nouns and verbs as nodes, and does not focus on selected targets.</Paragraph>
    <Paragraph position="4"> A method not evaluated by (Patwardhan et al., 2003) and using another semantic relatedness measure (&amp;quot;conceptual density&amp;quot;) is (Agirre and Rigau, 1996). It is also based on a distance within the WordNet hierarchy. They used a variable context size for the task and present results only for the best size (thus being a not fully unsupervised method). Their random baseline is around 30%, and their precision is 43% for 80% attempted disambiguations.</Paragraph>
    <Paragraph position="5"> Another study of disambiguation using a semantic similarity derived from WordNet is (Abney and Light, 1999); it sees the task as a Hidden Markov Model, whose parameters are estimated from corpus data, so this is a mixed model more than a purely dictionary-based model. With a baseline of 0.285, they reach a score of 0.423. Again, the method we used is much simpler, for comparable or better results.</Paragraph>
    <Paragraph position="6"> Besides, by using all connections simultaneously between words in the context to disambiguate and the rest of the lexicon, this method avoids the combinatorial explosion of methods purely based on a similarity measure, where every potential sense of every meaningful word in the context must be considered (unless every word sense of words other than the target is known beforehand, which is not a very realistic assumption), so that only local optimization can be achieved. In our case disambiguating a lot of different words appearing in the same context may result in poorer results than with only a few words, but it will not take longer.</Paragraph>
    <Paragraph position="7"> The only downside is heavy memory usage, as with any dictionary-based method.</Paragraph>
    <Paragraph position="8"> We have made the evaluation on dictionary entries because they are already part of the net- null work of senses, to avoid raising other issues too early. Thus, we are not exactly in the context of disambiguating free text. It could then be argued that our task is simpler than standard disambiguation, because dictionary definitions might just be written in a more constrained and precise language. That is why we give the score when taking always the first sense for each entry, as an approximation of the most common sense (since the dictionary does not have frequency information). We can see that this score is about 50% only for the coarse polysemy, and 40% for the fine polysemy, compared to a typical 70-80% in usual disambiguation test sets, for similar sense dispersion (given by the random baseline); in (Abney and Light, 1999), the first-sense baseline gives 82%. So we could in fact argue that disambiguating dictionary entries seems harder. This fact remains however to be confirmed with the actual most frequent senses. Let us point out again that our algorithm does not make use of the number of senses in definitions.</Paragraph>
    <Paragraph position="9"> Among the potential sources of improvement for the future, or sources of errors in the past, we can list at least the following: overlapping of some definitions for sub-senses of an entry. Some entries of the dictionary we used have sub-senses that are very hard to distinguish. In order to measure the impact of this, we should have multiple annotations of the same data and measure inter-annotator agreement, something that has not been done yet.</Paragraph>
    <Paragraph position="10"> part of speech tagging generates a few errors when confusing adjectives and nouns or adjectives and verbs having the same lemma; this should be compensated when we enrich the graph with entries for adjectives. null some time should be spent studying the precise influence of the length of the random walk considered; we have chosen a value a priori to take into account the average length of a path in the graph, but once we have more hand-tagged data we should be able to have a better idea of the best suited value for that parameter.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML