File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/e93-1051_evalu.xml

Size: 6,352 bytes

Last Modified: 2025-10-06 14:00:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1051">
  <Title>Lexical Disambiguation Using Constraint Handling In Prolog (CHIP) *</Title>
  <Section position="5" start_page="433" end_page="434" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> Evaluation of a dictionary-based lexical disambiguation routine is difficult since the preselection of the correct senses is in practice very difficult and timeconsuming. The most obvious technique would seem to be to start by creating a benchmark of sentences, disambiguating these manually using intuitive linguistic and lexicographical expertise to assign the best sense-number to each word. However, distinctions between senses are often delicate and fine-grained in a dictionary, and it is often hard to fit a particular case into one and only one category. It is typical in work of this kind that researchers use human choices for the words or sentences to disambiguate and the senses they will attempt to recognise \[Guthrie, 1993\]. In most of the cases \[Hearst, 1991; McDonald et al., 1990; Guthrie et al., 1991; Guthrie et al., 1992\], the number of test sentences is rather small (less than 50) so that no exact comparison between different methods can be done. Our tests included a set of 20 sentences, from sentences cited in an NLP textbook \[Harris, 1985\] (used to illustrate non-MRD-based semantic disambiguation techniques) example sentences cited in \[Guthrie et al., 1992; Lesk, 1986; Hearst, 1991\] (for comparison between different lexical disambiguation routines) and examples taken from LDOCE (to assess the algorithm's performance with example sentences of particular senses in the dictionary-this might also be a way of testing the consistency of the relationship between different senses and their corresponding examples of a word in LDOCE). A sense chosen by our algorithm is compared with the 'intuitive' sense; but if there is not an exact match, we need to look further to judge how 'plausible' the predicted sense remains.</Paragraph>
    <Paragraph position="1"> After pruning of function words, length varied from 2 to 6 content words to be disambiguated, with an average of 3.1 ambiguous words per sentence. The number of different sense combinations ranged from 15 to 126000.</Paragraph>
    <Paragraph position="2"> Of the 62 ambiguous words, 36 were assigned senses exactly matching our prior intuitions, giving an overall success rate of 58%. Although accuracy of the results is far from 100%, the method confirms the potential contribution of the use of dictionary definitions to the problem of lexical sense disambiguation. Ambiguous words had between 2 and 44 different senses. Investigating the success at disambiguating a particular word depended on the number of alternative senses given in the dictionary we had the following results: No. senses No. words Disambiguated Success per word per range correctly  It might be expected that if the algorithm has to choose between a very large number of alternative senses it would be much likelier to fail; but in fact the algorithm held up well against the odds, showing graceful degradation in success rate with increasing ambiguity. Furthermore, success rate showed little variation with increased number of ambiguous words per sentence: No. amb. words No. sentences Success per sentence per range  This presumably indicates a balanced trade-off between competing factors. One might expect that each extra word brings with it more information to help disambiguate other words, improving overall success rate; on the other hand, it also brings with it spurious senses with primitives which may act as 'red herrings' favouring alternative senses for other words.</Paragraph>
    <Paragraph position="3"> The average overlap score per sentence for the best analysis rose in line with sentence length, or rather, number of ambiguous words in the sentence: No. ambiguous words Average overlap for per sentence best disambiguation  We noticed a trend towards choosing longer sensedefinitions over shorter ones (i.e senses defined by a larger set of semantic primitives tended to be preferred); 41 out of the 62 solutions given by the program (66%) were longer definitions than average.</Paragraph>
    <Paragraph position="4"> This is to be expected in an algorithm maximising overlap, as there are more primitives to overlap with in a larger definition. However, this tendency did NOT appear to imply wrong long sense were being preferred to correct short sense leading to a worsening overall success rate: of the 41 cases, 27 were correct, i.e 66% compared to 58% overall. A better interpretation of this result might be that longer definitions are more detailed and accurate, thus making a better 'target'.</Paragraph>
    <Paragraph position="5"> Of the 26 'failures', 5 were assigned senses which were in fact incompatible with the syntactic word-class in the given sentence. This indicates that if the algorithm was combined with a word-tagger such as CLAWS \[Atwell, 1983; Leech, 1983\], and lexical senses were constrained to those allowed by the wordtags predicted by CLAWS, the success rate could rise to 66%. This may also be necessary in cases where LDOCE's definitions are not accurate enough. For example, trying to disambiguate the words show, in.</Paragraph>
    <Paragraph position="6"> retest and music in the sentence 'He's showing an interest in music' \[Procter et al., 1978\]. the program chose the eighth noun sense of show and the second verb sense of interest. This was because the occurence of the word 'do' in both definitions resuited in a maximum overlap for that combination.</Paragraph>
    <Paragraph position="7"> However, the 'do's sense is completely different in each case. For the show 'do' was related to 'well done ~ and for interest to 'do something'.</Paragraph>
    <Paragraph position="8"> Optimisation with CHIP performed well in finding the optimal solution. In all cases no other sense combination had a better score than the one found. This was confirmed by testing our algorithm in a separate implementation without any of CHIP's optimisation procedures but using a conventional method for exploring the search space for the best solution. Optimisation with CHIP was found to be from 120% to 600% faster than the conventional approach.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML