File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-0206_evalu.xml

Size: 4,760 bytes

Last Modified: 2025-10-06 14:00:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0206">
  <Title>Analysis of a Hand-Tagging Task</Title>
  <Section position="9" start_page="36" end_page="37" type="evalu">
    <SectionTitle>
8 Results
</SectionTitle>
    <Paragraph position="0"> We first report the percentage of overlap between taggers' and experts' choices in terms of the three main variables: POS, degree of polysemy, and the order of senses in WordNet. We give the results in percentages here; however, calculation of the significant effects is based on analyses of variance carried out on the raw data.</Paragraph>
    <Paragraph position="1"> In the frequency condition, taggers overall chose the same sense as the experts 75.2% of the time; in the random condition, the overall agreement was 72.8%. In both conditions, performance was significantly (p &lt; 0.01) higher for nouns than for the other  parts of speech. For all four parts of speech, we found more tagger-expert matches in the frequency condition than in the random condition. The difference, however, was significant (p &lt; 0.05) only for nouns.</Paragraph>
    <Paragraph position="2"> The target words were classified into four groups depending on their polysemy count. Group 1 contalned words with 2 senses; Group 2 words with 3-4 senses; the words in Group 3 had 5-7, and in Group 4, 8 or more senses. The groups were created so that each contained approximately 25% of the words from each part of speech, i.e., the groups were similar in size for each syntactic category.</Paragraph>
    <Paragraph position="3"> Tagger-expert matches decreased significantly with increasing number of senses (p&lt;0.01) in both conditions. This effect was found for all parts of speech, but it was especially strong for adverbs, where performance dropped from a mean 83.3% tagger-expert agreement for adverbs with two senses to 32.5% for adverbs with 5-7 senses, and to only 29.4% for the most polysemous adverbs. Except for words with two senses, we found more tagger-expert matches in the frequency condition than in the random condition.</Paragraph>
    <Paragraph position="4"> In both conditions, significantly more tagger-expert matches occurred for all parts of speech when the expert choice was in first position than when it occurred in a subsequent position (80.2% vs. 70.5%, p&lt;0.01 for the frequency condition; 79% vs. 70%, p&lt;0.05 for the random condition). This effect was also found with the same level of significance for verbs alone, in both conditions. In the frequency condition, we found the effect of the expert choice being at the top of the list of senses to be particularly strong for the most polysemons words (p&lt;0.05); the overall effect of the expert choice being the first choice for all polysemy classes was significant at the p&lt;0.01 level. (For words with only two senses in WordNet, the position had no significant effect on the rate of agreement between taggers and experts.) We now turn to the sense choices that were made by most taggers. We asked, what percentage of taggers selected the most frequently chosen sense, and did the syntactic class membership of the words, their degree of polysemy, or the order of the senses in WordNet have an effect on the rate of agreement? Taggers agreed among themselves significantly more often than they did with the experts (82.5% in the frequency condition, and 82% in the random condition). Inter-tagger agreement followed the same pattern as tagger-expert matches: agreement decreased with increasing polysemy; agreement rates were highest for nouns and lowest for verbs and adjectives in both conditions.</Paragraph>
    <Paragraph position="5"> Inter-tagger agreement decreased significantly (p&lt;~0.01) with increasing polysemy for all parts of speech in both conditions. This supports our expectation that more choices render the matching task more dii~cult, making agreement less likely. The decrease in inter-tagger agreement with increasing polysemy was especially strong in the case of adverbs. null In the frequency order condition, the overall agreement was significantly (p&lt;~0.01) higher (87%) when the agreed-upon sense was the first choice rather than a subsequent one (78%) on the list of alternative senses in the dictionary. This effect was also found separately for all POS except nouns. Similarly, we found that in the random order condition, inter-tagger agreement was higher for all POS when the agreed-upon sense was the first in the dictionary (85.5% vs. 79.6%). For the different polysemy groups, the choice most often made was in first position for low and medium high polysemy words, but for high polysemy words (5 or more senses), the most frequently selected sense was less often in the first position.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML