File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0206_concl.xml
Size: 9,937 bytes
Last Modified: 2025-10-06 13:57:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0206"> <Title>Analysis of a Hand-Tagging Task</Title> <Section position="10" start_page="37" end_page="39" type="concl"> <SectionTitle> 9 Discussion </SectionTitle> <Paragraph position="0"> The rather high tagger-expert agreement indicated that the novice taggers found the annotation task feasible. We found the predicted main effects for degree of polysemy, POS, and the order in which the senses were presented in the dictionary booklet.</Paragraph> <Paragraph position="1"> Increasing polysemy of the target words produced less tagger-expert and inter-tagger agreement. Besides having to weigh and compare more options, the taggers needed to adjust their own ideas of the polysemous words' meanings to the particular way these are split up and represented in WordNet. The more alternative senses there were, the less likelihood there was that the taggers' mental representations of the senses overlapped significantly with those in WordNet.</Paragraph> <Paragraph position="2"> In both conditions, nouns were tagged significantly more often in agreement with the experts' choice than verbs and adjectives. For nouns, we found no significant increase in the number of agreed-upon choices when they were at the top of the list of alternative senses, indicating that the taggers were fairly sure of their choices independent of the order in which the different noun senses were listed in the dictionary. This effect could be attributed at best only partly to the relatively low polysemy of nouns. Nouns may be &quot;easier&quot; because they commonly denote concrete, imagible referents. Verb and adjective meanings, on the other hand, are more context-dependent, particularly on the meanings of the nouns with which they co-occur. People's mental representations of noun concepts may be more fixed and stable and less vague than those of verbs and adjectives. In fact, the larger number of dictionary sense numbers for verbs in particular may be due less to actual meaning distinctions than to the lexicographer's attempt to account for the great semantic flexibility of many verbs.</Paragraph> <Paragraph position="3"> Overall, taggers chose the expert selection less frequently than they agreed on a sense among themselves. While it is possible that the expert choice did not always reflect the best match, we suspect that novice taggers annotate differently from lexicographers. The latter are necessarily highly sensitive to sense distinctions and have developed a facility to retrieve and distinguish the multiple meanings of a word more easily than naive languge users, who may have a less rich representation of word meanings at their fingertips. This possibility is supported by (Jorgenson, 1990), whose naive subjects consistently distinguished fewer senses of a word than dictionaries do, even when they were given dictionaries to consult in the course of the sense discrimination task. Jorgenson's subjects agreed substantically on discriminating the three most central, salient senses of polysemous nouns but did not distinguish subsenses. Dictionaries likewise often agree among each other on the most central, core, senses of words but differ in the number and kinds of subtle distinctions.</Paragraph> <Paragraph position="4"> But whereas lexicographers are trained in drawing fine distinctions, naive language users appear to be aware of large-grained sense differences only. Our resuits indicate, in the case of finer sense distinctions, a lack of shared mental representations among the taggers, and a decrease in agreement. This explanation is also consistent with the decrease in tagger-expert matches along with increasing polysemy.</Paragraph> <Paragraph position="5"> The salience and the shared mental representation of certain word senses might further account for our third main effect. Taggers agreed with the experts and with each other significantly more often when the WordNet senses were presented in the order of frequency of occurrence. This was generally true for words from all polysemy groups and POS.</Paragraph> <Paragraph position="6"> We suggest that taggers recognized the most appropriate sense more easily in this condition because they did not use the same strategy as in the random order condition. In the frequency condition, the most salient, &quot;core,&quot; senses usually occurred first, or at least fairly high, on the list of senses. These senses also had a high chance of being the appropriam ones in the text, since we had selected a fiction passage with non-technlcal, everyday language.</Paragraph> <Paragraph position="7"> Taggers working in the frequency condition probably realized that the sense ordering resembled that of most standard dictionaries and chose the first sense that seemed at all to be a good match rather than examining all senses carefully, as they would have to do in the random order condition.</Paragraph> <Paragraph position="8"> When the first sense was also the one the lexicographers had chosen as the most appropriate one, the taggers' task was relatively easy. Given that they recognized that the first sense was appropriate, selecting it meant that they did not have to examine and compare the remaining senses in search of an even better choice. Weighing all available senses against each other and against the given usage can be a difficult task especially for novice taggers, and we expected a general tendency to gravitate towards the first choice for this reason. Stopping to read after one has encountered the first sense that seems appropriate resembles the dictionary look-up strategy where one stops reading the entry when one has found a sense that seems to match the given usage (Kflgarriif, 1993).</Paragraph> <Paragraph position="9"> The first senses in the frequency condition, which generally express the most salient and central meanings, might be most clearly representend in both naive and expert speakers' mental lexicons and might show the greatest overlap across speakers.</Paragraph> <Paragraph position="10"> These senses were presumably easily understood by the taggers and increased any reluctance to examine the remaining options.</Paragraph> <Paragraph position="11"> The difference between the tagger-expert matches for words in the first position and words in subsequent positions was particularly strong for verbs and (in the frequency order condition) for words with eight or more senses. These were the cases that were generally more difficult for the taggers, as reflected in lower tagger-expert agreement. The results therefore indicate that the expert choice being the first made the decision process for the taggers much easier by eliminating the need for a difficult comparison of all the available senses, and, in the frequency condition, by the fact that the first sense was generally the most salient one.</Paragraph> <Paragraph position="12"> The preference for the first among the available senses was even more pronounced in the inter-tagger agreement. There was a highly significant difference for the agreed-upon choice between the first and subsequent positions in the case of verbs and adjectives and words with eight or more senses in the frequency order condition (p<0.01). Again, the taggers probably understood the first, most frequent and often most salient sense easily and were reluctant to consider more fine-grained sense differentiations.</Paragraph> <Paragraph position="13"> In the random order condition, no bias towards the first sense existed, so the strategy of choosing the first sense or an appropriate sense near the top of the list was not available. The taggers had to examine and consider each sense in the entry, which made the task more difficult. This is reflected in lower inter-tagger and tagger-expert agreement rates. Yet the high percentages of matches in this condition show that the taggers worked well. When the expert sense was the first on the list, taggers working in the random order condition selected the expert sense less frequently than the taggers working in the frequency order condition. This result further indicates that taggers here were not biased towards the first sense, but considered all senses equally.</Paragraph> <Paragraph position="14"> In sum, we found that matching word usages to word senses in a dictionary is a hard task, whose dit~culty depends on the part of speech of the target word and increases with the number of senses given in the dictionary. Among the available choices, the first sense of each polysemous word was a significant attractor.</Paragraph> <Paragraph position="15"> Our findings suggest that randomly ordered senses would weaken taggers' strategy of relying on the first sense being the best match and encourage more scrupulous examination of the available choices. 4 Confidence ratings reflected the degree of difficulty of the items in that they paralleled the taggers' performance as measured by tagger-expert and inter-tagger agreement. Highly polysemous words were tagged with less confidence, and taggers were more confident when tagging nouns rather than verbs and modifiers. Confidence was slightly higher for inter-tagger than expert-tagger matches, supporting the reality of a &quot;naive&quot; lexicon as opposed to representation of polysemous words in the mental lexicon of practiced lexicographers or linguists. In the random order condition, taggers made their decision with more confidence than in the frequency order condition, although was less agreement with the experts. We believe that this result further supports the claim that taggers in the two conditions proceeded differently: Taggers working with a randomly ordered list of senses did not rely on the first sense being the correct one. They worked more scrupulously, which is reflected in the higher confidence ratings.</Paragraph> </Section> class="xml-element"></Paper>