File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0206_metho.xml

Size: 13,134 bytes

Last Modified: 2025-10-06 14:14:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0206">
  <Title>Analysis of a Hand-Tagging Task</Title>
  <Section position="4" start_page="34" end_page="34" type="metho">
    <SectionTitle>
3 Polysemy
</SectionTitle>
    <Paragraph position="0"> Arguably, the degree of polysemy of a word is related to the degree of difficulty of the tagging process. The fact that dictionaries differ frequently with respect to the number of senses for polysemous words points to the difficulty of representing different meanings of a word as discrete and non-overlapping sense distinctions. In some cases (homonymy), the division between different senses seems fairly clear and agreed upon among different lexicographers, while for others, it is not at all obvious how many senses should be distinguished.</Paragraph>
  </Section>
  <Section position="5" start_page="34" end_page="34" type="metho">
    <SectionTitle>
4 Number of senses in WordNet
</SectionTitle>
    <Paragraph position="0"> The dictionary that the taggers had available for tagging task is WordNet (Miller, 1990; Miller and Fellbaum, 1991). WordNet makes fairly fine-grained distinctions, roughly comparable to a collegiate dictionary. We reasoned that the greater the sense number in WordNet was, the harder the taggers' task of evaluating the different sense distinctions in terms of the target word became. We predicted that a greater degree of polysemy would lead to greater discrepancies between the taggers' matches and those of the experhnenters, as well as among the taggers themselves. I</Paragraph>
  </Section>
  <Section position="6" start_page="34" end_page="35" type="metho">
    <SectionTitle>
5 Part of speech
</SectionTitle>
    <Paragraph position="0"> The semantic make-up of some words makes them more difScult to interpret, and hence harder to match to dictionary senses, than others. Some concepts are less well-defined or definable, and more abstract than others (Schwanenfluegel, 1991). Words referring to concrete and imagible entities such as objects and persons may generally be easier to interpret. If such words are polysemous, the different meanings should be relatively easy to distinguish on &amp;quot;Polysemy&amp;quot; in WordNet subsumes homonymy as well as polysemy; however, the latter is far more common: in most cases, the different senses of a word are semantically related. No clearly discernible homonyms occurred in the data we analyzed for this report.</Paragraph>
    <Paragraph position="1">  the grounds that each meaning has a fairly clear representation. By this reasoning, we expected nouns to present fewer difficulties to taggers. (Of course, many nouns have abstract referents, but as a class, we predicted nouns to be easier to annotate than verbs or modifiers. The nouns in the text we chose for our analysis had mostly concrete, imagible referents.) null Modifiers like adjectives and adverbs often derive much of their meanings in particular contexts from the words they modify ((Katz, 1964; Pustejovsky, 1995)). During sequential tagging, each content word in a running text is tagged, so the meanings of highly polysemous adjectives often become clear as the tagger looks to the head noun. However, adjectives in WordNet are highly polysemous and show a good deal of overlap, so that the context does not always uniquely pick out one sense. The kinds of polysemy and overlap found among the adjectives are carried over to the many derived adverbs in WordNet.</Paragraph>
    <Paragraph position="2"> Whereas the meanings of nouns tend to be stable in the presence of different verbs, verbs can show subtle meaning variations depending on the kinds of noun arguments with which they co-occur. Moreover, the boundary between literal and metaphoric language seems particularly elusive in the case of verbs. (Gentner and France, 1988) demonstrated the &amp;quot;high mutability&amp;quot; of verbs, showing people's willingness to assign very flexible meanings to verbs while noun meanings were held constant. They argue that verb meanings are more easily altered because they are less cohesive than those of nouns. We expected the semantic flexibility of verbs to create additional difBculties for tagging. Discrete dictionary senses could be particularly iU-suited to usages where core senses have been extended beyond what the dictionary definitions cover, and where taggers must abstract from a creative usage to a more general, inclusive sense. In other cases, a usage can be assigned to several senses that have been accorded polyseme status on the basis of previously encountered usages, but may overlap with respect to other usages. We therefore expected less overall agreement for verbs tags than for nouns.</Paragraph>
    <Paragraph position="3"> Polysemy and syntactic class membership interact: Verbs and adjectives have on average more senses than nouns in both conventional dictionaries and in WordNet. Both the number of senses and the syntactic class membership of verbs and moditiers may conspire to make these words more difficult to tag.</Paragraph>
  </Section>
  <Section position="7" start_page="35" end_page="35" type="metho">
    <SectionTitle>
6 Sense ordering in Vv~rdNet
</SectionTitle>
    <Paragraph position="0"> The order in which WordNet list.,~ the different senses of a word corresponds to the frequency with which that sense has been tagged to words in the Brown Corpus (Landes et al., in press). Statistically, one would therefore expect the first sense to be the one that is chosen as the most appropriate one in most cases. (Gale et al., 1992) estimate that automatic sense disambignation would be a'~, least 75% correct if a system ignored context and assigned the most frequently occurring sense. (Miller et al., 1994) found that automatic assignment of polysemous words in the Brown Corpus to senses in WordNet was correct 58% of the time with a guessing heuristic that assumed the most frequently occurring sense to be the correct one.</Paragraph>
    <Paragraph position="1"> The taggers whose work is analyzed here were not aware of the frequency ordering of the senses. However, other reasons led us to predict a preference for the first sense. The most frequently tagged sense also usually represents the most &amp;quot;central&amp;quot; or &amp;quot;core&amp;quot; meening of the word in question. When it covers the largest semantic &amp;quot;territory,&amp;quot; the first sense may seem like the safest choice.</Paragraph>
    <Paragraph position="2"> Taggers may often be reluctant to ex~mlne a large number of senses when one appears quite appropriate. While reading each new WordNet entry for a given word, taggers must modify the corresponding entry in their mental lexicons. When encountering a sense that appears to match the usage, taggers do not know whether another sense, which they have not yet read, will present a still more subtle meaning difference. Since the first sense usually represents the most inclusive meaning of the word, taggers daunted by the task of examining a large number of closely related senses or unsure about certain sense distinctions may simply chose the first sense rather than continue searching for further subdifferentiations. We therefore predicted a tendency on the part of the taggers to select the first sense even when it was not the one chosen by us.</Paragraph>
  </Section>
  <Section position="8" start_page="35" end_page="36" type="metho">
    <SectionTitle>
7 The experiment
</SectionTitle>
    <Paragraph position="0"> We analyzed the data from the paid training session that all taggers underwent before they were assigned to work on the semantic concordance (cite landesinpress). The taggers were 17 undergraduate and graduate students (6 male, 11 female). In all cases, the taggers' sense selections were compared to those made by two of the authors, who have years of experience in lexicography. While these &amp;quot;expert&amp;quot; sense selections constituted the standard for evaluating the taggers' performance, they should not be re- null garded as the &amp;quot;right&amp;quot; choice, implying that all other choices are &amp;quot;wrong.&amp;quot; Rather, the matches between taggers' and experts' choices reflect the extent to which the ability to match mental representations of meanings with dictionary entries overlap between untrained annotators and lexicographers practiced in drawing subtle sense distinctions and familiar with the limitations of dictionary representations.</Paragraph>
    <Paragraph position="1"> In addition to evaluating the taggers' annotations against those of the &amp;quot;experts,&amp;quot; we examined the degree of inter-tagger agreement, which would shed some light on the representation of meanings in the lexicons of novice taggers unpracticed at drawing a large number of fine-grained sense distinctions, and their ability to deal with potentially overlapping and redundant entries in WordNet. A high inter-tagger agreement rate would be indicative of the stability of naive inter-subject meaning discrimination. We expected less agreement for words that we predicted to be more difficult. Significant disagreement for highly polysemous words would be compatible with (Jorgenson, 1990), whose subjects discriminate only about three senses of highly polysemous nouns.</Paragraph>
    <Paragraph position="2"> Moreover, we expected less inter-tagger agreement for verbs and modifiers than for nouns.</Paragraph>
    <Paragraph position="3"> The material was a 660-word section taken from a fiction passage in the Brown Corpus. We eliminated the 336 function words and proper nouns, and the 70 monosemons content words. Of the remaining 254 polysemous words, 88 were nouns, 100 were verbs, 39 were adjectives, and 27 were adverbs, a distribution similar to that found in standard prose texts. The task of the taggers was to select appropriate senses from WordNet for these 254 words. 2 The number of alternative WordNet senses per word ranged from two to forty-one (the mean across all POS was 6.62). The mean number of WordNet senses for the verbs in the text was 8.63; for adjectives 7.95; for nouns 4.74; for adverbs 3.37.</Paragraph>
    <Paragraph position="4"> Taggers received a specially created booklet with the typed text and a box in which they marked their sense choices. 3 Taggers further received a dictionary booklet containing the senses for the words to be tagged as they are represented in WordNet. Word senses were provided as synonym sets along with defining glosses.</Paragraph>
    <Paragraph position="5"> For nouns and verbs, the corresponding superordinate synonym sets were presented; adjectives were 2We had made a few minor alterations to the text; for example, we omitted short phrases containing word senses that had previously occurred in the text.</Paragraph>
    <Paragraph position="6"> 3In addition, the taggers participants indicated the degree of confidence with which they made their choice; these ratings are reported in (Fellbaum et al., in press). given with their antonyms. Two versions of the dictionary booklet were prepared, one for each training condition.</Paragraph>
    <Paragraph position="7"> In the first condition (&amp;quot;frequency&amp;quot; condition), 8 taggers were given a dictionary booklet listing the WordNet senses in the order of frequency with which they appear in the already tagged Brown Corpus. If, in the frequency condition, there was a significant tendency to chose the first sense, which was usually also the most inclusive, general one, it would indicate that the taggers adopted a &amp;quot;safe&amp;quot; strategy in picking the core sense rather than to continue searching for more subtle distinctions. While the taggers were not told anything about the sense ordering in the dictionary booklet, we expected those taggers working in the frequency condition to realize fairly quickly in the course of their annotations that the sense listed at the top was often most inclusive or salient one.</Paragraph>
    <Paragraph position="8"> In the second condition (&amp;quot;random order condition&amp;quot;), the remaining 9 taggers were given a dictionary booklet with the same WordNet senses arranged in random order generated by means of a random number generator. Here, the first sense was no longer necessarily the most inclusive, general one.</Paragraph>
    <Paragraph position="9"> A strong tendency towards picking the first sense in the random order would point to a reluctance to examine and evaluate all available senses, independent of whether this sense represented the most salient or core sense.</Paragraph>
    <Paragraph position="10"> Not surprisingly, the expert choice was at the top of the list in the frequency condition for most words.</Paragraph>
    <Paragraph position="11"> The mean position of the expert choice for all parts of speech in the frequency order was 2.29; in the random condition, the mean position of the expert choice was 3.55.</Paragraph>
    <Paragraph position="12"> The taggers, who worked independently from each other, were not aware of having been assigned to one of two groups of participants. They finished the task within 4-6 hours.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML