XML Viewer - w02-0817

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0817_metho.xml
Size: 20,636 bytes
Last Modified: 2025-10-06 14:08:02
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0817">
  <Title>Building a Sense Tagged Corpus with Open Mind Word Expert</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Sense Tagged Corpora
</SectionTitle>
    <Paragraph position="0"> The availability of large amounts of semantically tagged data is crucial for creating successful WSD systems. Yet, as of today, only few sense tagged corpora are publicly available.</Paragraph>
    <Paragraph position="1"> One of the first large scale hand tagging efforts is reported in (Miller et al., 1993), where a subset of the Brown corpus was tagged with WordNet July 2002, pp. 116-122. Association for Computational Linguistics. Disambiguation: Recent Successes and Future Directions, Philadelphia, Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense senses. The corpus includes a total of 234,136 tagged word occurrences, out of which 186,575 are polysemous. There are 88,058 noun occurrences of which 70,214 are polysemous.</Paragraph>
    <Paragraph position="2"> The next significant hand tagging task was reported in (Bruce and Wiebe, 1994), where 2,476 usages of interest were manually assigned with sense tags from the Longman Dictionary of Contemporary English (LDOCE). This corpus was used in various experiments, with classification accuracies ranging from 75% to 90%, depending on the algorithm and features employed.</Paragraph>
    <Paragraph position="3"> The high accuracy of the LEXAS system (Ng and Lee, 1996) is due in part to the use of large corpora. For this system, 192,800 word occurrences have been manually tagged with senses from WordNet. The set of tagged words consists of the 191 most frequently occurring nouns and verbs. The authors mention that approximately one man-year of effort was spent in tagging the data set.</Paragraph>
    <Paragraph position="4"> Lately, the SENSEVAL competitions provide a good environment for the development of supervised WSD systems, making freely available large amounts of sense tagged data for about 100 words. During SENSEVAL-1 (Kilgarriff and Palmer, 2000), data for 35 words was made available adding up to about 20,000 examples tagged with respect to the Hector dictionary. The size of the tagged corpus increased with SENSEVAL-2 (Kilgarriff, 2001), when 13,000 additional examples were released for 73 polysemous words. This time, the semantic annotations were performed with respect to WordNet.</Paragraph>
    <Paragraph position="5"> Additionally, (Kilgarriff, 1998) mentions the Hector corpus, which comprises about 300 word types with 300-1000 tagged instances for each word, selected from a 17 million word corpus.</Paragraph>
    <Paragraph position="6"> Sense tagged corpora have thus been central to accurate WSD systems. Estimations made in (Ng, 1997) indicated that a high accuracy domain independent system for WSD would probably need a corpus of about 3.2 million sense tagged words.</Paragraph>
    <Paragraph position="7"> At a throughput of one word per minute (Edmonds, 2000), this would require about 27 man-years of human annotation effort.</Paragraph>
    <Paragraph position="8"> With Open Mind Word Expert we aim at creating a very large sense tagged corpus, by making use of the incredible resource of knowledge constituted by the millions of Web users, combined with techniques for active learning.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Open Mind Word Expert
</SectionTitle>
    <Paragraph position="0"> Open Mind Word Expert is a Web-based interface where users can tag words with their WordNet senses. Tagging is organized by word. That is, for each ambiguous word for which we want to build a sense tagged corpus, users are presented with a set of natural language (English) sentences that include an instance of the ambiguous word.</Paragraph>
    <Paragraph position="1"> Initially, example sentences are extracted from a large textual corpus. If other training data is not available, a number of these sentences are presented to the users for tagging in Stage 1. Next, this tagged collection is used as training data, and active learning is used to identify in the remaining corpus the examples that are &amp;quot;hard to tag&amp;quot;. These are the examples that are presented to the users for tagging in Stage 2. For all tagging, users are asked to select the sense they find to be the most appropriate in the given sentence, from a drop-down list that contains all WordNet senses, plus two additional choices, &amp;quot;unclear&amp;quot; and &amp;quot;none of the above&amp;quot;. The results of any automatic classification or the classification submitted by other users are not presented so as to not bias the contributor's decisions. Based on early feedback from both researchers and contributors, a future version of Open Mind Word Expert may allow contributors to specify more than one sense for any word. A prototype of the system has been implemented and is available at http://www.teachcomputers.org. Figure 1 shows a screen shot from the system interface, illustrating the screen presented to users when tagging the noun &amp;quot;child&amp;quot;.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Data
</SectionTitle>
      <Paragraph position="0"> The starting corpus we use is formed by a mix of three different sources of data, namely the Penn Treebank corpus (Marcus et al., 1993), the Los Angeles Times collection, as provided during TREC conferences1, and Open Mind Common Sense2, a collection of about 400,000 common-sense assertions in English as contributed by volunteers over the Web. A mix of several sources, each covering a different spectrum of usage, is  used to increase the coverage of word senses and writing styles. While the first two sources are well known to the NLP community, the Open Mind Common Sense constitutes a fairly new textual corpus. It consists mostly of simple single sentences. These sentences tend to be explanations and assertions similar to glosses of a dictionary, but phrased in a more common language and with many sentences per sense. For example, the collection includes such assertions as &amp;quot;keys are used to unlock doors&amp;quot;, and &amp;quot;pressing a typewriter key makes a letter&amp;quot;. We believe these sentences may be a relatively clean source of keywords that can aid in disambiguation. For details on the data and how it has been collected, see (Singh, 2002).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Active Learning
</SectionTitle>
      <Paragraph position="0"> To minimize the amount of human annotation effort needed to build a tagged corpus for a given ambiguous word, Open Mind Word Expert includes an active learning component that has the role of selecting for annotation only those examples that are the most informative.</Paragraph>
      <Paragraph position="1"> According to (Dagan et al., 1995), there are two main types of active learning. The first one uses memberships queries, in which the learner constructs examples and asks a user to label them. In natural language processing tasks, this approach is not always applicable, since it is hard and not always possible to construct meaningful unlabeled examples for training. Instead, a second type of active learning can be applied to these tasks, which is selective sampling. In this case, several classifiers examine the unlabeled data and identify only those examples that are the most informative, that is the examples where a certain level of disagreement is measured among the classifiers. null We use a simplified form of active learning with selective sampling, where the instances to be tagged are selected as those instances where there is a disagreement between the labels assigned by two different classifiers. The two classifiers are trained on a relatively small corpus of tagged data, which is formed either with (1) Senseval training examples, in the case of Senseval words, or (2) examples obtained with the Open Mind Word Expert system itself, when no other training data is available.</Paragraph>
      <Paragraph position="2"> The first classifier is a Semantic Tagger with Active Feature Selection (STAFS). This system (previously known as SMUls) is one of the top ranked systems in the English lexical sample task at SENSEVAL-2. The system consists of an instance based learning algorithm improved with a scheme for automatic feature selection. It relies on the fact that different sets of features have different effects depending on the ambiguous word considered. Rather than creating a general learning model for all polysemous words, STAFS builds a separate feature space for each individual word. The features are selected from a pool of eighteen different features that have been previously acknowledged as good indicators of word sense, including: part of speech of the ambiguous word itself, surrounding words and their parts of speech, keywords in context, noun before and after, verb before and after, and others. An iterative forward search algorithm identifies at each step the feature that leads to the highest cross-validation precision computed on the training data. More details on this system can be found in (Mihalcea, 2002b).</Paragraph>
      <Paragraph position="3"> The second classifier is a COnstraint-BAsed Language Tagger (COBALT). The system treats every training example as a set of soft constraints on the sense of the word of interest. WordNet glosses, hyponyms, hyponym glosses and other WordNet data is also used to create soft constraints. Currently, only &amp;quot;keywords in context&amp;quot; type of constraint is implemented, with weights accounting for the distance from the target word.</Paragraph>
      <Paragraph position="4"> The tagging is performed by finding the sense that minimizes the violation of constraints in the instance being tagged. COBALT generates confidences in its tagging of a given instance based on how much the constraints were satisfied and violated for that instance.</Paragraph>
      <Paragraph position="5"> Both taggers use WordNet 1.7 dictionary glosses and relations. The performance of the two systems and their level of agreement were evaluated on the Senseval noun data set. The two systems agreed in their classification decision in 54.96% of the cases. This low agreement level is a good indication that the two approaches are fairly orthogonal, and therefore we may hope for high disambiguation precision on the agreement  dividual classifiers and their agreement and disagreement sets set. Indeed, the tagging accuracy measured on the set where both COBALT and STAFS assign the same label is 82.5%, a figure that is close to the 85.5% inter-annotator agreement measured for the SENSEVAL-2 nouns (Kilgarriff, 2002).</Paragraph>
      <Paragraph position="6"> Table 1 lists the precision for the agreement and disagreement sets of the two taggers. The low precision on the instances in the disagreement set justifies referring to these as &amp;quot;hard to tag&amp;quot;. In Open Mind Word Expert, these are the instances that are presented to the users for tagging in the active learning stage.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Ensuring Quality
</SectionTitle>
      <Paragraph position="0"> Collecting from the general public holds the promise of providing much data at low cost. It also makes attending to two aspects of data collection more important: (1) ensuring contribution quality, and (2) making the contribution process engaging to the contributors.</Paragraph>
      <Paragraph position="1"> We have several steps already implemented and have additional steps we propose to ensure quality. null First, redundant tagging is collected for each item. Open Mind Word Expert currently uses the following rules in presenting items to volunteer contributors: a1 Two tags per item. Once an item has two tags associated with it, it is not presented for further tagging.</Paragraph>
      <Paragraph position="2"> a1 One tag per item per contributor. We allow contributors to submit tagging either anonymously or having logged in. Anonymous contributors are not shown any items already tagged by contributors (anonymous or not) from the same IP address. Logged in contributors are not shown items they have already tagged.</Paragraph>
      <Paragraph position="3"> Second, inaccurate sessions will be discarded. This can be accomplished in two ways, roughly by checking agreement and precision: a1 Using redundancy of tags collected for each item, any given session (a tagging done all in one sitting) will be checked for agreement with tagging of the same items collected outside of this session.</Paragraph>
      <Paragraph position="4"> a1 If necessary, the precision of a given contributor with respect to a preexisting gold standard (such as SemCor or Senseval training data) can be estimated directly by presenting the contributor with examples from the gold standard. This will be implemented if there are indications of need for this in the pilot; it will help screen out contributors who, for example, always select the first sense (and are in high agreement with other contributors who do the same).</Paragraph>
      <Paragraph position="5"> In all, automatic assessment of the quality of tagging seems possible, and, based on the experience of prior volunteer contribution projects (Singh, 2002), the rate of maliciously misleading or incorrect contributions is surprisingly low. Additionally, the tagging quality will be estimated by comparing the agreement level among Web contributors with the agreement level that was already measured in previous sense tagging projects. An analysis of the semantic annotation task performed by novice taggers as part of the SemCor project (Fellbaum et al., 1997) revealed an agreement of about 82.5% among novice taggers, and 75.2% among novice taggers and lexicographers. null Moreover, since we plan to use paid, trained taggers to create a separate test corpus for each of the words tagged with Open Mind Word Expert, these same paid taggers could also validate a small percentage of the training data for which no gold standard exists.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Engaging the Contributors
</SectionTitle>
      <Paragraph position="0"> We believe that making the contribution process as engaging and as &amp;quot;game-like&amp;quot; for the contributors as possible is crucial to collecting a large volume of data. With that goal, Open Mind Word Expert tracks, for each contributor, the number of items tagged for each topic. When tagging items, a contributor is shown the number of items (for this word) she has tagged and the record number of items tagged (for this word) by a single user.</Paragraph>
      <Paragraph position="1"> If the contributor sets a record, it is recognized with a congratulatory message on the contribution screen, and the user is placed in the Hall of Fame for the site. Also, the user can always access a real-time graph summarizing, by topic, their contribution versus the current record for that topic.</Paragraph>
      <Paragraph position="2"> Interestingly, it seems that relatively simple word games can enjoy tremendous user acceptance. For example, WordZap (http://wordzap.com), a game that pits players against each other or against a computer to be the first to make seven words from several presented letters (with some additional rules), has been downloaded by well over a million users, and the reviewers describe the game as &amp;quot;addictive&amp;quot;. If sense tagging can enjoy a fraction of such popularity, very large tagged corpora will be generated.</Paragraph>
      <Paragraph position="3"> Additionally, NLP instructors can use the site as an aid in teaching lexical semantics. An instructor can create an &amp;quot;activity code&amp;quot;, and then, for users who have opted in as participants of that activity (by entering the activity code when creating their profiles), access the amount tagged by each participant, and the percentage agreement of the tagging of each contributor who opted in for this activity. Hence, instructors can assign Open Mind Word Expert tagging as part of a homework assignment or a test.</Paragraph>
      <Paragraph position="4"> Also, assuming there is a test set of already tagged examples for a given ambiguous word, we may add the capability of showing the increase in disambiguation precision on the test set, as it results from the samples that a user is currently tagging.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Proposed Task for SENSEVAL-3
</SectionTitle>
    <Paragraph position="0"> The Open Mind Word Expert system will be used to build large sense tagged corpora for some of the most frequent ambiguous words in English.</Paragraph>
    <Paragraph position="1"> The tagging will be collected over the Web from volunteer contributors. We propose to organize a task in SENSEVAL-3 where systems will disambiguate words using the corpus created with this system.</Paragraph>
    <Paragraph position="2"> We will initially select a set of 100 nouns, and collect for each of them a0a2a1a4a3a6a5a8a7a4a9a10a1 tagged samples (Edmonds, 2000), where a5 is the number of senses of the noun. It is worth mentioning that, unlike previous SENSEVAL evaluations, where multi-word expressions were considered as possible senses for an constituent ambiguous word, we filter these expressions apriori with an automatic tool for collocation extraction. Therefore, the examples we collect refer only to single ambiguous words, and hence we expect a lower inter-tagger agreement rate and lower WSD tagging precision when only single words are used, since usually multi-word expressions are not ambiguous and they constitute some of the &amp;quot;easy cases&amp;quot; when doing sense tagging.</Paragraph>
    <Paragraph position="3"> These initial set of tagged examples will then be used to train the two classifiers described in Section 3.2, and annotate an additional set of a0a2a1a11a3a12a5a13a7a15a14a15a1 examples. From these, the users will be presented only with those examples where there is a disagreement between the labels assigned by the two classifiers. The final corpus for each ambiguous word will be created with (1) the original set of a0a2a1a4a3a16a5a8a7a4a9a10a1 tagged examples, plus (2) the examples selected by the active learning component, sense tagged by users.</Paragraph>
    <Paragraph position="4"> Words will be selected based on their frequencies, as computed on SemCor. Once the tagging process of the initial set of 100 words is completed, additional nouns will be incrementally added to the Open Mind Word Expert interface. As we go along, words with other parts of speech will be considered as well.</Paragraph>
    <Paragraph position="5"> To enable comparison with Senseval-2, the set of words will also include the 29 nouns used in the Senseval-2 lexical sample tasks. This would allow us to assess how much the collected data helps on the Senseval-2 task.</Paragraph>
    <Paragraph position="6"> As shown in Section 3.3, redundant tags will be collected for each item, and overall quality will be assessed. Moreover, starting with the initial set of a0a2a1a17a3a18a5a19a7a20a9a10a1 examples labeled for each word, we will create confusion matrices that will indicate the similarity between word senses, and help us create the sense mappings for the coarse grained evaluations.</Paragraph>
    <Paragraph position="7"> One of the next steps we plan to take is to replace the &amp;quot;two tags per item&amp;quot; scheme with the &amp;quot;tag until at least two tags agree&amp;quot; scheme proposed and used during the SENSEVAL-2 tagging (Kilgarriff, 2002). Additionally, the set of meanings that constitute the possible choices for a certain ambiguous example will be enriched with groups of similar meanings, which will be determined either based on some apriori provided sense mappings (if any available) or based on the confusion matrices mentioned above.</Paragraph>
    <Paragraph position="8"> For each word with sense tagged data created with Open Mind Word Expert, a test corpus will be built by trained human taggers, starting with examples extracted from the corpus mentioned in Section 3.1. This process will be set up independently of the Open Mind Word Expert Web interface. The test corpus will be released during SENSEVAL-3.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML