File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0202_metho.xml

Size: 12,978 bytes

Last Modified: 2025-10-06 14:14:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0202">
  <Title>Experience in WordNet Sense Tagging in the Wall Street Journal</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Verbs and the Basic Tag
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Format
</SectionTitle>
      <Paragraph position="0"> The following are the verbs that were tagged. The total number of occurrences is 6,197.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="8" type="metho">
    <SectionTitle>
VERB NUMBER VERB NUMBER
</SectionTitle>
    <Paragraph position="0"> have 2740 make 473 take 316 get 231 add 118 pay 189 see 159 call 151 decline 84 hold 127 come 191 give 168 keep I01 know 87 find 130 lose 82 believe 103 raise 124 drop 61 lead 105 work 101 leave 81 run 105 look 95 meet 75 The basic tags have the following form. Extensions will be given below. word_&lt;lemma, WordNet POS, WordNet sense number&gt;  For example: The Sacramento-based S&amp;L had_(have verb 4) assets of $2.4 billion at the end of September.</Paragraph>
    <Paragraph position="1"> That is, 'had' is a form of the main verb 'have' occurring as WordNet sense number 4.</Paragraph>
  </Section>
  <Section position="6" start_page="8" end_page="8" type="metho">
    <SectionTitle>
3 Refinements
</SectionTitle>
    <Paragraph position="0"> We consistently break out certain uses of verbs to a greater extent than WordNet does, in particular, idioms and verbs of intermediate (and auxiliary) function. There are several reasons for doing so.</Paragraph>
    <Paragraph position="1"> The primary reason is to perform more accurate tagging. Not all such uses are covered by WordNet entries.</Paragraph>
    <Paragraph position="2"> A second reason is to support identifying better features for automatic tagging. Some of these special-case uses can be identified with good accuracy with simple grammars, while the more semantically weighty uses of the same verb generally cannot be. Thus, different features will be appropriate for the special-case versus other uses. By tagging them as separate categories, one can search for separate features characterizing each class.</Paragraph>
    <Paragraph position="3"> Finally, it is helpful to the human tagger for the preprocessor to target these distinguished classes, for which relatively high-accuracy automatic solutions are possible.</Paragraph>
    <Section position="1" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
3.1 Auxiliary Verbs
</SectionTitle>
      <Paragraph position="0"> WordNet does not provide sense information for auxiliary uses of verbs. SEMCOR (Miller et al. 1994) leaves these uses untagged. Among the verbs tagged in our corpus, only 'have' has an auxiliary use, which we tag as follows, with the string &amp;quot;aux&amp;quot; replacing the sense number: South Korea has_(have verb_aux) recorded a trade surplus of 71 million dollars so far this year.</Paragraph>
      <Paragraph position="1"> As they can be recognized automatically with high accuracy, auxiliaries are automatically annotated by the preprocessor.</Paragraph>
    </Section>
    <Section position="2" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
3.2 Intermediate Verbs
</SectionTitle>
      <Paragraph position="0"> &amp;quot;Intermediate verb&amp;quot; is a term used in Quirk et al. (1985; pp. 138-148), defined as an occurrence &amp;quot;whose status is in some degree intermediate between auxiliaries and main verbs.&amp;quot; Quirk et al. arrange verbs on a scale ranging from modal auxiliaries to main verbs, and &amp;quot;many of the intermediate verbs, particularly those at the higher end of the scale, have meanings associated with aspect, tense, and modality: meanings which are primarily expressed through auxiliary verb constructions.&amp;quot; Among the verbs tagged in our corpus, 'had', 'get', and 'keep' are used with intermediate function in the following constructions: 'had better' (or 'had best') and 'have got to' (called &amp;quot;modal idioms&amp;quot; by Quirk et al.), 'have to' (called a &amp;quot;semi-auxiliary&amp;quot;), 'get' + -ed participle, and 'keep' + -ing participle (which are given the title &amp;quot;catenatives').</Paragraph>
      <Paragraph position="1"> Some but not all of these are represented by senses in WordNet (and none are identified as having this special function). Since WordNet senses cannot be consistently assigned to these occurrences, we use a new tag, &amp;quot;int&amp;quot;, in place of a sense number (or in addition to one, when there is an appropriate sense), creating a new category, as we did with the auxiliaries. null An example of an intermediate verb occurrence is the following. Note that sense 5 of 'have' is an appropriate WordNet sense for this occurrence: Apple II owners, for example, had_(have_to verbJnt 5) to use their television sets as screens and stored data on audiocassettes.</Paragraph>
      <Paragraph position="2"> These intermediate occurrences can also be recognized with good accuracy, and so are also added to the corpus by the preprocessor.</Paragraph>
      <Paragraph position="3"> The auxiliary and intermediateuses of 'have' together represent well over half of the occurrences, so breaking these out as separate categories enables the preprocessor to assist the tagger greatly. In addition, it would allow for separate evaluation of an automatic classifier tagging 'have'.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="8" end_page="10" type="metho">
    <SectionTitle>
4 Verb Idioms
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="8" end_page="10" type="sub_section">
      <SectionTitle>
4.1 Manual Annotation
</SectionTitle>
      <Paragraph position="0"> The occurrence of a variety of verb idioms---semantic units consisting of a verb followed by a particle or other modifying word--accounted for a recognizable segment--about 6%- of the tagged data. For example: null The borrowing to raise these funds would be paid_(pay_off verb I) off as assets of sick thrifts are sold.</Paragraph>
      <Paragraph position="1"> WordNet does not provide entries for all idioms, and the entries it does provide do not always include a sense for the occurrences observed in the corpus. It is important to recognize idioms, because interpreting their constituent words separately would often change the meaning of the sentence (cf., e.g.,  Wilks 1977 and Wilensky &amp; Arens 1980). Our coding instructions specify that the tagger should attempt to identify idioms even if WordNet does not provide an entry for it. The preprocessor assists in this task, by identifying potential idioms.</Paragraph>
      <Paragraph position="2"> The following axe strategies we found useful in dealing with the difficult problem of manually identifying idioms.</Paragraph>
      <Paragraph position="3">  1. Does the word following the verb cease to have any of its usual or literal meanings as supplied by WordNet when used with that verb? If America can keep_(keep.up verb 1) up the present situation ... the  economies of these countries would be totally restructured to be able to almost sustain growth by themselves.</Paragraph>
      <Paragraph position="4"> The 'situation' here does not need to be kept in a lofty position, but rather maintained. The use of 'up' as a particle takes away its literal, physical meaning, and attaches it semantically to 'keep', making an idiom definition necessary. 2. Could the idiom be replaced with a single verb which has the same meaning? For example: But the New York Stock Exchange chairman said he doesn't support reinstating a &amp;quot;collar&amp;quot; on program trading, arguing that firms could get_(get_around verb 2) around such a limit.</Paragraph>
      <Paragraph position="5"> WordNet's entry for this sense of &amp;quot;get around&amp;quot; includes as synonyms &amp;quot;avoid&amp;quot; and &amp;quot;bypass&amp;quot;, which, if used in place of the idiom, do not  change the meaning of the sentence.</Paragraph>
      <Paragraph position="6"> 3. Would the particle be mistaken for a preposition beginning a prepositional phrase-and thereby changing the meaning of the sentence--if viewed  as separate from the main verb? Consider this example: Coleco failed to come_(come.up_with verb 1) up with another winner and filed for bankruptcy-law protection ...</Paragraph>
      <Paragraph position="7"> This example actually meets all three criteria. 'Come up with' must be considered a single idiom partly to avoid a literal interpretation that would change the meaning of the sentence, as described in criterion (1), and it also has the meaning &amp;quot;locate&amp;quot;, which further qualifies this sentence as an idiom according to criterion (2).  If this sentence were given a literal reading, perhaps by an automatic tagger, 'with another winner' might be identified as an acceptable prepositional phrase.</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
4.2 A Flexible Tag Format
</SectionTitle>
      <Paragraph position="0"> For the purposes of the larger project of which this annotation project is a part, the words are annotated with information in addition to the WordNet sense tags. A simple example is the richer part-of-speech tags produced by Brill's tagger (1992). We note here a problem that we encountered using SEMCOR's tag format for idioms: SEMCOR merges the component words of the idiom into one annotation, thereby making it impossible to unambiguously represent information about the individual words. Representing split idioms is also a problem with this scheme.</Paragraph>
      <Paragraph position="1"> To maintain separate ~nnotations and also tie the constituents of an idiom together, we suggest the format below (or an analogous one), which is generated by the preprocessor. The annotations for the individual words are delimited by &amp;quot;(wf&amp;quot; and &amp;quot;(/wf )&amp;quot;. The human annotator's tags are included in the individual word annotations. For example, below the annotator tagged &amp;quot;take&amp;quot; with the first Word-Net entry for 'take place'. When there is an appropriate WordNet entry for the idiom as a whole, we store that entry with the first word of the idiom (but the entry could be stored with both). Appropriate WordNet entries for the individual words can also be stored in the individual word annotations. The Brill part-of-speech tags illustrate other information we would like to retain for the individual words.</Paragraph>
      <Paragraph position="2"> &lt;wf BrilIPOSffiVBD idiomffitake_place-i wnentry=_&lt;take_place verb l&gt;&gt;took&lt;/wf&gt; &lt;wf pos=NN idiomffitake_place-2&gt;place&lt;/wf&gt; The first two lines contain the annotation for the first word in the idiom. It contains a Brill POS tag for 'take' and a WordNet entry for 'take place'. The string 'take-place-l' encodes the fact that this is the first word of a 'take place' idiom.</Paragraph>
      <Paragraph position="3"> The third line represents the second word in the idiom ('take-place-T), which is a noun ('NN').</Paragraph>
      <Paragraph position="4"> An intervening adverb, for example, would simply be represented with its own annotation placed between the annotations for the words in the idiom.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="10" end_page="10" type="metho">
    <SectionTitle>
5 Challenging Ambiguities
</SectionTitle>
    <Paragraph position="0"> There are some instances in the corpus that we found to be truly ambiguous. These instances support two completely different interpretations even with the help of the context. For example:  In this sentence, two interpretations of the verb 'have' are equally possible, even when the sentence is viewed in context: 'Have' can be seen as an auxiliary, meaning that the group have themselves clone the forecasting, or as WordNet sense I (in which case 'forecast' is an adjective), implying that someone else has given them an amount, 56.9 billion francs, that represents their expected revenue.</Paragraph>
    <Paragraph position="1"> A problem found several times in the corpus occurred when a single verb is used in a sentence that has two objects, and each object suggests a different sense of the verb. In the sentence below, for example, two senses of the main verb 'have' are represented simultaneously in the sentence. Sense 4 carties the idea of ownership, which should be applied to the object 'papers', while sense 3 has the meaning '~o experience or receive&amp;quot;, which should be applied to the object 'sales'.</Paragraph>
    <Paragraph position="2"> PAPERS: Backe Group Inc. agreed to acquire Atlantic Publications Inc., which has_(have verb 4114) 30 community papers and annual sales of $7 million.</Paragraph>
    <Paragraph position="3"> Such cases are borderline, hovering in between two distinct meanings.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML