XML Viewer - m98-1029

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/m98-1029_metho.xml
Size: 49,232 bytes
Last Modified: 2025-10-06 14:14:48
<?xml version="1.0" standalone="yes"?>
<Paper uid="M98-1029">
  <Title>Source: Lynette Hirschman (mailto:lynette@MITRE.org)</Title>
  <Section position="2" start_page="0" end_page="1" type="metho">
    <SectionTitle>
2. GENERAL NOTATION
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 SGML Tagging
</SectionTitle>
      <Paragraph position="0"> The annotation for coreference is SGML tagging within the text stream. Referring expressions and their antecedents are tagged as follows:</Paragraph>
      <Paragraph position="2"> The basic annotation contains the information to establish some type of link between an explicitly marked pair of noun phrases. In the above example, the pronoun &amp;quot;it&amp;quot; is tagged as referring to the same entity as the phrase, &amp;quot;Lawson Mardon Group Ltd.&amp;quot; There is one markup per string. Other links can be inferred from the explicit links. We assume that the coreference relation is symmetric and transitive, so if phrase A is marked as coreferential with B (indicated by a REF pointer from A to B), we can infer that B is coreferential with A; if A is coreferential with B, and B is coreferential with C, we can infer that A is coreferential with C.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 The &amp;quot;TYPE&amp;quot; Attribute
</SectionTitle>
      <Paragraph position="0"> The purpose of the TYPE attribute is to indicate the relationship between the anaphor and the antecedent. At present only one such relationship, &amp;quot;IDENT&amp;quot; (for identity), is being annotated.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 The &amp;quot;ID&amp;quot; and &amp;quot;REF&amp;quot; Attributes
</SectionTitle>
      <Paragraph position="0"> The ID and REF attributes are used to indicate that there is a coreference link between two strings. The ID is arbitrarily but uniquely assigned to the string during markup. The REF uses that ID to indicate the coreference link.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 The &amp;quot;MIN&amp;quot; Attribute
</SectionTitle>
      <Paragraph position="0"> The MIN attribute is used in the answer key (&amp;quot;key&amp;quot;) to indicate the minimum string that the system under evaluation must include in the COREF tag in order to receive full credit for its output (&amp;quot;response&amp;quot;). So, in the next example, if the system response had omitted &amp;quot;of Surrey, England&amp;quot; from the COREF tag, the response would nonetheless receive full credit because it identified the minimum string.</Paragraph>
      <Paragraph position="1"> &lt;COREF ID=&amp;quot;100&amp;quot; MIN=&amp;quot;Haden MacLellan PLC&amp;quot;&gt;Haden MacLellan PLC of Surrey, England&lt;/COREF&gt; ... &lt;COREF ID=&amp;quot;101&amp;quot; TYPE=&amp;quot;IDENT&amp;quot; REF=&amp;quot;100&amp;quot;&gt;Haden MacLellan&lt;/COREF&gt; Any response which includes the MIN string and does not include any tokens beyond those enclosed in the &lt;COREF&gt;...&lt;/COREF&gt; tags is valid. The MIN string will in general be the HEAD of the phrase; see section 5 for a full discussion of this issue. Note that only the annotation KEY distinguishes between the maximal string and the MIN string; the response key does not have a MIN attribute.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.5 The &amp;quot;STATUS&amp;quot; Attribute
</SectionTitle>
      <Paragraph position="0"> The STATUS (&amp;quot;status&amp;quot;) attribute is used in the answer key when the markup is optional. The only value for this attribute is OPT (&amp;quot;optional&amp;quot;).</Paragraph>
      <Paragraph position="1">  The evaluation software will not score a string that is marked OPT in the key unless the response has markup on that string. A potential example is given below. (It is marked OPT because a reader may not be certain that &amp;quot;Livingston Street&amp;quot; refers to the Board of Education.) Note that the optionality is marked only for the anaphor.</Paragraph>
      <Paragraph position="2">  At the Feb. 96 meeting of the Coreference and Ellipsis working group, the suggestion was made to distinguish markups that are optional because of textual ambiguity from markups that are optional because of unclear or missing markup guidelines. Although this seems a workable suggesion, a little experimentation may be advisable before implementation.</Paragraph>
      <Paragraph position="3"> Our &lt;COREF ID=&amp;quot;102&amp;quot; MIN=&amp;quot;Board of Education&amp;quot;&gt;Board of Education&lt;/COREF&gt; budget is just too high, the Mayor said. &lt;COREF ID=&amp;quot;103&amp;quot; STATUS=&amp;quot;OPT&amp;quot; TYPE=&amp;quot;IDENT&amp;quot; REF=&amp;quot;102&amp;quot;&gt;Livingston Street&lt;/COREF&gt; has lost control.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="1" end_page="1" type="metho">
    <SectionTitle>
3. WHAT PART OF THE TEXT TO ANNOTATE
</SectionTitle>
    <Paragraph position="0"> Coreference markup should be made on the body of the text and on corpus-specific portions of the header. The SGML tags that are used to identify the body and the various portions of the header may vary from one corpus to another.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 Specific Guidance for MUC-7 Corpus
</SectionTitle>
      <Paragraph position="0"> The annotation of coreference is to be performed within the text delimited by the SLUG, DATE, NWORDS, PREAMBLE, TEXT, and TRAILER tags.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.2 Specific Guidance for Speech Transcriptions
</SectionTitle>
      <Paragraph position="0"> If the transcript contains disfluencies or verbal erasures, the &amp;quot;erased&amp;quot; portion should not be annotated for coreference; this means that it will be helpful to have the input text annotated for disfluencies before beginning coreference annotation, so that there is agreement on what is &amp;quot;verbally deleted&amp;quot; and what is part of the final output.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1" end_page="1" type="metho">
    <SectionTitle>
4. WHAT THINGS TO ANNOTATE
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.1 Markables
</SectionTitle>
      <Paragraph position="0"> The coreference relation will be marked between elements of the following categories: NOUNS, NOUN PHRASES, and PRONOUNS. Elements of these categories are MARKABLES. PRONOUNS include both personal and demonstrative pronouns, and with respect to personal pronouns, all cases, including the possessive. Dates (&amp;quot;January 23&amp;quot;), currency expressions (&amp;quot;$1.2 billion&amp;quot;), and percentages (&amp;quot;17%&amp;quot;) are considered noun phrases.</Paragraph>
      <Paragraph position="1"> A noun phrase is markable whether it is the object of an assertion, a negation, or a question. Thus, &amp;quot;a machete&amp;quot; is markable in all of the following examples: I have a machete.</Paragraph>
      <Paragraph position="2"> I don't have a machete.</Paragraph>
      <Paragraph position="3"> Do you have a machete? Note in particular that the initial introduction of an object into the discourse may often occur as an indefinite noun phrase (&amp;quot;Do you have a machete?&amp;quot; or &amp;quot;I saw *a truck*; *it* turned the corner...&amp;quot;). Also note that just because an element is &amp;quot;markable&amp;quot;, it does not follow that there are later references to it -- that is, it may or may not participate in coreference. That may even be true for pronouns -- section 4.5 for further discussion. Interrogative &amp;quot;wh-&amp;quot; noun phrases are NOT markables, e.g. &amp;quot;Which engine&amp;quot; and &amp;quot;Who&amp;quot; in the following queries: Which engine would you like to use? Who is your boss? The relation is marked only between pairs of elements both of which are markables. This means that some markables that look anaphoric will not be coded, including pronouns, demonstratives, and definite NPs whose antecedent is a clause rather than a markable. For example, in Program trading is &amp;quot;a racket,&amp;quot; complains Edward Egnuss, a White Plains, N.Y., investor and electronics sales executive, &amp;quot;and *it's not to the benefit of the small investor*, *that*'s for sure.&amp;quot; Though &amp;quot;that&amp;quot; is related to &amp;quot;it's not to the benefit of the small investor&amp;quot;, the latter is not markable, so no antecedent is annotated for &amp;quot;that&amp;quot;.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.2 Terminology for Mark-Up
</SectionTitle>
      <Paragraph position="0"> It is useful to define some terms to support a discussion of the difficult coreference issues. This section defines the terms &amp;quot;extensional descriptor&amp;quot;, &amp;quot;intensional descriptor&amp;quot; and &amp;quot;grounding instance&amp;quot;.</Paragraph>
      <Paragraph position="1"> An extensional descriptor is an enumeration of the member(s) of a set by (unique) names. In the context of the coreference task definition, this amounts to the use of proper names, e.g., *Jane Z. Smith*, *Chrysler Corporation*, or numerical values (The stock price was *$4.02*).</Paragraph>
      <Paragraph position="2"> An intensional description is a predicate that is true of an entity or set of entities -- that it, it characterizes or defines the members of the set: &amp;quot;the prime numbers&amp;quot;, &amp;quot;the president of Chrysler Corporation&amp;quot;. Any nonconcrete common noun, taken on its own, is an intensional description: it functions at the &amp;quot;type&amp;quot; level (&amp;quot;president&amp;quot;, &amp;quot;problem&amp;quot;) or, if it takes a quantifiable value, at the &amp;quot;function&amp;quot; level (&amp;quot;rate&amp;quot;, &amp;quot;temperature&amp;quot;). Intensional descriptions are useful for sets which have no finite extension (&amp;quot;the set of odd numbers&amp;quot;), or in cases where we don't know the extension (&amp;quot;the gene sequence responsible for encoding the immune response&amp;quot;).</Paragraph>
      <Paragraph position="3"> They can also be used to refer to instances of the type (*my Ford*... *the car*), or to values of the function (the [per share value of [$4.02]] ... [The stock price]).</Paragraph>
      <Paragraph position="4"> The grounding instance in a coreference chain is the first extensional description in the chain (most often, the first element in the chain). This terminology is useful in the discussion about function-value relations, time-dependent entities and bare nominals. Thus in the sequence Henry Higgins, who was formerly sales director for Sudsy Soaps, became president of Dreamy Detergents we have a sequence consisting of the extensional description *Henry Higgins* (which is the grounding instance), together with two intensional descriptions, *sales director for Sudsy Soaps* and *president of Dreamy Detergents*. In addition, there are two other extensional descriptions, *Sudsy Soaps*, and *Dreamy Detergents*.</Paragraph>
      <Paragraph position="5"> In the sentence The temperature rose to 90 degrees before dropping to 70 degrees.</Paragraph>
      <Paragraph position="6"> we have a function, &amp;quot;temperature&amp;quot;, which takes on a value (&amp;quot;90 degrees&amp;quot;) at one point in time, and at a later point in time, a second value, &amp;quot;70 degrees&amp;quot;. Because there is only one occurrence of the noun phrase &amp;quot;the temperature&amp;quot;, we have a problem marking the coreferring expressions. &amp;quot;The temperature&amp;quot; is a function expression, and is grounded first by &amp;quot;90 degrees&amp;quot;; an implicit second occurrence is coreferential with &amp;quot;70 degrees&amp;quot;. However, if we mark all of these as coreferential, we find that &amp;quot;90 degrees&amp;quot; is IDENT with &amp;quot;70 degrees&amp;quot;, which is clearly wrong. What we want to say is: 90 degrees instantiates &amp;quot;The temperature&amp;quot; at time t1; 70 degrees instantiates &amp;quot;The temperature&amp;quot; at time t2; And these are both of type &amp;quot;temperature&amp;quot; (but not IDENT).</Paragraph>
      <Paragraph position="7"> In section 1.3, we proposed conventions that prevent the collapsing of coreference chains, at the expense of losing some type coreference. Given that our mark-up conventions are already incomplete (we don't mark verb coreference), this seems like a small price to pay for making the chains we do mark useful in other information extraction tasks, e.g., the Template Element task. We provide marked-up examples of this in section 6.4.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.3 Names and Other Named Entities
</SectionTitle>
      <Paragraph position="0"> Names and other Named Entities (as defined in the MUC-6 document titled &amp;quot;Named Entity Task Definition&amp;quot; -dates, times, currency amounts, and percentages) are all markables. A substring of a Named Entity, however, is not a markable. Thus in *London* ... *London*-based ...</Paragraph>
      <Paragraph position="1"> the two instances of London are to be marked coreferential; in *Reuters Holding PLC* ... *Reuters* announced that &amp;quot;Reuters Holding PLC&amp;quot; and &amp;quot;Reuters&amp;quot; are to be marked coreferential. But in Equitable of Iowa Cos. ... located in Iowa.</Paragraph>
      <Paragraph position="2"> the two instances of &amp;quot;Iowa&amp;quot; are NOT to be marked as coreferential since the first is not a markable: it is a substring of a Named Entity. In addition to names as defined for the Named Entity task, other identifiers that are, in the opinion of the annotator, clearly not decomposable should be treated as atomic as well, e.g., &amp;quot;Widener Library&amp;quot; and &amp;quot;E two&amp;quot; in &lt;COREF ID=&amp;quot;0&amp;quot; MIN=&amp;quot;building&amp;quot;&gt;the large strange-looking building, which is &lt;COREF ID=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt;Widener Library&lt;/COREF&gt;&lt;/COREF&gt; and in okay then I'll take &lt;COREF ID=&amp;quot;0&amp;quot; MIN=&amp;quot;E two&amp;quot;&gt;engine E two&lt;/COREF&gt; ... so uh the plan is to take &lt;COREF ID=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot; REF=&amp;quot;0&amp;quot; MIN=&amp;quot;E two&amp;quot;&gt;engine E two&lt;/COREF&gt; ... Date expressions recognized by the Named Entity task are also treated as atomic; components of a date are not separate markables. Thus, in In a report issued January 5, 1995, the program manager said that there would be no new funds this year. no relation is to be marked between &amp;quot;1995&amp;quot; and &amp;quot;this year&amp;quot;.</Paragraph>
    </Section>
    <Section position="4" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.4 Gerunds
</SectionTitle>
      <Paragraph position="0"> Gerunds (verbal forms using a present participle) are not markable. In *Slowing the economy* is supported by some Fed officials; *it* is repudiated by others. one should not mark the relation between &amp;quot;slowing the economy&amp;quot; and &amp;quot;it&amp;quot;. A phrase headed by a present participle is taken to be verbal if it can take an object (as in the above example) or can be modified by an adverb.</Paragraph>
      <Paragraph position="1"> Present participles which are modified by other nouns or adjectives (&amp;quot;program trading&amp;quot;, &amp;quot;excessive spending&amp;quot;), are preceded by an article (&amp;quot;a&amp;quot;, &amp;quot;the&amp;quot;, &amp;quot;my&amp;quot;, etc.) or are followed by an &amp;quot;of&amp;quot; phrase (&amp;quot;slowing of the economy&amp;quot;) are to be considered noun-like and ARE markable.</Paragraph>
    </Section>
    <Section position="5" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.5 Pronouns
</SectionTitle>
      <Paragraph position="0"> The possessive forms of pronouns used as determiners are markable. Thus in its chairperson there are two potential markables for relations: &amp;quot;its&amp;quot; and the entire NP, &amp;quot;its chairperson&amp;quot;. Similarly, in &amp;quot;the man's arm&amp;quot;, there are two markables, &amp;quot;[the] man&amp;quot; and &amp;quot;the man's arm&amp;quot;. The general question of what is to be treated as a lexical token (apostrophes in this case) is discussed in the MUC document titled &amp;quot;Tokenization Rules.&amp;quot; First, second, and third-person pronouns are all markable, so in &amp;quot;There is no business reason for *my* departure&amp;quot;, *he* added.</Paragraph>
      <Paragraph position="1"> &amp;quot;my&amp;quot; and &amp;quot;he&amp;quot; should be marked as coreferential. Reflexive pronouns are markable, so in *He* shot *himself* with *his* revolver.</Paragraph>
      <Paragraph position="2"> &amp;quot;He&amp;quot;, &amp;quot;himself&amp;quot;, and &amp;quot;his&amp;quot; should all be marked coreferential. Emphatics are also markable; thus, &amp;quot;himself&amp;quot; should also be marked coreferential, so that &amp;quot;He&amp;quot; and &amp;quot;himself&amp;quot; are marked coreferential in: *He* is, *himself*, unsure of the outcome.</Paragraph>
      <Paragraph position="3"> In certain cases, pronouns may not have an antecedent (&amp;quot;It's raining&amp;quot;) or they may refer to something unmarkable, for example, a clausal construction -- see section 4.1 above.</Paragraph>
    </Section>
    <Section position="6" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.6 Bare Nouns
</SectionTitle>
      <Paragraph position="0"> Prenominal modifiers (e.g., *ocean drilling* in &amp;quot;the ocean drilling company&amp;quot;) are markable only if either the prenominal modifier is coreferential with a named entity or to the syntactic head of a maximal noun phrase.</Paragraph>
      <Paragraph position="1"> That is, there must be one element in the coreference chain that is a head or a name, not a modifier. Thus the following instance is markable, because the prenominal modifier &amp;quot;aluminum&amp;quot; is coreferential with the head noun &amp;quot;aluminum&amp;quot; in the phrase &amp;quot;market for aluminum&amp;quot;.</Paragraph>
      <Paragraph position="2"> The price of *aluminum* siding has steadily increased, as the market for *aluminum* reacts to the strike in Chile.</Paragraph>
      <Paragraph position="3"> Similarly, the following two occurrences of &amp;quot;drug&amp;quot; would be marked: He was accused of money laundering and *drug* trafficking. However, the trade in *drugs*....</Paragraph>
      <Paragraph position="4"> Contrast this with the following occurrences of &amp;quot;contract&amp;quot; and &amp;quot;contract drilling&amp;quot; which would NOT be marked, because there are no occurrences of this phrase, except as a prenominal modifier in the following sequence: Ocean Drilling &amp; Exploration Co. will sell its *contract drilling* business. ... Ocean Drilling said it will offer 15% to 20% of the *contract drilling* business through an initial public offering in the near future.</Paragraph>
      <Paragraph position="5"> Note that the occurrences of *its*, *its contract drilling business* and *the contract drilling business* would all be markable -- see section 6.5.</Paragraph>
      <Paragraph position="6"> While nouns in prenominal positions are sometimes markable, the noun which appears at the head of a noun phrase is not separately markable -- it is markable only as part of the entire noun phrase. Thus in the passage Linguists are a strange bunch. Some linguists even like spinach.</Paragraph>
      <Paragraph position="7"> it would not be correct to link the two instances of &amp;quot;linguists&amp;quot;.Similarly, in the sentence: The rate, which was 6 percent, was higher than that offered by the other bank.</Paragraph>
      <Paragraph position="8"> the noun phrase &amp;quot;the rate&amp;quot; is a function expression, instantiated by the predicate &amp;quot;6 percent&amp;quot;, so these two would be marked coreferential, as follows: &lt;COREF ID=&amp;quot;0&amp;quot; MIN=&amp;quot;rate&amp;quot;&gt;The rate, which was &lt;COREF ID=&amp;quot;1&amp;quot; REF=&amp;quot;0&amp;quot;&gt;6 percent&lt;/COREF&gt;,&lt;/COREF&gt; was higher than that offered by the other bank.</Paragraph>
      <Paragraph position="9"> In this example, pronoun *that* is coreferential at the FUNCTION level with *The rate*. However, *that* occurs as the head of a noun phrase, *that offered by the other bank*, which is NOT coreferential with *The rate* and *6 percent* (indeed, it refers to a higher rate), so *that* is an instance of a pronoun that cannot be marked in our current framework, even though we lose some type coreference information by not marking it.</Paragraph>
    </Section>
    <Section position="7" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.7 Implicit Pronouns
</SectionTitle>
      <Paragraph position="0"> Assume that English has no zero pronouns; in other words, the empty string is not markable. In Bill called John and spoke with him for an hour.</Paragraph>
      <Paragraph position="1"> there is no relation between the implicit subject of &amp;quot;spoke&amp;quot; and &amp;quot;Bill&amp;quot;. Do not code relations between a relative pronoun and the NP to which it attaches or the gap that it fills. Thus, in the movie which I saw the relative pronoun &amp;quot;which&amp;quot; bears no markable relation to either &amp;quot;the movie&amp;quot; (the head to which the relative pronoun attaches) or to the implicit object of &amp;quot;saw&amp;quot; (the gap that the pronoun fills).</Paragraph>
    </Section>
    <Section position="8" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.8 Conjoined Noun Phrases
</SectionTitle>
      <Paragraph position="0"> Noun phrases which contain two or more heads (as defined in section 4.1) are marked by defining the MINimal string (see section 5) as the span from the first &amp;quot;head&amp;quot; through the last &amp;quot;head&amp;quot; including all material in between. The MAXimal string includes the entire maximal conjoined noun phrase. Thus we mark coreference between &amp;quot;The sleepy boys and girls&amp;quot; and &amp;quot;their&amp;quot; as follows: &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;boys and girls&amp;quot;&gt;The sleepy boys and girls&lt;/COREF&gt; enjoy &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt;their&lt;/COREF&gt; breakfast.</Paragraph>
      <Paragraph position="1"> In addition, the individual conjuncts are markable if they are separately coreferential with other phrases:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1" end_page="1" type="metho">
    <SectionTitle>
5. HOW MUCH OF THE MARKABLE TO ANNOTATE
</SectionTitle>
    <Paragraph position="0"> The task is defined in order to allow maximal latitude for systems in identifying markables, and to decouple the evaluation from that of accurately parsing noun phrases. Accordingly, the string generated by a system to identify a markable must include the head of the markable (as defined below) and may include any additional text up to a maximal noun phrase (as defined below).</Paragraph>
    <Paragraph position="1"> In preparing the key, the text element to be enclosed in SGML tags is the maximal noun phrase; the head will be designated by the MIN attribute.</Paragraph>
    <Paragraph position="2"> [We expect that in the future it may be possible, when separate noun phrase bracketings are available, to automatically generate the maximal NP markup from a markup using only heads.]</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
5.1 Head of a Phrase
</SectionTitle>
      <Paragraph position="0"> For most noun phrases, the head will be the main noun, without its left and right modifiers.</Paragraph>
      <Paragraph position="1">  If the head is a name, the entire name is marked. This includes suffixes such as &amp;quot;Sr.&amp;quot;, &amp;quot;III&amp;quot;, etc. on personal names and &amp;quot;Corp.&amp;quot; on organization names; it does not include personal titles or any modifiers. We follow in this regard the rules for marking personal and organization names for the Named Entity task, as well as for other non-geographic names (e.g, &amp;quot;New Year&amp;quot;):  For constructions that are idioms or collocations, the minimal phrase will ignore the fact that this is a collocation and use the syntactic head; this is because the definition of a collocation is often domain-specific. In the following examples, the MIN is indicated by asterisks: income *taxes* light *year* *run* of the mill If the maximal noun phrase is the same as the head, the MIN need not be marked. Also, if the maximal noun phrase differs from the MIN only by the articles &amp;quot;a&amp;quot; or &amp;quot;the&amp;quot;, the MIN need not be marked, because the scoring program will automatically strip these before comparing answers.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
5.2 Maximal Noun Phrase
</SectionTitle>
      <Paragraph position="0"> The maximal noun phrase includes all text which may be considered a modifier of the noun phrase. This includes (among other modifiers) appositional phrases, non-restrictive relative clauses, and prepositional phrases which may be viewed as modifiers of the noun phrase or of a containing clause: *Mr. Holland* *the senior of the executives who will assume Holland's duties* *the rumor that the war had ended* *Fred Frosty, the ice cream king of Tyson's Corner,* *the Penn Central Co., which used to run a railroad,* XYZ Inc. formed *a joint venture with Sony* Note that in the fourth and fifth cases the final comma may be viewed as part of the NP, and so is included in the maximal NP. The system does not need to worry about punctuation, since the scorer strips punctuation before comparing key to response. In the last case, &amp;quot;with Sony&amp;quot; could equally well be taken to modify &amp;quot;venture&amp;quot; or &amp;quot;formed&amp;quot;, and so is included as part of the maximal NP around &amp;quot;venture&amp;quot;. Note also that in the &amp;quot;Fred Frosty&amp;quot; example, there is a coreference between the entire noun phrase and the appositional phrase, &amp;quot;the ice cream king of Tyson's Corner&amp;quot;; see section 6.3 for a discussion of this construct. In the case of a conjoined noun phrase with shared complements or modifiers, the maximal noun phrase for the conjoined phrase is the maximal noun phrase. The minimal noun phrase will begin the at the minimal phrase for the first conjunct and include everything up to the end of the minimal phrase for the last conjunct.</Paragraph>
      <Paragraph position="1"> If the conjuncts are referenced individually, the maximal noun phrases will NOT include the conjunct. The maximal NP for the first conjunct will include all of the NP up to the conjunction; the maximal NP for the second conjunct will include all of the NP following the conjunction: &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;Fribble&amp;quot;&gt;Ms. Fribble&lt;/COREF&gt; was &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot; STATUS=&amp;quot;OPT&amp;quot;&gt;president&lt;/COREF&gt; and &lt;COREF ID=&amp;quot;3&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot; STATUS=&amp;quot;OPT&amp;quot; MIN=&amp;quot;CEO&amp;quot;&gt; CEO of Amalgamated Text Processing Inc.&lt;/COREF&gt; It is possible that the maximal span of the noun phase is interrupted by material that is not part of the noun phrase. Such discontinuous noun phrases should nonetheless be included within a single COREF tag. [In the future, it may be possible to capture the discontinuity explicitly by some special notation.] In the MUC-6 corpus, discontinuous noun phrases frequently appear in headlines, since the non-first lines of a headline are often marked with &amp;quot;@&amp;quot;, which is external to the preceding and subsequent text. An annotated example of a discontinuous markable is shown in the example below:</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
5.3 Exceptions: Articles
</SectionTitle>
      <Paragraph position="0"> If the only difference between the head and the maximal noun phrase is the presence of an article -- the word &amp;quot;the&amp;quot;, &amp;quot;a&amp;quot;, or &amp;quot;an&amp;quot; at the beginning of the noun phrase -- the MIN need not be explicitly marked. (The scoring program will automatically strip leading articles before comparing strings.)</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1" end_page="2" type="metho">
    <SectionTitle>
6. WHICH RELATIONSHIPS TO ANNOTATE
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
6.1 Basic Coreference
</SectionTitle>
      <Paragraph position="0"> The basic criterion for linking two markables is whether they are coreferential: whether they refer to the same object, set, activity, etc. It is not a requirement that one of the markables is &amp;quot;semantically dependent&amp;quot; on the other, or is an anaphoric phrase.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
6.2 Bound Anaphors
</SectionTitle>
      <Paragraph position="0"> We also make a coreference link between a &amp;quot;bound anaphor&amp;quot; and the noun phrase which binds it (even though one may argue that such elements are not coreferential in the usual sense). Thus we would link a quantified noun phrase and a pronoun dependent on that quantification: *Most computational linguists* prefer *their* own parsers.</Paragraph>
      <Paragraph position="1"> Note that a quantified noun phrase would also be linked to subsequent anaphors, outside the scope of quantification, through the usual relation of identity of coreference. Thus in the following text all three noun phrases would be linked: *Every TV network* reported *its* profits yesterday. *They* plan to release full quarterly statements tomorrow.</Paragraph>
      <Paragraph position="2"> By this rule, a pronoun in a relative clause which is bound to the head of the clause would get a coreference link to the entire NP. Thus, for every man who knows his own mind we would establish a coreference link between &amp;quot;his&amp;quot; and the entire noun phrase &amp;quot;every man who knows his own mind&amp;quot;: &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;man&amp;quot;&gt;every man who knows &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt;his &lt;COREF&gt;own mind&lt;/COREF&gt;</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
6.3 Apposition
</SectionTitle>
      <Paragraph position="0"> A typical use of an appositional phrase is to provide an alternative description or name for an object: Julius Caesar, the well-known emperor, Julius Caesar, a well-known emperor, the well known emperor, Julius Caesar, This identity of reference is to be represented by a coreference link between the appositional phrase, &amp;quot;the well-known emperor&amp;quot; and the ENTIRE noun phrase, &amp;quot;Julius Caesar, the/a well-known emperor&amp;quot;: &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;Julius Caesar&amp;quot;&gt;Julius Caesar, &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot; MIN=&amp;quot;emperor&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt; the/a well-known emperor,&lt;/COREF&gt;&lt;/COREF&gt; The appositional phrase may be separated from the head by other modifiers. Thus Peter Holland, 45, deputy general manager,...</Paragraph>
      <Paragraph position="1"> becomes &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;Peter Holland&amp;quot;&gt;Peter Holland, 45, &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot; MIN=&amp;quot;manager&amp;quot;&gt; deputy general manager,&lt;/COREF&gt; &lt;/COREF&gt; Appositional phrases are markable (and support the Descriptor slot in the Template Element task in the MUC-7 Information Extraction Task Definition) even when indefinite, e.g., Ms. Ima Head, a 10-year MUC veteran, San Diego, one of America's finest cities, An appositional phrase is also marked in the specifier relation, e.g., &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;job&amp;quot;&gt;The job of &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot;&gt;manager&lt;/COREF&gt;&lt;/ COREF&gt; However, appositional phrases are NOT marked when they are negative: Ms. Ima Head, never a great MUC fan, or when there is only partial overlap of sets: The criminals, often legal immigrants, ...</Paragraph>
      <Paragraph position="2"> Appositional phrases are marked only when they constitute a separate noun phrase following the head. In written text, appositives are generally set off by commas; in transcripts of spoken language, the commas may well not be present because punctuation is generally not captured in text-to-speech transcription. There are cases where a construction that looks similar to an appositive but occurs within a single noun phrase as a title or modifier, e.g., *the real estate company* * Century 21* This kind of single noun construction is not considered markable. Thus, no coreference is marked in cases such as the following: the real estate company Century 21 the realtor Century 21 presidential advisor Joe Smarty</Paragraph>
    </Section>
    <Section position="4" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
Treasury Secretary Bucks
</SectionTitle>
      <Paragraph position="0"> But the following phrase would have mark-up: *the job of *manager**</Paragraph>
    </Section>
    <Section position="5" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
6.4 Predicate Nominals and Time-dependent Identity
</SectionTitle>
      <Paragraph position="0"> Predicate nominals are also typically coreferential with the subject. Thus in the example *Bill Clinton* is *the President of the United States*.</Paragraph>
      <Paragraph position="1"> we would record a coreference link between &amp;quot;Bill Clinton&amp;quot; and &amp;quot;the President of the United States&amp;quot;. Coreference should NOT be recorded if the text only asserts the possibility of identity between two markables. In Phinneas Flounder may be the dumbest man who ever lived.</Paragraph>
      <Paragraph position="2"> Phinneas Flounder was almost the first president of the corporation.</Paragraph>
      <Paragraph position="3"> If elected, Phinneas Flounder would be the first Californian in the Oval Office. no coreference is to be recorded.</Paragraph>
      <Paragraph position="4"> We also allow coreference to be recorded when the predicate nominative is marked indefinite, e.g.,: *Mediation* is *a viable alternative to bankruptcy*.</Paragraph>
      <Paragraph position="5"> *Farm-debt mediation* is *one of the Farm Belt's success stories*.</Paragraph>
      <Paragraph position="6"> *ARPA program managers* are *nice people*.</Paragraph>
      <Paragraph position="7"> However, as with apposition, if there is possibility or a partial set overlap, no coreference is marked because there is no set coreference and no IDENT relation: Mediation is often a viable alternative to bankruptcy.</Paragraph>
      <Paragraph position="8"> Mediation may be a viable alternative to bankruptcy.</Paragraph>
      <Paragraph position="9"> Two markables should be recorded as coreferential if the text asserts them to be coreferential at ANY TIME.  Thus Henry Higgins, who was formerly sales director for Sudsy Soaps, became president of Dreamy Detergents should be annotated as &lt;COREF ID=&amp;quot;1&amp;quot; MIN=&amp;quot;Henry Higgins&amp;quot;&gt;Henry Higgins, who was formerly &lt;COREF ID=&amp;quot;2&amp;quot; MIN=&amp;quot;director&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt;sales director for Sudsy Soaps,&lt;/COREF&gt;&lt;/COREF&gt; became &lt;COREF ID=&amp;quot;3&amp;quot; MIN=&amp;quot;president&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt;president of Dreamy Detergents&lt;/COREF&gt;  This is one portion of the guidelines that will clearly need modification after a decision is made about enhancing the notation to distinguish time-dependent coreference links from other coreference links. But the distinction between the two types applies not only to predicate nominals but also to apposition, function-value and other construction types. Thus the phrase &amp;quot;and Time-Dependent Entity&amp;quot; should probably be removed from the title of section 6.4; a new section could cover the general issue of time-dependent coreference. Also, general info about any new type of notation should go in section 2, and the meaning of the new notation should be documented in section 6 (which may need a different section title).</Paragraph>
      <Paragraph position="10"> Even if the copula or inchoative verb is embedded, coreference should be marked, as in Dreamy Detergents named Henry Higgins to be president which should be annotated as Dreamy Detergents named &lt;COREF ID=&amp;quot;1&amp;quot;&gt;Henry Higgins&lt;/COREF&gt; to be &lt;COREF ID=&amp;quot;2&amp;quot; REF=&amp;quot;1&amp;quot; TYPE=&amp;quot;IDENT&amp;quot;&gt;president&lt;/COREF&gt; When the copula is clearly implied by the semantics of the verb, coreference should be marked. Expressions of equivalence involving the word &amp;quot;as&amp;quot; will also be marked. The NPs enclosed in asterisks in the following examples will be marked coreferential: Dreamy Detergents named *Henry Higgins* *president* *Henry Higgins* is considered *Sudsy Soap's best sales director* *Higgins* will serve as *president of Dreamy Detergents* Cases may arise where an intensional descriptor may apply to two distinct entities, e.g., the current president of a company and a previous president are mentioned. The conventions require that the extensional descriptions guide the mark-up process, and therefore that these two chains *NOT* be collapsed into a single chain:  Both coreference chains contain the same intensional predicate, &amp;quot;sales director for Sudsy Soaps&amp;quot;, but these have different temporal extensional realizations. Although the occurrences of &amp;quot;sales director&amp;quot; are coreferential at the type level, the extensionally grounded chains take precedence, because it is critical to preserve independence of chains grounded in different extensions -- that is, to prevent obviously different individuals from ending up in the same IDENT coreference chain or equivalence class. Thus we &amp;quot;cut&amp;quot; the chain in the above example at the first &amp;quot;type&amp;quot; coreference that would cause collapsing (that is, that can point to a new extension), as in: &lt;COREF ID=&amp;quot;5&amp;quot;&gt;Fred&lt;/COREF&gt; resigned as &lt;COREF ID=&amp;quot;6&amp;quot; MIN=&amp;quot;president&amp;quot; REF=&amp;quot;5&amp;quot;&gt;president of IBM&lt;/COREF&gt;; next month, &lt;COREF ID=&amp;quot;7&amp;quot;&gt;the president&lt;/COREF&gt; will be &lt;COREF ID=&amp;quot;8&amp;quot; REF= &amp;quot;7&amp;quot;&gt;Mary&lt;/COREF&gt;.</Paragraph>
    </Section>
    <Section position="6" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
6.5 Types and Tokens
</SectionTitle>
      <Paragraph position="0"> The general principle for annotating coreference is that two markables are coreferential if they both refer to sets, and the sets are identical, or they both refer to types, and the types are identical. There are a number of problematic cases where one can argue whether something is a set or a type. There is no simple algorithm for determining the ontological category of a referent. There are, though, some useful rules. Most occurrences of bare plurals refer to types or kinds, not to sets. In ...*producers* don't like to see a hit wine increase in price... *Producers* have seen this market opening up and *they*'re now creating wines that appeal to these people.</Paragraph>
      <Paragraph position="1"> &amp;quot;producers&amp;quot;, &amp;quot;Producers&amp;quot;, and &amp;quot;they&amp;quot; refer to types and they all refer to the same type. Notice that if interpreted as referring to sets, they would not all refer to the same set. More properly, there is no reason to think they would corefer; not all the producers who have seen the market opening up have created new wines.</Paragraph>
      <Paragraph position="2"> Note that a type can be referred to by a bare plural, a definite singular NP (&amp;quot;the tiger is fast becoming extinct&amp;quot;) or a (bare) prenominal. In The action followed by one day an Intelogic announcement that it will retain an investment banker to explore alternatives &amp;quot;to maximize *shareholder* value,&amp;quot; including the possible sale of the company. Mr. Edelman declined to specify what prompted the recent moves, saying they are meant only to benefit *shareholders* when &amp;quot;the company is on a roll.&amp;quot; the two starred occurrences corefer to the type: shareholder (of Intelogic).</Paragraph>
    </Section>
    <Section position="7" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
6.6 Functions and Values
In
</SectionTitle>
      <Paragraph position="0"> GM announced *its third quarter profit*. *It* was *$0.02*.</Paragraph>
      <Paragraph position="1"> all three starred phrases refer to an amount of money; they all refer to the same amount of money. Hence they are coreferential. The first phrase, in context, refers to that amount via referring to a function, say of companies and quarters of a year--or times. (In addition, the &amp;quot;its&amp;quot; in the first NP would be linked to GM.) In General Motors announced {their third quarter profit of *$0.02*}.</Paragraph>
      <Paragraph position="2"> the bracketed and starred phrases are coreferential. They refer to one and the same amount of money. Note that here, as in the case of apposition, the result is that a phrase is marked as being coreferential with a part of the phrase.</Paragraph>
      <Paragraph position="3"> In *The temperature* is *90*....The temperature is rising.</Paragraph>
      <Paragraph position="4"> the first occurrence of &amp;quot;the temperature&amp;quot; is an intensional expression referring to the value (extension) of the function at arguments (places, times) supplied by context. *The temperature* is coreferential with &amp;quot;90&amp;quot; which grounds it. In the second occurrence, &amp;quot;the temperature&amp;quot; refers to the function (indirectly, by way of referring to the derivative of the function). So it is not coreferential with the first occurrence or with &amp;quot;90&amp;quot;. In the sequence The temperature was 90 yesterday and has already reached 95 today. This sets a new record high. we have a different problem: we have two extensional descriptions (90, 95) for the temperature, and only a single occurrence of the intensional description &amp;quot;temperature&amp;quot;. In this case, &amp;quot;temperature&amp;quot; is coreferential with the extensional description occurring in the same clause (&amp;quot;90&amp;quot;). As a result, &amp;quot;95&amp;quot; is in its own coreference class, and we are not able to mark the fact that it too is a temperature. However, &amp;quot;95&amp;quot; is coreferential with &amp;quot;This&amp;quot; and &amp;quot;a new record high&amp;quot;. This is marked as follows: &lt;COREF ID=&amp;quot;4&amp;quot;&gt;The temperature&lt;/COREF&gt; was &lt;COREF ID=&amp;quot;5&amp;quot; REF=&amp;quot;4&amp;quot;&gt;90&lt;/ COREF&gt; yesterday and has already reached &lt;COREF ID=&amp;quot;6&amp;quot;&gt;95&lt;/COREF&gt; today. &lt;COREF ID=&amp;quot;7&amp;quot; REF=&amp;quot;6&amp;quot;&gt;This&lt;/COREF&gt; sets &lt;COREF ID=&amp;quot;8&amp;quot; MIN=&amp;quot;high&amp;quot; REF=&amp;quot;7&amp;quot;&gt;a new record high&lt;/COREF&gt;.</Paragraph>
      <Paragraph position="5"> If both extensions are in the same clause, as in: The stock value rose from $8.05 to $9.15 The per share value of $8.05 rose to $9.15 at the end of trading.</Paragraph>
      <Paragraph position="6"> then the function takes on the most &amp;quot;current&amp;quot; value in its clause, e.g., *stock value* and *$9.15* are marked coreferential, and $8.05 is in its own class, not coreferential.</Paragraph>
    </Section>
    <Section position="8" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
6.7 Metonymy
</SectionTitle>
      <Paragraph position="0"> The pervasive phenomenon of metonymy raises a problem for Coreference relations. Do we annotate and recognize the relation before or after coercion? Here are some texts to consider:  (1) *The White House* sent its health care proposal to Congress yesterday. Senator Dole said *the administration*'s bill had little chance of passing.</Paragraph>
      <Paragraph position="1"> (2) *Ford* announced a new product line yesterday. *Ford* spokesman John Smith said *they* will start manufacturing widgets.</Paragraph>
      <Paragraph position="2"> (3) I bought the New York Times this morning. I read that the editor of the New York Times is resigning. (4) *The United States* is a democracy. *The United States* has an area of 3.5 million square miles.  We propose that coreference be determined with respect to coerced entities. Of course, this still leaves open the question as to the circumstances under which coercion is required. In (1) there is a coercion from the White House to the administration operating out of the White House, and that is IDENT with &amp;quot;the administration&amp;quot;; so &amp;quot;White House&amp;quot; and &amp;quot;administration&amp;quot; are IDENT. (Notice that there is also a question as to whether the adminstration's proposal is the same as its bill. This too requires a coercion of sorts.) In (2), while there might seem to be a coercion from Ford to a spokesman for Ford, we believe that such a coercion is not necessary, for it is plausible that corporations, as legal persons, can do many of the things that people can do--such as &amp;quot;announce.&amp;quot; They may have to do some or all such things through other agents, but many people do many things that way. And if Ford can announce, then it, through one of its spokesmen, can &amp;quot;say&amp;quot;. Believing that no coercion is required, we would mark as coreferential the first instance of &amp;quot;Ford&amp;quot;, the second instance of &amp;quot;Ford&amp;quot; (in the phrase &amp;quot;Ford spokesman John Smith&amp;quot;), and &amp;quot;they&amp;quot;, but would NOT mark the phrase &amp;quot;Ford spokesman John Smith&amp;quot; as coreferential with anything else in this passage. In (3) the first &amp;quot;New York Times&amp;quot; is coerced into a copy of the paper published by the New York Times and the second is coerced into the organization; so they are not IDENT. (4) is somewhat akin to (2). Countries are both geographical entities and governmental units. Thus, no coercion is necessary and the two starred occurrences are coreferential.</Paragraph>
      <Paragraph position="3"> In the absence of general principles, a body of such decisions will need to be developed to codify the rules for coercion and coreference. In cases where there has been no clear precedent, the answer keys for formal evaluations will need to mark coreference as optional.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="2" end_page="2" type="metho">
    <SectionTitle>
7. BASIS OF JUDGMENT
</SectionTitle>
    <Paragraph position="0"> The coreference judgments should be based on the intelligent reader's knowledge of the world resulting from his or her best understanding of the text. It should not be based on a theory of the structure of the text, or on a linguistic theory of how NPs are resolved, or on estimates of what the typical NLP system could do. This means that some relations will be impossible for current NLP systems to recover, but this is why the task will push the technology. The annotators should assume that they are typical intelligent readers.</Paragraph>
    <Paragraph position="1"> 8. SCORING AND THE ORDERING OF LINKS If three markables, A, B, and C, are coreferential, this relationship could be recorded in the key in several ways: for example, by a REF pointer in both B and C pointing to A, or by a REF pointer in B pointing to A and a REF pointer in C pointing to B. A similar range of variations is possible in a system response. The current scoring rules provide that any correct key, when compared to any correct response, will yield a 100% recall/100% precision score, independent of the way the coreference relation is encoded in the key by REF pointers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML