File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2015_metho.xml

Size: 9,568 bytes

Last Modified: 2025-10-06 14:10:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2015">
  <Title>OntoNotes: The 90% Solution</Title>
  <Section position="4" start_page="57" end_page="57" type="metho">
    <SectionTitle>
3 PropBanking
</SectionTitle>
    <Paragraph position="0"> The Penn Proposition Bank, funded by ACE (DOD), focuses on the argument structure of verbs, and provides a corpus annotated with semantic roles, including participants traditionally viewed as arguments and adjuncts. The 1M word Penn Tree-bank II Wall Street Journal corpus has been successfully annotated with semantic argument structures for verbs and is now available via the Penn Linguistic Data Consortium as PropBank I (Palmer et al., 2005). Links from the argument labels in the Frames Files to FrameNet frame elements and VerbNet thematic roles are being added.</Paragraph>
    <Paragraph position="1"> This style of annotation has also been successfully applied to other genres and languages.</Paragraph>
  </Section>
  <Section position="5" start_page="57" end_page="58" type="metho">
    <SectionTitle>
4 Word Sense
</SectionTitle>
    <Paragraph position="0"> Word sense ambiguity is a continuing major obstacle to accurate information extraction, summarization and machine translation. The subtle fine-grained sense distinctions in WordNet have not lent themselves to high agreement between human annotators or high automatic tagging performance.</Paragraph>
    <Paragraph position="1"> Building on results in grouping fine-grained WordNet senses into more coarse-grained senses that led to improved inter-annotator agreement (ITA) and system performance (Palmer et al., 2004; Palmer et al., 2006), we have developed a process for rapid sense inventory creation and annotation that includes critical links between the grouped word senses and the Omega ontology (Philpot et al., 2005; see Section 5 below).</Paragraph>
    <Paragraph position="2"> This process is based on recognizing that sense distinctions can be represented by linguists in an hierarchical structure, similar to a decision tree, that is rooted in very coarse-grained distinctions which become increasingly fine-grained until reaching WordNet senses at the leaves. Sets of senses under specific nodes of the tree are grouped together into single entries, along with the syntactic and semantic criteria for their groupings, to be presented to the annotators.</Paragraph>
    <Paragraph position="3"> As shown in Figure 1, a 50-sentence sample of instances is annotated and immediately checked for inter-annotator agreement. ITA scores below 90% lead to a revision and clarification of the groupings by the linguist. It is only after the groupings have passed the ITA hurdle that each individual group is linked to a conceptual node in the ontology. In addition to higher accuracy, we find at least a three-fold increase in annotator productivity.</Paragraph>
    <Paragraph position="4">  As part of OntoNotes we are annotating the most frequent noun and verb senses in a 300K subset of the PropBank, and will have this data available for release in early 2007.</Paragraph>
    <Section position="1" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
4.1 Verbs
</SectionTitle>
      <Paragraph position="0"> Our initial goal is to annotate the 700 most frequently occurring verbs in our data, which are typically also the most polysemous; so far 300 verbs have been grouped and 150 double annotated. Subcategorization frames and semantic classes of arguments play major roles in determining the groupings, as illustrated by the grouping for the 22 WN 2.1 senses for drive in Figure 2. In ad- null WN1: &amp;quot;Can you drive a truck?&amp;quot;, WN2: &amp;quot;drive to school,&amp;quot;, WN3: &amp;quot;drive her to school,&amp;quot;, WN12: &amp;quot;this truck drives well,&amp;quot; WN13: &amp;quot;he drives a taxi,&amp;quot;,WN14: &amp;quot;The car drove around the corner,&amp;quot;, WN:16: &amp;quot;drive the turnpike to work,&amp;quot; G2: force to a position or stance NP drive NP/PP/infinitival WN4: &amp;quot;He drives me mad.,&amp;quot; WN6: &amp;quot;drive back the invaders,&amp;quot; WN7: &amp;quot;She finally drove him to change jobs,&amp;quot; WN8: &amp;quot;drive a nail,&amp;quot; WN15: &amp;quot;drive the herd,&amp;quot; WN22: &amp;quot;drive the game.&amp;quot; G3: to exert energy on behalf of something NP drive NP/infinitival WN5: &amp;quot;Her passion drives her,&amp;quot; WN10: &amp;quot;He is driving away at his thesis.&amp;quot; G4: cause object to move rapidly by striking it NP drive NP WN9: &amp;quot;drive the ball into the outfield ,&amp;quot; WN17 &amp;quot;drive a golf ball,&amp;quot; WN18 &amp;quot;drive a ball&amp;quot;  dition to improved annotator productivity and accuracy, we predict a corresponding improvement in word sense disambiguation performance. Training on this new data, Chen and Palmer (2005) report 86.3% accuracy for verbs using a smoothed maximum entropy model and rich linguistic features, which is 10% higher than their earlier, state-of-the art performance on ungrouped, fine-grained senses.</Paragraph>
    </Section>
    <Section position="2" start_page="58" end_page="58" type="sub_section">
      <SectionTitle>
4.2 Nouns
</SectionTitle>
      <Paragraph position="0"> We follow a similar procedure for the annotation of nouns. The same individual who groups Word-Net verb senses also creates noun senses, starting with WordNet and other dictionaries. We aim to double-annotate the 1100 most frequent polysemous nouns in the initial corpus by the end of 2006, while maximizing overlap with the sentences containing annotated verbs.</Paragraph>
      <Paragraph position="1"> Certain nouns carry predicate structure; these include nominalizations (whose structure obviously is derived from their verbal form) and various types of relational nouns (like father, President, and believer, that express relations between entities, often stated using of). We have identified a limited set of these whose structural relations can be semi-automatically annotated with high accuracy.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="58" end_page="58" type="metho">
    <SectionTitle>
5 Ontology
</SectionTitle>
    <Paragraph position="0"> In standard dictionaries, the senses for each word are simply listed. In order to allow access to additional useful information, such as subsumption, property inheritance, predicate frames from other sources, links to instances, and so on, our goal is to link the senses to an ontology. This requires decomposing the hierarchical structure into subtrees which can then be inserted at the appropriate conceptual node in the ontology.</Paragraph>
    <Paragraph position="1"> The OntoNotes terms are represented in the 110,000-node Omega ontology (Philpot et al., 2005), under continued construction and extension at ISI. Omega, which has been used for MT, summarization, and database alignment, has been assembled semi-automatically by merging a variety of sources, including Princeton's WordNet, New Mexico State University's Mikrokosmos, and a variety of Upper Models, including DOLCE (Gangemi et al., 2002), SUMO (Niles and Pease, 2001), and ISI's Upper Model, which are in the process of being reconciled. The verb frames from PropBank, FrameNet, WordNet, and Lexical Conceptual Structures (Dorr and Habash, 2001) have all been included and cross-linked.</Paragraph>
    <Paragraph position="2"> In work planned for later this year, verb and noun sense groupings will be manually inserted into Omega, replacing the current (primarily WordNet-derived) contents. For example, of the verb groups for drive in the table above, G1 and G4 will be placed into the area of &amp;quot;controlled motion&amp;quot;, while G2 will then sort with &amp;quot;attitudes&amp;quot;.</Paragraph>
  </Section>
  <Section position="7" start_page="58" end_page="58" type="metho">
    <SectionTitle>
6 Coreference
</SectionTitle>
    <Paragraph position="0"> The coreference annotation in OntoNotes connects coreferring instances of specific referring expressions, meaning primarily NPs that introduce or access a discourse entity. For example, &amp;quot;Elco Industries, Inc.&amp;quot;, &amp;quot;the Rockford, Ill. Maker of fasteners&amp;quot;, and &amp;quot;it&amp;quot; could all corefer. (Non-specific references like &amp;quot;officials&amp;quot; in &amp;quot;Later, officials reported...&amp;quot; are not included, since coreference for them is frequently unclear.) In addition, proper premodifiers and verb phrases can be marked when coreferent with an NP, such as linking, &amp;quot;when the company withdrew from the bidding&amp;quot; to &amp;quot;the withdrawal of New England Electric&amp;quot;.</Paragraph>
    <Paragraph position="1"> Unlike the coreference task as defined in the ACE program, attributives are not generally marked. For example, the &amp;quot;veterinarian&amp;quot; NP would not be marked in &amp;quot;Baxter Black is a large animal veterinarian&amp;quot;. Adjectival modifiers like &amp;quot;American&amp;quot; in &amp;quot;the American embassy&amp;quot; are also not sub-ject to coreference.</Paragraph>
    <Paragraph position="2"> Appositives are annotated as a special kind of coreference, so that later processing will be able to supply and interpret the implicit copula link.</Paragraph>
    <Paragraph position="3"> All of the coreference annotation is being doubly annotated and adjudicated. In our initial English batch, the average agreement scores between each annotator and the adjudicated results were 91.8% for normal coreference and 94.2% for appositives. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML