XML Viewer - j05-1004

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/j05-1004_metho.xml
Size: 33,806 bytes
Last Modified: 2025-10-06 14:09:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="J05-1004">
  <Title>The Proposition Bank: An Annotated Corpus of Semantic Roles</Title>
  <Section position="3" start_page="73" end_page="76" type="metho">
    <SectionTitle>
2. Semantic Roles and Syntactic Alternation
</SectionTitle>
    <Paragraph position="0"> Our work in examining verb alternation behavior is inspired by previous research into the linking between semantic roles and syntactic realization, in particular, the comprehensive study of Levin (1993). Levin argues that syntactic frames are a direct reflection of the underlying semantics; the sets of syntactic frames associated with a particular Levin class reflect underlying semantic components that constrain allowable arguments. On this principle, Levin defines verb classes based on the ability of particular verbs to occur or not occur in pairs of syntactic frames that are in some sense meaning-preserving (diathesis alternations). The classes also tend to share some semantic component. For example, the break examples above are related by a transitive/intransitive alternation called the causative/inchoative alternation. Break and other verbs such as shatter and smash are also characterized by their ability to appear in the middle construction, as in Glass breaks/shatters/smashes easily. Cut,a similar change-of-state verb, seems to share in this syntactic behavior and can also appear in the transitive (causative) as well as the middle construction: John cut the bread, This loaf cuts easily. However, it cannot also occur in the simple intransitive: The window broke/*The bread cut. In contrast, cut verbs can occur in the conative--John valiantly cut/hacked at the frozen loaf, but his knife was too dull to make a dent in it--whereas break verbs cannot: *John broke at the window. The explanation given is that cut describes a series of actions directed at achieving the goal of separating some object into pieces.</Paragraph>
    <Paragraph position="1"> These actions consist of grasping an instrument with a sharp edge such as a knife and applying it in a cutting fashion to the object. It is possible for these actions to be Palmer, Gildea, and Kingsbury The Proposition Bank performed without the end result being achieved, but such that the cutting manner can still be recognized, for example, John cut at the loaf. Where break is concerned, the only thing specified is the resulting change of state, in which the object becomes separated into pieces.</Paragraph>
    <Paragraph position="2"> VerbNet (Kipper, Dang, and Palmer 2000; Kipper, Palmer, and Rambow 2002) extends Levin's classes by adding an abstract representation of the syntactic frames for each class with explicit correspondences between syntactic positions and the semantic roles they express, as in Agent REL Patient or Patient REL into pieces for break.</Paragraph>
    <Paragraph position="3">  (For other extensions of Levin, see also Dorr and Jones [2000] and Korhonen, Krymolowsky, and Marx [2003].) The original Levin classes constitute the first few levels in the hierarchy, with each class subsequently refined to account for further semantic and syntactic differences within a class. The argument list consists of thematic labels from a set of 20 such possible labels (Agent, Patient, Theme, Experiencer, etc.). The syntactic frames represent a mapping of the list of schematic labels to deep-syntactic arguments.</Paragraph>
    <Paragraph position="4"> Additional semantic information for the verbs is expressed as a set (i.e., conjunction) of semantic predicates, such as motion, contact, transfer_info. Currently, all Levin verb classes have been assigned thematic labels and syntactic frames, and over half the classes are completely described, including their semantic predicates. In many cases, the additional information that VerbNet provides for each class has caused it to subdivide, or use intersections of, Levin's original classes, adding an additional level to the hierarchy (Dang et al. 1998). We are also extending the coverage by adding new classes (Korhonen and Briscoe 2004).</Paragraph>
    <Paragraph position="5"> Our objective with the Proposition Bank is not a theoretical account of how and why syntactic alternation takes place, but rather to provide a useful level of representation and a corpus of annotated data to enable empirical study of these issues. We have referred to Levin's classes wherever possible to ensure that verbs in the same classes are given consistent role labels. However, there is only a 50% overlap between verbs in VerbNet and those in the Penn TreeBank II, and PropBank itself does not define a set of classes, nor does it attempt to formalize the semantics of the roles it defines.</Paragraph>
    <Paragraph position="6"> While lexical resources such as Levin's classes and VerbNet provide information about alternation patterns and their semantics, the frequency of these alternations and their effect on language understanding systems has never been carefully quantified.</Paragraph>
    <Paragraph position="7"> While learning syntactic subcategorization frames from corpora has been shown to be possible with reasonable accuracy (Manning 1993; Brent 1993; Briscoe and Carroll 1997), this work does not address the semantic roles associated with the syntactic arguments. More recent work has attempted to group verbs into classes based on alternations, usually taking Levin's classes as a gold standard (McCarthy 2000; Merlo and Stevenson 2001; Schulte im Walde 2000; Schulte im Walde and Brew 2002). But without an annotated corpus of semantic roles, this line of research has not been able to measure the frequency of alternations directly, or more generally, to ascertain how well the classes defined by Levin correspond to real-world data.</Paragraph>
    <Paragraph position="8"> We believe that a shallow labeled dependency structure provides a feasible level of annotation which, coupled with minimal coreference links, could provide the foundation for a major advance in our ability to extract salient relationships from text. This will in turn improve the performance of basic parsing and generation  2 These can be thought of as a notational variant of tree-adjoining grammar elementary trees or tree-adjoining grammar partial derivations (Kipper, Dang, and Palmer 2000).</Paragraph>
    <Paragraph position="9"> Computational Linguistics Volume 31, Number 1  components, as well as facilitate advances in text understanding, machine translation, and fact retrieval.</Paragraph>
    <Paragraph position="10"> 3. Annotation Scheme: Choosing the Set of Semantic Roles Because of the difficulty of defining a universal set of semantic or thematic roles covering all types of predicates, PropBank defines semantic roles on a verb-by-verb basis. An individual verb's semantic arguments are numbered, beginning with zero.</Paragraph>
    <Paragraph position="11"> For a particular verb, Arg0 is generally the argument exhibiting features of a Prototypical Agent (Dowty 1991), while Arg1 is a Prototypical Patient or Theme. No consistent generalizations can be made across verbs for the higher-numbered arguments, though an effort has been made to consistently define roles across members of VerbNet classes. In addition to verb-specific numbered roles, PropBank defines several more general roles that can apply to any verb. The remainder of this section describes in detail the criteria used in assigning both types of roles.</Paragraph>
    <Paragraph position="12"> As examples of verb-specific numbered roles, we give entries for the verbs accept and kick below. These examples are taken from the guidelines presented to the annotators and are also available on the Web at http://www.cis.upenn.edu/~ cotton/ cgi-bin/pblex_fmt.cgi.</Paragraph>
    <Paragraph position="14"> the football], but Mary pulled it away at the last moment. A set of roles corresponding to a distinct usage of a verb is called a roleset and can be associated with a set of syntactic frames indicating allowable syntactic variations in the expression of that set of roles. The roleset with its associated frames is called a Palmer, Gildea, and Kingsbury The Proposition Bank frameset. A polysemous verb may have more than one frameset when the differences in meaning are distinct enough to require a different set of roles, one for each frameset. The tagging guidelines include a ''descriptor'' field for each role, such as ''kicker'' or ''instrument,'' which is intended for use during annotation and as documentation but does not have any theoretical standing. In addition, each frameset is complemented by a set of examples, which attempt to cover the range of syntactic alternations afforded by that usage. The collection of frameset entries for a verb is referred to as the verb's frames file.</Paragraph>
    <Paragraph position="15"> The use of numbered arguments and their mnemonic names was instituted for a number of reasons. Foremost, the numbered arguments plot a middle course among many different theoretical viewpoints.</Paragraph>
    <Paragraph position="16">  The numbered arguments can then be mapped easily and consistently onto any theory of argument structure, such as traditional theta role (Kipper, Palmer, and Rambow 2002), lexical-conceptual structure (Rambow et al. 2003), or Prague tectogrammatics (HajicVova and KucVerova' 2002).</Paragraph>
    <Paragraph position="17"> While most rolesets have two to four numbered roles, as many as six can appear, in particular for certain verbs of motion:  in last year's third quarter]. (wsj_1210) Because of the use of Arg0 for agency, there arose a small set of verbs in which an external force could cause the Agent to execute the action in question. For example, in the sentence . . . Mr. Dinkins would march his staff out of board meetings and into his private office . . . (wsj_0765), the staff is unmistakably the marcher, the agentive role. Yet Mr. Dinkins also has some degree of agency, since he is causing the staff to do the marching. To capture this, a special tag, ArgA, is used for the agent of an induced action. This ArgA tag is used only for verbs of volitional motion such as march and walk, modern uses of volunteer (e.g., Mary volunteered John to clean the garage, or more likely the passive of that, John was volunteered to clean the garage), and, with some hesitation, graduate based on usages such as Penn only graduates 35% of its students. (This usage does not occur as such in the Penn Treebank corpus, although it is evoked in the sentence No student should be permitted to be graduated from elementary school without having mastered the 3 R's at the level that prevailed 20 years ago. (wsj_1286)) In addition to the semantic roles described in the rolesets, verbs can take any of a set of general, adjunct-like arguments (ArgMs), distinguished by one of the function tags shown in Table 1. Although they are not considered adjuncts, NEG for verb-level negation (e.g., John didn't eat his peas) and MOD for modal verbs (e.g., John would eat</Paragraph>
  </Section>
  <Section position="4" start_page="76" end_page="83" type="metho">
    <SectionTitle>
3 By following the treebank, however, we are following a very loose government-binding framework.
4 We make no attempt to adhere to any linguistic distinction between arguments and adjuncts. While many
</SectionTitle>
    <Paragraph position="0"> linguists would consider any argument higher than Agr2 or Agr3 to be an adjunct, such arguments occur frequently enough with their respective verbs, or classes of verbs, that they are assigned a number in order to ensure consistent annotation.</Paragraph>
    <Paragraph position="1"> Computational Linguistics Volume 31, Number 1  everything else) are also included in this list to allow every constituent surrounding the verb to be annotated. DIS is also not an adjunct but is included to ease future discourse connective annotation.</Paragraph>
    <Section position="1" start_page="77" end_page="77" type="sub_section">
      <SectionTitle>
3.1 Distinguishing Framesets
</SectionTitle>
      <Paragraph position="0"> The criteria for distinguishing framesets are based on both semantics and syntax. Two verb meanings are distinguished as different framesets if they take different numbers of arguments. For example, the verb decline has two framesets:  However, alternations which preserve verb meanings, such as causative/inchoative or object deletion, are considered to be one frameset only, as shown in the example (17). Both the transitive and intransitive uses of the verb open correspond to the same frameset, with some of the arguments left unspecified:  Subtypes of the ArgM modifier tag.</Paragraph>
      <Paragraph position="1"> LOC: location CAU: cause EXT: extent TMP: time DIS: discourse connectives PNC: purpose ADV: general purpose MNR: manner NEG: negation marker DIR: direction MOD: modal verb Palmer, Gildea, and Kingsbury The Proposition Bank  with his foot] Moreover, differences in the syntactic type of the arguments do not constitute criteria for distinguishing among framesets. For example, see.01 allows for either an NP object or a clause object:  my intake of chocolate].</Paragraph>
      <Paragraph position="2"> Note that the verb and particle do not need to be contiguous; (20) above could just as well be phrased The seed companies cut the tassels of each plant off. For the WSJ text, there are frames for over 3,300 verbs, with a total of just over 4,500 framesets described, implying an average polysemy of 1.36. Of these verb frames, only 21.6% (721/3342) have more than one frameset, while less than 100 verbs have four or more. Each instance of a polysemous verb is marked as to which frameset it belongs to, with interannotator (ITA) agreement of 94%. The framesets can be viewed as extremely coarse-grained sense distinctions, with each frameset corresponding to one or more of the Senseval 2 WordNet 1.7 verb groupings. Each grouping in turn corresponds to several WordNet 1.7 senses (Palmer, Babko-Malaya, and Dang 2004).</Paragraph>
    </Section>
    <Section position="2" start_page="77" end_page="77" type="sub_section">
      <SectionTitle>
3.2 Secondary Predications
</SectionTitle>
      <Paragraph position="0"> There are two other functional tags which, unlike those listed above, can also be associated with numbered arguments in the frames files. The first one, EXT (extent), indicates that a constituent is a numerical argument on its verb, as in climbed 15% or walked 3 miles. The second, PRD (secondary predication), marks a more subtle relationship. If one thinks of the arguments of a verb as existing in a dependency tree, all arguments depend directly on the verb. Each argument is basically independent of the others. There are those verbs, however, which predict that there is a predicative relationship between their arguments. A canonical example of this is call in the sense of ''attach a label to,'' as in Mary called John an idiot. In this case there is a relationship between John and an idiot (at least in Mary's mind). The PRD tag is associated with the Arg2 label in the frames file for this frameset, since it is predictable that the Arg2 predicates on the Arg1 John. This helps to disambiguate the crucial difference between the following two sentences: predicative reading ditransitive reading  Arg2-PRD: a doctor (attribute) Arg1: a doctor (thing summoned) It is also possible for ArgMs to predicate on another argument. Since this must be decided on a case-by-case basis, the PRD function tag is added to the ArgM by the annotator, as in example (28).</Paragraph>
      <Paragraph position="1"> 5 This sense could also be stated in the dative: Mary called a doctor for John. Palmer, Gildea, and Kingsbury The Proposition Bank</Paragraph>
    </Section>
    <Section position="3" start_page="77" end_page="77" type="sub_section">
      <SectionTitle>
3.3 Subsumed Arguments
</SectionTitle>
      <Paragraph position="0"> Because verbs which share a VerbNet class are rarely synonyms, their shared argument structure occasionally takes on odd characteristics. Of primary interest among these are the cases in which an argument predicted by one member of a class cannot be attested by another member of the same class. For a relatively simple example, consider the verb hit, in VerbNet classes 18.1 and 18.4. This takes three very obvious arguments:  VerbNet classes 18.1 and 18.4 are filled with verbs of hitting, such as beat, hammer, kick, knock, strike, tap, and whack. For some of these the instrument of hitting is necessarily included in the semantics of the verb itself. For example, kick is essentially ''hit with the foot'' and hammer is exactly ''hit with a hammer.'' For these verbs, then, the Arg2 might not be available, depending on how strongly the instrument is incorporated into the verb. Kick, for example, shows 28 instances in the treebank but only one instance of a (somewhat marginal) instrument:  endless trail of bad news]. (wsj_2428) Another perhaps more interesting case is that in which two arguments can be merged into one in certain syntactic situations. Consider the case of meet, which canonically takes two arguments:  to discuss global trade as well as regional matters such as transportation and telecommunications]. (wsj_0043) In these cases there is an assumed or default Arg1 along the lines of ''each other'':</Paragraph>
    </Section>
    <Section position="4" start_page="77" end_page="83" type="sub_section">
      <SectionTitle>
3.4 Role Labels and Syntactic Trees
</SectionTitle>
      <Paragraph position="0"> The Proposition Bank assigns semantic roles to nodes in the syntactic trees of the Penn Treebank. Annotators are presented with the roleset descriptions and the syntactic tree and mark the appropriate nodes in the tree with role labels. The lexical heads of constituents are not explicitly marked either in the treebank trees or in the semantic labeling layered on top of them. Annotators cannot change the syntactic parse, but they are not otherwise restricted in assigning the labels. In certain cases, more than one node may be assigned the same role. The annotation software does not require that the nodes being assigned labels be in any syntactic relation to the verb. We discuss the ways in which we handle the specifics of the treebank syntactic annotation style in this section.</Paragraph>
      <Paragraph position="1"> Palmer, Gildea, and Kingsbury The Proposition Bank  several factors. On one hand, if a given argument is defined as a ''destination,'' then in a sentence such as John poured the water into the bottle, the destination of the water is clearly the bottle, not ''into the bottle.'' The fact that the water is going into the bottle is inherent in the description ''destination''; the preposition merely adds the specific information that the water will end up inside the bottle. Thus arguments should properly be associated with the NP heads of prepositional phrases. On the other hand, however, ArgMs which are prepositional phrases are annotated at the PP level, not the NP level. For the sake of consistency, then, numbered arguments are also tagged at the PP level. This also facilitates the treatment of multiword prepositions such as out of, according to, and up to but not including.</Paragraph>
      <Paragraph position="2">  in the first 9 months of 1989] (wsj_0067)  as traces, which are often coindexed with other constituents in the tree. When a trace is assigned a role label by an annotator, the coindexed constituent is automatically added to the annotation, as in  the football], but Mary pulled it away at the last moment.</Paragraph>
      <Paragraph position="3"> Verbs such as cause, force, and persuade, known as object control verbs, pose a problem for the analysis and annotation of semantic structure. Consider a sentence such as Commonwealth Edison said the ruling could force it to slash its 1989 earnings by $1.55 a share. (wsj_0015). The Penn Treebank's analysis assigns a single sentential (S) constituent to the entire string it to slash . . . a share, making it a single syntactic argument to the verb force. In the PropBank annotation, we split the sentential complement into two semantic roles for the verb force, assigning roles to the noun phrase and verb phrase but not to the S node which subsumes them:  While it is the Arg0 of force, it is the Arg1 of slash. Similarly, subject control verbs such as promise result in the subject of the main clause being assigned two roles, one for each verb:  by the year 2000].</Paragraph>
      <Paragraph position="4"> We did not find a single case of a subject control verb used with a direct object and an infinitival clause (e.g., John promised Mary to come) in the Penn Treebank. The cases above must be contrasted with verbs such as expect, often referred as exceptional case marking (ECM) verbs, where an infinitival subordinate clause is a single semantic argument:  property that the verb and its subject can be inserted almost anywhere within another of the verb's arguments. While the canonical realization is John said (that) Mary was going to eat outside at lunchtime today, it is common to say Mary, John said, was going to eat outside at lunchtime today or Mary was going to eat outside, John said, at lunchtime today.In this situation, there is no constituent holding the whole of the utterance while not also holding the verb of saying. We annotate these cases by allowing a single semantic role to point to the component pieces of the split constituent in order to cover the correct, discontinuous substring of the sentence.</Paragraph>
      <Paragraph position="5">  the new funds have become ''extremely attractive to Japanese and other investors outside the U.S.''] (wsj_0029) In the flat structure we have been using for example sentences, this looks like a case of repeated role labels. Internally, however, there is one role label pointing to multiple constituents of the tree, shown in Figure 1.</Paragraph>
      <Paragraph position="6"> Palmer, Gildea, and Kingsbury The Proposition Bank</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="83" end_page="87" type="metho">
    <SectionTitle>
4. The Propbank Development Process
</SectionTitle>
    <Paragraph position="0"> Since the Proposition Bank consists of two portions, the lexicon of frames files and the annotated corpus, the process is similarly divided into framing and annotation.</Paragraph>
    <Section position="1" start_page="83" end_page="83" type="sub_section">
      <SectionTitle>
4.1 Framing
</SectionTitle>
      <Paragraph position="0"> The process of creating the frames files, that is, the collection of framesets for each lexeme, begins with the examination of a sample of the sentences from the corpus containing the verb under consideration. These instances are grouped into one or more major senses, and each major sense is turned into a single frameset. To show all the possible syntactic realizations of the frameset, many sentences from the corpus are included in the frames file, in the same format as the examples above. In many cases a particular realization will not be attested within the Penn Treebank corpus; in these cases, a constructed sentence is used, usually identified by the presence of the characters of John and Mary. Care was taken during the framing process to make synonymous verbs (mostly in the sense of ''sharing a VerbNet Class'') have the same framing, with the same number of roles and the same descriptors on those roles.</Paragraph>
      <Paragraph position="1"> Generally speaking, a given lexeme/sense pair required 10-15 minutes to frame, although highly polysemous verbs could require longer. With the 4,500+ framesets currently in place for PropBank, this is clearly a substantial time investment, and the frames files represent an important resource in their own right. We were able to use membership in a VerbNet class which already had consistent framing to project accurate frames files for up to 300 verbs. If the overlap between VerbNet and PropBank had been more than 50%, this number might have been higher.</Paragraph>
    </Section>
    <Section position="2" start_page="83" end_page="87" type="sub_section">
      <SectionTitle>
4.2 Annotation
</SectionTitle>
      <Paragraph position="0"> We begin the annotation process by running a rule-based argument tagger (Palmer, Rosenzweig, and Cotton 2001) on the corpus. This tagger incorporates an extensive lexicon, entirely separate from that used by PropBank, which encodes class-based  mappings between grammatical and semantic roles. The rule-based tagger achieved 83% accuracy on pilot data, with many of the errors due to differing assumptions made in defining the roles for a particular verb. The output of this tagger is then corrected by hand. Annotators are presented with an interface which gives them access to both the frameset descriptions and the full syntactic parse of any sentence from the treebank and allows them to select nodes in the parse tree for labeling as arguments of the predicate selected. For any verb they are able to examine both the descriptions of the arguments and the example tagged sentences, much as they have been presented here. The tagging is done on a verb-by-verb basis, known as lexical sampling, rather than all-words annotation of running text.</Paragraph>
      <Paragraph position="1"> The downside of this approach is that it does not quickly provide a stretch of fully annotated text, needed for early assessment of the usefulness of the resource (see subsequent sections). For this reason a domain-specific subcorpus was automatically extracted from the entirety of the treebank, consisting of texts roughly primarily concerned with financial reporting and identified by the presence of a dollar sign anywhere in the text. This ''financial'' subcorpus comprised approximately one-third of the treebank and served as the initial focus of annotation.</Paragraph>
      <Paragraph position="2"> The treebank as a whole contains 3,185 unique verb lemmas, while the financial subcorpus contains 1,826. These verbs are arrayed in a classic Zipfian distribution, with a few verbs occurring very often (say, for example, is the most common verb, with over 10,000 instances in its various inflectional forms) and most verbs occurring two or fewer times. As with the distribution of the lexical items themselves, the framesets also display a Zipfian distribution: A small number of verbs have many framesets ( go has 20 when including phrasal variants, and come, get, make, pass, take, and turn each have more than a dozen) while the majority of verbs (2581/3342) have only one frameset.</Paragraph>
      <Paragraph position="3"> For polysemous verbs annotators had to determine which frameset was appropriate for a given usage in order to assign the correct argument structure, although this information was explicitly marked only during a separate pass.</Paragraph>
      <Paragraph position="4"> Annotations were stored in a stand-off notation, referring to nodes within the Penn Treebank without actually replicating any of the lexical material or structure of that corpus. The process of annotation was a two-pass, blind procedure followed by an adjudication phase to resolve differences between the two initial passes. Both role labeling decisions and the choice of frameset were adjudicated.</Paragraph>
      <Paragraph position="5"> The annotators themselves were drawn from a variety of backgrounds, from undergraduates to holders of doctorates, including linguists, computer scientists, and others. Undergraduates have the advantage of being inexpensive but tend to work for only a few months each, so they require frequent training. Linguists make the best overall judgments although several of our nonlinguist annotators also had excellent skills. The learning curve for the annotation task tended to be very steep, with most annotators becoming comfortable with the process within three days of work. This contrasts favorably with syntactic annotation, which has a much longer learning curve (Marcus, personal communication), and indicates one of the advantages of using a corpus already syntactically parsed as the basis of semantic annotation. Over 30 annotators contributed to the project, some for just a few weeks, some for up to three years. The framesets were created and annotation disagreements were adjudicated by a small team of highly trained linguists: Paul Kingsbury created the frames files and managed the annotators, and Olga Babko-Malaya checked the frames files for consistency and did the bulk of the adjudication.</Paragraph>
      <Paragraph position="6"> We measured agreement between the two annotations before the adjudication step using the kappa statistic (Siegel and Castellan 1988), which is defined with respect to Palmer, Gildea, and Kingsbury The Proposition Bank the probability of interannotator agreement, PdATh, and the agreement expected by chance, PdETh:</Paragraph>
      <Paragraph position="8"> Measuring interannotator agreement for PropBank is complicated by the large number of possible annotations for each verb. For role identification, we expect agreement between annotators to be much higher than chance, because while any node in the parse tree can be annotated, the vast majority of arguments are chosen from the small number of nodes near the verb. In order to isolate the role classification decisions from this effect and avoid artifically inflating the kappa score, we split role identification (role vs. nonrole) from role classification (Arg0 vs. Arg1 vs. ...) and calculate kappa for each decision separately. Thus, for the role identification kappa, the interannotator agreement probability PdATh is the number of node observation agreements divided by the total number of nodes considered, which is the number of nodes in each parse tree multiplied by the number of predicates annotated in the sentence. All the PropBank data were annotated by two people, and in calculating kappa we compare these two annotations, ignoring the specific identities of the annotators for the predicate (in practice, agreement varied with the training and skill of individual annotators). For the role classification kappa, we consider only nodes that were marked as arguments by both annotators and compute kappa over the choices of possible argument labels. For both role identification and role classification, we compute kappa for two ways of treating ArgM labels. The first is to treat ArgM labels as arguments like any other, in which case ArgM-TMP, ArgM-LOC, and so on are considered separate labels for the role classification kappa. In the second scenario, we ignore ArgM labels, treating them as unlabeled nodes, and calculate agreement for identification and classification of numbered arguments only.</Paragraph>
      <Paragraph position="9"> Kappa statistics for these various decisions are shown in Table 2. Agreement on role identification is very high (.99 under both treatments of ArgM), given the large number of obviously irrelevant nodes. Reassuringly, kappas for the more difficult role classification task are also high: .93 including all types of ArgM and .96 considering only numbered arguments. Kappas on the combined identification and classication decision, calculated over all nodes in the tree, are .91 including all sub-types of ArgM and .93 over numbered arguments only. Interannotator agreement among nodes that either annotator identified as an argument was .84, including ArgMs and .87, excluding ArgMs.</Paragraph>
      <Paragraph position="10"> Discrepancies between annotators tended to be less on numbered arguments than on the selection of function tags, as shown in the confusion matrices of Tables 3 and 4.</Paragraph>
      <Paragraph position="11">  Certain types of functions, particularly those represented by the tags ADV, MNR, and DIS, can be difficult to distinguish. For example, in the sentence Also, substantially lower Dutch corporate tax rates helped the company keep its tax outlay flat relative to earnings growth (wsj_0132), the phrase relative to earnings growth could be interpreted as a manner adverbial (MNR), describing how the tax outlays were kept flat, or as a general-purpose adverbial (ADV), merely providing more information on the keeping event. Similarly, a word such as then can have several functions. It is canonically a temporal adverb marking time or a sequence of events (. . . the Senate then broadened the list further . . . (wsj_0101)) but can also mark a consequence of another action (...iffor any reason I don't have the values, then I won't recommend it. (wsj_0331)) or simply serve as a placeholder in conversation (It's possible then that Santa Fe's real estate . . . could one day fetch a king's ransom (wsj_0331)). These three usages require three different taggings (TMP, ADV, and DIS, respectively) and can easily trip up an annotator.</Paragraph>
      <Paragraph position="12"> The financial subcorpus was completely annotated and given a preadjudication release in June 2002. The fully annotated and adjudicated corpus was completed in March 2004. Both of these are available through the Linguistic Data Consortium, although because of the use of the stand-off notation, prior possession of the treebank is also necessary. The frames files are distributed separately and are available through the project Web site at http://www.cis.upenn.edu/~ace/.</Paragraph>
      <Paragraph position="13"> Table 3 Confusion matrix for argument labels, with ArgM labels collapsed into one category. Entries are a fraction of total annotations; true zeros are omitted, while other entries are rounded to zero.  Confusion matrix among subtypes of ArgM, defined in Table 1. Entries are fraction of all ArgM labels. Entries are a fraction of all ArgM labels; true zeros are omitted, while other entries are rounded to zero.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML