File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/j05-1004_abstr.xml
Size: 6,249 bytes
Last Modified: 2025-10-06 13:44:24
<?xml version="1.0" standalone="yes"?> <Paper uid="J05-1004"> <Title>The Proposition Bank: An Annotated Corpus of Semantic Roles</Title> <Section position="2" start_page="0" end_page="73" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Robust syntactic parsers, made possible by new statistical techniques (Ratnaparkhi 1997; Collins 1999, 2000; Bangalore and Joshi 1999; Charniak 2000) and by the availability of large, hand-annotated training corpora (Marcus, Santorini, and Marcinkiewicz 1993; Abeille' 2003), have had a major impact on the field of natural language processing in recent years. However, the syntactic analyses produced by these parsers are a long way from representing the full meaning of the sentences that are parsed. As a simple example, in the sentences (1) John broke the window.</Paragraph> <Paragraph position="1"> (2) The window broke.</Paragraph> <Paragraph position="2"> a syntactic analysis will represent the window as the verb's direct object in the first sentence and its subject in the second but does not indicate that it plays the same underlying semantic role in both cases. Note that both sentences are in the active voice * 2005 Association for Computational Linguistics C3 Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104. Email: mpalmer@cis.upenn.edu.</Paragraph> <Paragraph position="3"> . Department of Computer Science, University of Rochester, PO Box 270226, Rochester, NY 14627. Email: gildea@cs.rochester.edu.</Paragraph> <Paragraph position="4"> Submission received: 9th December 2003; Accepted for publication: 11th July 2004 and that this alternation in subject between transitive and intransitive uses of the verb does not always occur; for example, in the sentences (3) The sergeant played taps.</Paragraph> <Paragraph position="5"> (4) The sergeant played.</Paragraph> <Paragraph position="6"> the subject has the same semantic role in both uses. The same verb can also undergo syntactic alternation, as in (5) Taps played quietly in the background.</Paragraph> <Paragraph position="7"> and even in transitive uses, the role of the verb's direct object can differ: (6) The sergeant played taps.</Paragraph> <Paragraph position="8"> (7) The sergeant played a beat-up old bugle.</Paragraph> <Paragraph position="9"> Alternation in the syntactic realization of semantic arguments is widespread, affecting most English verbs in some way, and the patterns exhibited by specific verbs vary widely (Levin 1993). The syntactic annotation of the Penn Treebank makes it possible to identify the subjects and objects of verbs in sentences such as the above examples. While the treebank provides semantic function tags such as temporal and locative for certain constituents (generally syntactic adjuncts), it does not distinguish the different roles played by a verb's grammatical subject or object in the above examples. Because the same verb used with the same syntactic subcategorization can assign different semantic roles, roles cannot be deterministically added to the treebank by an automatic conversion process with 100% accuracy. Our semantic-role annotation process begins with a rule-based automatic tagger, the output of which is then hand-corrected (see section 4 for details).</Paragraph> <Paragraph position="10"> The Proposition Bank aims to provide a broad-coverage hand-annotated corpus of such phenomena, enabling the development of better domain-independent language understanding systems and the quantitative study of how and why these syntactic alternations take place. We define a set of underlying semantic roles for each verb and annotate each occurrence in the text of the original Penn Treebank. Each verb's roles are numbered, as in the following occurrences of the verb offer from our data: We believe that providing this level of semantic representation is important for applications including information extraction, question answering, and machine translation. Over the past decade, most work in the field of information extraction has shifted from complex rule-based systems designed to handle a wide variety of semantic phenomena, including quantification, anaphora, aspect, and modality (e.g., Alshawi 1992), to more robust finite-state or statistical systems (Hobbs et al. 1997; Miller et al. 1998). These newer systems rely on a shallower level of semantic representation, similar to the level we adopt for the Proposition Bank, but have also tended to be very domain specific. The systems are trained and evaluated on corpora annotated for semantic relations pertaining to, for example, corporate acquisitions or terrorist events. The Proposition Bank (PropBank) takes a similar approach in that we annotate predicates' semantic roles, while steering clear of the issues involved in quantification and discourse-level structure. By annotating semantic roles for every verb in our corpus, we provide a more domain-independent resource, which we hope will lead to more robust and broad-coverage natural language understanding systems. The Proposition Bank focuses on the argument structure of verbs and provides a complete corpus annotated with semantic roles, including roles traditionally viewed as arguments and as adjuncts. It allows us for the first time to determine the frequency of syntactic variations in practice, the problems they pose for natural language understanding, and the strategies to which they may be susceptible.</Paragraph> <Paragraph position="11"> We begin the article by giving examples of the variation in the syntactic realization of semantic arguments and drawing connections to previous research into verb alternation behavior. In section 3 we describe our approach to semantic-role annotation, including the types of roles chosen and the guidelines for the annotators. Section 5 compares our PropBank methodology and choice of semantic-role labels to those of another semantic annotation project, FrameNet. We conclude the article with a discussion of several preliminary experiments we have performed using the PropBank annotations, and discuss the implications for natural language research.</Paragraph> </Section> class="xml-element"></Paper>