File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/j05-1004_concl.xml
Size: 4,552 bytes
Last Modified: 2025-10-06 13:54:36
<?xml version="1.0" standalone="yes"?> <Paper uid="J05-1004"> <Title>The Proposition Bank: An Annotated Corpus of Semantic Roles</Title> <Section position="7" start_page="101" end_page="102" type="concl"> <SectionTitle> 8. Conclusion </SectionTitle> <Paragraph position="0"> The Proposition Bank takes the comprehensive corpus annotation of the Penn Treebank one step closer to a detailed semantic representation by adding semantic-role labels. On analyzing the data, the relationships between syntax and semantic structures are more complex than one might at first expect. Alternations in the realization of semantic arguments of the type described by Levin (1993) turn out to be common in practice as well as in theory, even in the limited genre of Wall Street Journal articles. Even so, by using detailed guidelines for the annotation of each individual verb, rapid consistent annotation has been achieved, and the corpus is available through the Linguistic Data Consortium. For information on obtaining the frames file, please consult http://www.cis.upenn.edu/ ~ ace/.</Paragraph> <Paragraph position="1"> Palmer, Gildea, and Kingsbury The Proposition Bank The broad-coverage annotation has proven to be suitable for training automatic taggers, and in addition to ourselves there is a growing body of researchers engaged in this task. Chen and Rambow (2003) make use of extracted tree-adjoining grammars. Most recently, the Gildea and Palmer (2002) scores presented here have been improved markedly through the use of support-vector machines as well as additional features for named entity tags, headword POS tags, and verb clusters for back-off (Pradhan et al. 2003) and using maximum-entropy classifiers (He and Gildea 2004, Xue and Palmer 2004). This group also used Charniak's parser instead of Collins's and tested the system on TDT data. The performance on a new genre is lower, as would be expected. Despite the complex relationship between syntactic and semantic structures, we find that statistical parsers, although computationally expensive, do a good job of providing information relevant for this level of semantic interpretation. In addition to the constituent structure, the headword information, produced as a side product, is an important feature. Automatic parsers, however, still have a long way to go. Our results using hand-annotated parse trees including traces show that improvements in parsing should translate directly into more accurate semantic representations.</Paragraph> <Paragraph position="2"> There has already been a demonstration that a preliminary version of these data can be used to simplify the effort involved in developing information extraction (IE) systems. Researchers were able to construct a reasonable IE system by simply mapping specific Arg labels for a set of verbs to template slots, completely avoiding the necessity of building explicit regular expression pattern matchers (Surdeanu et al.</Paragraph> <Paragraph position="3"> 2003). There is equal hope for advantages for machine translation, and proposition banks in Chinese (Xue and Palmer 2003) and Korean are already being built, focusing where possible on parallel data. The general approach ports well to new languages, with the major effort continuing to go into the creation of frames files for verbs.</Paragraph> <Paragraph position="4"> There are many directions for future work. Our preliminary linguistic analyses have merely scratched the surface of what is possible with the current annotation, and yet it is only a first approximation at capturing the richness of semantic representation. Annotation of nominalizations and other noun predicates is currently being added by New York University, and a Phase II (Babko-Malaya et al.) that will include eventuality variables, nominal references, additional sense tagging, and discourse connectives is underway.</Paragraph> <Paragraph position="5"> We have several plans for improving the performance of our automatic semantic-role labeling. As a first step we are producing a version of PropBank that uses more informative thematic labels based on VerbNet thematic labels (Kipper, Palmer, and Rambow 2002). We are also working with FrameNet to produce a mapping between our annotation and theirs which will allow us to merge the two annotated data sets.</Paragraph> <Paragraph position="6"> Finally, we will explore alternative machine-learning approaches and closer integration of semantic-role labeling and sense tagging with the parsing process.</Paragraph> </Section> class="xml-element"></Paper>