File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2026_intro.xml
Size: 5,143 bytes
Last Modified: 2025-10-06 14:03:29
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2026"> <Title>Accurate Parsing of the Proposition Bank</Title> <Section position="2" start_page="0" end_page="101" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recent successes in statistical syntactic parsing based on supervised techniques trained on a large corpus of syntactic trees (Collins, 1999; Charniak, 2000; Henderson, 2003) have brought the hope that the same approach could be applied to the more ambitious goal of recovering the propositional content and the frame semantics of a sentence. Moving towards a shallow semantic level of representation has immediate applications in question-answering and information extraction. For example, an automatic flight reservation system processing the sentence I want to book a flight from Geneva to New York will need to know that from Geneva indicates the origin of the flight and to New York the destination.</Paragraph> <Paragraph position="1"> (Gildea and Jurafsky, 2002) define this shallow semantic task as a classification problem where the semantic role to be assigned to each constituent is inferred on the basis of probability distributions of syntactic features extracted from parse trees. They use learning features such as phrase type, position, voice, and parse tree path. Consider, for example, a sentence such as The authority dropped at midnight Tuesday to $ 2.80 trillion (taken from section 00 of PropBank (Palmer et al., 2005)). The fact that to $ 2.80 trillion receives a direction semantic label is highly correlated to the fact that it is a Prepositional Phrase (PP), that it follows the verb dropped, a verb of change of state requiring an end point, that the verb is in the active voice, and that the PP is in a certain tree configuration with the governing verb.</Paragraph> <Paragraph position="2"> All the recent systems proposed for semantic role labelling (SRL) follow this same assumption (CoNLL, 2005).</Paragraph> <Paragraph position="3"> The assumption that syntactic distributions will be predictive of semantic role assignments is based on linking theory. Linking theory assumes the existence of a hierarchy of semantic roles which are mapped by default on a hierarchy of syntactic positions. It also shows that regular mappings from the semantic to the syntactic level can be posited even for those verbs whose arguments can take several syntactic positions, such as psychological verbs, locatives, or datives, requiring a more complex theory. (See (Hale and Keyser, 1993; Levin and Rappaport Hovav, 1995) among many others.) If the internal semantics of a predicate determines the syntactic expressions of constituents bearing a semantic role, it is then reasonable to expect that knowledge about semantic roles in a sentence will be informative of its syntactic structure, and that learning semantic role labels at the same time as parsing will be beneficial to parsing accuracy.</Paragraph> <Paragraph position="4"> We present work to test the hypothesis that a current statistical parser (Henderson, 2003) can output rich information comprising both a parse tree and semantic role labels robustly, that is without any significant degradation of the parser's accuracy on the original parsing task. We achieve promising results both on the simple parsing task, where the accuracy of the parser is measured on the standard Parseval measures, and also on the parsing task where more complex labels comprising both syntactic labels and semantic roles are taken into account.</Paragraph> <Paragraph position="5"> These results have several consequences. First, we show that it is possible to build a single integrated system successfully. This is a meaningful achievement, as a task combining semantic role labelling and parsing is more complex than simple syntactic parsing. While the shallow semantics of a constituent and its structural position are often correlated, they sometimes diverge. For example, some nominal temporal modifiers occupy an object position without being objects, like Tuesday in the Penn Treebank representation of the sentence above.</Paragraph> <Paragraph position="6"> The indirectness of the relation is also confirmed by the difficulty in exploiting semantic information for parsing. Previous attempts have not been successful. (Klein and Manning, 2003) report a reduction in parsing accuracy of an unlexicalised PCFG from 77.8% to 72.9% in using Penn Treebank function labels in training. The two existing systems that use function labels sucessfully, either inherit Collins' modelling of the notion of complement (Gabbard, Kulick and Marcus, 2006) or model function labels directly (Musillo and Merlo, 2005). Furthermore, our results indicate that the proposed models are robust. To model our task accurately, additional parameters must be estimated. However, given the current limited availability of annotated treebanks, this more complex task will have to be solved with the same overall amount of data, aggravating the difficulty of estimating the model's parameters due to sparse data.</Paragraph> </Section> class="xml-element"></Paper>