File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1517_intro.xml
Size: 4,504 bytes
Last Modified: 2025-10-06 14:03:18
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1517"> <Title>Efficient extraction of grammatical relations</Title> <Section position="3" start_page="0" end_page="160" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> RASP is a robust statistical analysis system for English developed by Briscoe and Carroll (2002).</Paragraph> <Paragraph position="1"> It contains a syntactic parser which can output analyses in a number of formats, including (nbest) syntactic trees, robust minimal recursion semantics (Copestake, 2003), grammatical relations (GRs), and weighted GRs. The weighted GRs for a sentence comprise the set of grammatical relations in all parses licensed for that sentence, each GR is weighted based on the probabilities of the parses in which it occurs. This weight is normalised to fall within the range a1 0,1a2 where a3a5a4a7a6 indicates that all parses contain the GR. Therefore, high precision GR sets can be determined by thresholding on the GR weight (Carroll and Briscoe, 2002). Carroll and Briscoe compute weighted GRs by first unpacking all parses or the n-best subset from the parse forest.</Paragraph> <Paragraph position="2"> Hence, this approach is either (a) inefficient (and for some examples impracticable) if a large number of parses are licensed by the grammar, or (b) inaccurate if the number of parses unpacked is less than the number licensed by the grammar.</Paragraph> <Paragraph position="3"> In this paper, we show how to obviate the need to trade off efficiency and accuracy by extracting weighted GRs directly from the parse forest using a dynamic programming approach based on the Inside-Outside algorithm (IOA) (Baker, 1979; Lari and Young, 1990). This approach enables efficient calculation of weighted GRs over all parses and substantially improves the throughput and memory usage of the parser. Since the parser is unificationbased, we also modify the parsing algorithm so that local ambiguity packing is based on feature structure equivalence rather than subsumption.</Paragraph> <Paragraph position="4"> Similar dynamic programming techniques that are variants of the IOA have been applied for related tasks, such as parse selection (Johnson, 2001; Schmid and Rooth, 2001; Geman and Johnson, 2002; Miyao and Tsujii, 2002; Kaplan et al., 2004; Taskar et al., 2004). The approach we take is similar to Schmid and Rooth's (2001) adaptation of the algorithm, where 'expected governors' (similar to our 'GR specifications') are determined for each tree, and alternative nodes in the parse forest have the same lexical head. Initially, they create a packed parse forest and during a second pass the parse forest nodes are split if multiple lexical heads occur. The IOA is applied over this split data structure. Similarly, Clark and Curran (2004) alter their packing algorithm so that nodes in the packed chart have the same semantic head and 'unfilled' GRs. Our ap- null proach is novel in that while calculating inside probabilities we allow any node in the parse forest to have multiple semantic heads.</Paragraph> <Paragraph position="5"> Clark and Curran (2004) apply Miyao and Tsujii's (2002) dynamic programming approach to determine weighted GRs. They outline an alternative parse selection method based on the resulting weighted GRs: select the (consistent) GR set with the highest average weighted GR score. We apply this parse selection approach and achieve 3.01% relative reduction in error. Further, the GR set output by this approach is a consistent set whereas the high precision GR sets outlined in (Carroll and Briscoe, 2002) are neither consistent nor coherent.</Paragraph> <Paragraph position="6"> The remainder of this paper is organised as follows: Section 2 gives details of the RASP system that are relevant to this work. Section 3 describes our test suite and experimental environment.</Paragraph> <Paragraph position="7"> Changes required to the current parse forest creation algorithm are discussed in Section 4, while Section 5 outlines our dynamic programming approach for extracting weighted GRs (EWG). Section 6 presents experimental results showing (a) improved efficiency achieved by EWG, (b) increased upper bounds of precision and recall achieved using EWG, and (c) increased accuracy achieved by a parse selection algorithm that would otherwise be too inefficient to consider. Finally, Section 7 outlines our conclusions and future lines of research.</Paragraph> </Section> class="xml-element"></Paper>