File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-1014_evalu.xml
Size: 6,819 bytes
Last Modified: 2025-10-06 13:58:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1014"> <Title>Fast LR Parsing Using Rich (Tree Adjoining) Grammars</Title> <Section position="4" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> We evaluated the approach using the Penn Treebank WSJ Corpus, release 2 (Marcus et al., 1994), using Sections 2 to 21 for grammar extraction and training, section 0 for development and 23 for testing.</Paragraph> <Paragraph position="1"> Only parts-of-speech were used as input.7 8 A smoothed ranking function is defined as follows: null</Paragraph> <Paragraph position="3"> The best a18 was experimentally determined to be 1.</Paragraph> <Paragraph position="4"> That is: in general, even if there is minimal evidence of the context including the second state, the statistics using this context lead to a better result than using only one state.</Paragraph> <Paragraph position="5"> For each sentence there is an initial parsing attempt using only a27a28a24a30a29 a0 a19 as the ranking function with an a maximum of 500 backtracking occurrences. If it fails, then the sentence is parsed using a27a28a24a31a29 a0a20a11 a75a14a13a15a13 , with a maximum of 3,000 backtracking occurrences.</Paragraph> <Paragraph position="6"> In table 1 we report the following figures for the development set (Section 0) and test set (Section the state a24 , applied to a22 . Elsewhere in the paper we have omitted the explicit reference to the state.</Paragraph> <Paragraph position="7"> 7However, two new categories were defined: one for time nouns, namely those that appear in the Penn Treebank as heads of constituents marked &quot;TMP&quot; (for temporal); another for the word &quot;that&quot;. This is similar to (Collins, 1997)'s and Charniak97's definition of a separate category for auxiliary verbs. 8We also included some punctuation symbols among the terminals such as comma, colon and semicolon. They are extracted into the grammar as if they were regular modifiers. Their main use is in guiding parsing decisions.</Paragraph> <Paragraph position="8"> Section %failed tput recall prec. a0a2a1a4a3a6a5 a10 recall and prec. are the labeled parsing recall and precision, respectively, as defined in (Collins, 1997) (slightly different from (Black et al., 1991)). a7a9a8a11a10 a11 is their harmonic average. a10 tput is the average number of sentences parsed per second. To obtain the average, the number of sentences submitted as input (not only those that parsed successfully) is divided by the total time (excluded the time overhead before it starts parsing the first sentence). The programs were run under Linux, in a PC with a Pentium III 930MHz processor.</Paragraph> <Paragraph position="9"> The first two lines report the measures for the parsed sentences as originally generated by the parser. We purposefully do not report precision.</Paragraph> <Paragraph position="10"> As we mentioned in the beginning of the paper, the parser assigns to the sentences a much richer hierarchical structure than the Penn Treebank does, which is penalized by the precision measure. The reason for such increase in structure is not quite a particular decision of ours, but a consequence of using a sound grammar under the TAG grammatical formalism.9 However, having concluded our manifesto, we understand that algorithms that try to keep precision as high as the recall necessarily have losses in recall compared to if they ignored the precision, and therefore in order to have fair comparison with them and to improve the credibility of our results, we flattened the parse trees in a post-processing step, using a simple rule-based technique on top of some frequency measures for individual grammar trees gathered by (Xia, 2001) and the result is presented in the bottom lines of the table.</Paragraph> <Paragraph position="11"> 9By sound we mean a grammar that properly factors recursion in one way or another. Grammars have been extracted where the right side of a rule reflects exactly each single-level expansion found in the Penn Treebank. We are also aware of a few alternatives in grammatical formalisms that could capture such flatness, e.g., sister adjunction (Chiang, 2000).</Paragraph> <Paragraph position="12"> The most salient positive result is that the parser is able to parse sentences at a rate of about 20 sentences per second. Most of the medium-to-high accuracy parsers take at least a few seconds per sentence under the same conditions.10 This is an enormous speed-up. As for the accuracy, it is not far from the top performing parser for parts-of-speech that we are aware of, reported by (Sima'an, 2000): recall/precision = a12a14a13 a14 a13a16a15a57a49a14a12a14a13 a14a18a17a11a17 Perhaps the most similar work to ours is Briscoe and Carroll's (1993; 1995; 1992; 1996). They implemented a standard LR parser for CFGs, and a probabilistic method for conflict resolution similar to ours in that the decisions are conditioned to the LR states but with different methods. In particular, they proceed in a parallel way accumulating probabilities along the paths and using a Viterbi decoder at the end. Their best published result is of unlabeled bracket recall and precision of 74 % and 73 %, parsing the Susanne corpus. Since the unlabeled bracket measures are much easier than the ones we are reporting, on labeled brackets, our results are clearly superior to theirs. Also the Susanne corpus is easier than the Penn Treebank.</Paragraph> <Paragraph position="13"> There are two additional points we want to make.</Paragraph> <Paragraph position="14"> One is with respect to the ranking function a27a28a24a30a29 a0 a19 , based on two states. It is a very rich statistic, but suffers from sparse data problems. Parsing section 0 with only this statistics (no form of smoothing), with backtracking limit of 3,000 attempts, we could parse only 31 % of the sentences but the non-flattened recall was 88.33 %, which is quite high for using only parts-of-speech. The second observation is that when parsing with the smoothed function a27a28a24a30a29 a0 a11 a75 a13 a13 most of the sentences use very few number of back-tracking attempts. In fact a graph relating number of backtracking attempts a0 with number of sentences that parse using a0 attempts shows an a3a15a49a4a19 relation characteristic of Zipf's law. Most of the time spent with computation is spent with sentences that either fail parsing or parse with difficulty, showing low bracketing accuracy.</Paragraph> <Paragraph position="15"> 10The fastest parser we are aware of is from BBN, with a throughput of 3 sentences per second under similar conditions. We also emphasize we have not taken particular care with optimization for speed yet.</Paragraph> </Section> class="xml-element"></Paper>