File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1115_intro.xml
Size: 1,881 bytes
Last Modified: 2025-10-06 14:06:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1115"> <Title>Compacting the Penn Treebank Grammar</Title> <Section position="3" start_page="699" end_page="699" type="intro"> <SectionTitle> II </SectionTitle> <Paragraph position="0"> bank grammar resulted in two major findings: one, that the grammar can be compacted to about 7% of its original size, and the rule number growth of the compacted grammar stops at some point. The other is that a 58% reduction can be achieved with no loss in parsing performance, whereas a 69% reduction yields a gain in recall, but a loss in precision.</Paragraph> <Paragraph position="1"> This, we believe, gives further support to the utility of treebank grammars and to the compaction method. For example, compaction methods can be applied within the DOP framework to reduce the number of trees. Also, by partially lexicalising the rule extraction process (i.e., by using some more frequent words as well as the part-of-speech tags), we may be able to achieve parsing performance similar to the best results in the field obtained in (Collins, 1996).</Paragraph> <Paragraph position="2"> 2 Growth of the Rule Set One could investigate whether there is a finite grammar that should account for any text within a class of related texts (i.e. a domain oriented sub-grammar of English). If there is, the number of extracted rules will approach a limit as more sentences are processed, i.e. as the rule number approaches the size of such an underlying and finite grammar.</Paragraph> <Paragraph position="3"> We had hoped that some approach to a limit would be seen using PTB II (Marcus et al., 1994), which larger and more consistent for bracketting than PTB I. As shown in Figure 1, however, the rule number growth continues unabated even after more than 1 million part-of-speech tokens have been processed.</Paragraph> </Section> class="xml-element"></Paper>