File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1055_concl.xml
Size: 1,689 bytes
Last Modified: 2025-10-06 13:55:19
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1055"> <Title>Learning Accurate, Compact, and Interpretable Tree Annotation</Title> <Section position="6" start_page="439" end_page="439" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> By using a split-and-merge strategy and beginning with the barest possible initial structure, our method reliably learns a PCFG that is remarkably good at parsing. Hierarchical split/merge training enables us to learn compact but accurate grammars, ranging from extremely compact (an F1 of 78% with only 147 symbols) to extremely accurate (an F1 of 90.2% for our largest grammar with only 1043 symbols). Splitting provides a tight fit to the training data, while merging improves generalization and controls grammar size. In order to overcome data fragmentation and overfitting, we smooth our parameters. Smoothing allows us to add a larger number of annotations, each specializing in only a fraction of the data, without overfitting our training set. As one can see in Table 4, the resulting parser ranks among the best lexicalized parsers, beating those of Collins (1999) and Charniak and Johnson (2005).8 Its F1 performance is a 27% reduction in error over Matsuzaki et al. (2005) and Klein and Manning (2003). Not only is our parser more accurate, but the learned grammar is also significantly smaller than that of previous work. While this all is accomplished with only automatic learning, the resulting grammar is human-interpretable. It shows most of the manually introduced annotations discussed by Klein and Manning (2003), but also learns other linguistic phenomena.</Paragraph> </Section> class="xml-element"></Paper>