File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1130_concl.xml
Size: 2,505 bytes
Last Modified: 2025-10-06 13:55:18
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1130"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Robust PCFG-Based Generation using Automatically Acquired LFG Approximations</Title> <Section position="8" start_page="1038" end_page="1039" type="concl"> <SectionTitle> 6 Conclusion and Further Work </SectionTitle> <Paragraph position="0"> We present a new architecture for stochastic LFG surface realisation using the automatically annotated treebanks and extracted PCFG-based LFG approximations of Cahill et al. (2004). Our model maximises the probability of a tree given an fstructure, supporting a simple and efficient implementation that scales to wide-coverage treebank-based resources. An improved model would maximise the probability of a string given an f-structure by summing over trees with the same yield. More research is required to implement such a model efficiently using packed representations (Carroll and Oepen, 2005). Simple PCFG-based models, while effective and computationally efficient, can only provide approximations to LFG and similar constraint-based formalisms (Abney, 1997). Research on discriminative disambiguation methods (Valldal and Oepen, 2005; Nakanishi et al., 2005) is important. Kaplan and Wedekind (2000) show that for certain linguistically interesting classes of LFG (and PATR etc.) grammars, generation from f-structures yields a context free language. Their proof involves the notion of a &quot;refinement&quot; grammar where f-structure information is compiled into CFG rules. Our probabilistic generation grammars bear a conceptual similarity to Kaplan and Wedekind's &quot;refinement&quot; grammars. It would be interesting to explore possible connections between the treebank-based empirical work presented here and the theoretical constructs in Kaplan and Wedekind's proofs.</Paragraph> <Paragraph position="1"> We presented a full set of generation experiments on varying sentence lengths training on Sections 02-21 of the Penn Treebank and evaluating on Section 23 strings. Sentences of length [?]20 achieve coverage of 95.26%, BLEU score of 0.7227 and string accuracy of 0.7476 against the raw Section 23 text. Sentences of all lengths achieve coverage of 89.49%, BLEU score of 0.6979 and string accuracy of 0.7012. Our method is robust and can cope with noise in the f-structure input to generation and will attempt to produce partial output rather than fail.</Paragraph> </Section> class="xml-element"></Paper>