File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1130_concl.xml

Size: 2,505 bytes

Last Modified: 2025-10-06 13:55:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1130">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Robust PCFG-Based Generation using Automatically Acquired LFG Approximations</Title>
  <Section position="8" start_page="1038" end_page="1039" type="concl">
    <SectionTitle>
6 Conclusion and Further Work
</SectionTitle>
    <Paragraph position="0"> We present a new architecture for stochastic LFG surface realisation using the automatically annotated treebanks and extracted PCFG-based LFG approximations of Cahill et al. (2004). Our model maximises the probability of a tree given an fstructure, supporting a simple and efficient implementation that scales to wide-coverage treebank-based resources. An improved model would maximise the probability of a string given an f-structure by summing over trees with the same yield. More research is required to implement such a model efficiently using packed representations (Carroll and Oepen, 2005). Simple PCFG-based models, while effective and computationally efficient, can only provide approximations to LFG and similar constraint-based formalisms (Abney, 1997). Research on discriminative disambiguation methods (Valldal and Oepen, 2005; Nakanishi et al., 2005) is important. Kaplan and Wedekind (2000) show that for certain linguistically interesting classes of LFG (and PATR etc.) grammars, generation from f-structures yields a context free language. Their proof involves the notion of a  &amp;quot;refinement&amp;quot; grammar where f-structure information is compiled into CFG rules. Our probabilistic generation grammars bear a conceptual similarity to Kaplan and Wedekind's &amp;quot;refinement&amp;quot; grammars. It would be interesting to explore possible connections between the treebank-based empirical work presented here and the theoretical constructs in Kaplan and Wedekind's proofs.</Paragraph>
    <Paragraph position="1"> We presented a full set of generation experiments on varying sentence lengths training on Sections 02-21 of the Penn Treebank and evaluating on Section 23 strings. Sentences of length [?]20 achieve coverage of 95.26%, BLEU score of 0.7227 and string accuracy of 0.7476 against the raw Section 23 text. Sentences of all lengths achieve coverage of 89.49%, BLEU score of 0.6979 and string accuracy of 0.7012. Our method is robust and can cope with noise in the f-structure input to generation and will attempt to produce partial output rather than fail.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML