File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1130_intro.xml
Size: 3,262 bytes
Last Modified: 2025-10-06 14:03:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1130"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Robust PCFG-Based Generation using Automatically Acquired LFG Approximations</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Wide coverage grammars automatically extracted from treebanks are a corner-stone technology in state-of-the-art probabilistic parsing. They achieve robustness and coverage at a fraction of the development cost of hand-crafted grammars. It is surprising to note that to date, such grammars do not usually figure in the complementary operation to parsing - natural language surface realisation.</Paragraph> <Paragraph position="1"> Research on statistical natural language surface realisation has taken three broad forms, differing in where statistical information is applied in the generation process. Langkilde (2000), for example, uses n-gram word statistics to rank alternative output strings from symbolic hand-crafted generators to select paths in parse forest representations. Bangalore and Rambow (2000) use n-gram word sequence statistics in a TAG-based generation model to rank output strings and additional statistical and symbolic resources at intermediate generation stages. Ratnaparkhi (2000) uses maximum entropy models to drive generation with word bigram or dependency representations taking into account (unrealised) semantic features.</Paragraph> <Paragraph position="2"> Valldal and Oepen (2005) present a discriminative disambiguation model using a hand-crafted HPSG grammar for generation. Belz (2005) describes a method for building statistical generation models using an automatically created generation tree-bank for weather forecasts. None of these probabilistic approaches to NLG uses a full treebank grammar to drive generation.</Paragraph> <Paragraph position="3"> Bangalore et al. (2001) investigate the effect of training size on performance while using grammars automatically extracted from the Penn-II Treebank (Marcus et al., 1994) for generation. Using an automatically extracted XTAG grammar, they achieve a string accuracy of 0.749 on their test set. Nakanishi et al. (2005) present probabilistic models for a chart generator using a HPSG grammar acquired from the Penn-II Treebank (the Enju HPSG). They investigate discriminative disambiguation models following Valldal and Oepen (2005) and their best model achieves coverage of 90.56% and a BLEU score of 0.7723 on Penn-II WSJ Section 23 sentences of length [?]20.</Paragraph> <Paragraph position="4"> In this paper we present a novel PCFG-based architecture for probabilistic generation based on wide-coverage, robust Lexical Functional Grammar (LFG) approximations automatically extracted from treebanks (Cahill et al., 2004). In Section 2 we briefly describe LFG (Kaplan and Bresnan, 1982). Section 3 presents our generation architecture. Section 4 presents evaluation results on the Penn-II WSJ Section 23 test set using string-based metrics. Section 5 compares our approach with alternative approaches in the literature. Section 6 concludes and outlines further research. null</Paragraph> </Section> class="xml-element"></Paper>