File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-2023_intro.xml
Size: 2,904 bytes
Last Modified: 2025-10-06 14:00:42
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-2023"> <Title>Forest-Based Statistical Sentence Generation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Large textual corpora offer the possibility of a statistical approach to the task of sentence generation. Like any large-scale NLP or AI task, the task of sentence generation requires immense amounts of knowledge. The knowledge needed includes lexicons, grammars, ontologies, collocation lists, and morphological tables. Acquiring and applying accurate, detailed knowledge of this breadth poses difficult problems.</Paragraph> <Paragraph position="1"> Knight and Hatzivassiloglou (1995) suggested overcoming the knowledge acquisition bottleneck in generation by tapping the information inherent in textual corpora. They performed experiments showing that automatically-acquired, corpus-based knowledge greatly reduced the need for deep, hand-crafted knowledge. At the same time, this approach to generation improved scalability and robustness, offering the potential in the future for higher quality output. null In their approach, K ~: H adapted techniques used in speech recognition. Corpus-based statistical knowledge was applied to the generation process after encoding many alternative phrasings into a structure called a lattice (see Figure 1). A lattice was able to represent large numbers alternative phrases without requiring the large amount of space that an explicitly enumerated list of individual alternatives would require. The Mternative sentences in the lattice were then ranked according to a statistical language model, and the most likely sentence was chosen as output. Since the number of phrases that needed be considered typically grew exponentially with the length of the phrase, the lattice was usually too large for an exhaustive search, and instead an n-best algorithm was used to heuristically narrow the search.</Paragraph> <Paragraph position="2"> The lattice-based method, though promising, had several drawbacks that will be discussed shortly. This paper presents a different method of statistical generation based on a forest structure (a packed set of trees). A forest is more compact than a lattice, and it offers a hierarchical organization that is conducive to representing syntactic information. Furthermore, it facilitates dramatically more efficient statistical ranking, since constraints can be localized, and the combinatorial explosion of possibilities that need be considered can be reduced. In addition to describing the forest data structure we use, this paper presents a forest-based ranking algorithm, and reports experimental results on its efficiency in both time and space. It also favorably compares these results to the performance of a lattice-based approach.</Paragraph> </Section> class="xml-element"></Paper>