File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/n06-1046_relat.xml
Size: 2,962 bytes
Last Modified: 2025-10-06 14:15:52
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1046"> <Title>Aggregation via Set Partitioning for Natural Language Generation</Title> <Section position="3" start_page="359" end_page="360" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Due to its importance in producing coherent and fluent text, aggregation has been extensively studied in the text generation community.1 Typically, semantic grouping and sentence structuring are interleaved in one step, thus enabling the aggregation component to operate over a rich feature space. The common assumption is that other parts of the generation system are already in place during aggregation, and thus the aggregation component has access to discourse, syntactic, and lexical constraints.</Paragraph> <Paragraph position="1"> The interplay of different constraints is usually captured by a set of hand-crafted rules that guide the aggregation process (Scott and de Souza, 1990; Hovy, 1990; Dalianis, 1999; Shaw, 1998). Alternatively, these rules can be learned from a corpus. For instance, Walker et al. (2001) propose an overgenerate-and-rank approach to aggregation within the context of a spoken dialog application.</Paragraph> <Paragraph position="2"> Their system relies on a preference function for selecting an appropriate aggregation among multiple alternatives and assumes access to a large feature space expressing syntactic and pragmatic features of the input representations. The preference function is learned from a corpus of candidate aggregations marked with human ratings. Another approach is put forward by Cheng and Mellish (2000) who use a genetic algorithm in combination with a hand-crafted preference function to opportunistically find a text that satisfies aggregation and planning constraints.</Paragraph> <Paragraph position="3"> Our approach differs from previous work in two important respects. First, our ultimate goal is a generation system which can be entirely induced from a parallel corpus of sentences and their corresponding database entries. This means that our generator will operate over more impoverished representations than are traditionally assumed. For example we do 1The approaches are too numerous to list; we refer the interested reader to Reiter and Dale (2000) and Reape and Mellish (1999) for comprehensive overviews.</Paragraph> <Paragraph position="4"> This fragment will give rise to 6 sentences in the final text. not presume to know all possible ways in which our database entries can be lexicalized, nor do we presume to know which semantic or discourse relations exist between different entries. In this framework, aggregation is the task of grouping semantic content without making any decisions about sentence structure or its surface realization. Second, we strive for an approach to the aggregation problem which is as domain- and representation-independent as possible.</Paragraph> </Section> class="xml-element"></Paper>