File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1611_intro.xml
Size: 4,911 bytes
Last Modified: 2025-10-06 14:03:18
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1611"> <Title>Discrete Optimization as an Alternative to Sequential Processing in NLG</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> From an engineering perspective, one of the major considerations in building a Natural Language Generation (NLG) system is the choice of the architecture. Two important issues that need to be considered at this stage are firstly, the modularization of the linguistic decisions involved in the generation process and secondly, the processing flow (cf. [De Smedt et al., 1996]).</Paragraph> <Paragraph position="1"> On one side of the spectrum lie integrated systems, with all linguistic decisions being handled within a single process (e.g. [Appelt, 1985]). Such architectures are theoretically attractive, as they assume a close coordination of different types of linguistic decisions, which are known to be dependent on one another (cf. e.g. [Danlos, 1984]). A major disadvantage of integrated models is the complexity that they necessarily involve, which results in poor portability and scalability. On the other side of the spectrum there are highly modularized pipeline architectures. A prominent example of this second case is the consensus pipeline architecture recognized by [Reiter, 1994] and further elaborated in [Reiter and Dale, 2000].</Paragraph> <Paragraph position="2"> The modularization of Reiter's model occurs at two levels.</Paragraph> <Paragraph position="3"> First, individual linguistic decisions of the same type (e.g.</Paragraph> <Paragraph position="4"> involving lexical or syntactic choice) are grouped together within single low level tasks, such as lexicalization, aggregation or ordering. Second, tasks are allocated to three high-level generation stages, i.e. Document Planning, Microplanning and Surface Realization. The processing flow in the pipeline architecture is sequential, with individual tasks being executed in a predetermined order.</Paragraph> <Paragraph position="5"> A study of applied NLG systems [Cahill and Reape, 1999] reveals, however, that while most applied NLG systems rely on sequential processing, they do not follow the strict modularization that the consensus model assumes. Low-level tasks are spread over various generation stages and may in fact be executed more than once at diverse positions in the pipeline.</Paragraph> <Paragraph position="6"> An attempt to account for commonalities that many NLG systems share, without imposing too many restrictions, as is the case with Reiter's &quot;consensus&quot; model, is the Reference Architecture for Generation Systems (RAGS) [Mellish et al., 2004]. RAGS is an abstract specification of an NLG architecture that focuses on two issues: data types that the generation process manipulates and a generic model of the interactions between modules, based on a common central server. An important feature of RAGS is that it leaves the question of processing flow to the actual implementation. Hence it is theoretically possible to build both fully integrated as well as pipeline-based systems that would observe the RAGS principles. Two implementations of RAGS presented in [Mellish and Evans, 2004] demonstrate an intermediate way.</Paragraph> <Paragraph position="7"> In this paper we present a novel approach to building an integrated NLG system, in which the generation process is modeled as a discrete optimization problem. It provides an extension to the classification-based generation framework, presented in [Marciniak and Strube, 2004]. We first assume modularization of the generation process at the lowest possible level: individual tasks correspond to realizations of single form elements (FEs) that build up a linguistic expression.</Paragraph> <Paragraph position="8"> The decisions that these tasks involve are then represented as classification tasks and integrated via an Integer Linear Programming (ILP) formulation (see e.g. [Nemhauser and Wolsey, 1999]. This way we avoid the well known ordering problem that is present in all pipeline-based systems. Observing, at least partially, the methodological principles of RAGS, we specify the architecture of our system at two independent levels. At the abstract level, the low-level generation tasks are defined, all based on the same input/output interface. At the implementation level, the processing flow and integration method are determined.</Paragraph> <Paragraph position="9"> The rest of the paper is organized as follows: in Section 2 we present briefly the classification-based generation framework and remark on the shortcomings of pipeline-based processing. In Section 3 we introduce the ILP formulation of the generation task, and in Section 4 we report on the experiments and evaluation of the system.</Paragraph> </Section> class="xml-element"></Paper>