XML Viewer - w02-1713

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1713_metho.xml
Size: 16,613 bytes
Last Modified: 2025-10-06 14:08:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1713">
  <Title>XtraGen -- A Natural Language Generation System Using XMLand Java-Technologies</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Backus-Naur Form
</SectionTitle>
    <Paragraph position="0"> of conditions: Simple-Conditions and Complex-Conditions. They in turn are the supertypes for more specific conditions: Simple-Condition They form the actual tests that are applied to the input structure. A set of commonly used conditions is already predefined such as ones that test for equality or that test whether certain information is existent in the input structure. If there is a need for some very specific conditional testing that cannot be realized with the existing means a developer is free to implement and add its own conditional types.</Paragraph>
    <Paragraph position="1"> Complex-Condition This type of condition makes it possible to combine several conditions into a more complex one. Three predefined Complex-Conditions exist: the And-Condition, the Or-Condition and the Not-Condition. Additional Complex-Conditions can also be added by providing an implementation for them.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Parameterization
</SectionTitle>
      <Paragraph position="0"> Parameterization is an easy and flexible means to guide and control the generation process with regard to different linguistic preferences such as matters of style or rhetorical structures. Parameterization works by introducing a preference mechanism that provides the possibility of dynamically sorting the application of templates according to a given set of parameters.</Paragraph>
      <Paragraph position="1">  The way parametrization works in our system is a two-step process: Adding of parameters to templates During the design of a generation grammar the writer adds one or more parameters to some templates as in the example in figure 3.</Paragraph>
      <Paragraph position="2"> Here the upper template is intended to be used during the generation of text targeted at experts and the lower one in case text is to be produced for novices (level is expert in one template and novice in the other). Both of the templates are preferably used when a low verbosity level is desired (verbosity is low in both cases).</Paragraph>
      <Paragraph position="3"> Setting of the parameters at runtime At runtime the parameters corresponding to the ones defined in the grammar are set to the desired values. To continue our example, we now set the value of the parameterleveltoexpert(see figure 4) and hence the template in the upper box would be selected.</Paragraph>
      <Paragraph position="4"> The particularity of our system is that parameters can be assigned a weight and thus a priority. In our example we might want to give a higher priority to the parameter level than to the parameter verbosity as shown in figure 5 This now sorts the application of templates in a way that they are first sorted according to their level of verbosity and the result is further sorted according to the level of expertise.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Actions
</SectionTitle>
      <Paragraph position="0"> In the case that all conditions of a given template have been tested successfully (see section 2.2) the actions contained in the actions-part of the template are executed.</Paragraph>
      <Paragraph position="1"> There are four different types of actions that can appear: String-Action, Getter-Action, Inflection-Action and Selection-Action. The actual purpose of each of them is quite different but all of them return a result string when executed successfully.</Paragraph>
      <Paragraph position="2"> String-Action This type of action simply returns a statically specified string as a result -- a so-called canned text.</Paragraph>
      <Paragraph position="3"> Getter-Action With a Getter-Action it is possible to directly access and retrieve data from the entered input structure. The syntax used for specifying the path to the data conforms to the syntax of XPath (World Wide Web Consortium, 1999). There is no additional processing done on the returned data.</Paragraph>
      <Paragraph position="4"> &lt;get path=&amp;quot;/values/startTime&amp;quot;/&gt; Inflection-Action This action inflects a stem according to the defined morphological constraints and returns the result.</Paragraph>
      <Paragraph position="5"> The stem can be stated statically in the grammar as in case (a) or can be dynamically retrieved from the input structure as in case (b). The needed morphological constraints are furnished by the constraints-part of the template to which the given label provides a link (confer to section 2.5 below for details).</Paragraph>
      <Paragraph position="6">  ally be seen as the most important of the actions since it accounts for the context-free backbone of the system.</Paragraph>
      <Paragraph position="7"> It allows to select another template directly via a specified identifier as in case (a) or via a given category as in case (b). In the second case several templates might have the given  category and hence backtracking might be invoked at his point. (see section 5.1) (a) &lt;select id=&amp;quot;top&amp;quot;/&gt; (b) &lt;select category=&amp;quot;top&amp;quot; optional=&amp;quot;true&amp;quot;/&gt; Selections can also be declared optional as in (b) which means that in case the selection of the template fails no backtracking is invoked and simply an empty string is returned.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.5 Constraints and Morphology
</SectionTitle>
      <Paragraph position="0"> The treatment of morphology is naturally one of the major issues in the context of a complete natural language generation especially when working with morphologically rich languages such as German, Russian or Finnish. Therefore we took great care to design and develop a morphological subsystem that is powerful and flexible yet easy to understand and use. The actual realization of the component is based on a constraint-based inheritance algorithm that follows the example of PATR-II (Shieber et al., 1989).</Paragraph>
      <Paragraph position="1"> In the (overly simplified) example in figure 6 one can get a glimpse on how the morphology works: There are two Selection-Actions, the first one labelled X0, the second one labelled X1. The given constraint now tells that the attribute number of X0 is the same as the attribute number of X1 and sets it dynamically to the value retrieved by the Getter-</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.6 Compilation
</SectionTitle>
      <Paragraph position="0"> In order to be able to work with a generation grammar the generation engine requires the grammar (and its templates) to exist in the form of a Java object. But since the original format of the grammar is plain XML this format must be transformed through a compilation process into the internally needed representation. Our system is capable to perform such a compilation in two different ways: Just-in-time Compilation With this technique the required templates are compiled from their XML source into their corresponding Java objects at runtime of the generation engine, i.e.</Paragraph>
      <Paragraph position="1"> during the actual generation process. This type of compilation is advised only for smaller grammars or during the development and testing of a new grammar since the constant interleaving of compilation on the one hand and the actual generation process on the other leads to some quite noticeable overhead. This overhead is naturally not acceptable when XtraGen is used in real-time applications.</Paragraph>
      <Paragraph position="2"> Pre-Compilation This type of compilation allows to compile the whole grammar before its actual deployment during the generation process. The pre-compilation of grammars can improve the performance of the generator-engine tremendously and is therefore to be preferred in most situations. (The pre-compilation of generation grammars is very similar to the Translets approach in XSL (Apache XML Project, 2002) where XSL stylesheets are compiled in advance into Java objects.)</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Input
</SectionTitle>
    <Paragraph position="0"> In contrast to other generation systems that require their input to adhere to some particular (and most of the time proprietary) encoding format the core engine of our system only demands its input to be a valid XML structure.</Paragraph>
    <Paragraph position="1"> The actual restrictions on the input are imposed only at the level of the generation grammars in terms of their access to the input (see section 2.4 on Getter-Actions and Inflection-Actions). This can obviously lead to a severe drawback: In case that either the generation grammar or the input structure changes heavily there might emerge a complete mis-match between the XPath specified in the actions and the actual structure of the input.</Paragraph>
    <Paragraph position="2"> Under circumstances when it is not feasible to change either the input structure or the grammar, we propose to introduce an additional mapping layer between input and generator that is based on a XSL stylesheet and that dynamically maps the input in the way that is needed by the grammar.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Editor
</SectionTitle>
    <Paragraph position="0"> We have stressed in the sections before that we believe our formalism to be powerful yet very straight-forward to implement and use. But when developing larger grammars for real-world applications it becomes quite a demanding, non-trivial task to keep track of all the templates and especially of the relations between them (e.g. relations on the level of morphological constraints) Common XML editors are of no help at this point since they cannot show such relations at all.</Paragraph>
    <Paragraph position="1"> Therefore the development of egram, a Java-based graphical editor for generation grammars is on its way at the site of our cooperation partner DFKI (German Research Center for Artificial Intelligence). After the completion of its development this tool will allow to comfortably edit all aspects of generation grammars and templates. Among many other things the editor will be able to depict the whole generation grammar and process in a graphical tree format in which dependencies between templates are shown in an intuitive way.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Implementation
</SectionTitle>
    <Paragraph position="0"> The realization and implementation of the XtraGen system is based entirely on the two cornerstones Java and XML.</Paragraph>
    <Paragraph position="1"> XML was chosen because it has become the de-facto standard language in many (if not most) scenarios where information transfer takes place. This in turn is caused by its unique capabilities to encode information in a way that is easy to read, process, and generate (even for human beings as in the case of our formalism).</Paragraph>
    <Paragraph position="2"> Java was chosen because it provides many mechanism to bolster the productivity of a programmer during the development of new software with such things as an extensive programming interface or automatic memory management for example. An additional advantage of Java is the availability of many free and readily usable open-source packages that provide a host of diverse functionalities. The most important ones in our project were the different XML packages and in particular the XML parser Xerces or the XSLT engine Xalan (Apache XML Project, 2002).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Backtracking
</SectionTitle>
      <Paragraph position="0"> During the generation process it is possible that two different templates are applicable at the same time (i.e. they have the same category and all of their conditions are satisfied). Now if one of the templates is selected this leads to one of two different results: * The application of the template was successful which means that all of its actions could be successfully executed and a result was returned. null * The application of the template failed because the execution of one or more of its actions was not successful.</Paragraph>
      <Paragraph position="1"> But the failure of a template described above does not mean that there exist no solution at all. Therefore we backtrack to the point where the unsuccessful template was selected and apply another template. This procedure is repeated until there are no more templates at this backtrack point.</Paragraph>
      <Paragraph position="2"> The underlying implementation of the backtracking mechanism is quite elaborated since it has to take several important issues into account, the most important ones are: Performance issues We implemented several different mechanisms that help to tremendously enhance the performance during the backtracking phase such as the memorization of partial solution.</Paragraph>
      <Paragraph position="3"> Constraint issues We had to take great care of the constraint inheritance mechanism during the backtracking implementation so that an invocation of backtracking does not lead to a misguided percolation of constraints and hence a corrupted morphology.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Programming Interface
</SectionTitle>
      <Paragraph position="0"> So far we have talked about the deployment of XtraGen only on the level of generation grammars and their XML-based formalism. Now we turn to the description of the tasks that have to be undertaken on the level of programming code to make the system run.</Paragraph>
      <Paragraph position="1"> The following shows the individual steps that are be taken to generate some output with XtraGen: Creating a new generator-engine The very first thing to do in order to get the whole system running is to create a new Generator object which represents the core generation engine:</Paragraph>
      <Paragraph position="3"> By doing so one implicitly creates objects for the internal subcomponents such as the already mentioned morphological component and puts them under the control of the generation engine. null Setting the start-category or -id The generation engine needs to know which template it should start from. This is done by specifying either a start-category as in case (a) or a start-identifier as in case (b).</Paragraph>
      <Paragraph position="4">  In the first case a Document object (World Wide Web Consortium, 2000) that contains the grammar in parsed XML-format is passed, in the second case a pre-compiled Grammar object is passed. (see section 2.6) Setting the input In addition to the grammar the generation engine needs an input structure to generate from. This can be set as follows: generator.setInputDocument(</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Document input);
</SectionTitle>
      <Paragraph position="0"> Again, the parameter passed is a Document object that contains the input in parsed XML format. The input can be reassigned between two calls to the generation engine.</Paragraph>
      <Paragraph position="1"> Setting parameters In subsection 2.3 we talked about the use of parameters to control and guide the generation process. The way parameterization works is explained in detail there. To set parameters at runtime one has to add the following methods:  (a) generator.addParameter( String name, String value); (b) generator.addParameter( String name, String value, double weight); This step is only needed if parameterization is desired. Otherwise these methods can be omitted and parameterization is turned automatically off.</Paragraph>
      <Paragraph position="2"> Run the generation process To now actually start the generation process and get some output, one of the following calls can be used: (a) String result = generator.generate(); (b) Document result =  generator.generateDocument(); The difference between the two calls is that in case (a) a simple String containing the result is returned whereas in the case (b) a Document object is passed back.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML