File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1028_metho.xml
Size: 19,254 bytes
Last Modified: 2025-10-06 14:14:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1028"> <Title>Applying Explanation-based Learning to Control and Speeding-up Natural Language Generation</Title> <Section position="5" start_page="214" end_page="217" type="metho"> <SectionTitle> 3 Overview of the method </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> The above figure displays the overall architecture of the EBL learning method. The right-hand part of the diagram shows the linguistic competence base (LCB) and the left the EBL-based subgrammar processing component (SGP).</Paragraph> <Paragraph position="3"> LCB corresponds to the tactical component of a general natural language generation system NLG. In this paper we assume that the strategic component of the NLG has already computed the MRS representation of the information of an underlying computer program. SGP consists of a training module TM, an application module AM, and the subgram2But note, our approach does not depend on a flat representation of logical forms. However, in the case of conventional representation form, the mechanisms for indexing the trained structures would require more complex abstract data types (see sec. 4 for more details). mar, automatically determined by TM and applied by AM.</Paragraph> <Paragraph position="4"> Briefly, the flow of control is as follows: During the training phase of the system, a new logical form mrs is given as input to the LCB. After grammatical processing, the resulting feature structure fs(mrs) (i.e., a feature structure that contains among others the input MRS, the computed string and a representation of the derivation tree) is passed to TM. TM extracts and generalizes the derivation tree of fs(mrs), which we call the template tempi(mrs) of fs(mrs), tempi(mrs) is then stored in a decision tree, where indices are computed from the MRS found under the root of tempi(mrs). During the application phase, a new semantic input mrs t is used for the retrieval of the decision tree. If a candidate template can be found and successfully instantiated, the resulting feature structure fs(mrd) constitutes the generation result of mrs ~.</Paragraph> <Paragraph position="5"> Thus described, the approach seems to facilitate only exact retrieval and matching of a new semantic input. However, before we describe how partial matching is realized, we will demonstrate in more detail the exact matching strategy using the example MRS shown in figure 1.</Paragraph> <Paragraph position="6"> Training phase The training module TM starts right after the resulting feature structure fs for the input MRS mrs has been computed. In the first phase, TM extracts and generalizes the derivation tree of fs, called the template of fs. Each node of the template contains the rule name used in the corresponding derivation step and a generalization of the local MRS. A generalized MRS is the abstraction of the LISZT value of a MRS where each element only contains the (lexical semantic) type and HANDEL information (the HANDEL information is used for directing lexical choice (see below)).</Paragraph> <Paragraph position="7"> In our example mrs, figure 2 displays the generalized MRS mrsg. For convenience, we will use the more compact notation: {(SandyRel h4), (Giveael hl), (TempOver hl), (Some h9), (ChairRel hl0), (To h12), (KimRel h14)} Using this notation, figure 4 (see next page) displays the template tempi(mrs) obtained from fs.</Paragraph> <Paragraph position="8"> Note that it memorizes not only the rule application structure of a successful process but also the way the grammar mutuMly relates the compositional parts of the input MRS.</Paragraph> <Paragraph position="9"> In the next step of the training module TM, the generalized MRS mrs~ information of the root node of tempi(mrs) is used for building up an index in a decision tree. Remember that the relative order of the elements of a MRS is immaterial. For that reason, the elements of mrsg are alphabetically ordered, so that we can treat it as a sequence when used as a new index in the decision tree.</Paragraph> <Paragraph position="10"> The alphabetic ordering has two advantages.</Paragraph> <Paragraph position="11"> Firstly, we can store different templates under a common prefix, which allows for efficient storage and retrieval. Secondly, it allows for a simple efficient treatment of MRS as sets during the retrieval phase of the application phase.</Paragraph> <Paragraph position="12"> are in bold.</Paragraph> <Paragraph position="13"> Application phase The application module AM basically performs the following steps: 1. Retrievah For a new MRS mrs' we first construct the alphabetically sorted generalized MRS mrsg. mr% is then used as a path description for traversing the decision tree. For reasons we will explain soon, traversal is directed by type ! subsumption. Traversal is successful if mrsg has been completely processed and if the end node in the decision tree contains a template. Note that because of the alphabetic ordering, the relative order of the elements of new input mrs ~ is immaterial.</Paragraph> <Paragraph position="14"> 2. Expansion: A successfully retrieved template templ is expanded by deterministically applying the rules denoted by the non-terminal elements from the top downwards in the order specified by tempi. In some sense, expansion just re-plays the derivation obtained in the past. This will result in a grammatically fully expanded feature structure, where only lexical specific information is still missing. But note that through structure sharing the terminal elements will already be constrained by syntactic information. 3 3It is possible to perform the expansion step off-line as early as the training phase, in which case the application phase can be sped up, however at the price of more memory being taken up.</Paragraph> <Paragraph position="15"> 3. Lexical lookup: From each terminal element of the unexpanded template templ the type and HANDEL information is used to select the corresponding element from the input MRS mrs' (note that in general the MRS elements of the mrs' are much more constrained than their corresponding elements in the generalized MRS mrs'g). The chosen input MRS element is then used for performing lexical lookup, where lexical elements are indexed by their relation name. In general this will lead to a set of lexical candidates. null 4. Lexical instantiation: In the last step of the application phase, the set of selected lexical elements is unified with the constraints of the terminal elements in the order specified by the terminal yield. We also call this step terminalmatching. In our current system terminal-matching is performed from left to right. Since the ordering of the terminal yield is given by the template, it is also possible to follow other selection strategies, e.g., a semantic head-driven strategy, which could lead to more efficient terminal-matching, because the head element is supposed to provide selectional restriction information for its dependents.</Paragraph> <Paragraph position="16"> A template together with its corresponding index describes all sentences of the language that share the same derivation and whose MRS are consistent with that of the index. Furthermore, the index and the MRS of a template together define a normalization for the permutation of the elements of a new input MRS. The proposed EBL method guarantees soundness because retaining and applying the original derivation in a template enforces the full constraints of the original grammar.</Paragraph> <Paragraph position="17"> Achieving more generality So far, the application phase will only be able to re-use templates for a semantic input which has the same semantic type information. However, it is possible to achieve more generality, if we apply a further abstraction step on a generalized MRS. This is simply achieved by selecting a supertype of a MRS element instead of the given specialized type.</Paragraph> <Paragraph position="18"> The type abstraction step is based on the standard assumption that the word-specific lexical semantic types can be grouped into classes representing morpho-syntactic paradigms. These classes define the upper bounds for the abstraction process. In our current system, these upper bounds are directly used as the supertypes to be considered during the type abstraction step. More precisely, for each element x of a generalized MRS mrsg it is checked whether its type Tx is subsumed by an upper bound T, (we assume disjoint sets). Only if this is the case, Ts replaces Tx in mrsg.4 Applying this type abstraction strategy on the MRS of figure 1, we obtain: where e.g., NAMED is the common supertype of SANDYREL and KIMREL, and ACTUNDPREP is the supertype of GIVEREL. Figure 5 shows the template templg obtained from fs using the more general MRS information. Note, that the MRS of the root node is used for building up an index in the decision tree.</Paragraph> <Paragraph position="19"> Now, if retrieval of the decision tree is directed by type subsumption, the same template can be retrieved and potentially instantiated for a wider range of new MRS input, namely for those which are type compatible wrt. subsumption relation. Thus, the template templ 9 can now be used to generate, e.g., the string &quot;Kim gives a table to Peter&quot;, as well as the string &quot;Noam donates a book to Peter&quot;. However, it will not be able to generate a sentence like &quot;A man gives a book to Kim&quot;, since the retrieval</Paragraph> </Section> <Section position="6" start_page="217" end_page="217" type="metho"> <SectionTitle> 4 Of course, if a very fine-grained lexical semantic type </SectionTitle> <Paragraph position="0"> hierarchy is defined then a more careful selection would be possible to obtained different degrees of type abstraction and to achieve a more domain-sensitive determination of the subgrammars. However, more complex type abstraction strategies are then needed which would be able to find appropriate supertypes automatically.</Paragraph> <Paragraph position="1"> phase will already fail. In the next section, we will show how to overcome even this kind of restriction.</Paragraph> </Section> <Section position="7" start_page="217" end_page="218" type="metho"> <SectionTitle> 4 Partial Matching </SectionTitle> <Paragraph position="0"> The core idea behind partial matching is that in case an exact match of an input MRS fails we want at least as many subparts as possible to be instantiated.</Paragraph> <Paragraph position="1"> Since the instantiated template of a MRS subpart corresponds to a phrasal sign, we also call it a phrasal template. For example, assuming that the training phase has only to be performed for the example in figure 1, then for the MRS of &quot;A man gives a book to Kim&quot;, a partial match would generate the strings &quot;a man&quot; and &quot;gives a book to Kim&quot;.5 The instantiated phrasal templates are then combined by the tactical component to produce larger units (if possible, see below).</Paragraph> <Paragraph position="2"> Extended training phase The training module is adapted as follows: Starting from a template templ obtained for the training example in the manner described above, we extract recursively all possible subtrees templs also called phrasal templates. Next, each phrasal template is inserted in the decision tree in the way described above.</Paragraph> <Paragraph position="3"> It is possible to direct the subtree extraction process with the application of filters, which are applied to the whole remaining subtree in each recursive step. By using these filters it is possible to restrict the range of structural properties of candidate phrasal templates (e.g., extract only saturated NPs, or subtrees having at least two daughters, or sub-trees which have no immediate recursive structures). These filters serve the same means as the &quot;chunking criteria&quot; described in (Rayner and Carter, 1996).</Paragraph> <Paragraph position="4"> During the training phase it is recognized for each phrasal template templs whether the decision tree already contains a path pointing to a previously extracted and already stored phrasal template tempi's, such that templs = templ's. In that case, templ~ is not inserted and the recursion stops at that branch.</Paragraph> <Paragraph position="5"> Extended application phase For the application module, only the retrieval operation of the decision tree need be adapted.</Paragraph> <Paragraph position="6"> Remember that the input of the retrieval operation is the sorted generalized MRS mrsg of the input MRS mrs. Therefore, mrsg can be handled like a sequence. The task of the retrieval operation in the case of a partial match is now to potentially find all subsequences of mrsg which lead to a template.</Paragraph> <Paragraph position="7"> 5If we would allow for an exhaustive partial match (see below) then the strings '% book&quot; and &quot;Kim&quot; would additionally be generated.</Paragraph> <Paragraph position="8"> In case of exact matching strategy, the decision tree must be visited only once for a new input. In the case of partial matching, however, the decision tree describes only possible prefixes for a new input.</Paragraph> <Paragraph position="9"> Hence, we have to recursively repeat retrieval of the decision tree as long as the remaining suffix is not empty. In other words, the decision tree is now a finite representation of an infinite structure, because implicitly, each endpoint of an index bears a pointer to the root of the decision tree.</Paragraph> <Paragraph position="10"> Assuming that the following template/index pairs have been inserted into the decision tree: (ab, tl), (abcd, t2), (bcd, t3). Then retrieval using the path abcd will return all three templates, retrieval using aabbcd will return template tl and t3, and abc will only return tl.6 Interleaving with normal processing Our EBL method can easily be integrated with normal processing, because each instantiated template can be used directly as an already found sub-solution.</Paragraph> <Paragraph position="11"> In case of an agenda-driven chart generator of the kind described in (Neumann, 1994a; Kay, 1996), an instantiated template can be directly added as a passive edge to the generator's agenda. If passive edges with a wider span are given higher priority than those with a smaller span, the tactical generator would try to combine the largest derivations before smaller ones, i.e., it would prefer those structures determined by EBL.</Paragraph> </Section> <Section position="8" start_page="218" end_page="218" type="metho"> <SectionTitle> 5 Implementation </SectionTitle> <Paragraph position="0"> The EBL method just described has been fully implemented and tested with a broad coverage HPSG-based English grammar including more than 2000 fully specified lexical entries. 7 The TDL grammar formalism is very powerful, supporting distributed disjunction, full negation, as well as full boolean type logic.</Paragraph> <Paragraph position="1"> In our current system, an efficient chart-based bidirectional parser is used for performing the training phase. During training, the user can interactively select which of the parser's readings should be considered by the EBL module. In this way the user can control which sort of structural ambiguities should be avoided because they are known to cause misunderstandings. For interleaving the EBL application phase with normal processing a first pro6It is possible to parameterize our system to perform an exhaustive or a non-exhaustive strategy. In the non-exhaustive mode, the longest matching prefixes axe preferred.</Paragraph> <Paragraph position="2"> ~This grammar has been developed at CSLI, Stanford, and kindly be provided to the author.</Paragraph> <Paragraph position="3"> totype of a chart generator has been implemented using the same grammar as used for parsing.</Paragraph> <Paragraph position="4"> First tests has been carried out using a small test set of 179 sentences. Currently, a parser is used for processing the test set during training. Generation of the extracted templates is performed solely by the EBL application phase (i.e., we did not considered integration of EBL and chart generation). The application phase is very efficient. The average processing time for indexing and instantiation of a sentence level template (determined through parsing) of an input MRS is approximately one second. S Compared to parsing the corresponding string the factor of speed up is between 10 to 20. A closer look to the four basic EBL-generation steps: indexing, instantiation, lexical lookup, and terminal matching showed that the latter is the most expensive one (up to 70% of computing time). The main reasons are that 1.) lexical lookup often returns several lexical readings for an MRS element (which introduces lexical non-determinism) and 2.) the lexical elements introduce most of the disjunctive constraints which makes unification very complex. Currently, terminal matching is performed left to right. However, we hope to increase the efficiency of this step by using head-oriented strategies, since this might help to re-solve disjunctive constraints as early as possible.</Paragraph> </Section> <Section position="9" start_page="218" end_page="219" type="metho"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> The only other approach I am aware of which also considers EBL for NLG is (Samuelsson, 1995a; Samuelsson, 1995b). However, he focuses on the compilation of a logic grammar using LR-compiling techniques, where EBL-related methods are used to optimize the compiled LR tables, in order to avoid spurious non-determinisms during normal generation. He considers neither the extraction of a specialized grammar for supporting controlled language generation, nor strong integration with the normal generator.</Paragraph> <Paragraph position="1"> However, these properties are very important for achieving high applicability. Automatic grammar extraction is worthwhile because it can be used to support the definition of a controlled domain-specific language use on the basis of training with a general source grammar. Furthermore, in case exact matching is requested only the application module is needed for processing the subgrammar. In case of normal processing, our EBL method serves as a speed-up mechanism for those structures which have SEBL-based generation of all possible templates of an input MRS is less than 2 seconds. The tests have been performed using a Sun UltraSpaxc.</Paragraph> <Paragraph position="2"> &quot;actually been used or uttered&quot;. However, completeness is preserved.</Paragraph> <Paragraph position="3"> We view generation systems which are based on &quot;canned text&quot; and linguistically-based systems simply as two endpoints of a contiguous scale of possible system architectures (see also (Dale et al., 1994)). Thus viewed, our approach is directed towards the automatic creation of application-specific generation systems.</Paragraph> </Section> class="xml-element"></Paper>