File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/h90-1051_abstr.xml

Size: 19,976 bytes

Last Modified: 2025-10-06 13:46:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1051">
  <Title>Using Explanation-Based Learning to Increase Performance in a Large-Scale NL Query System</Title>
  <Section position="1" start_page="0" end_page="254" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Explanation-based learning (EBL) is a machine-learning technique, closely connected to other techniques like macrooperator learning, chunking, and partial evaluation; a phrase we have found useful for describing the method to logic programmers is example-guided partial evaluation. The basic ideas of the method are well-described in an overview article which recently appeared in Artificial Intelligence \[1\], to which we refer the reader who wants to understand the theoretical principles; here, we will only summarize briefly what EBL means in the context of natural-language processing. A detailed presentation can be found in \[3\] and \[4\].</Paragraph>
    <Paragraph position="1"> What EBL tries to do in the context of NLP is exploit the well-known observation that users of an NL interface tend to ask the same types of question most of the time; lacking exact figures, it seems reasonable to guess that at least 80% of all questions posed to a given specific NL application will be accounted for by the 100 most common question-types. If one had some simple way of automatically identifying these &amp;quot;common&amp;quot; question-types, it would be possible to win a great deal of efficiency by bypassing the normal parsing mechanism in all but the hard cases, Unfortunately, it is not feasible simply to add 100 extra rules to the grammar, since the common question-types vary depending on the application: a construction which occurs constantly in one domain may hardly exist in another.</Paragraph>
    <Paragraph position="2"> Something more sophisticated is required, which is capable of taking examples of common types of query and synthesizing the corresponding special rules. This is exactly what the EBL method offers. The normal route through the parser is extended with an EBL bypass, which contains special rules for efficient processing of common queries; these rules are not coded by the programmer, but rather are produced automatically by inspecting the solutions to previously posed queries of the same type. EBL can thus best be thought of as a way of automatically tuning an NL system to produce increased performance in a particular domain.</Paragraph>
    <Paragraph position="3"> The EBL module can consequently be divided into its compile-time and run-time parts. The compile-time part extracts the learned rules from sample queries: the central component is the generalizer, which in our version is essentially a type of Prolog interpreter. The run-time part then applies the rules to input queries, the compile-time system having previously indexed them so as to make them readily accessible to some kind of table look-up facility. In \[4\], we demonstrated, using examples taken from an application of the EBL method to CHAT-80 \[2\], that table look-up methods of this kind can be implemented quite simply in Prolog with a minimal overhead.</Paragraph>
    <Paragraph position="4"> In the current paper, we describe the results of experiments carried out at IBM Nordic Laboratories, where the EBL method was used on a large-scale NL query interface prototype. The EBL module learns a &amp;quot;two-lever' set of special grammar rules; the top-level rules for S's treat NP's as primitive, and these are supplemented by a second set of rules for common NP's. Both types of rules are learned automatically in the way described above. In the remainder of the paper, we first give a brief overview of the IBM system, concentrating on the features that presented problems for the implementation of the EBL process; we then describe the architecture of the EBL module's compile-time and run-time components. In section 4, we present our experimental results, which indicate fairly unambiguously that the EBL method gives a real, and quite substantial, speed-up of the system as a whole; the final section contains our conclusions together with suggested directions for  * further research.</Paragraph>
    <Paragraph position="5"> 2. Relevant characteristics of the  target NL system The system used for our experiments was a large-scale NL query prototype, implemented in Prolog, which is intended to provide good coverage of a fairly large portion of English. The main components perform the tasks of parsing, semantic interpretation, paraphrasing and database query generation; since the first of these is both the &amp;quot;cleanest&amp;quot; and by far the most time-consuming, we decided only to attempt to apply EBL to this phase of the process. We will thus concentrate exclusively in the following description on the grammar formalism, grammar and parser. As explained in \[4\], the main difficulties derive from the fact that our implementation of the EBL method requires the grammar to be reduced to a set of Horn-clauses: in our earlier experiments with CHAT-80, this was fairly simple, and only involved some minor editing of the code. Here, however, the gap between the grammar and an equivalent &amp;quot;clean&amp;quot; version was non-trivial. This was much more important than the mere increase in its size (~1000 rules, as opposed to 150 for CHAT-80), which in fact caused no problems at all.</Paragraph>
    <Paragraph position="6">  The two major hurdles with regard to the grammar formalism were its non-standard treatment of features and movement. The basic feature operation is not unification, but priority merge: movement is handled not by gap features, but rather by &amp;quot;non-restrictive&amp;quot; rules, in which more than one non-terminal can occur on the left-hand side of the rule as well as the right. Partly due to this, an unusual parsing mechanism is used, in which extra-logical predicates (especially &amp;quot;assert&amp;quot;) play an integral part. To give the flavour of the formalism, the following is a slightly modified version of a typical non-restrictive rule, in this case intended to cover free relatives like the one in &amp;quot;John mentioned a book yesterday which you should read&amp;quot;: s (2,prm=l, fpe (2)) &amp; temp_advp (i, trm=l, fpe (i)) -&gt; temp_advp (dng=0) &amp; s (rel=l) The rule reverses the sequence of temporal adverbial and relative clause, in effect transforming the sentence into &amp;quot;John mentioned a book which you should read yesterday&amp;quot;. The &amp;quot;2&amp;quot; in the first argument position in the left-hand &amp;quot;s&amp;quot; indicates that its features are to be inherited from those in the second constituent on the right-hand side; &amp;quot;prra=l&amp;quot; means that the &amp;quot;13 rra&amp;quot; feature in the inherited set will if necessary be overridden and set to 1.</Paragraph>
    <Paragraph position="7"> As we shall see in section 3.3, the parsing mechanism turns out to be irrelevant for our purposes; all that is significant is the grammar, viewed as a declarative description. We shall accordingly conclude our description of the target system at this point.</Paragraph>
    <Paragraph position="8">  3. Design of the EBL module</Paragraph>
    <Section position="1" start_page="251" end_page="251" type="sub_section">
      <SectionTitle>
3.1 Overall architecture
</SectionTitle>
      <Paragraph position="0"> As explained above, the EBL module can naturally be divided into its compile-time and run-time components, which we will further describe in the following sections.</Paragraph>
      <Paragraph position="1"> For convenience, we will sub-divide the compile-time system into three smaller components. These are the grammar pre-processor, which converts the grammar into a suitable pure Horn-clause representation; the generalizer, which performs the actual extraction of learned rules; and the simplifier, which attempts to reduce them in size by removing unnecessary calls. We now examine each of these in turn.</Paragraph>
    </Section>
    <Section position="2" start_page="251" end_page="251" type="sub_section">
      <SectionTitle>
3.2 The grammar pre-processor
</SectionTitle>
      <Paragraph position="0"> This component performs the job of converting the original grammar into a pure DCG form, in which the first argument of each non-terminal contains a term encoding its derivation history; the motivation for this additional condition will be apparent in the next section. The only non-trivial part of the process, from our viewpoint, was dealing with unrestricted rules, since the other problems had already been taken care of by the normal grammar compiler.</Paragraph>
      <Paragraph position="1"> However, it turned out that this problem could also be solved simply, by first representing the unrestricted rules in Pereira's Extraposition Grammar (XG) format; using the XG compiler from \[2\], it is then straight-forward to turn the grammar into pure Horn-clauses. Conceptually, the XG compiler turns the ~mrestricted grammar into a DCG, where each non-terminal is given an extra pair of arguments (the &amp;quot;extraposition list&amp;quot;), to pass around the additional left-hand constituents. To give an example, the rule quoted at the end of section 2 is represented (again in a slightly edited form) as follows: s(s(rulell2,S,T),Feats l,Sem i, X_in, x (nogap, nonterm~nal,</Paragraph>
      <Paragraph position="3"> {get_feature (Feat s_4, rel, 1 ), put_feature (Feat s_3, prm, i, Feat s_l ), put feature (Feat s_4, trm, I, Feat s_2) }.</Paragraph>
      <Paragraph position="4"> The DCG produced can potentially contain left-recursive rules. However, we shall see in the next section that this causes no problems, since it is not used for normal, unrestricted parsing; the non-terminating branches in the search space can thus never be entered.</Paragraph>
    </Section>
    <Section position="3" start_page="251" end_page="253" type="sub_section">
      <SectionTitle>
3.3 The generalizer
</SectionTitle>
      <Paragraph position="0"> Since a detailed description of the generalizer can be found in \[4\], we will restrict ourselves here to an example and a brief overview. The basic idea is first to define the class of operational goals; by this, we mean the goals which will be allowed to appear on the right-hand-side of learned rules.</Paragraph>
      <Paragraph position="1"> Having done this, a successfully processed example is generalized by (notionally) constructing a derivation tree for it, and then chopping off all the branches rooted in operational goals; the leaves in the new, &amp;quot;generalized&amp;quot; derivation will be the conditions in the learned rule (and thus by construction operational), and the root will be a more general version of the goal corresponding to that in the example. In the simplest (one-level) version of the scheme, operational goals will coincide with lexical ones: thus generalization will be at the word level. An illustrative example is shown in diagram 1.</Paragraph>
      <Paragraph position="2"> A slight refinement is to allow non-lexical operational goals, in particular ones corresponding to NP's. The basic method can now be applied recursively, first to the proof tree corresponding to the entire example, and then to each tree rooted in an operational NP goal; in the latter case, the operationality criterion is once again lexical. This results in the acquisition of two sets of rules, corresponding to the two different operationality criteria: the top-level rules construct S's from NP's and lexical items, and the second-level ones construct NP's from lexical items alone.</Paragraph>
      <Paragraph position="4"> The generalizer is basically a Prolog meta-interpreter, which means that generalization is from a computational perspective essentially the parsing of a query with a DCG; this means that care has to be taken to ensure that parsing efficiency is acceptably high, and even more importantly that infinite recursions are not caused by left-recursive grammar rules. Luckily, there is a simple and uniform way to solve this problem, by exploiting the fact that the first argument in each rule has been set up to hold the derivation history. The query is first run through the normal, &amp;quot;dirty&amp;quot; grammar, to find the intended instantiation of the derivation argument; this is then used to guide DCG parser used by the generalizer, effectively making the &amp;quot;parsing&amp;quot; deterministic. The top-level is thus schematically:  where the predicate names have their obvious meanings.</Paragraph>
    </Section>
    <Section position="4" start_page="253" end_page="254" type="sub_section">
      <SectionTitle>
3.4 The simplifier
</SectionTitle>
      <Paragraph position="0"> The purpose of this module is to attempt to reduce the size of learned rules, in particular calls to feature-manipulation primitives; these make up most of the body of typical rules with on average about 50 calls per rule. The basic mechanism is to take each feature-value, and trace its update history backwards through successive updates.</Paragraph>
      <Paragraph position="1"> Dividing feature-manipulation into &amp;quot;gets&amp;quot; and &amp;quot;puts&amp;quot;, we can optimize in at least the following ways: - Removing &amp;quot;gets&amp;quot; which can already be seen at compile-time to succeed. Since learned rules are compositions of normal ones, this case occurs when one component rule &amp;quot;gets&amp;quot; a feature that an earlier component has &amp;quot;put&amp;quot;. - Removing duplicate copies, when the same &amp;quot;get&amp;quot; occurs more than once in the rule.</Paragraph>
      <Paragraph position="2"> - Reordering the rule body so that all structure-building takes place at the end: this ensures that structure will only be built if the rule succeeds.</Paragraph>
      <Paragraph position="3"> If features were only used for syntax, it would also be possible to perform a further kind of optimization for Slevel rules; having traced each &amp;quot;get&amp;quot; back through the chain of &amp;quot;puts&amp;quot; ending in the feature set it accesses, we could then remove the &amp;quot;puts&amp;quot; altogether. This would represent a very considerable reduction in average rule-size. Semantic processing in the target system is unfortunately not structured so as to allow this, but we think it likely that the method could be applicable in other, similar, contexts.</Paragraph>
      <Paragraph position="4"> The following pseudo-code characterizes the simplification algorithm: Phase 1  1. Combine &amp;quot;gets&amp;quot; and &amp;quot;puts&amp;quot; accessing the same feature set into groups. Replace each group with a corresponding call to get_group or put_group.</Paragraph>
      <Paragraph position="5"> 2. Collect all calls to structure-building routines.</Paragraph>
      <Paragraph position="7"> Go through the body of the rule, passing an alist of annotations; this is used to replace or simplify calls to &amp;quot;get_group&amp;quot;. The alist associates with each feature set a history of its derivation. This is one of</Paragraph>
      <Paragraph position="9"> the feature set was derived from 01 d_ f e at u r e s by the chain of updates Update_set.</Paragraph>
      <Paragraph position="10"> For each literal L in the rule body, do one of the following. i) If L is of the form put group (Old, Updates, New ), then add a suitable entry to the alist, constructed from L and the derivation history of Old.</Paragraph>
      <Paragraph position="11"> ii) IlL is of the form getgroup (Feature_set, Ac c e s s i i st ), replace it with a literal of the form get_gro'up (Ori gi nal, Acce s s_l i st_l ), where: a) Original is the base of the update chain that Feature set belongs to.</Paragraph>
      <Paragraph position="12"> b) Access list_l is derived from Access_list as follows: for each element F=V, if F=VI is in the list of updates, unify V with Vl and throw away F=V.</Paragraph>
      <Paragraph position="13"> iii) If L is of any other form, keep it unaltered. Phase 3  1. Remove duplicate calls.</Paragraph>
      <Paragraph position="14"> 2. Re-expand calls to get_group and put_group. 3. Add structure-building calls to the end of the rule body. 3.5. The pattern-matcher  Since the learned rules acquired by the generalizer in effect comprise a specialized grammar, it would be possible to apply the normal parsing mechanism to them. However, this fails to exploit the grammar's unusually simple structure: the depth of a derivation-tree cannot exceed two, and NP is the only non-lexical category. Thinking about the problem in this way should make the pattern-matcher's construction easy to understand. The rules are compiled into a trie-structure, indexed by constituent category; this can either be &amp;quot;NP&amp;quot;, or some lexical category. The pattern-matcher then locates potentially suitable rules by a kind of non-deterministic LR parsing method, driven by the trie-structure and otherwise optimized to exploit the peculiarities of the situation; a well-formed substring table is used to remember previously located NP's. Our tests indicate that this method is at least five times faster than the target system's normal parser.</Paragraph>
      <Paragraph position="15"> The following pseudo-code characterizes the algorithm. Positions in the input string are marked from 0 to *end*;  * t r i e- root* denotes the root-node of the trie-structure; pointer marks the place we have reached in the input string, t r i e_no de the current position in the rule trie, and nps the sequence of NP's so far located between 0 and p oi nte r. We assume that lexical analysis has already been  performed, so that we can discover by a suitable look-up operation whether or not there is an item of a given lexical category at a given location in the input string.</Paragraph>
      <Paragraph position="16"> Pattern-matching algorithm 1. Set pointer to 0. Set trie-node to *trieroot*. null 2. Set category to the lexical category of the item at pointer.</Paragraph>
      <Paragraph position="17"> 3. Non-determinisfically do one of: a) If there is a tde arc from trie-node to next- null node triggering on category then set trie-node to next-node. Bump pointer andgo back to 2.</Paragraph>
      <Paragraph position="18"> b) If there is atde arc from trie-node to next-node triggering on &amp;quot;NP&amp;quot;, and there is an NP from pointer to next-pointer, set trie-node to next-node, set pointer to nextpointer, push the found NP onto nps, and go back to 2.</Paragraph>
      <Paragraph position="19"> c) If pointer = *end*, and trie-node is a leaf of the trie marked with a rule, then try to apply it to the whole input string, if necessary looking up NP's in sequence from rips.</Paragraph>
      <Paragraph position="20"> The subroutine for finding NP's is similar, though slightly simpler; the variable and constant names correspond in the obvious way to those in the first algorithm. To find an NP from pointer to next-pointer:  1. If the well-formed substring table records that NP's have been searched for at pointer, pick one non-deterministically and return, else 2. Set NP-pointer to pointer. Set NP-trie-node to *NP-trie-root*.</Paragraph>
      <Paragraph position="21"> 3. Set NP-category to the lexical category of the item at NP-pointer.</Paragraph>
      <Paragraph position="22"> 4. Non-deterministically do one of: a) Find a trie arc from NP-trie-node to NP- null next-node UJggering on category. Set NP-trie-node to NP-next-node. Bump NP-pointer and go back to 3.</Paragraph>
      <Paragraph position="23"> b) If there is a reduction rule at NP-trie-node, attempt to apply it to the segment of the input string joining pointer to NP-pointer, and record the result in the well-formed substring table. Then return.</Paragraph>
      <Paragraph position="24"> c) If NP-pointer = pointer and there are no alternatives left, record in the well-formed substring table that NP's have been searched for at pointer, and return with failure.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML