XML Viewer - w04-0306

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0306_intro.xml
Size: 20,770 bytes
Last Modified: 2025-10-06 14:02:26
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0306">
  <Title>An Efficient Algorithm to Induce Minimum Average Lookahead Grammars for Incremental LR Parsing</Title>
  <Section position="2" start_page="0" end_page="3" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Marcus' (1980) Determinism Hypothesis proposed that natural language can be parsed by a mechanism that operates &amp;quot;strictly deterministically&amp;quot; in that it does not simulate a nondeterministic machine. Although the structural details of the deterministic LR  The author would like to thank the Hong Kong Research Grants Council (RGC) for supporting this research in part through research grants RGC6083/99E, RGC6256/00E, and DAG03/04.EG09.</Paragraph>
    <Paragraph position="1"> parsing model we employ in this paper diverge from those of Marcus, fundamentally we adhere to his constraints that (1) all syntactic substructures created are permanent, which prohibits simulating determinism by backtracking, (2) all syntactic sub-structures created for a given input must be part of the output structure assigned to that input,which prohibits memoizing intermediate results as in dynamic programming or beam search, and (3) no temporary syntactic structures are encoded within the internal state of the machine, which prohibits the moving of temporary structures into procedural codes.</Paragraph>
    <Paragraph position="2"> A key issue is that, to give the Determinism Hypothesis teeth, it is necessary to limit the size of the decision window. Otherwise, it is always possible to circumvent the constraints simply by increasing the degree of lookahead or, equivalently, increasing the buffer size (which we might call the degree of &amp;quot;look-behind&amp;quot;); either way, increasing the decision window essentially delays decisions until enough disambiguating information is seen. In the limit, a decision window equal to the sentence length renders the claim of incremental parsing meaningless. Marcus simply postulated that a maximum buffer size of three was sufficient. In contrast, our approach permits greater flexibility and finer gradations, where the average degree of lookahead required can be minimized with the aim of assisting grammar induction.</Paragraph>
    <Paragraph position="3"> Since Marcus' work, a significant body of work on incremental parsing has developed in the sentence processing community, but much of this work has actually suggested models with an increased amount of nondeterminism, often with probabilistic weights (e.g., Narayanan &amp; Jurafsky (1998); Hale (2001)).</Paragraph>
    <Paragraph position="4"> Meanwhile, in the way of formal methods, Tomita (1986) introduced Generalized LR parsing, which offers an interesting hybrid of nondeterministic dynamic programming surrounding LR parsing methods that were originally deterministic.</Paragraph>
    <Paragraph position="5"> Additionally, methods for determinizing and minimizing finite-state machines are well known (e.g., Mohri (2000), B  al &amp; Carton (1968)). However, such methods (a) do not operate at the context-free level, (b) do not directly minimize lookahead, and (c) do not induce grammars under environmental constraints.</Paragraph>
    <Paragraph position="6"> Unfortunately, there has still been relatively little work on automatic learning of grammars for deterministic parsers to date. Hermjakob &amp; Mooney (1997) describe a semi-automatic procedure for learning a deterministic parser from a treebank, which requires the intervention of a human expert in the loop to determine appropriate derivation order, to resolve parsing conflicts between certain actions such as &amp;quot;merge&amp;quot; and &amp;quot;add-into&amp;quot;, and to identify specific features for disambiguating actions. In our earlier work we described a deterministic parser with a fully automatically learned decision algorithm (Wong and Wu, 1999). But unlike our present work, the decision algorithms in both Hermjakob &amp; Mooney (1997) and Wong &amp; Wu (1999) are procedural; there is no explicit representation of the grammar that can be meaningfully inspected.</Paragraph>
    <Paragraph position="7"> Finally, we observe that there are also trainable stochastic shift-reduce parser models (Briscoe and Carroll, 1993), which are theoretically related to shift-reduce parsing, but operate in a highly nondeterministic fashion during parsing.</Paragraph>
    <Paragraph position="8"> We believe the shortage of learning models for deterministic parsing is in no small part due to the difficulty of overcoming computational complexity barriers in the optimization problems this would involve. Many types of factors need to be optimized in learning, because deterministic parsing is much more sensitive to incorrect choice of structural features (e.g., categories, rules) than nondeterministic parsing that employ robustness mechanisms such as weighted charts.</Paragraph>
    <Paragraph position="9"> Consequently, we suggest shifting attention to the development of new methods that directly address the problem of optimizing criteria associated with deterministic parsing, in computationally feasible ways. In particular, we aim in this paper to develop a method that efficiently searches for a parser under a minimum average lookahead cost function.</Paragraph>
    <Paragraph position="10"> It should be emphasized that we view the role of a deterministic parser as one component in a larger model. A deterministic parsing stage can be expected to handle most input sentences, but not all. Other nondeterministic mechanisms will clearly be needed to handle a minority of cases, the most obvious being garden-path sentences.</Paragraph>
    <Paragraph position="11"> In the sections that follow, we first formalize the learning problem. We then describe an efficient approximate algorithm for this task. The operation of this algorithm is illustrated with an example. Finally, we give an analysis of the complexity characteristics of the algorithm.</Paragraph>
    <Paragraph position="13"> where the average lookahead objective function</Paragraph>
    <Paragraph position="15"> that an LR parser for G needs in order to deterministically parse the sample S without any shift-reduce or reduce-reduce conflicts. If G is ambiguous in the sense that it generates more than one parse for any sentence in S,then ^ k (G)=[?] since a conflict is unavoidable no matter how much lookahead is used. In other words, G</Paragraph>
    <Paragraph position="17"> that requires the smallest number of lookaheads on average so as to make parsing S using this subset of G deterministic.</Paragraph>
    <Paragraph position="18"> Note that the constraining grammar G C by nature is not deterministic. The constraining grammar serves merely as an abstract model of environmental constraints that confirm or reject potential parses. Since such feedback is typically quite permissive, the constraining grammar typically allows a great deal of ambiguity. This of course renders the constraining grammar itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. In other words, we should not expect the constraining grammar alone to contain sufficient information to choose a deterministically parsable grammar.</Paragraph>
    <Paragraph position="19"> For expository simplicity we assume all grammars are in standard context-free form in the discussion that follows, although numerous notational variations, generalizations, and restricted cases are certainly possible. We note also that, although the formalization is couched in terms of standard syntactic phrase structures, there is no reason why one could not employ categories and/or attributes on parse nodes representing semantic features. Doing so would permit the framework to accommodate some semantic information in minimizing lookahead for deterministic parsing, which would be more realistic from a cognitive modeling standpoint. (Of course, further extensions to integrate more complex incremental semantic interpretation mechanisms into this framework could be explored as well.) Finding the minimum average lookahead grammar is clearly a difficult optimization problem. To compute the value of ^ k (G), one needs the LR table for that particular G, which is expensive to compute. Computing the LR table for all G [?] G C would be infeasible. It is a natural conjecture, in fact, that the problem of learning MAL grammars is NP-hard. We therefore seek an approximation algorithm with good performance, as discussed next.</Paragraph>
    <Paragraph position="20"> 3 An efficient approximate algorithm for learning incremental MAL parsers We now describe an approximate method for efficiently learning a MAL grammar. During learning, the MAL grammar is represented simultaneously as both a set of standard production rules as well as an LR parsing table. Thus the learning algorithm outputs an explicit declarative rule set together with a corresponding compiled LR table.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 Approximating assumptions
</SectionTitle>
      <Paragraph position="0"> To overcome the obstacles mentioned in the previous section, we make the following approximations: 1. Incremental approximation for MAL rule set computation. We assume that the MAL grammar for a given corpus is approximately equal to the sequential union of the MAL grammar rules for each sentence in the corpus, where the set of MAL grammar rules for each sentence is determined relative to the set of all rules selected from preceding sentences in the corpus.</Paragraph>
      <Paragraph position="1"> 2. Incremental approximation for LR state set computation. We assume that the correct set of LR states for a given set of rules is approximately equal to that obtained by incrementally modifying the LR table and states from a slightly smaller subset of the rules. (Our approach exploits the fact that the correct set of states for LR (k) parsers is always independent of k.) Combining these approximation assumptions enables us to utilize a sentence-by-sentence greedy approach to seeking a MAL grammar. Specifically, the algorithm iteratively computes a minimum average lookahead set of rules for each sentence in the training corpus, accumulating all rules found, while keeping the LR state set and table updated. The full algorithm is fairly complex; we discuss its key aspects here.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.2 Structure of the iteration
</SectionTitle>
      <Paragraph position="0"> , and outputs the LR table for a parser that can deterministically parse the entire training corpus using a minimum average lookahead.</Paragraph>
      <Paragraph position="1"> The algorithm consists of an initialization step followed by an iterative loop. In the initialization step in lines 1-3, we create an empty LR table T, along with an empty set A of parsing action sequences defined as follows. A parsing action sequence A(P) for a given parse P is the sequence of triples (state, input, action) that a shift-reduce parser follows in order to construct P. At any given point, T will hold the LR table computed from the MAL parse of all sentences already processed, and A will hold the corresponding parsing sequences for the MAL parses.</Paragraph>
      <Paragraph position="2"> Entering the loop, we iteratively augment T by adding the states arising from the MAL parse F [?] of each successive sentence in the training corpus and, in addition, cache the corresponding parsing action sequence A(F [?] ) into the set A. This is done by first computing a chart for the sentence, in line 4, by</Paragraph>
      <Paragraph position="4"> ing the standard Earley (1970) procedure. We then call find MAL parse in line 5, to compute the parse that requires minimum average lookahead to resolve ambiguity. The items and states produced by the rules in F [?] are added to the LR table T by calling incremental update LR in line 6, and the parsing action sequence of F [?] is appended to A in line 7.</Paragraph>
      <Paragraph position="5"> Note that the indices of the original states in T are not changed and only items are added into them if need be so that A is not changed by adding items and states to T, and there might be new states introduced which are also indexed.</Paragraph>
      <Paragraph position="6"> By definition, the true MAL grammar does not depend on the order of the sentences the learning algorithm inspects. However, find MAL parser processes the example sentences in order, and attempts to find the MAL grammar sentence by sentence. The order of the sentences impacts the grammar produced by the learning algorithm, so it is not guaranteed to find the true MAL grammar. However the approximation is well motivated particularly when we have large numbers of example sentences.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.3 Incrementally updating the LR table
</SectionTitle>
      <Paragraph position="0"> Given the structure of the loop, it can be seen that efficient learning of the set of MAL rules cannot be achieved without a component that can update the LR table incrementally as each rule is added into the  current MAL grammar. Otherwise, every time a rule is found to be capable of reducing average lookahead and therefore is added into the MAL grammar, the LR table must be recomputed from scratch, which is sufficiently time consuming as to be infeasible when trying to learn a MAL grammar with a realistically large input corpus and/or constraining grammar.</Paragraph>
      <Paragraph position="1"> The incremental update LR function incrementally updates the LR table in an efficient fashion that avoids recomputing the entire table. The inputs to incremental update LR are a pre-existing LR table T, and a set of new rules R to be added.</Paragraph>
      <Paragraph position="2"> This algorithm is derived from the incremental LR parser generator algorithm ilalr and is relatively complex; see Horspool (1988) for details. Historically, work on incremental parser generators first concentrated on LL(1) parsers. Fischer (1980) was first to describe a method for incrementally updating an LR(1) table. Heering et al.(1990) use the principle of lazy evaluation to attack the same problem. Our design of incremental update LR is more closely related to ilalr for the following reasons:  * ilalr has the property that when updating the LR table to contain the newly added rules, it does not change the index of each already existing state. This is important for our task as the slightest change in the states might affect significantly the parsing sequences of the sen- null tences that have already been processed.</Paragraph>
      <Paragraph position="3"> * Although worst case complexity for ilalr is exponential in the number of rules in the grammar, empirically it is quite efficient in practical use. Heuristics are used to improve the speed of the algorithm, and as we do not need to compute lookahead sets, the speed of the algorithm can be further improved.</Paragraph>
      <Paragraph position="4">  The method is approximate, and may yield slight, acceptable deviations from the optimal table. ilalr is not an exact LR table generator in the sense that it may create states that should not exist and may miss some states that should exist. The algorithm is based on the assumption that most states in the original LR table occur with the same kernel items in the updated LR table. Empirically, the assumption is valid as the proportion of superfluous states is typically only in the 0.1% to 1.3% range.</Paragraph>
    </Section>
    <Section position="4" start_page="1" end_page="3" type="sub_section">
      <SectionTitle>
3.4 Finding minimum average lookahead
</SectionTitle>
      <Paragraph position="0"> parses The function find MAL parse selects the full parse F[?] of a given sentence that requires the least av-</Paragraph>
      <Paragraph position="2"> reduce or reduce-reduce conflicts with a set A of parsing action sequences, such that F[?] is a sub-set of a chart C. The inputs to find MAL parse, more specifically, are a chart C containing all the partial parses in the input sentence, and the set A containing the parsing action sequences of the MAL parse of all sentences processed so far. The algorithm operates by constructing a graph-structured stack of the same form as in GLR parsing (Tomita, 1986)(Tomita and Ng, 1991) while simultaneously computing the minimum average lookahead. Note that Tomita's original method for constructing the graph-structured stack has exponential time com-</Paragraph>
      <Paragraph position="4"> ,inwhichn and r are the length of the sentence and the length of the longest rhs of any rule in the grammar. As a result, Tomita's algorithm  parenrightbig for grammars in Chomsky normal form but is potentially exponential when productions are of unrestricted length, which in practice is the case with most parsing problems. We follow Kipps (1991) in modifying Tomita's algorithm to allow it to run in time proportional to n  for grammars with productions of arbitrary length. The most time consuming part in Tomita's algorithm is when reduce actions are executed in which the ancestors of the current node have to be found incurring time complexity n r . To avoid this we employ an ancestor table to keep the ancestors of each node in the GLR forest which is updated dynamically as the GLR forest is being constructed. This modification brings down the time complexity of reduce actions to n  in the worst case, and allows the function build GLR forest to construct the graph-structured stack in O parenleftbig n  parenrightbig . Aside from constructing the graph-structured stack, we compute the average lookahead for each LR state transition taken during the construction. Whenever there is a shift or reduce action in the algorithm, a new vertex for the graph-structured stack is generated, and the function compute average lookahead is called to ascertain the average lookahead of the new vertex. Finally, reconstruct MAL parse is called to recover the full parse F[?] for the MAL parsing action sequence. null Figure 2 shows the compute average lookahead function, which estimates the average lookahead of a vertex v generated by an LR state transition r. To facilitate computations involving average lookahead, we use a 6-tuple (i,s,a,k,r,d) instead of the more common triple form (i,s,a) to represent each vertex in the graph-structured stack, where: * i: The index of the right side of the coverage of the vertex. The vertices with the same right side i will be kept in U</Paragraph>
      <Paragraph position="6"> * s: The state in which the vertex is in.</Paragraph>
      <Paragraph position="7"> * a: The ancestor table of the vertex.</Paragraph>
      <Paragraph position="8"> * k: The average lookahead information, in the  form of a pair (m,l) where l is the minimum average lookahead of all paths leading from the root to this vertex and m is the number of state transitions in that MAL path.</Paragraph>
      <Paragraph position="9"> * r: The parsing action that generates the vertex along the path that needs minimum average lookahead. r is a triple (d  (1) S - NP VP (2) VP - vNP (3) VP - vPP (4) VP - v (5) VP - vp (6) VP - vdet (7) PP - pNP (8) NP - NP PP (9) NP - n (10) NP - det n (11) VP - VP n d are the index of vertices and f is an action, and the set A containing the parsing action sequences of the MAL parse of all sentences processed so far. Let v =(i,s,a,k,r,d) be the new vertex with index d, and let v  . The function proceeds by first computing the lookahead needed to resolve conflicts between r and A. Next we check whether v is a packed node and initialize k in v; if not, k is initialized to (0,0), and otherwise it is copied from the packed node. We then compute the average lookahead needed to go from v prime to v and check whether it provides a more economical way to resolve conflicts. The average lookahead of a vertex v generated by applying an action f on vertex v prime can be computed from k prime of v prime and the lookahead needed to generate v from v prime . v can be generated by applying different actions on different vertices and k keeps the one that needs minimum average lookahead and f keeps that action. null Finally, the reconstruct MAL parse function is called after construction of the entire graph-structured stack is complete in order to recover the full minimum average lookahead parse tree. We assume the grammar has only one start symbol and rebuild the parse tree from the state that is labelled with the start symbol.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML