File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/p03-1026_metho.xml
Size: 17,947 bytes
Last Modified: 2025-10-06 14:08:14
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1026"> <Title>A Tabulation-Based Parsing Method that Reduces Copying</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Why Prolog? </SectionTitle> <Paragraph position="0"> Apology! This paper is not an attempt to show that a Prolog-based parser could be as fast as a phrase-structure parser implemented in an imperative programming language such as C. Indeed, if the categories of a grammar are discretely ordered, chart edges can be used for further parsing in situ, i.e., with no copying out of the table, in an imperative programming language. Nevertheless, when the categories are partially ordered, as in unification-based grammars, there are certain breadth-first parsing control strategies that require even imperatively implemented parsers to copy edges out of their tables. null What is more important is the tradeoff at stake between efficiency and expressiveness. By improving the performance of Prolog-based parsing, the computational cost of its extra available expressive devices is effectively reduced. The alternative, simple phrase-structure parsing, or extended phrase-structure-based parsing with categories such as typed feature structures, is extremely cumbersome for large-scale grammar design. Even in the handful of instances in which it does seem to have been successful, which includes the recent HPSG English Resource Grammar and a handful of Lexical-Functional Grammars, the results are by no means graceful, not at all modular, and arguably not reusable by anyone except their designers.</Paragraph> <Paragraph position="1"> The particular interest in Prolog's expressiveness arises, of course, from the interest in generalized context-free parsing beginning with definite clause grammars (Pereira and Shieber, 1987), as an instance of a logic programming control strategy. The connection between logic programming and parsing is well-known and has also been a very fruitful one for parsing, particularly with respect to the application of logic programming transformations (Stabler, 1993) and constraint logic programming techniques to more recent constraint-based grammatical theories. Relational predicates also make grammars more modular and readable than pure phrase-structure-based grammars.</Paragraph> <Paragraph position="2"> Commercial Prolog implementations are quite difficult to beat with imperative implementations when it is general logic programming that is required. This is no less true with respect to more recent data structures in lexicalized grammatical theories. A recent comparison (Penn, 2000) of a version between ALE (which is written in Prolog) that reduces typed feature structures to Prolog term encodings, and LiLFeS (Makino et al., 1998), the fastest imperative re-implementation of an ALE-like language, showed that ALE was slightly over 10 times faster on large-scale parses with its HPSG reference grammar than LiLFeS was with a slightly more efficient version of that grammar.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Empirical Efficiency </SectionTitle> <Paragraph position="0"> Whether this algorithm will outperform standard Prolog parsers is also largely empirical, because: 1. one of the two copies is kept on the heap itself and not released until the end of the parse. For large parses over large data structures, that can increase the size of the heap significantly, and will result in a greater number of cache misses and page swaps.</Paragraph> <Paragraph position="1"> 2. the new algorithm also requires an off-line partial evaluation of the grammar rules that increases the number of rules that must be iterated through at run-time during depth-first closure. This can result in redundant operations being performed among rules and their partially evaluated instances to match daughter categories, unless those rules and their partial evaluations are folded together with local disjunctions to share as much compiled code as possible.</Paragraph> <Paragraph position="2"> A preliminary empirical evaluation is presented in Section 8.</Paragraph> <Paragraph position="3"> Oepen and Carroll (2000), by far the most comprehensive attempt to profile and optimize the performance of feature-structure-based grammars, also found copying to be a significant issue in their imperative implementations of several HPSG parsers -- to the extent that it even warranted recomputing unifications in places, and modifying the manner in which active edges are used in their fastest attempt (called hyper-active parsing). The results of the present study can only cautiously be compared to theirs so far, because of our lack of access to the successive stages of their implementations and the lack of a common grammar ported to all of the systems involved. Some parallels can be drawn, however, particularly with respect to the utility of indexing and the maintenance of active edges, which suggest that the algorithm presented below makes Prolog behave in a more &quot;C-like&quot; manner on parsing tasks.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Theoretical Benefits </SectionTitle> <Paragraph position="0"> The principal benefits of this algorithm are that: 1. it reduces copying, as mentioned above. 2. it does not suffer from a problem that textbook algorithms suffer from when running under non-ISO-compatible Prologs (which is to say most of them). On such Prologs, asserted empty category edges that can match leftmost daughter descriptions of rules are not able to combine with the outputs of those rules.</Paragraph> <Paragraph position="1"> 3. keeping a copy of the chart on the heap allows for more sophisticated indexing strategies to apply to memoized categories that would otherwise be overwhelmed by the cost of copying an edge before matching it against an index. Indexing is also briefly considered in Section 8. Indexing is not the same thing as filtering (Torisawa and Tsuji, 1995), which extracts an approximation grammar to parse with first, in order to increase the likelihood of early unification failure. If the filter parse succeeds, the system then proceeds to perform the entire unification operation, as if the approximation had never been applied. Malouf et al. (2000) cite an improvement of 35-45% using a &quot;quickcheck&quot; algorithm that they appear to believe finds the optimal selection of a2 feature paths for quickchecking. It is in fact only a greedy approximation -- the optimization problem is exponential in the number of feature paths used for the check.</Paragraph> <Paragraph position="2"> Penn (1999) cites an improvement of 15-40% simply by re-ordering the sister features of only two types in the signature of the ALE HPSG grammar during normal unification.</Paragraph> <Paragraph position="3"> True indexing re-orders required operations without repeating them. Penn and Popescu (1997) build an automaton-based index for surface realization with large lexica, and suggest an extension to statistically trained decision trees. Ninomiya et al. (2002) take a more computationally brute-force approach to index very large databases of feature structures for some kind of information retrieval application. Neither of these is suitable for indexing chart edges during parsing, because the edges are discarded after every sentence, before the expense of building the index can be satisfactorily amortized. There is a fair amount of relevant work in the database and programming language communities, but many of the results are negative (Graf, 1996) -- very little time can be spent on constructing the index.</Paragraph> <Paragraph position="4"> A moment's thought reveals that the very notion of an active edge, tabulating the well-formed prefixes of rule right-hand-sides, presumes that copying is not a significant enough issue to merit the overhead of more specialized indexing. While the present paper proceeds from Carpenter's algorithm, in which no active edges are used, it will become clear from our evaluation that active edges or their equivalent within a more sophisticated indexing strategy are an issue that should be re-investigated now that the cost of copying can provably be reduced in Prolog.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 The Algorithm </SectionTitle> <Paragraph position="0"> In this section, it will be assumed that the phrase-structure grammar to be parsed with obeys the following property: Definition 1 An (extended) context-free grammar, a0 , is empty-first-daughter-closed (EFD-closed) iff, for every production rule, a1a3a2 a3a5a4a6a1a8a7a10a9a11a9a11a9a12a1a14a13 in a0 , a2a16a15 a5 and there are no empty productions (empty categories) derivable from non-terminal a1 a7 .</Paragraph> <Paragraph position="1"> The next section will show how to transform any phrase-structure grammar into an EFD-closed grammar. null This algorithm, like Carpenter's algorithm, proceeds breadth-first, right-to-left through the string, at each step applying the grammar rules depthfirst, matching daughter categories left-to-right. The first step is then to reverse the input string, and compute its length (performed by reverse count/5) and initialize the chart:</Paragraph> <Paragraph position="3"> Two copies of the chart are used in this presentation. One is represented by a term chart(E1,...,EL), where the a17 th argument holds the list of edges whose left node is a17 . Edges at the beginning of the chart (left node 0) do not need to be stored in this copy, nor do edges beginning at the end of the chart (specifically, empty categories with left node and right node Length). This will be called the term copy of the chart. The other copy is kept in a dynamic predicate, edge/3, as a textbook Prolog chart parser would. This will be called the asserted copy of the chart.</Paragraph> <Paragraph position="4"> Neither copy of the chart stores empty categories.</Paragraph> <Paragraph position="5"> These are assumed to be available in a separate predicate, empty cat/1. Since the grammar is EFDclosed, no grammar rule can produce a new empty category. Lexical items are assumed to be available in the predicate lex/2.</Paragraph> <Paragraph position="6"> The predicate, build/3, actually builds the chart:</Paragraph> <Paragraph position="8"> build([],_,_).</Paragraph> <Paragraph position="9"> The precondition upon each call to build(Ws,R,Chart) is that Chart contains the complete term copy of the non-loop edges of the parsing chart from node R to the end, while Ws contains the (reversed) input string from node R to the beginning. Each pass through the first clause of build/3 then decrements Right, and seeds the chart with every category for the lexical item that spans from R-1 to R. The predicate, add edge/4 actually adds the lexical edge to the asserted copy of the chart, and then closes the chart depth-first under rule applications in a failure-driven loop. When it has finished, if Ws is not empty (RMinus1 is not 0), then build/3 retracts all of the new edges from the asserted copy of the chart (with rebuild edges/2, described below) and adds them to the R-1st argument of the term copy before continuing to the next word.</Paragraph> <Paragraph position="10"> add edge/4matches non-leftmost daughter descriptions from either the term copy of the chart, thus eliminating the need for additional copying of non-empty edges, or from empty cat/1. Whenever it adds an edge, however, it adds it to the asserted copy of the chart. This is necessary because add edge/4 works in a failure-driven loop, and any edges added to the term copy of the chart would be removed during backtracking:</Paragraph> <Paragraph position="12"> match_rest(Dtrs,R,Chart,Mother,L).</Paragraph> <Paragraph position="13"> Note that we never need to be concerned with updating the term copy of the chart during the operation of add edge/4 because EFD-closure guarantees that all non-leftmost daughters must have left nodes strictly greater than the Left passed as the first argument to add edge/4.</Paragraph> <Paragraph position="14"> Moving new edges from the asserted copy to the term copy is straightforwardly achieved by re-</Paragraph> <Paragraph position="16"> The two copies required by this algorithm are thus: 1) copying a new edge to the asserted copy of the chart by add edge/4, and 2) copying new edges from the asserted copy of the chart to the term copy of the chart by rebuild edges/2. The asserted copy is only being used to protect the term copy from being unwound by backtracking.</Paragraph> <Paragraph position="17"> Asymptotically, this parsing algorithm has the same cubic complexity as standard chart parsers -only its memory consumption and copying behavior are different.</Paragraph> <Paragraph position="18"> 6 EFD-closure To convert an (extended) context-free grammar to one in which EFD-closure holds, we must partially evaluate those rules for which empty categories could be the first daughter over the available empty categories. If all daughters can be empty categories in some rule, then that rule may create new empty categories, over which rules must be partially evaluated again, and so on. The closure algorithm is presented in Figure 1 in pseudo-code and assumes the existence of six auxiliary lists: a0 Es-- a list of empty categories over which partial evaluation is to occur, a0 Rs -- a list of rules to be used in partial evaluation, null a0 NEs -- new empty categories, created by partial evaluation (when all daughters have matched empty categories), a0 NRs-- new rules, created by partial evaluation (consisting of a rule to the leftmost daughter of which an empty category has applied, with only its non-leftmost daughters remaining), a0 EAs -- an accumulator of empty categories already partially evaluated once on Rs, and a0 RAs-- an accumulator of rules already used in partial evaluation once on Es.</Paragraph> <Paragraph position="19"> Initialize Es to empty cats of grammar; initialize Rs to rules of input grammar; initialize the other four lists to []; loop: while Es =/= [] do for each E in Es do for each R in Rs do unify E with the leftmost unmatched category description of R; if it does not match, continue; if the leftmost category was rightmost (unary rule), then add the new empty category to NEs otherwise, add the new rule (with leftmost category marked as matched) to NRs;</Paragraph> <Paragraph position="21"> if NRs = [], then end: EAs are the closed empty cats, Rs are the closed rules</Paragraph> <Paragraph position="23"> Each pass through the while-loop attempts to match the empty categories in Es against the left-most daughter description of every rule in Rs. If new empty categories are created in the process (because some rule in Rs is unary and its daughter matches), they are also attempted -- EAs holds the others until they are done. Every time a rule's leftmost daughter matches an empty category, this effectively creates a new rule consisting only of the non-leftmost daughters of the old rule. In a unification-based setting, these non-leftmost daughters could also have some of their variables instantiated to information from the matching empty category. null If the while-loop terminates (see the next section), then the rules of Rs are stored in an accumulator, RAs, until the new rules, NRs, have had a chance to match their leftmost daughters against all of the empty categories that Rshas. Partial evaluation with NRs may create new empty categories that Rs have never seen and therefore must be applied to. This is taken care of within the while-loop when RAs are added back to Rs for second and subsequent passes through the loop.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 Termination Properties </SectionTitle> <Paragraph position="0"> The parsing algorithm itself always terminates because the leftmost daughter always consumes input.</Paragraph> <Paragraph position="1"> Off-line EFD-closure may not terminate when infinitely many new empty categories can be produced by the production rules.</Paragraph> <Paragraph position="2"> We say that an extended context-free grammar, by which classical CFGs as well as unification-based phrase-structure grammars are implied, is a0 -offlineparseable (a0 -OP) iff the empty string is not infinitely ambiguous in the grammar. Every a0 -OP grammar can be converted to a weakly equivalent grammar which has the EFD-closure property. The proof of this statement, which establishes the correctness of the algorithm, is omitted for brevity.</Paragraph> <Paragraph position="3"> EFD-closure bears some resemblance in its intentions to Greibach Normal Form, but: (1) it is far more conservative in the number of extra rules it must create; (2) it is linked directly to the derivable empty categories of the grammar, whereas GNF conversion proceeds from an already a0 -eliminated grammar (EFD-closure of any a0 -free grammar, in fact, is the grammar itself); (3) GNF is rather more difficult to define in the case of unification-based grammars than with classical CFGs, and in the one generalization we are aware of (Dymetman, 1992), EFD-closure is actually not guaranteed by it; and Dymetman's generalization only works for classically offline-parseable grammars.</Paragraph> <Paragraph position="4"> In the case of non-a0 -OP grammars, a standard bottom-up parser without EFD-closure would not terminate at run-time either. Our new algorithm is thus neither better nor worse than a textbook bottom-up parser with respect to termination. A remaining topic for consideration is the adaptation of this method to strategies with better termination properties than the pure bottom-up strategy.</Paragraph> </Section> class="xml-element"></Paper>