File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/p94-1040_metho.xml
Size: 22,814 bytes
Last Modified: 2025-10-06 14:13:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P94-1040"> <Title>RELATING COMPLEXITY TO PRACTICAL PERFORMANCE IN PARSING WITH WIDE-COVERAGE UNIFICATION GRAMMARS</Title> <Section position="4" start_page="287" end_page="287" type="metho"> <SectionTitle> 2. THE PARSERS </SectionTitle> <Paragraph position="0"> The three parsers in this study are: a bottom-up left-corner parser, a (non-deterministic) LR parser, and an LR-like parser based on an algorithm devised by Schabes (1991). All three parsers accept grammars written in the ANLT formalism (Briscoe et al., 1987a), and the first two are distributed as part of the ANLT package. The parsers create parse forests (Tomita, 1987) that incorporate subtree sharing (in which identical sub-analyses are shared between differing superordinate analyses) and node packing (where sub-analyses covering the same portion of input whose root categories are in a subsumption relationship are merged into a single node).</Paragraph> </Section> <Section position="5" start_page="287" end_page="287" type="metho"> <SectionTitle> THE BOTTOM-UP LEFT-CORNER PARSER </SectionTitle> <Paragraph position="0"> The bottom-up left-corner (BU-LC) parser operates left-to-right and breadth-first, storing partial (active) constituents in a chart; Carroll (1993) gives a full description. Although pure bottom-up parsing is not usually thought of as providing high performance, the actual implementation achieves very good throughput (see section 4) due to a number of significant optimisations, amongst which are: * Efficient rule invocation from cheap (static) rule indexing, using discrimination trees keyed on the feature values in each rule's first daughter to interleave rule access with unification and also to share unification results across groups of rules.</Paragraph> <Paragraph position="1"> * Dynamic indexing of partial and complete constituents on category types to avoid attempting unification or subsumption operations which static analysis shows will always fail.</Paragraph> <Paragraph position="2"> * Dynamic storage minimisation, deferring structure copying--e.g, required by the unification operation or by constituent creation--until absolutely necessary (e.g. unification success or parse success, respectively).</Paragraph> <Paragraph position="3"> The optimisations improve throughput by a factor of more than three.</Paragraph> </Section> <Section position="6" start_page="287" end_page="287" type="metho"> <SectionTitle> THE NON-DETERMINISTIC LR PARSER </SectionTitle> <Paragraph position="0"> Briscoe & Carroll (1993) describe a methodology for constructing an LR parser for a unification-based grammar, in which a CF 'backbone' grammar is automatically constructed from the unification grammar, a parse table is constructed from the backbone grammar, and a parser is driven by the table and further controlled by unification of the 'residue' of features in the unification grammar that are not encoded in the backbone. In this parser, the LALR(1) technique (Aho, Sethi Ullman, 1986) is used, in conjunction with a graph-structured stack (Tomita, 1987), adapting for unification-based parsing Kipps' (1989) Tomita-like recogniser that achieves polynomial complexity on input length through caching.</Paragraph> <Paragraph position="1"> On each reduction the parser performs the unifications specified by the unification grammar version of the CF backbone rule being applied.</Paragraph> <Paragraph position="2"> This constitutes an on-line parsing algorithm. In the general case, the off-line variant (in which all unifications are deferred until the complete CF parse forest has been constructed) is not guaranteed to terminate; indeed, it usually does not do so with the ANLT grammar. However, a drawback to the on-line algorithm is that a variant of Kipps' caching cannot be used, since the cache must necessarily assume that all reductions at a given vertex with all rules with the same number of daughters build exactly the same constituent every time; in general this is not the case when the daughters are unification categories. A weaker kind of cache on partial analyses (and thus unification results) was found to be necessary in the implementation, though, to avoid duplication of unifications; this sped the parser up by a factor of about three, at little space cost.</Paragraph> </Section> <Section position="7" start_page="287" end_page="288" type="metho"> <SectionTitle> THE COMPILED-EARLEY PARSER </SectionTitle> <Paragraph position="0"> The Compiled-Earley (CE) parser is based on a predictive chart-based CF parsing algorithm devised by Schabes (1991) which is driven by a table compiling out the predictive component of Earley's (1970) parser. The size of the table is related linearly to the size of the grammar (unlike the LR technique). Schabes demonstrates that this parser always takes fewer steps than Earley's, although its time complexity is the same: O(n3). The space complexity is also cubic, since the parser uses Earley's representation of parse forests.</Paragraph> <Paragraph position="1"> The incorporation of unification into the CE parser follows the methodology developed for unification-based LR parsing described in the previous section: a table is computed from a CF 'backbone', and a parser, augmented with on-line unification and feature-based subsumption opera- null tions, is driven by the table. To allow meaningful comparison with the LR parser, the CE parser uses a one-word lookahead version of the table, constructed using a modified LALR technique (Carroll, 1993) 3 .</Paragraph> <Paragraph position="2"> To achieve the cubic time bound, the parser must be able to retrieve in unit time all items in the chart having a given state, and start and end position in the input string. However, the obvious array implenmntation, for say a ten word sentence with the ANLT grammar, would contain almost 500000 elements. For this reason, the implementation employs a sparse representation for the array, since only a small proportion of the elements are ever filled. In this parser, the same sort of duplication of ratifications occurs as in the LR parser, so lists of partial analyses are cached in the same way.</Paragraph> </Section> <Section position="8" start_page="288" end_page="288" type="metho"> <SectionTitle> 3. COMPLEXITIES OF THE PARSERS </SectionTitle> <Paragraph position="0"> The two wu'iables that determine a parser's coml)utational complexity are the grammar and the input string (Barton, Berwick &: Ristad, 1987).</Paragraph> <Paragraph position="1"> These are considered separately in the next two sections.</Paragraph> </Section> <Section position="9" start_page="288" end_page="288" type="metho"> <SectionTitle> GRAMMAR-DEPENDENT COMPLEXITY </SectionTitle> <Paragraph position="0"> The term dependent on tile grammar in the time complexity of the BU-LC unification-based parser described above is O(IC\[2\[RI3), where ICI is the number of categories implicit in the grammar, and \]RI, the number of rules. The space complexity is dominated by the size of the parse forest, O(\]C\[) (these results are proved by Carroll, 1993). For the ANLT grammar, in which features are nested to a maximum depth of two, ICI is finite but nevertheless extremely large (Briscoe et al., 1987b) 4. The grammar-dependent complexity of the LR parser makes it also appear intractable: Johnson (1989) shows that the number of LR(0) states for certain (pathological) grammars is exponentially related to the size of the grammar, and that there are some inputs which force an LR parser to visit all of these states in the course of a parse.</Paragraph> <Paragraph position="1"> aSchabes describes a table with no lookahead; the successful application of this technique supports Schabes' (1991:109) assertion that &quot;several other methods (such as LR(k)-like and SLR(k)-like) can also be used for constructing the parsing tables \[...\]&quot; aBarton, Berwick & Ristad (1987:221) calculate that GPSG, also with a maximum nesting depth of two, licences more than 10 rr5 distinct syntactic categories. The number of categories is actually infinite in grammars that use a fully recursive feature system.</Paragraph> <Paragraph position="2"> Thus the total number of operations performed, and also space consumed (by the vertices in the graph-structured stack), is an exponential function of the size of the grammar.</Paragraph> <Paragraph position="3"> To avoid this complexity, the CE parser employs a table construction method which ensures that the number of states in the parse table is linearly related to the size of the grammar, resulting in the number of operations performed by the parser being at worst a polynomial function of grammar size.</Paragraph> </Section> <Section position="10" start_page="288" end_page="289" type="metho"> <SectionTitle> INPUT-DEPENDENT COMPLEXITY </SectionTitle> <Paragraph position="0"> Although the complexity of returning all parses for a string is always related exponentially to its length (since the number of parses is exponential, and they must all at least be enumerated), the complexity of a parser is usually measured for the computation of a parse forest (unless extracting a single analysis from the forest is worse than linear) 5.</Paragraph> <Paragraph position="1"> If one of the features of the ANLT grammar formalism, the kleene operator (allowing indefinite repetition of rule daughters), is disallowed, then the complexity of the BU-LC parser with respect to the length of the input string is O(np+l), where p is the maximum number of daughters in a rule (Carroll, 1993). The inclusion of the operator increases the complexity to exponential. To retain the polynomial time bound, new rules can be introduced to produce recursive tree structures instead of an iterated fiat tree structure. However, when this technique is applied to the ANLT grammar the increased overheads in rule invocation and structure building actually slow the parser down.</Paragraph> <Paragraph position="2"> Although the time and space complexities of CF versions of the LR and CE parsers are O(n3), the unification versions of these parsers both turn out to have time bounds that are greater than cubic, in the general case. The CF versions implicitly pack identical sequences of sub-analyses, and in all reductions at a given point with rules with the same number of daughters, the packed sequences can be formed into higher-level constituents as they stand without further processing. However, in the unification versions, on each reduce action the daughters of the rule involved have to be unified with every possible alternative sequence of the sub-analyses that are being consumed by the rule 5This complexity measure does correspond to real world usage of a parser, since practical systems can usually afford to extract only a small number of parses from the frequently very large number encoded in a forest; this is often done on the basis of preference-based or probabilistic factors (e.g. Carroll & Briscoe, 1992).</Paragraph> <Paragraph position="3"> (in effect expanding and flattening out the packed sequences), leading to a bound of n p+I on the total number of unifications.</Paragraph> </Section> <Section position="11" start_page="289" end_page="289" type="metho"> <SectionTitle> 4. PRACTICAL RESULTS </SectionTitle> <Paragraph position="0"> To assess the practical performance of the three unification-based parsers described above, a series of experiments were conducted using the ANLT grammar (Grover, Carroll & Briscoe, 1993), a wide-coverage grammar of English. The grammar is defined in metagrammatical formalism which is compiled into a unification-based 'object gran~mar'--a syntactic variant of the Definite Clause Grammar formalism (Pereira & Warren, 1980)--containing 84 features and 782 phrase structure rules. Parsing uses fixed-arity term unification. The grammar provides full coverage of the following constructions: declarative sentences, imperatives and questions (yes/no, tag and wh-questions); all unbounded dependency types (topicalisation, relativisation, wh-questions); a relatively exhaustive treatment of verb and adjective complement types; phrasal and prepositional verbs of many complement types; passivisation; verb phrase extraposition; sentence and verb phrase modification; noun phrase complements and pre- and post-modification; partitives; coordination of all major category types; and nominal and adjectival comparatives.</Paragraph> <Paragraph position="1"> Although the grammar is linked to a lexicon containing definitions for 40000 base forms of words, the experiments draw on a much smaller lexicon of 600 words (consisting of closed class vocabulary and, for open-class vocabulary, definitions of just a sample of words which taken together exhibit the full range of possible complementation patterns), since issues of lexical coverage are of no concern here.</Paragraph> </Section> <Section position="12" start_page="289" end_page="290" type="metho"> <SectionTitle> COMPARING THE PARSERS </SectionTitle> <Paragraph position="0"> In the first experiment, the ANLT grammar was loaded and a set of sentences was input to each of the three parsers. In order to provide an independent basis for comparison, the same sentences were also input to the SRI Core Language Engine (CLE) parser (Moore & Alshawi, 1992) with the CLARE2.5 grammar (Alshawi et al., 1992), a state-of-the-art system accessible to the author.</Paragraph> <Paragraph position="1"> The sentences were taken from an initial sample of 175 representative sentences extracted from a corpus of approximately 1500 that form part of the ANLT package. This corpus, implicitly defining the types of construction the grammar is intended to cover, was written by the linguist who developed the ANLT grammar and is used to check for any adverse effects on coverage when the grammar is modified during grammar development. Of Sparc ELC workstation) and storage allocated (in megabytes) while parsing the 129 test sentences (1-12 words in length).</Paragraph> <Paragraph position="2"> the initial 175 sentences, the CLARE2.5 grammar failed to parse 42 (in several cases because punctuation is strictly required but is missing from the corpus). The ANLT grammar also failed to parse three of these, plus an additional four. These sentences were removed from the sample, leaving 129 (mean length 6.7 words) of which 47 were declarative sentences, 38 wh-questions and other sentences with gaps, 20 passives, and 24 sentences containing co-ordination.</Paragraph> <Paragraph position="3"> Table 1 shows the total parse times and storage allocated for the BU-LC parser, the LR parser, and the CE parser, all with ANLT grammar and lexicon. All three parsers have been implemented by the author to a similar high standard: similar implementation techniques are used in all the parsers, the parsers share the same unification module, run in the same Lisp environment, have been compiled with the same optimisation settings, and have all been profiled with the same tools and hand-optimised to a similar extent. (Thus any difference in performance of more than around 15% is likely to stem from algorithmic rather than implementational reasons). Both of the predictive parsers employ one symbol of lookahead, incorporated into the parsing tables by the LALR technique. Table 1 also shows the results for the CLE parser with the CLARE2.5 grammar and lexicon. The figures include garbage collection time, and phrasal (where appropriate) processing, but not parse forest unpacking. Both grammars give a total of around 280 analyses at a similar level of detail.</Paragraph> <Paragraph position="4"> The results show that the LR parser is approximately 35% faster than the BU-LC parser, and allocates about 30% less storage. The magnitude of the speed-up is less than might be expected, given the enthusiastic advocation of non-deterministic CF LR parsing for NL by some researchers (e.g. Tomita, 1987; Wright, Wrigley & Sharman, 1991), and in the light of improvements observed for predictive over pure bottom-up parsing (e.g. Moore & Dowding, 1991). However, on the assumption that incorrect prediction of gaps is the main avoidable source of performance degradation (c.f. Moore & Dowding), further investigation shows that the speed-up is near the maximum that is possible with the ANLT grammar (around 50%). The throughput of the CE parser is half that of the LR parser, and also less than that of the BU-LC parser. However, it is intermediate between the two in terms of storage allocated. Part of the difference in performance between it and the LR parser is due to the fact that it performs around 15% more unifications. This might be expected since the corresponding finite state automaton is not determinised--to avoid theoretical exponential time complexity on grammar size~ thus paying a price at run time. Additional reasons for the relatively poor performance of the CE parser are the overheads involved in maintaining a sparse representation of the chart, and the fact that with the ANLT grammar it generates less &quot;densely packed&quot; parse forests, since its parse table, with 14% more states (though fewer actions) than the LALR(1) table, encodes more contextual distinctions (Billot & Lang, 1989:146).</Paragraph> <Paragraph position="5"> Given that the ANLT and CLARE2.5 grammars have broadly similar (wide) coverage and return very similar numbers of syntactic analyses for the same inputs, the significantly better throughlint of the three parsers described in this paper ovcr the CLE parser 6 indicates that they do not contain any significant implementational deficiencies which would bias the results 7.</Paragraph> </Section> <Section position="13" start_page="290" end_page="290" type="metho"> <SectionTitle> SWAPPING THE GRAMMARS OVER </SectionTitle> <Paragraph position="0"> A second experiment was carried out with the CLE parser, in which the built-in grammar and lexicon were replaced by versions of the ANLT object grammar and lexical entries translated (automatically) into the CLE formalism. (The reverse of this configuration, in which the CLARE2.5 grammar is translated into the ANLT formalism, is not possible since some central rules contain sequences of daughters specified by a single 'list' variable, which has no counterpart in the ANLT and cannot directly be simulated). The through~Although the ANLT parser is implemented in Common Lisp and the CLE parser in Prolog, comparing parse times is a valid exercise since current compiler and run-time support technologies for both languages are quite well-developed, and in fact the CLE parser takes advantage of Prolog's built-in unification operation which will have been very tightly coded.</Paragraph> <Paragraph position="1"> 7The ANLT's speed advantage over CLARE is less pronounced if the time for morphological analysis and creation of logical forms is taken into account, probably because the systems use different processing techniques in these modules.</Paragraph> <Paragraph position="2"> put of this configuration was only one fiftieth of that of the BU-LC parser. The ANLT grammar contains more than five times as many rules than does the sentence-level portion of the CLARE2.5 grammar, and Alshawi (personal communication) points out that the CLE parser had not previously been run with a grammar containing such a large number of rules, in contrast to the ANLT parsers.</Paragraph> </Section> <Section position="14" start_page="290" end_page="291" type="metho"> <SectionTitle> THE EFFECT OF SENTENCE LENGTH </SectionTitle> <Paragraph position="0"> Although the mean sentence length in the first two experiments is much shorter than the 20-30 word length (depending on genre etc.) that is common in real texts, the test sentences cover a wide range of syntactic constructions and exhibit less constructional bias than would a set of sentences extracted at random from a single corpus. However, to investigate performance on longer sentences and the relationship between sentence length and parse time, a further set of 100 sentences with lengths distributed uniformly between 13 and 30 words was created by hand by the author and added to the previous test data. Table 2 shows the relationship between sentence length and mean parse time with the BU-LC and LR parsers.</Paragraph> <Paragraph position="1"> In contrast to the results from the first experiment, the throughput of the LR parser is only 4% better than that of the BU-LC parser for sentences of 13-27 words in length. The former parses many sentences up to twice as fast, but a small proportion of the others are parsed almost twice as slowly. As well as their wide variability with respect to the BU-LC parser, the absolute variability of the LR parse times is high (reflected in large standard deviations--a--see Table 2). Most of the sentences for which LR performance is worse contain more than one occurrence of the passive construction: due to their length this is particularly the case for the group of sentences of 28-30 words with which the LR parser performed particularly badly. However, it is likely that if the constraining power of the parse table were improved in this area the difference in throughput between LR and BU-LC would revert to nearer the 35% figure seen in the first experiment.</Paragraph> <Paragraph position="2"> The standard deviations for numbers of parses are also relatively large. The maximum number of parses was 2736 for one 29-word sentence, but on the other hand some of even the longest sentences had fewer than ten parses. (But note that since the time taken for parse forest unpacking is not included in parse times, the latter do not vary by such a large magnitude).</Paragraph> <Paragraph position="3"> The results of this experiment are displayed graphically in Figure 1, together with a quadratic function. Comparison with the function suggests numbers of parses for the 229 test sentences (1-30 words in length) with the BU-LC and LR parsers. that, at least for the BU-LC parser, parse time is related roughly quadratically to input length.</Paragraph> <Paragraph position="4"> In previous work with the ANLT (Briscoe & Carroll, 1993), throughput with raw corpus data was worse than that observed in these experiments, though probably only by a constant factor. This could be due to the fact that the vocabulary of the corpus concerned exhibits significantly higher lexical ambiguity; however, for sentences taken from a specific corpus, constructional bias observed in a training phase could be exploited to improve performance (e.g. Samuelsson &: Rayner, 1991).</Paragraph> </Section> class="xml-element"></Paper>