File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/p94-1040_evalu.xml

Size: 4,336 bytes

Last Modified: 2025-10-06 14:00:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1040">
  <Title>RELATING COMPLEXITY TO PRACTICAL PERFORMANCE IN PARSING WITH WIDE-COVERAGE UNIFICATION GRAMMARS</Title>
  <Section position="15" start_page="291" end_page="292" type="evalu">
    <SectionTitle>
5. DISCUSSION
</SectionTitle>
    <Paragraph position="0"> All three of the parsers have theoretical worst-case complexities that are either exponential, or polynomial on grammar size but with an extremely large multiplier. Despite this, in the practical experiments reported in the previous section the parsers achieve relatively good throughput with a general-purpose wide-coverage grammar of a natural language. It therefore seems likely that grammars of the type considered in this paper (i.e. with relatively detailed phrase structure components, but comparatively simple from a unification perspective), although realistic, do not bring the parsing algorithms involved anywhere near the worst-case complexity.</Paragraph>
    <Paragraph position="1"> In the experiments, the CE technique results in a parser with worse performance than the normal LR technique. Indeed, for the ANLT grammar, the number of states--the term that the CE technique reduces from exponential to linear on the grammar size---is actually smaller in the standard LALR(1) table. This suggests that, when considering the complexity of parsers, the issue of parse table size is of minor importance for realistic NL grammars (as long as an implementation represents the table compactly), and that improvements to complexity results with respect to grammar size, although interesting from a theoretical standpoint, may have little practical relevance for the processing of natural language.</Paragraph>
    <Paragraph position="2"> Although Schabes (1991:107) claims that the problem of exponential grammar complexity &amp;quot;is particularly acute for natural language processing since in this context the input length is typically small (10-20 words) and the grammar size very large (hundreds or thousands of rules and symbols)&amp;quot;, the experiments indicate that, with a wide-coverage NL grammar, inputs of this length can be parsed quite quickly; however, longer inputs (of more than about 30 words in length)--which occur relatively frequently in written text--are a problem. Unless grammar size takes on proportionately much more significance for such louger inputs, which seems implausible, it appears that in fact the major problems do not lie in the area of grammar size, but in input length.</Paragraph>
    <Paragraph position="3"> All three parsers have worst-case complexities that are exponential on input length. This theoretical bound might suggest that parsing performance would be severely degraded on long sentences; however, the relationship between length of sentence and parse tinm with the ANLT grammar and the sentences tested appears to be approximately only quadratic. There are probably many reasons why performance is lnuch better than the complexity results suggest, but the most important may be that: * kleene star is used only in a very limited context (for the analysis of coordination), * more than 90% of the rules in the grammar have no more than two daughters, and * very few rules license both left and right recursion (for instance of the sort that is typically used to analyse noun compounding, i.e.</Paragraph>
    <Paragraph position="4">  the BU-LC and LR parsers. A quadratic function is also displayed. N --&gt; N N).</Paragraph>
    <Paragraph position="5"> Despite little apparent theoretical difference between the CLE and ANLT grammar formalisms, and the fact that no explicit or formal process of 'tuning' parsers and grammars to perform well with each other has been carried out in either of the ANLT or CLARE systems, the results of the exl)eriment comparing the performance of the respective parsers using the ANLT grammar suggests that the parallel development of the software and grammars that has occurred nevertheless appears to have caused this to happen automatically. It therefore seems likely that implementational decisions and optimisations based on subtle properties of specific grammars can, and may very often be, more important than worst-case complexity when considering the practical performance of parsing algorithms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML