File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/e93-1036_intro.xml

Size: 8,373 bytes

Last Modified: 2025-10-06 14:05:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1036">
  <Title>Generalized Left-Corner Parsing</Title>
  <Section position="2" start_page="0" end_page="305" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Generalized LR parsing was first described by Tomita \[Tomita, 1986; Tomita, 1987\]. It has been regarded as the most efficient parsing technique for context-free grammars. The technique has been adapted to other formalisms than context-free grammars in \[Tomita, 1988\].</Paragraph>
    <Paragraph position="1"> A useful property of generalized LR parsing (henceforth abbreviated to GLR parsing) is that input is parsed in polynomial time. To be exact, if the length of the right side of the longest rule is p, and if the length of the input is n, then the time complexity is O(nP+l). Theoretically, this may be worse *Supported by the Dutch Organization for Scientific Research (NWO), under grant 00-62-518 than the time complexity of Earley's algorithm \[Earley, 1970\], which is O(n3). For practical cases in natural language processing however, GLR parsing seems to give the best results.</Paragraph>
    <Paragraph position="2"> The polynomial time complexity is established by using a graph-structured stack, which is a generalization of the notion of parse stack, in which pointers are used to connect stack elements. If nondeterminism occurs, then the search paths are investigated simultaneously, where the initial part of the parse stack which is common to all search paths is represented only once. If two search paths share the state of the top elements of their imaginary individual parse stacks, then the top element is represented only once, so that any computation which thereupon pushes elements onto the stack is performed only once.</Paragraph>
    <Paragraph position="3"> Another useful property of GLR parsing is that the output is a concise representation of all possible parses, the so called parse forest, which can be seen as a generalization of the notion of parse tree.</Paragraph>
    <Paragraph position="4"> (By some authors, parse forests are more specifically called shared, shared-packed, or packed shared (parse) forests.) The parse forests produced by the Mgorithm can be represented using O(n p+I) space. Efficient decoration of parse forests with attribute values has been investigated in \[Dekkers et al., 1992\].</Paragraph>
    <Paragraph position="5"> There are however some drawbacks to GLR parsing. In order of decreasing importance, these are: * The parsing technique is based on the use of LR tables, which may be very large for grammars describing natural languages. 1 Related to this is the large amount of time needed to construct l\[Purdom, 1974\] argues that grammars for programruing languages require LR tables which have a size which is about linear in the size of the grammar. It is generally considered doubtful that similar observations can be made for grammars for natural languages.</Paragraph>
    <Paragraph position="6">  a parser. Incremental construction of parsers may in some cases alleviate this problem \[Rekers, 1992\].</Paragraph>
    <Paragraph position="7"> * The parse forests produced by the algorithm are not as compact as they might be. This is because packing of subtrees is guided by the merging of search paths due to equal LR states, instead of by the equality of the derived nonterminals. The solution presented in \[Rekers, 1992\] implies much computational overhead.</Paragraph>
    <Paragraph position="8"> * Adapting the technique to arbitrary grammars requires the generalization to cyclic graph-structured stacks \[Nozohoor-Farshi, 1991\], which may complicate the implementation.</Paragraph>
    <Paragraph position="9"> * A minor disadvantage is that the theoretical time complexity worsens if p becomes larger.</Paragraph>
    <Paragraph position="10"> The solution given in \[Kipps, 1991\] to obtain a variant of the parsing technique which has a fixed time complexity of O(n3), independent of p, implies an overhead in computation costs which worsens instead of improves the time complexity in practical cases.</Paragraph>
    <Paragraph position="11"> These disadvantages of generalized LR parsing are mainly consequences of the LR parsing technique, more than consequences of the use of graph-structured stacks and parse forests.</Paragraph>
    <Paragraph position="12"> Lang \[Lang, 1974; Lang, 1988c\] gives a general construction of deterministic parsing algorithms from nondeterministic push-down automata. The produced data structures have a strong similarity to parse forests, as argued in \[Billot and Lang, 1989; Lang, 1991\]. The general idea of Lang has been applied to other formalisms than context-free grammars in \[Lang, 1988a; Lang, 1988b; Lang, 1988d\]. The idea of a graph-structured stack, however, does not immediately follow from Lang's construction. Instead, Lang uses the abstract notion of a table to store information, without trying to find the best implementation for this table. 2 One of the parsing techniques which can with some minor difficulties be derived from the construction of Lang is generalized left-corner parsing (henceforth abbreviated to GLC parsing). 3 The starting-point is left-corner parsing, which was first formally defined in \[Rosenkrantz and Lewis II, 1970\]. Generalized left-corner parsing, albeit under a different name, has first been investigated in \[Pratt, 2\[Sikkel, 1990\] argues that the way in which the table is implemented (using a two-dimensional matrix as in case of Earley's algorithm or using a graph-structured stack) is only of secondary importance to the global behaviour of the parsing algorithm.</Paragraph>
    <Paragraph position="13"> 3The term &amp;quot;generalized left-corner parsing&amp;quot; has been used before in \[Demers, 1977\] for a different parsing technique. Demers generalizes &amp;quot;left corner of a right side&amp;quot; to be a prefix of a right side which does not necessarily consist of one member, whereas we generalize LG parsing with zero lookahead to grammars which are not LC(0).</Paragraph>
    <Paragraph position="14"> 1975\]. (See also \[Tanaka et al., 1979; Bear, 1983; Sikkel and Op den ikker, 1992\].) In \[Shann, 1991\] it was shown that the parsing technique can be a serious rival to generalized LR parsing with regard to the time complexities. (Other papers discussing the time complexity of GLC parsing are \[Slocum, 1981; Wir~n, 1987\].) A functional variant of GLC parsing for definite clause grammars has been discussed in \[Matsumoto and Sugimura, 1987\]. This algorithm does not achieve a polynomial time complexity however, because no &amp;quot;packing&amp;quot; takes place.</Paragraph>
    <Paragraph position="15"> A variant of Earley's algorithm discussed in \[Leiss, 1990\] also is very similar to GLC parsing although the top-down nature of Earley's algorithm is preserved. null GLC parsin~ has been rediscovered a number of times (e.g. in \[Leermakers, 1989; Leermakers, 1992\], \[Schabes, 1991\], and \[Perlin, 1991\]), but without any mention of the connection with LC parsing, which made the presentations unnecessarily difficult to understand. This also prevented discovery of a number of optimizations which are obvious from the view-point of left-corner parsing.</Paragraph>
    <Paragraph position="16"> In this paper we reinvestigate GLC parsing in combination with graph-structured stacks and parse forests. It is shown that this parsing technique is not subject to the four disadvantages of the algorithm of Tomita.</Paragraph>
    <Paragraph position="17"> The structure of this paper is as follows. In Section 2 we explain nondeterministic LC parsing. This parsing algorithm is the starting-point of Section 3, which shows how a deterministic algorithm can be defined which uses a graph-structured stack and produces parse forests. Section 4 discusses how this generalized LC parsing algorithm can be adapted to arbitrary context-free grammars.</Paragraph>
    <Paragraph position="18"> How the algorithm can be improved to operate in cubic time is shown in Section 5. The improved algorithm produces parse forests in a non-standard representation, which requires only cubic space. One more class of optimizations is discussed in Section 6. Preliminary results with an implementation of our algorithm are discussed in Section 7.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML