File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/p94-1017_intro.xml

Size: 9,863 bytes

Last Modified: 2025-10-06 14:05:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1017">
  <Title>AN OPTIMAL TABULAR PARSING ALGORITHM</Title>
  <Section position="3" start_page="121" end_page="123" type="intro">
    <SectionTitle>
3. Add \[--* A\] to Tj,i for \[--+ o~\] * Tj,i
</SectionTitle>
    <Paragraph position="0"> where there are A ---* a, D --* A6 * pt  However, for certain i there may be many \[A ~ /9\] * Tj,c-1, for some j, and each may give rise to a different A' which is non-empty. In this way, Clause 1 may add several items \[A' --~ a\] to Tc-I,C, some possibly with overlapping sets A'. Since items represent computation of subderivations, the algorithm may therefore compute the same subderivation several times.</Paragraph>
    <Paragraph position="1"> In the resulting algorithm, no set Tc,j depends on any set Tg,h with g &lt; i. In \[15\] this fact is used to construct a parallel parser with n processors Po,..., Pn-1, with each Pi processing the sets Ti,j for all j &gt; i. The flow of data is strictly from right to left, i.e. items computed by Pc are only passed on to P0,..., Pc-1.</Paragraph>
    <Paragraph position="2"> Tabular ELR parsing The tabular form of ELR parsing allows an optimization which constitutes an interesting example of how a tabular algorithm can have a property not shared by its nondeterministic origin. 5 First note that we can compute the columns of a parse table strictly from left to right, that is, for fixed i we can compute all sets Tj,c before we compute the sets Tj,C-F1 * If we formulate a tabular ELR algorithm in a naive way analogously to Algorithm 5, as is done in \[5\], then for example the first clause is given by:  are applicable to tabular realisations of logical push-down automata, but not to these automata themselves.</Paragraph>
    <Paragraph position="3"> We propose an optimization which makes use of the fact that all possible items \[A --+/9\] * Tj,i-1 are already present when we compute items in Ti-l,i: we compute one single item \[A' -+ hi, where A' is a large set computed using all \[A --+ /9\] * Tj,i-1, for any j. A similar</Paragraph>
    <Paragraph position="5"> optimization can be made for the third clause.</Paragraph>
    <Paragraph position="6"> Algorithm 6 (Tabular extended LR) Sets T/j of the table are to be subsets of I ELR. Start with an empty table. Add \[{S'} ~ \] to T0,0. For i ---- 1,..., n, in this order, perform one of the following steps until no more items can be added.</Paragraph>
    <Paragraph position="7">  1. Add \[A' --. a\] to Ti-l# for a = ai where A' = {A I 3j3\[A --*/9\] * Tj,i-13A ----, ha, B ---* /9C0' * pt\[B * A A A Z* C\]} is non-empty 2. Add \[A' --* aa\] for a = ai and where A' = {A 3. Add \[A&amp;quot; --. A\] to Tj,i for \[A' --* a\]E Tj,i  where there is A --+ a E pt with A E A', and A&amp;quot; =</Paragraph>
    <Paragraph position="9"> Report recognition of the input if \[{S'} --* S\] * T0,,~.</Paragraph>
    <Paragraph position="10"> Informally, the top-down filtering in the first and third clauses is realised by investigating all left corners D of nonterminals C (i.e. D Z* C) which are expected  from a certain input position. For input position i these nonterminals D are given by</Paragraph>
    <Paragraph position="12"> tion of the i-th column of the table, the first and third clauses can be simplified to:  1. Add \[A' ~ a\] to Ti-l,i for a = a i where A' = {A \[ A --~ aa E pt} M Si-1 is non-empty 3. Add \[A&amp;quot; ---, A\] to Tj,i for \[A' --, ~\] E Tj,i  where there is A --, a E pt with A E A', and A&amp;quot; = {D \[ D ~ A5 E pt} N Sj is non-empty which may lead to more practical implementations. Note that we may have that the tabular ELR algorithm manipulates items of the form \[A --~ a\] which would not occur in any search path of the nondeterministic ELR algorithm, because in general such a A is the union of many sets A' of items \[A ~ --~ a\] which would be manipulated at the same input position by the nondeterministic algorithm in different search paths. With minor differences, the above tabular ELR algorithm is described in \[21\]. A tabular version of pseudo ELR parsing is presented in \[20\]. Some useful data structures for practical implementation of tabular and non-tabular PLR, ELR and CP parsing are described in \[S\], Finding an optimal tabular algorithm In \[14\] Schabes derives the LC algorithm from LR parsing similar to the way that ELR parsing can be derived from LR parsing. The LC algorithm is obtained by not only splitting up the goto function into goto 1 and goto 2 but also splitting up goto~ even further, so that it non-deterministically yields the closure of one single kernel item. (This idea was described earlier in \[5\], and more recently in \[10\].) Schabes then argues that the LC algorithm can be determinized (i.e. made more deterministic) by manipulating the goto functions. One application of this idea is to take a fixed grammar and choose different goto functions for different parts of the grammar, in order to tune the parser to the grammar.</Paragraph>
    <Paragraph position="13"> In this section we discuss a different application of this idea: we consider various goto functions which are global, i.e. which are the same for all parts of a grammar. One example is ELR parsing, as its goto~ function can be seen as a determinized version of the goto 2 function of LC parsing. In a similar way we obtain PLR parsing. Traditional LR parsing is obtained by taking the full determinization, i.e. by taking the normal goto function which is not split up. 6 6Schabes more or less also argues that LC itself can be obtained by determinizing TD parsing. (In lieu of TD parsing he mentions Earley's algorithm, which is its tabular realisation.) We conclude that we have a family consisting of LC, PLR, ELR, and LR parsing, which are increasingly deterministic. In general, the more deterministic an algorithm is, the more parser states it requires. For example, the LC algorithm requires a number of states (the items in I Lc) which is linear in the size of the grammar. By contrast, the LR algorithm requires a number of states (the sets of items) which is exponential in the size of the grammar \[2\].</Paragraph>
    <Paragraph position="14"> The differences in the number of states complicates the choice of a tabular algorithm as the one giving optimal behaviour for all grammars. If a grammar is very simple, then a sophisticated algorithm such as LR may allow completely deterministic parsing, which requires a linear number of entries to be added to the parse table, measured in the size of the grammar.</Paragraph>
    <Paragraph position="15"> If, on the other hand, the grammar is very ambiguous such that even LR parsing is very nondeterministic, then the tabular realisation may at worst add each state to each set Tij, so that the more states there are, the more work the parser needs to do. This favours simple algorithms such as LC over more sophisticated ones such as LR. Furthermore, if more than one state represents the same subderivation, then computation of that subderivation may be done more than once, which leads to parse forests (compact representations of collections of parse trees) which are not optimally dense \[1, 12, 7\]. Schabes proposes to tune a parser to a grammar, or in other words, to use a combination of parsing techniques in order to find an optimal parser for a certain grammar. 7 This idea has until now not been realised.</Paragraph>
    <Paragraph position="16"> However, when we try to find a single parsing algorithm which performs well for all grammars, then the tabular ELR algorithm we have presented may be a serious candidate, for the following reasons: * For M1 i, j, and a at most one item of the form \[A --, ct\] is added to Tij. Therefore, identical sub-derivations are not computed more than once. (This is a consequence of our optimization in Algorithm 6.) Note that this also holds for the tabular CP algorithm. null * ELR parsing guarantees the correct-prefix property, contrary to the CP algorithm. This prevents computation of all subderivations which are useless with regard to the already processed input.</Paragraph>
    <Paragraph position="17"> * ELR parsing is more deterministic than LC and PLR parsing, because it allows shared processing of all common prefixes. It is hard to imagine a practical parsing technique more deterministic than ELR parsing which also satisfies the previous two properties. In particular, we argue in \[8\] that refinement of the LR technique in such a way that the first property above holds whould require an impractically large number of LR states.</Paragraph>
    <Paragraph position="18">  Epsilon rules Epsilon rules cause two problems for bottom-up parsing. The first is non-termination for simple realisations of nondeterminism (such as backtrack parsing) caused by hidden left recursion \[7\]. The second problem occurs when we optimize TD filtering e.g. using the sets Si: it is no longer possible to completely construct a set Si before it is used, because the computation of a derivation deriving the empty string requires Si for TD filtering but at the same time its result causes new elements to be added to S~. Both problems can be overcome \[8\].</Paragraph>
    <Paragraph position="19"> Conclusions We have discussed a range of different parsing algorithms, which have their roots in compiler construction, expression parsing, and natural language processing.</Paragraph>
    <Paragraph position="20"> We have shown that these algorithms can be described in a common framework.</Paragraph>
    <Paragraph position="21"> We further discussed tabular realisations of these algorithms, and concluded that we have found an optimal algorithm, which in most cases leads to parse tables containing fewer entries than for other algorithms, but which avoids computing identical subderivations more than once.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML