File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/00/a00-2036_ackno.xml
Size: 5,949 bytes
Last Modified: 2025-10-06 13:49:56
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-2036"> <Title>Left-To-Right Parsing and Bilexical Context-Free Grammars</Title> <Section position="8" start_page="276" end_page="279" type="ackno"> <SectionTitle> Acknowledgements </SectionTitle> <Paragraph position="0"> We would like to thank Jason Eisner and Mehryar Mohri for fruitful discussions. The first author is supported by the German Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the VERBMOBIL Project under Grant 01 IV 701 V0, and was employed at AT&T Shannon Laboratory during a part of the period this paper was written. The second author is supported by MURST under project PRIN.&quot; BioInformatica e Ricerca Genomica and by University of Padua, under project Sviluppo di Sistemi ad Addestramento Automatico per l'Analisi del Linguaggio Naturale.</Paragraph> <Paragraph position="1"> A Recognition in time independent of the lexicon In Section 5 we have shown that it is unlikely that correct-prefix property parsing for a bilexical CFG can be carried out in polynomial time and independently of the lexicon size, when only polynomial-time off-line compilation of the grammar is allowed. To complete our presentation, we show here that correct-prefix property parsing in time independent of the lexicon size is indeed possible if we spend exponential time on grammar precompilation.</Paragraph> <Paragraph position="2"> We first consider tabular LR parsing (Tomita, 1986), a technique which satisfies the correct-prefix property, and apply it to bilexical CFGs. Our presentation relies on definitions from (Nederhof and Satta, 1996). Let w E V~ be some input string. A property of LR parsing is that any state that can be reached after reading prefix w\[1,j\], j < \]w\], must be of the form goto(goto(. . . (goto( q~n, X1),...), Xm-1), Xm) where q~ is the initial LR state, and XI,..., X,~ are terminals or nonterminals such that XI'.'Xm o* w\[1, if. For a bilexical CFG, each X~ is of the form b~ or of the form B~\[b~\], where bl,..., bm is some subsequence of wIl,j \]. This means that there are at most (2+ IVDI)&quot; distinct states that can be reached by the recognizer, apart from qin. In the algorithm, the tabulation prevents repeated manipulation of states for a triple of input positions, leading to a time complexity of O(n 3 IvDIn), where n = Iwl. Hence, when we apply precompilation of the grammar, we can carry out recognition in time exponential in the length of the input string, yet independent of the lexicon size.</Paragraph> <Paragraph position="3"> Note however that the precompilation for LR parsing takes exponential time.</Paragraph> <Paragraph position="4"> The second algorithm with the CPP we will consider can be derived from Earley's algorithm (Earley, 1970). For this new recognizer, we achieve a time complexity completely independent of the size of the whole grammar, not merely independent of the size of the lexicon as in the case of tabular LR parsing. Furthermore, the input grammar can be any general CFG, not necessarily a bilexical one. In terms of the length of the input, the complexity is polynomial rather than exponential.</Paragraph> <Paragraph position="5"> Earley's algorithm is outlined in what follows, with minor modifications with respect to its original presentation. An item is an object of the form \[A -+ a ,, j3\], where A -~ a~ is a production from the grammar. The recognition algorithm consists in an incremental construction of a (n + 1) x (n + 1), 2-dimensional table T, where n is the length of the input string. At each stage, each entry T\[i,j\] in the table contains a set of items, which is initially the empty set. After an initial item is added to entry T\[0, 0\] in the table, other items in other entries are derived from it, directly or indirectly, using three steps called predictor, scanner and completer. When no more new items can be derived, the presence of a final item in entry T\[0, n\] indicates whether the input is recognized.</Paragraph> <Paragraph position="6"> The recognition process can be precompiled, based on the observation that for any grammar the set of all possible items is finite, and thereby all potential contents of T's entries can be enumerated.</Paragraph> <Paragraph position="7"> Furthermore, the dependence of entries on one another is not cyclic; one item in T\[i, j\] may be derived from a second item in the same entry, but it is not possible that, for example, an item in T\[i,j\] is derived from an item in T\[i',j'\], with (i,j) ~ (i',j'), which is in turn derived from an item in T\[i,j\].</Paragraph> <Paragraph position="8"> A consequence is that entries can be computed in a strict order, and an operation that involves the combination of, say, the items from two entries T\[i, j\] and T\[j, k\] by means of the completer step can be implemented by a simple table lookup. More precisely, each set of items is represented by an atomic state, and combining two sets of items according to the completer step is implemented by indexing a 2-dimensional array by the two states representing those two sets, yielding a third state representing the resulting set of items. Similarly, the scanner and predictor steps and the union operation on sets of items can all be implemented by table lookup.</Paragraph> <Paragraph position="9"> The time complexity of recognition can straight-forwardly be shown to be (9(n3), independent of the size of the grammar. However, massive pre-compilation is involved in enumerating all possible sets of items and precomputing the operations on them. The motivation for discussing this algorithm is therefore purely theoretical: it illustrates the unfavourable complexity properties that Theorem 2, together with the conjecture about quasideterminizers, attributes to the recognition problem if the correct-prefix property is to be ensured.</Paragraph> </Section> class="xml-element"></Paper>