File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0306_metho.xml
Size: 7,387 bytes
Last Modified: 2025-10-06 14:09:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0306"> <Title>An Efficient Algorithm to Induce Minimum Average Lookahead Grammars for Incremental LR Parsing</Title> <Section position="3" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 An example </SectionTitle> <Paragraph position="0"> We now walk through a simplified example so as to fix ideas and illustrate the operation of the algorithm. Table 1 shows a simple constraining grammar G C which we will use for this example.</Paragraph> <Paragraph position="1"> Now consider the small training corpus: 1. I did.</Paragraph> <Paragraph position="2"> 2. He went to Africa.</Paragraph> <Paragraph position="3"> 3. I bought a ticket.</Paragraph> <Paragraph position="5"> To begin with, find MAL parser considers sentence 1. In this particular case, chart parse(S</Paragraph> <Paragraph position="7"> finds only one valid parse. The GLR forest is built, giving the LR state transitions and parsing actions shown in Table 2, where each tuple (d',d,f,k) gives the state prior to the action, the state resulting from the action, the action, and the average lookahead. Here compute average lookahead determines that the average lookahead ^ k is 0. From this parse tree, incremental update LR accepts rules (1), (4), and (9) and updates the previously empty LR table T.</Paragraph> <Paragraph position="8"> Next, find MAL parser considers sentence 2. Here, chart parse(S ) finds two possible parses, leading to the LR state transitions and parsing actions shown in Table 3. This time, the average lookahead calculation is sensitive to the what was already entered into the LR table T during the previous step of processing sentence 1. For example, in the first parse, the fourth transition (4, 6, sh, 1) requires a lookahead of 1 in order to avoid a shift-reduce conflict with (4, 5, re4, 0) from sentence 1. The sixth transition (1, 9, re9, 2) requires a lookahead of 2. It turns out that the first parse has an average lookahead of 0.20,while the second parse has an average lookahead of 0.33. We thus prefer the first parse tree, calling incremental update LR to further update the LR table T using rules (3) and (7).</Paragraph> <Paragraph position="9"> Finally, find MAL parser considers sentence ) finds two possible parses, leading this time to the LR state transitions and parsing actions shown in Table 4. Various lookaheads are again needed to avoid conflicts with the existing rules in T. The first parse has an average lookahead of 0.22, and is selected in preference to the second parse which has an average lookahead of 0.33. From the first parse tree, incremental update LR accepts rules (2) and (10) to again update the LR table T.</Paragraph> <Paragraph position="10"> Thus the final output MAL grammar, requiring a lookahead of 1, is shown in Table 5.</Paragraph> <Paragraph position="11"> (0, 1, sh, 0) (1, 2, re9, 1) (2, 4, sh, 0) (4, 8, sh, 1) (8, 5, re6, 0) (5, 10, sh, 1) (10, 5, re11, 0) (5, 3, re1, 0) (3, acc) 3/9 (1) S - NP VP (2) VP - vNP (3) VP - vPP (4) VP - v (7) PP - pNP (9) NP - n (10) NP - det n</Paragraph> </Section> <Section position="4" start_page="3" end_page="3" type="metho"> <SectionTitle> 5 Complexity analysis </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.1 Time complexity </SectionTitle> <Paragraph position="0"> Since the algorithm executes each of its five main steps once for each sentence in the corpus, the time complexity of the algorithm is upper bounded by the sum of the time complexities of those five steps.</Paragraph> <Paragraph position="1"> Suppose n is the maximum length of any sentence in the corpus, and m is the number of rules in the grammar. Then for each of the five steps: the number of lookaheads needed by each parsing action is computed by comparing the parsing action with the MAL parsing action sequences for all previous sentences, the time complexity of this function depends on the maximum length of any sentence that has already been processed, which is bounded by n.</Paragraph> <Paragraph position="2"> The dynamic programming method used to locate the most economical parse in terms of average lookahead, described above, can be seen to be quadratic in n.</Paragraph> <Paragraph position="3"> parenrightbig . Note, however, that Tanaka et al. (1992) propose an enhancement that can reconstruct the parse trees in time linear to n;this is a direction for future improvement of our algorithm. null 5. incremental update LR is O (2 m ). As with ilalr, theoretically the worst time complexity is exponential in the number of rules in the existing grammar. However, various heuristics can be employed to make the algorithm quicker, and in practical experiments the algorithm is quite fast and precise in producing LR tables, particularly since m is very small relative to |S|.</Paragraph> <Paragraph position="4"> The time complexity of the algorithm for each</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 5.2 Space complexity </SectionTitle> <Paragraph position="0"> As with time complexity, an upper bound on the space complexity can be obtained from the five main steps: space usage of compute average lookahead directly corresponds to the dynamic programming structure, like the time complexity. 4. reconstruct MAL parse is O (n).Thisis bounded by the number of vertices in the graph-structured stack, which is is O (n). 5. incremental update LR is O (2 m ). As with time complexity, although the worst time complexity is exponential in the number of rules in the existing grammar, in practice this is not the major bottleneck.</Paragraph> <Paragraph position="1"> The space complexity of the algorithm is thus We have defined a new grammar learning task based on the concept of a minimum average lookahead (MAL) objective criterion. This approach provides an alternative direction for modeling of incremental parsing: it emphatically avoids increasing the amount of nondeterminism in the parsing models, as has been done in across a wide range of recent models, including probabilized dynamic programming parsers as well as GLR approaches. In contrast, the objective here is to learn completely deterministic parsers from unannotated corpora, with loose environmental guidance from nondeterministic constraining grammars.</Paragraph> <Paragraph position="2"> Within this context, we have presented a greedy algorithm for the difficult task of learning approximately MAL grammars for deterministic incremental LR(k) parsers, with a time complexity of ). This algorithm is efficient in practice, and thus enables a broad range of applications where degree of lookahead serves as a grammar induction bias.</Paragraph> <Paragraph position="3"> Numerous future directions are suggested by this model. One obvious line of work involves experiments varying the types of corpora as well as the numerous parameters within the MAL grammar learning algorithm, to test predictions against various modeling criteria. More efficient algorithms and heuristics could help further increase the applicability of the model. In addition, the accuracy of the model could be strengthened by reducing sensitivity to some of the approximating assumptions.</Paragraph> </Section> </Section> class="xml-element"></Paper>