File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1014_intro.xml

Size: 7,994 bytes

Last Modified: 2025-10-06 14:01:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1014">
  <Title>Fast LR Parsing Using Rich (Tree Adjoining) Grammars</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The LR approach for parsing has long been considered for natural language parsing (Lang, 1974; Tomita, 1985; Wright and Wrigley, 1991; Shieber, 1983; Pereira, 1985; Merlo, 1996), but it was not until a more recent past, with the advent of corpus-based techniques made possible by the availability of large treebanks, that parsing results and evaluation started being reported (Briscoe and Carroll, 1993; Inui et al., 1997; Carroll and Briscoe, 1996; Ruland, 2000).</Paragraph>
    <Paragraph position="1"> The appeal of LR parsing (Knuth, 1965) derives from its high capacity of postponement of structural decisions, therefore allowing for much of the spurious local ambiguity to be automatically discarded. But it is still the case that conflicts arise in the LR table for natural language grammars, and in large quantity. The key question is how one can use the contextual information contained in the parsing stack to cope with the remaining (local) ambiguity manifested as conflicts in the LR tables. The aforementioned work has concentrated on LR parsing for CFGs which has a clear deficiency in making available sufficient context in the LR states. (Shieber and Johnson, 1993) hints at the relevance of rich grammars on this respect. They use Tree Adjoining Grammars (TAGs) (Joshi and Schabes, 1997; Joshi et al., 1975) to defend the possibility of granular incremental computations in LR parsing. Incidentally or not, they make use of disambiguation contexts that are only possible in a state of a conceptual LR parser for a rich grammar formalism such as TAG, but not for a CFG.</Paragraph>
    <Paragraph position="2"> Concrete LR-like algorithms for TAGs have only recently been proposed (Prolo, 2000; Nederhof, 1998), though their evaluation was restricted to the quality of the parsing table (see also (Schabes and Vijay-Shanker, 1990; Kinyon, 1997) for earlier attempts). null In this paper, we revisit the LR parsing technique, applied to a rich grammar formalism: TAG. Following (Briscoe and Carroll, 1993), conflict resolution is based on contextual information extracted from the so called Instantaneous Description or Configuration: a stack, representing the control memory of the LR parser, and a lookahead sequence, here limited to one symbol.1 However, while Briscoe and Carroll invested on massive parallel computation of the possible parsing paths, with pruning and posterior ranking, we ex1Unlike (Wright and Wrigley, 1991)'s approach who tries to transpose PCFG probabilities to LR tables, facing difficulties which, to the best of our knowledge, have not been yet solved to content (cf. also (Ng and Tomita, 1991; Wright et al., 1991; Abney et al., 1999)).</Paragraph>
    <Paragraph position="3"> Association for Computational Linguistics.</Paragraph>
    <Paragraph position="4"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 103-110. Proceedings of the Conference on Empirical Methods in Natural periment with a simple greedy depth-first technique with limited amount of backtracking, that resembles to a certain extent the commitment/recovery models from the psycholinguistic research on human language processing, supported by the occurrence of &amp;quot;garden paths&amp;quot;.2 We use the Penn Treebank WSJ corpus, release 2 (Marcus et al., 1994), to evaluate the approach.</Paragraph>
    <Paragraph position="5"> 2 The architecture of the parser Table 1 shows the architecture of our parsing application. We extract a TAG from a piece of the Penn Treebank, the training corpus, and submit it to an LR parser generator. The same training corpus is used again to extract statistical information that is used by the driver as follows. The grammar generation process generates as a subproduct the TAG derivation trees for the annotated sentences compatible with the extracted grammar trees. This derivation tree is then converted into the sequence of LR parsing actions that would have to be used by the parser to generate exactly that analysis. A parser execution simulation is then performed, guided by the obtained sequence of parsing actions, collecting the statistical information defined in Section 3.</Paragraph>
    <Paragraph position="6"> In possession of the LR table, grammar and statistical information, the parser is then able to parse fast natural language sentences annotated for partsof-speech. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 The extracted grammar
</SectionTitle>
      <Paragraph position="0"> Our target grammar is extracted with a customized version of the extractor defined in (Xia, 2001), which we will not describe here. However, a key aspect to mention is that grammar trees are extracted by factoring of recursion. Even constituents annotated flat in the Treebank are first given a more hierarchical, recursive structure. Therefore the trees generated during parsing will be richer than those in the Treebank. We will return to this point later.</Paragraph>
      <Paragraph position="1"> Before grammar extraction, Treebank labels are merged to allow for the generation of a more compact grammar and parsing table, and to concentrate statistical information (e.g., NN and NNS; NNP and NNPS; all labels for finite verb forms). The gram2See, e.g., (Tanenhaus and Trueswell, 1995) for a survey on human sentence comprehension.</Paragraph>
      <Paragraph position="2">  mar extractor assigns a plausibility judgment to each extracted tree. When a tree is judged implausible, it is discarded from the grammar, and so are the sentences in the training corpus in which the tree is used. This reduced our training corpus by about 15 %.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 The LR parser generator
</SectionTitle>
      <Paragraph position="0"> We used the grammar generator in (Prolo, 2000). In this section we only present a parsing example, to illustrate the kinds of action inserted in the generated tables; details concerning how the table is generated are omitted. Consider the TAG fragment in Figure 2 for simple sentences with modal and adverb adjunction. Figure 3 contains a derivation for the sentence</Paragraph>
      <Paragraph position="2"> We sketch in Figure 4 the sequence of actions executed by the parser. Technically, each element of the stack would be a pair: the second element being b) derivation treea) derived tree  an LR state; the first can be either a grammar symbol or an embedded stack. Although the state is the only relevant component of the pair for parsing, in the figure, for presentational purposes, we omit the state and instead show only the symbol/embedded stack component (despite the misleading presence of embedded stacks, actions are executed in constant time). Stacks are surrounded by square brackets. Only the parts of speech have been represented. The bpack action is not standard in LR parsing. It represents an earlier partial commitment for structure. In its first appearance, it acknowledges that some material will be adjoined to a VP that dominates the element at the top of the stack (in fact it dominates the a0 topmost elements, where a0 is the second parameter of bpack). The material is then enclosed in a substack (the subscript VP at the left bracket is for presentation purposes only; that information is in fact in the LR state that would pair with the substack). The next line contains another bpack with a0a2a1a4a3 , that proposes another adjunction, dominating the VB and the RB. Reductions for auxiliary trees leave no visible trace in the stack after they are executed. The parser executes reductions in a bottom up order with respect to the derivation tree3</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML