File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1012_intro.xml
Size: 7,299 bytes
Last Modified: 2025-10-06 14:06:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1012"> <Title>Another Facet of LIG Parsing</Title> <Section position="2" start_page="0" end_page="87" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The class of mildly context-sensitive languages can be described by several equivalent grammar types.</Paragraph> <Paragraph position="1"> Among these types we can notably cite tree adjoining grammars (TAGs) and linear indexed grammars (LIGs). In (Vijay-Shanker and Weir, 1994) TAGs are transformed into equivalent LIGs. Though context-sensitive linguistic phenomena seem to be more naturally expressed in TAG formalism, from a computational point of view, many authors think that LIGs play a central role and therefore the understanding of LIGs and LIG parsing is of importance. For example, quoted from (Schabes and Shieber, 1994) &quot;The LIG version of TAG can be used for recognition and parsing. Because the LIG formalism is based on augmented rewriting, the parsing algorithms can be much simpler to understand and easier to modify, and no loss of generality is incurred&quot;. In (Vijay-Shanker and Weir, 1993) LIGs are used to express the derivations of a sentence in TAGs. In (Vijay-Shanker, Weir and Rainbow, 1995) the approach used for parsing a new formalism, the D-Tree Grammars (DTG), is to translate a DTG into a Linear Prioritized Multiset Grammar which is similar to a LIG but uses multisets in place of stacks.</Paragraph> <Paragraph position="2"> LIGs can be seen as usual context-free grammars (CFGs) upon which constraints are imposed. These constraints are expressed by stacks of symbols associated with non-terminals. We study parsing of LIGs, our goal being to define a structure that verifies the LIG constraints and codes all (and exclusively) parse trees deriving sentences.</Paragraph> <Paragraph position="3"> Since derivations in LIGs are constrained CF derivations, we can think of a scheme where the CF derivations for a given input are expressed by a shared forest from which individual parse trees which do not satisfied the LIG constraints are erased. Unhappily this view is too simplistic, since the erasing of individual trees whose parts can be shared with other valid trees can only be performed after some unfolding (unsharing) that can produced a forest whose size is exponential or even unbounded. In (Vijay-Shanker and Weir, 1993), the context-freeness of adjunction in TAGs is captured by giving a CFG to represent the set of all possible derivation sequences. In this paper we study a new parsing scheme for LIGs based upon similar principles and which, on the other side, emphasizes as (Lang, 1991) and (Lang, 1994), the use of grammars (shared forest) to represent parse trees and is an extension of our previous work (Boullier, 1995).</Paragraph> <Paragraph position="4"> This previous paper describes a recognition algorithm for LIGs, but not a parser. For a LIG and an input string, all valid parse trees are actually coded into the CF shared parse forest used by this recognizer, but, on some parse trees of this forest, the checking of the LIG constraints can possibly failed.</Paragraph> <Paragraph position="5"> At first sight, there are two conceivable ways to extend this recognizer into a parser: 1. only &quot;good&quot; trees are kept; 2. the LIG constraints are Ire-\]checked while the extraction of valid trees is performed.</Paragraph> <Paragraph position="6"> As explained above, the first solution can produce an unbounded number of trees. The second solution is also uncomfortable since it necessitates the reevaluation on each tree of the LIG conditions and, doing so, we move away from the usual idea that individual parse trees can be extracted by a simple walk through a structure.</Paragraph> <Paragraph position="7"> In this paper, we advocate a third way which will use (see section 4), the same basic material as the one used in (Boullier, 1995). For a given LIG L and an input string x, we exhibit a non ambiguous CFG whose sentences are all possible valid derivation sequences in L which lead to x. We show that this CFG can be constructed in (.9(n 6) time and that individual parses can be extracted in time linear with the size of the extracted tree.</Paragraph> <Section position="1" start_page="87" end_page="87" type="sub_section"> <SectionTitle> 2 Derivation Grammar and CF Parse Forest </SectionTitle> <Paragraph position="0"> In a CFG G = (VN, VT, P, S), the derives relation is the set {(aBa',aj3a') I B --~ j3 e P A V = G VN U VT A a, a ~ E V*}. A derivation is a sequence of strings in V* s.t. the relation derives holds between any two consecutive strings. In a rightmost derivation, at each step, the rightmost non-terminal say B is replaced by the right-hand side (RHS) of a B-production. Equivalently if a0 ~ ... ~ an is G G a rightmost derivation where the relation symbol is overlined by the production used at each step, we say that rl ... rn is a rightmost ao/a~-derivation.</Paragraph> <Paragraph position="1"> For a CFG G, the set of its rightmost S/xderivations, where x E E(G), can itself be defined by a grammar.</Paragraph> <Paragraph position="2"> Definition 1 Let G = (VN,VT,P,S) be a CFG, its rightmost derivation grammar is the CFG D = (VN, P, pD, S) where pD _~ {A0 --~ A1... Aqr I r ---Ao --+ woAlwl.., wq_lAqwq E P Awi E V~ A Aj E LFrom the natural bijection between P and pD, we can easily prove that</Paragraph> <Paragraph position="4"> rl ... rn is a rightmost S/x-derivation in G~ This shows that the rightmost derivation language of a CFG is also CF. We will show in section 4 that a similar result holds for LIGs.</Paragraph> <Paragraph position="5"> Following (Lang, 1994), CF parsing is the intersection of a CFG and a finite-state automaton (FSA) which models the input string x 2. The result of this intersection is a CFG G x -- (V~, V~, px, ISIS) called a shared parse forest which is a specialization of the initial CFG G = (V~, VT, P, S) to x. Each produc-J E px, is the production ri E P up to some tion r i non-terminal renaming. The non-terminal symbols in V~ are triples denoted \[A\]~ where A E VN, and p and q are states. When such a non-terminal is productive, \[A\] q :~ w, we have q E 5(p, w). G ~ If we build the rightmost derivation grammar associated with a shared parse forest, and we remove all its useless symbols, we get a reduced CFG say D ~ . The CF recognition problem for (G, x) is equivalent to the existence of an \[S\]~-production in D x. Moreover, each rightmost S/x-derivation in G is (the reverse of) a sentence in E(D*). However, this result is not very interesting since individual parse trees can be as easily extracted directly from the parse forest. This is due to the fact that in the CF case, a tree that is derived (a parse tree) contains all the information about its derivation (the sequence of rewritings used) and therefore there is no need to distinguish between these two notions. Though this is not always the case with non CF formalisms, we will see in the next sections that a similar approach, when applied to LIGs, leads to a shared parse forest which is a LIG while it is possible to define a derivation grammar which is CF.</Paragraph> </Section> </Section> class="xml-element"></Paper>