File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1521_metho.xml

Size: 12,189 bytes

Last Modified: 2025-10-06 14:10:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1521">
  <Title>Parsing TAG with Abstract Categorial Grammar</Title>
  <Section position="4" start_page="0" end_page="141" type="metho">
    <SectionTitle>
2 The linear l-calculus
</SectionTitle>
    <Paragraph position="0"> We begin by giving a brief definition of linear types and linear l-terms together with some stan- null dard notations. We assume that the reader is familiar with the usual notions related to l-calculus (bconversion, free variables, capture-avoiding substitutions. . . ); for more details about l-calculus, one may consult (Barendregt, 1984).</Paragraph>
    <Paragraph position="1"> Definition 1 The set of linear types, T , is the smallest set containing {[?]} and such that if a,b [?] T then (amultimapb) [?] T .</Paragraph>
    <Paragraph position="2"> Given a type (a1 multimap (***(an multimap [?])***)), we write it (a1,...,an)multimap[?].</Paragraph>
    <Paragraph position="3"> Definition 2 Given a infinite enumerable set of variables, X, and an alphabet S, we define the set of linear l-terms of type a [?] T , La, as the smallest set satisfying the following properties:  1. x [?] X = xa [?] La 2. t [?] La [?]xb [?] FV (t) = lxb.t [?] Lbmultimapa 3. a [?] S = a [?] L[?]multimap[?]</Paragraph>
    <Paragraph position="5"> In general, we write lx1 ...xn.t for lx1....lxn.t and we write t0t1 ...tn for (...(t0t1)...tn). Strings are represented by closed linear l-terms of type str = [?] multimap [?]. Given a string abcde, it is represented by the following linear l-term: ly[?].a(b(c(d(ey[?])))); /w/ represents the set of terms which are b-convertible to the l-term representing the string w. Concatenation is represented by</Paragraph>
    <Paragraph position="7"> will be written w1 + w2. The concatenation is moreover associative, we may thus write</Paragraph>
    <Paragraph position="9"> For the description of our algorithm, we rely on contexts: Definition 3 A context is a l-term with a hole. Contexts are defined by the following grammar:</Paragraph>
    <Paragraph position="11"> The insertion of a term within a context is done the obvious way. One has nevertheless to remark that when a term t is inserted in a context C[], the context C[] can bind variables free in t. For example, if C[] = lx.[] and t = x then C[t] = lx.x and x which was free in t is not free anymore in C[t].</Paragraph>
  </Section>
  <Section position="5" start_page="141" end_page="142" type="metho">
    <SectionTitle>
3 Indices as syntactic descriptions
</SectionTitle>
    <Paragraph position="0"> Usually the items of Earley algorithms use indices to represent positions in the input string. The algorithm we describe is a particular instance of a more general one which parses linear l-terms rather than strings. In that case, one cannot describe in a simple way positions by means of indices. Instead of indices, positions in a term t will be represented with zippers ((Huet, 1997)), i.e. a pair (C[],v) of a context and a term such that C[v] = t. Figure 1 explicits the correspondence between indices and zippers via an example.</Paragraph>
    <Paragraph position="1"> The items of Earley algorithms for TAGs use pairs of indices to describe portions of the input string. In our algorithm, this role is played by linear types built upon zippers; the parsing process can be seen as a type-checking process in a particular type system. We will not present this system here, but we will give a flavor of the meaning of those types called syntactic descriptions (Salvati, 2006). In order to represent the portion of a string between the indices i and j, we use the zippers (Ci[],vi) and (Cj[],vj) which respectively represent the position i and j in the string. The portion of string is represented by the syntactic description (Cj[],vj) multimap (Ci[],vi); this syntactic description can be used to type functions which take vj as argument and return vi as a result. For example, given the syntactic description: (lx.a(b(c[])),d(ex)) multimap (lx.a[],b(c(d(ex)))), it represents the set of functions that result in terms that are b-convertible to b(c(d(ex))) when they take d(ex) as an argument; this set is exactly /bc/. Our algorithm uses representations of string contexts with syntactic descriptions such as d = ((C1[],v1) multimap (C2[],v2)) multimap  (C3[],v3)multimap(C4[],v4) (in the following we write ((C1[],v1)multimap(C2[],v2),(C3[],v3))multimap(C4[],v4) for such syntactic descriptions). Assume that (C1[],v1) multimap (C2[],v2) represents /bc/ and that (C3[],v3) multimap (C4[],v4) represents /abcde/, then  d describes the terms which give a result in /abcde/ when they are applied to an element of /bc/. Thus, d describes the set of terms b-convertible to lfy.a(f(d(ey))), the set of terms representing the string context a[ ]de.</Paragraph>
    <Paragraph position="2"> Some of the syntactic descriptions we use may contain variables denoting non-specified syntactic descriptions that may be instanciated during parsing. In particular, the syntactic description variable F will always be used as a non-specified syn-</Paragraph>
    <Paragraph position="4"> tactic description representing strings (i.e. F may only be substituted by a syntactic description of the form (C1[],v1) multimap (C2[],v2)), such syntactic descriptions will represent the foot of an auxiliary tree. We will also use Y to represent a nonspecifed point in the input sentence (i.e. Y may only be substituted by syntactic descriptions of the form (C[],v)), such syntactic descriptions will represent the end of an elementary tree.</Paragraph>
    <Paragraph position="5"> As syntactic desccriptions are types for the linear l-calculus, we introduce the notion of typing context for syntactic descriptions.</Paragraph>
    <Paragraph position="6">  Definition 4 A typing context G (context for short), is a set of pairs of the form x : d where x is a variable and d is a syntactic description such that x : d [?] G and x : e [?] G iff d = e.</Paragraph>
    <Paragraph position="7"> If x : d [?] G, then we say that x is declared with type d in G.</Paragraph>
    <Paragraph position="8"> Typing contexts G must not be confused with contexts C[]. If a typing context G is the set</Paragraph>
    <Paragraph position="10"> contexts may declare at most two variables.</Paragraph>
  </Section>
  <Section position="6" start_page="142" end_page="142" type="metho">
    <SectionTitle>
4 Representing TAG with second order
ACGs
</SectionTitle>
    <Paragraph position="0"> We cannot give here a detailed definition of second order ACGs here. We therefore directly explain how to transform TAGs into lexical entries representing a second order ACG that can be directly used by the algorithm.</Paragraph>
    <Paragraph position="1"> We represent a TAG G by a set of lexical entries LG. Lexical entries are triples (G,t,a) where G is a typing context, t is a linear l-term and a is either Na, Ns or Na.1 if N is a non-terminal of the considered TAG. Without loss of generality, we consider that the adjunction at an interior node of an elementary tree is either mandatory or forbidden1. We adopt the convention of rep1We do not treat here the case of optional adjunction, but our method can be straightforwardly extended to cope with it, following ideas from (de Groote, 2002). It only modifies the way we encode a TAG with a set of lexical entries, the algorithm remains unchanged.</Paragraph>
    <Paragraph position="2"> resenting adjunction nodes labeled with N by the variable xstrmultimapstrNa , the substitution nodes labeled with N  |by the variable xstrNs, the foot node of an auxiliary tree labeled with N[?] by the variable fstrNa.1 and the variable y[?] will represent the end of strings. When necessary, in order to respect the linearity constraints of the l-terms, indices are used to distinguish those variables. This convention being settled, the type annotation on variables is not necessary anymore, thus we will write xNa, xNs, fNa.1 and y. To translate the TAG, we use the function ph defined by figure 2. Given an initial tree T whose root is labeled by N and t the normal form of ph(T), ( ,t,Ns)2 is the lexical entry associated to T; if T is an auxiliary tree whose root is labeled by N and t is the normal form of ph(T) then ( ,lfNa.1.t,Na)2 is the lexical entry associated to T. A TAG G is represented by LG the smallest set verifying:  1. if T is an elementary tree of G then the lexical entry associated to T is in LG.</Paragraph>
    <Paragraph position="3"> 2. if ( ,t,a) [?] LG, with a equals to Na or Ns, and t = C[xNat1t2] then (G,t1,Na.1) [?] LG  where G = fMa.1 : F if fMa.1 [?] FV (t1) otherwise G is the empty typing context.</Paragraph>
    <Paragraph position="4"> Given a term t such that xa [?] FV (t), and (G,tprime,a) [?] LG, then we say that t is rewritten as t[xa := tprime], t = t[xa := tprime]. Furthermore if xa is the leftmost variable we write t =l t[xa := tprime]. It is easy to check that if t [?]= tprime with FV (tprime) = [?], then t [?]=l tprime. A string w is generated by a LG whenever xSs [?]= t and t [?] /w/ (S being the start symbol of G). Straightforwardly, the set of strings generated by LG is exactly the language of G.</Paragraph>
  </Section>
  <Section position="7" start_page="142" end_page="144" type="metho">
    <SectionTitle>
5 The algorithm
</SectionTitle>
    <Paragraph position="0"> As we want to emphasize the fact that the algorithm we propose borrows much to type checking, we use sequents in the items the algorithm manipulates. Sequents are objects of the form G turnstileleft t : d</Paragraph>
    <Paragraph position="2"> where G is a typing context, t is a linear l-term, and d is a syntactic description.</Paragraph>
    <Paragraph position="3"> The algorithm uses two kinds of items; either items of the form (a;G turnstileleft t : d;L) (where L is a list of sequents, the subgoals, here L contains either zero or one element) or items of the form [Na.1;G;t;(C1[],v1) multimap (C2[],v2)]. All the possible instances of the items are given by figure 3. The algorithm is a recognizer but can easily be extended into a parser3. It fills iteratively a chart until a fixed-point is reached. Elements are added to the chart by means of inference rules given by figure 4, in a deductive parsing fashion (Shieber et al., 1995). Inference rules contain two parts: the first part is a set of premises which state conditions on elements that are already in the chart. The second part gives the new element to add to the chart if it is not already present. For the more general algorithm, the rules are not much more numerous as they can be abstracted into more general schemes.</Paragraph>
    <Paragraph position="4"> An item of the form (a;G1 turnstileleft t1 : d;G2 turnstileleft t2 :  (C1[],v1)) verifies: 1. (Gprime1,t1,a) [?] LG where Gprime1 = fNa.1 : F if G1 = fNa.1 : e or Gprime1 = G1 otherwise.</Paragraph>
    <Paragraph position="5"> 2. there is a context C[] such that t1 = C[t2] and  if d is of the form (d1,...,dn)multimap(C2[],v2) (n must be 1, or 2) then C[y] [?]=l tprime so that tprime is described by (C1[],v1)multimap(C2[],v2).</Paragraph>
    <Paragraph position="7"> or if d = ((C3[],v3) multimap (C4[],v4),Y ) multimap 3Actually, if it is extended into a parser, it will ouput the shared forest of the derivation trees; (de Groote, 2002) explains how to obtain the derived trees from the derivation trees in the framework of ACGs  (C2[],v2) and t1 = lfNa.1y.v then fNa.1 =l tprimeprime and tprimeprime is described by (C3[],v3) multimap (C4[],v4) An item of the form (a;G turnstileleft t : d;) verifies: 1. (Gprime,t,a) [?] LG where Gprime = fNa.1 : F if G = fNa.1 : e or Gprime = G otherwise 2. d does not contain non-specified syntactic descriptions4.</Paragraph>
    <Paragraph position="8"> 3. t [?]=l tprime and tprime is described by d (d may either represent a string context or a string).</Paragraph>
    <Paragraph position="9"> 4. if G = fNa.1 : (C3[],v3) multimap (C4[],v4) or if  the chart.</Paragraph>
    <Paragraph position="10"> An input ly.C[y] is recognized iff when the fixed-point is reached, the chart contains an item of the form (Ss; turnstileleft t : (ly.C[],y) multimap (ly.[],C[y]); ) (where S is the start symbol of the</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML