File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/e93-1042_evalu.xml

Size: 13,000 bytes

Last Modified: 2025-10-06 14:00:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1042">
  <Title>NEW FRONTIERS BEYOND CONTEXT-FREENESS: DI-GRAMMARS AND DI-AUTOMATA.</Title>
  <Section position="4" start_page="358" end_page="361" type="evalu">
    <SectionTitle>
2 Dl=Grammars and DI-languages
</SectionTitle>
    <Paragraph position="0"> A DI-grammar is a 5-tupel G = (N,T,F,P,S), where N,T,S are as usual, F is a alphabet of indices, P is a set of rules of the following form</Paragraph>
    <Paragraph position="2"> The relation &amp;quot;= &gt;&amp;quot; or &amp;quot;directly derives&amp;quot; is defined as follows:</Paragraph>
    <Paragraph position="4"> and for B i e T: index i = ~ O.e. the empty word) (o~a) The reflexive and transitive closure *=&gt; of =&gt; is defined as usual.</Paragraph>
    <Paragraph position="5"> Replacing (*) by&amp;quot;mdex i = index for Bie N, index i = for B i e T&amp;quot;, changes the above definition into a definition of Aho's well known indexed grammars. How index-percolation differs in indexed and Di-grammars is illustrated in (4).</Paragraph>
    <Paragraph position="7"> m(y) is the mirror image of y and D 1 is the Dyck language generated by the following CFG G k (DI=L(Gk)), G k = ({S},{\[,I},R k, S), where R k = {S -~ \[S\], S ~ SS, S -~ ~} (5.4) L 4 = {ak; k = n n, n.~&gt;l}; (L 4 is not an indexed language, s. Takeshi Hayashi (1973)).</Paragraph>
    <Paragraph position="8"> By definition (see above), the intersection of the class of indexed languages and the class of DI-languages includes the context-free (err) languages. The inclusion is proper, since the (non-cfr) language L 1 is generated by G 1 = ({S,A,B}, {a,b,c}, {f,g}, R 1, S), where R 1 = {S -+ aAfc, A --, aAgc, A --, B, Bg --, bB, Bf -+ b}, and G 1 obviously is both a DI-gratmnax and and an indexed  grammar,-Like cfr. languages and unlike indexed languages, DI-languages have the constant growth property (i.e.</Paragraph>
    <Paragraph position="9"> for every DI-grammar G there exists a keN, s.th. for every weL(G), s.th. \[wl&gt;k, there exists a sequence w 1  (--w), w2,w3,...(wi*L(G)) , such that Iwnl &lt; IWn+ll &lt; (n+l)xlwl for every member w n of the sequence). Hence L2, and afortiori L4, is not a DI-language. But L2 is an indexed language, since it is generated by the indexed grammar G 2 =({S,A,D}, {a}, {f,g}, R 2, S), where R 2</Paragraph>
    <Paragraph position="11"/>
    <Section position="1" start_page="359" end_page="360" type="sub_section">
      <SectionTitle>
2.1 DI-Grammars and Indexed Grammars
</SectionTitle>
      <Paragraph position="0"> Considering the well known generative strength of indexed grammars, it is by no means obvious that L 3 is not an indexed language. In view of the complexity of the proof that L3 is not indexed, only some important points can be indicated - referring to the 3 main parts of every word x * L 3 by Xl, \[Xm\],Xr,aS illustrated in the example (6):</Paragraph>
      <Paragraph position="2"> Assume that there is a indexed grammar GI=  (N,T,F,P,S) such that L3=L(GI): 1. Since G I can not be contextfree, it follows from the  intercalation (or &amp;quot;pumping&amp;quot;) lemma for indexed grammars proved by Takeshi Hayashi in (Hayashi, 1973) that there exists for G I an integer k such that for any x  The interdependently extendible parts of x Sl...s n, tn...t 1, t l'...tn', rnrn', and Sn'... s 1&amp;quot;, can not all be subwords of the central component \[Xm\] of x (or all be subwords of the peripheral components XlXr), else, \[Xm\] (or XlXr) could be increased and decreased independently of the peripheral components x 1 and x r (or of \[Xm\], respectively) of x, contradicting the assumption that x * L 3. Rather, the structure of x necessitates that .Sl...s n and Sn'...s 1' be subwords of XlX r and that the &amp;quot;pumped&amp;quot; index (f# 3 n be discharged deriving the central component \[Xm\]. Thus, we know that for every/&gt;0 there exists an index IX * F +, a x * L3, and a subword \[Xm&amp;quot; \] of the central part \[Xm\] of x such that \[Xm'\]&gt;l and M~t=*=&gt;\[Xm&amp;quot; \] (M=B or the nonterminal of a descendant of A(f# 3nfo). To simplify our exposition we write Ix m'\] instead of \[Xm\] and have (7) MIx =*=&gt; \[Xm\] with the structure of x I and x r being encoded and stored in the index IX.</Paragraph>
      <Paragraph position="3"> 2. The balanced parentheses of \[Xm\] can not be encoded in the index Ix in (7) in such a manner that \[Xm\] is a homomorphic image of Ix. For the set I={Ix'; S=*=&gt;XlMIx'x r =*=&gt;Xl\[Xm\]X r *L 3 } of all indices which satisfy (7) is regular (or of Type 3), but because of the Dyek-strueture of \[Xm\] , LM={\[Xm\];Xl\[Xm\]Xr*L3} is not regular but essentially context-free or of Type 2. 3. In the derivation underlying (7) essential use of branching rules of the type A--~B1B2...B k (k_&gt;.2) has to be made in the sense that the effect of the rules can not be simulated by linear rules. Else the central part \[Xm\] could only have linear and not trans-linear Dyck-structure. Without branching rules the required trans-linear parenthetical structure could only be generated by the use of additional index-introducing rules in (7), in order to &amp;quot;store&amp;quot; and coordinate parentheses, which, however, would destroy the dependence of \[x m\] from x I and</Paragraph>
      <Paragraph position="5"> where k=2 n, wie {a,b} + for l&lt;i_&lt;2n; m(w i) is the mirror image of wf i.e the central part \[Xm\] of such a word contains 2n+l'l pairs of parentheses, as shown in (9) for n=3: (9) \[\[\[\[wsl\[w7l\]\[\[w6l\[w5lll\[\[\[w4l\[w3ll\[\[w2\]\[Wl\]\]\]\] According to our assumption, G I generates all words having the form (8). Referring to the derivation in (7), consider a path from MIx to any of the parenthesized parts w i of \[Xm\] in (8). (Ignoring for expositional purposes the possibility of &amp;quot;storing&amp;quot; (a constant amount of) parentheses in nonterminal nodes,) because of 2. and 3.</Paragraph>
      <Paragraph position="6"> an injective mapping can be defined from the set of pairs of parentheses containing at least two other (and because of the structure of (8) disjunct) pairs of parentheses into the set of branching nodes with (at least) two nonterminal daughters. Call a node in the range of the mapping a P-Node. Assuming without loss of generality that each node has at most two nonterminal daughters, there are 2n-1 such P-nodes in the subtree rooted in MIx and yielding the parenthesized part \[Xm\] of (8). Furthermore, every path from MIx to the root W i of the subtree yielding \[wi\] contains exactly n P-nodes ( where 2n=-k in (8)).</Paragraph>
      <Paragraph position="7"> Call an index-symbol finside the index-stack ix a w iindex if f is discharged into a terminal constituting a parenthesized w i in (8) (or equivalently, if f encodes a symbol of the peripheral Xl..Xr).</Paragraph>
      <Paragraph position="8"> Let ft be the first (or leflmos0 wi-index from above in the index-stack Ix, and let w t be the subword of \[Xm\] containing the terminal into which ft is discharged, i.e all other wi-indices in Ix are only accessible after ft has been consumed. Thus, for Ix=alto we get from (7) Mafto-=+=&gt;uBt \[v.t fto\]v=+=&gt;utWt \[~fto\]vt and Wt\[ffto\]ffi+=&gt;wt The path Pt from Mix to w t contains n B-nodes, for k=2 n in (8). For every B-node Bj (0_&lt;j&lt;n) of Pt we obtain because of the index-multiplication effected by noterminal branching:</Paragraph>
      <Paragraph position="10"> Every path Pj branching off from Pt at Bj\[xjfto \] leads to a word wj derived exclusively by discharging wi-indices situated in Ix below (or on the right side of) ft.</Paragraph>
      <Paragraph position="11"> Consequently, ft has to be deleted on every such path Pj, before the appropriate indices become accessible, i.e. we get for every j with 0&lt; j&lt;n: ajE'jft&amp;quot;\] = &gt;ujRjt j t,,Jyj =* = &gt; yjqtfto\] , (Bj,Rj,Cj eN,xj,o F*,ft Thus, for n&gt;lN\[ in (8) (INI the cardinality of the non-terminal alphabet N of G I, ignoring, as before the constant amount of parenthesis-storing in nonterminals) because of \[{Cj;0&lt;j&lt;n}l=n the node-label Cj\[fto \] occurs twice on two different paths branching off from Pt, i.e.</Paragraph>
      <Paragraph position="12"> there exist p, q (0_&lt;p&lt;q&lt;n) such that:</Paragraph>
      <Paragraph position="14"> I.e. G I generates words w&amp;quot; =Xl&amp;quot;\[Xm&amp;quot;\]Xr&amp;quot;, the central part of which contain a duplication (of &amp;quot;z&amp;quot; in \[Xm&amp;quot;\]=ylzY2zy 3) without correspondence in Xl&amp;quot; or Xr&amp;quot;, thus contradicting the general form of words of L 3.</Paragraph>
      <Paragraph position="15"> Hence L 3 is not indexed.</Paragraph>
    </Section>
    <Section position="2" start_page="360" end_page="361" type="sub_section">
      <SectionTitle>
2.2 DI-Grammars and Linear Indexed Grammars I
</SectionTitle>
      <Paragraph position="0"> As already mentioned above, Gazdar in (Gazdar, 1988) introduced and discussed a grammar formalism, afterwards (e.g. in (Weir and Joshi, 1988)) called linear indexed granunars (LIG's), using index stacks in which only one nonterminai on the right-hand-side of a rule can inherit the stack from the left-hand-side, i.e. the rules of a LIG G=(N,T, F, P, S) with N,F,T,S as above, are of the Form  i. A\[..\] ~A1U...Ai\[..\]..~I n ii. A\[..\] -~AI\[\]...Ai\[f..\]...A n iii. A \[f..\]-~A 1 \[\]...Ai\[..\]...An iv. All ~a whereA 1,...,AneN, feF, and aeT~{e}. The &amp;quot;derives&amp;quot;relation =&gt; is defined as follows o(A \[fl.. fn\] ~=&gt;o~ i H...A \[fl.. fn\]..-~n\[\] ~ if A\[..\] -~A l\[\]...Ai\[..\]..,4n~P  IThanks to the anonymous referees for suggestions for this section and the next one.</Paragraph>
      <Paragraph position="1">  if A\[\] --+aeP =*---&gt; is the reflexive and transitive closure of =&gt;, and L(G)={w; weT* &amp; S\[\]=*=&gt;w}.</Paragraph>
      <Paragraph position="2"> Gazdar has shown that LIGs are a (proper) subclass of indexed grammars. Joshi, Vijay-Shanker, and Weir (Joshi, VijaydegShanker, and Weir, 1989; Weir and Joshi, 1988) have shown that LIGs, Combinatory Categorial Grammars (CCG), Tree Adjoinig Grammars (TAGs), and Head Grammars (HGs) are weakly equivalent. Thus, ff an inclusion relation can be shown to hold between DI-languages (DIL) and LILs, it simultaneously holds between the DIL-class and all members of the family.</Paragraph>
      <Paragraph position="3"> To simulate the restriction on stack transmission in a LIG GI=(N1,T , FI, P1, S1) the following construction of a DI-grammar G d suggests itself: Let G d =(N, T, F, P, S) where N-{S}={X'; XeN1},</Paragraph>
      <Paragraph position="5"> It follows by induction on the number of derivation steps that for X'eN, X~NI, tt'~F*, tt~Fi*, and w ~T* (10) X'tt'#=*o=&gt;w if and only if X\[~t\]=*Gl=&gt;w where X'=h(X) and ~t'=h(10 (h is the homomorphism from (NIwFI)* into (NuF)* with h(Z)=Z'). For the nontrivial part of the induction, note that A'#~t&amp;quot; can not be terminated in G.</Paragraph>
      <Paragraph position="6"> Together with S=&gt;S 1 &amp;quot;# (I0) yields L(GI)=L(G ). The inclusion of the LIG-class in the DI-class is proper, since L 3 above is not a LIG-language, or to give a more simple example: Lw= {analnla2nlbln2b2n2b n \[ n = nl + n2} is according to (Vijay-Shanker, Weir and Joshi, 1987) not in TAL, hence not in LIL. But (the indexed langauge) L w is generated by the DI-Grammar Gw=({ S,A,B },{a,b,al,a2,b 1,b2 },{ S--~aSIb, S--~AB,Af---~ a 1Aa2,Bf--~b 1Bb2,Af--+ala2,Bf-+b lb2,A-+e,B&amp;quot;+e},S).</Paragraph>
    </Section>
    <Section position="3" start_page="361" end_page="361" type="sub_section">
      <SectionTitle>
2.3 Generalized Composition and Combinatory
Categorial Grammars
</SectionTitle>
      <Paragraph position="0"> The relation of DI-granunars to Steedman's Combinatory Categorial Grammars with Generalized Composition (GC-CCG for short) in the sense of (Weir and Joshi, 1988) is not so easy to determine. If for each n~_&gt;l composition rules of the form</Paragraph>
      <Paragraph position="2"> are permitted, the generative power of the resulting grammars is known to be stronger than TAGs (Weir and Joshi, 1988).</Paragraph>
      <Paragraph position="3"> Now, the GC-CCG given by f(~)={#} f(al)={SDU#, SDG#, #/X/#,#/X~#} f(a)={A,XkA} fCol)={S/Y/#, S/Y~#, #/YI#,#/~#} f(b)={B, YxB} f(D ={K} f(\])={#/#kK, ~#~Z} generates a language Lc, which when intersected with the regular set { a,b}+{ \[,\],a 1,b 1 }+{a,b} + yields a language Lp which is for similar reasons as L 3 not even an indexed language. But Lp does not seem to be a DI-language either. Hence, since indexed languages and DI-languages are closed under intersection with regular sets., L c is neither an indexed nor (so it appears) a DI-language.</Paragraph>
      <Paragraph position="4"> The problem of a comparison of DI-grammars and GC-CCGs is that, inspite of all appearances, the combination of generalized forward and backward composition can not directly simulate nor be simulated by index-distribution, at least so it seems.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML