File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/p86-1012_metho.xml

Size: 10,021 bytes

Last Modified: 2025-10-06 14:11:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P86-1012">
  <Title>CATEGORIAL AND NON-CATEGORIAL LANGUAGES</Title>
  <Section position="3" start_page="75" end_page="75" type="metho">
    <SectionTitle>
I. NON-CONTEXT-FREE CATEGORIAL
LANGUAGES
</SectionTitle>
    <Paragraph position="0"> In this section we present a characterization theorem for the categorial systems that generate only context-free languages.</Paragraph>
    <Paragraph position="1"> First, we introduce a lexicon FEQ that we will show has the property that for any choice R of metarules any string in L(CGR) has equal numbers of a,b, and c.</Paragraph>
    <Paragraph position="2"> We define the lexicon FEQ as FEQ (a ) = {A },</Paragraph>
    <Paragraph position="4"> We will also make use of two languages on the alphabet {a,b,e,d, e} Ll={a&amp;quot;db &amp;quot;e c ~ In &gt;/1 },and LEQ = {w ! #a = #b = #c &gt;1 1,#d =#e = 1}.</Paragraph>
    <Paragraph position="5"> A lemma shows that with any set R of rules the lexicon FEQ yields a subset of LEQ.</Paragraph>
    <Paragraph position="6"> Lemma 1 Let G be -any categorial grammar, CGR(VT,VA,S,FEQ), where VT ={a,b,c,d,e},</Paragraph>
    <Paragraph position="8"> w = wl...w. be a corresponding morpheme string. To differentiate between the occurrence of a symbol as a head and otherwise, write C/A/C/B = CA -1C-1B-1' S /A /C /B = SA-1C-1B -1 and C /D = CD -1. For any rule system R, a redex is two adjacent categories, the tail of one matching the head of the other, and is reduced to a single category after cancelling the matching symbols. Since all occurrences of A must cancel to yield a reduction to S, #A = #A -1. This holds for all atomic categories except S, for which #S = #S-l+l.</Paragraph>
    <Paragraph position="9"> This lexicon has the property that any derivable category symbol, either has exactly one S and is Sheaded or does not have an occurrence of S. Hence in x, #S = 1, i.e., w has exactly one e. Let the number of occurrences in x of C/A/C/B and C/D be p and q respectively. \]t follows that #C = p +q, #C -1 = p +1. Hence q = 1 and w ha.~ exactly one d. Each occurrence of C/A/C/B introduces oneA-landB-1. Sincew has one e, #A-1 = #B-J = p +1. Hence #A = #B = p +1.</Paragraph>
    <Paragraph position="10"> Since for each A ,B and C in z there must be exactly onea,b and c,#a =#b =#c. \[\] We show next that in the restricted ease where R contains only the two rules FP and B s , the language L 1 is obtained.</Paragraph>
    <Paragraph position="11"> Lemma 2 Let CG R be the categorial grammar with lexicon FEQ and rule set R = {FP ,Bs }. Then</Paragraph>
    <Paragraph position="13"> any x having a parse must have exactly one e. Further, all b's and c's can appear only on the left and right of e respectively. Any derivable category having an A has the form S/(A/)&amp;quot; U where U does not have any A. Thus all A's appear consecutively on the left of the e. For the rightmost e,F(c) = C/D. A d must be in between a's and b's. By lemma 1, #(a)=#(b) =# (c). Thus x = a n db n ec&amp;quot; , for some n. Hence L 1 = L (CGR). \[\] The next lemma shows that no language intermediate to L1 and LEQ can be context-free. It really does not involve eategorial grammar at all.</Paragraph>
    <Paragraph position="14"> Lemma 3 If L 1C.L C-LEQ, then L is not context-free.</Paragraph>
    <Paragraph position="15"> Proof Suppose L is context-free. Since L contains L1, it has arbitrarily long strings of the form a '~ b db&amp;quot;e c&amp;quot;. Let k and K be pumping lemma constants. Choose n &gt;max(K,k). This string, if pumped, yields a string not in LEQ, hence we have a contradiction. \[\] Corollary Let {FP ,Bs }~R. Then there is a non-context-free language L ( CGR ).</Paragraph>
    <Paragraph position="16"> Proof Use the lexicon FEQ. Then by lemma 1 L(CGR)~LEQ. But{FP,Bs}~R,soLI~L(CGR). \[\] The following theorem summarizes the results by characterizing the rule sets that can be used to generate context sensitive languages.</Paragraph>
    <Paragraph position="17"> Main Theorem A categorial system with rule set R can generate a context-sensitive language if and only if R contains a partial combination rule and a combination rule in the reverse direction.</Paragraph>
    <Paragraph position="18"> Proof The &amp;quot;if&amp;quot; part follows for {FP,Bs }by lemmas 1, 2, and 3. It follows for {BP ,F } by symmetry. For the &amp;quot;only if&amp;quot; part, first note that any unidirectional system (system with rules that are all forward, or all backward) can generate only context-free languages. 5 The only remaining cases are {F ,B } and {FP ,BP 1. The first generates only context free languages. 5 The second generates only the empty language, since no atomic symbol can be derived using only these two rules.</Paragraph>
    <Paragraph position="19"> II. CATEGORIAL LANGUAGES ARE PERMUTA-</Paragraph>
  </Section>
  <Section position="4" start_page="75" end_page="76" type="metho">
    <SectionTitle>
TIONS OF CONTEXT-FREE LANGUAGES
</SectionTitle>
    <Paragraph position="0"> Let VT = {a l, a2 &amp;quot;-.,ak }. A Parikh mapping 6 v/is a mapping from morpheme strings to vectors such that x~(w) = (#al,#a2 ..... #a k). u is a permutation of v iff ~(u)=~(v). Let ~P(L~={W(w)IwEL}, A language L is a permutation of L iff ~(L ) = xC(L). We define a rotation as follows. In the parse tree for u E L, at any node corresponding to a B redex or BP-redex exchange its left and right subtrees, obtaining an F-redex or an FP-redex. Let v the resulting terminal string. We say that u has been transformed into v by rotation.</Paragraph>
    <Paragraph position="1"> We now obtain results that are helpful in showing that certain languages eannol be generated by. categorial grammars. First we show that, every categorial language is a permutation of a context free language. This will enable us to show that properties of context-free languages that depend only on the symbol counts must also hold of categorial languages.</Paragraph>
    <Paragraph position="2"> Theorem Let R c: {F, FP, B, BP}. Then there exists a LCF such that C/(L (CGR)) = C/(LcF), where LcF is context free.</Paragraph>
    <Paragraph position="3"> Proof Let x eL (CGR). In its parse tree at each node corresponding to a B-redex or a BP-redex perform a rotation, so that it becomes a F -redex or a FP -redex. Since the transformed string y is obtained by rearranging the parse tree, xt,(x)= ~(y ). Also y derivable using R I = {FP ,F } only. Hence the set of such y obtained as a permutation of some x is the same as L (CGRt), which is context free, 5 i.e., L ( CGR I) = LCF . \[\]  Corollary For any R ~ {F, FP, B, BP}, L (CGR) is semilinear , Parikh bounded and has the linear growth property.</Paragraph>
    <Paragraph position="4"> Semilinearity follows from Parikh's Lemma and linear growth from the pumping lemma for context-free languages. Parikh boundedness follows from the fact that any context-free language is Parikh bounded. 6 I-1 Proposition Any one--symbol categorial grammar is regular. null Note that if L is a semilinear subset of nonnegative integers, {a n In eL } is regular.</Paragraph>
    <Paragraph position="5"> III. NON-CATEGORIAL LANGUAGES We now exhibit some non-categorial languages and compare eategorial languages with others. From the corollary of the previous section we have the following results. Theorem Categorial languages are properly contained in the context-sensitive languages.</Paragraph>
    <Paragraph position="6"> Proof The languages {a h (n) \[ n &gt;/0 }, where h (n)=n 2 or h (n)=2&amp;quot; which do not have linear growth rate, are not generated by any CGR. These are context sensitive. Also{arab&amp;quot; I either m&gt;n ,grin is prime and n ~&lt;m and m is prime } is not semilinear 7 and hence not categorial.</Paragraph>
    <Paragraph position="7"> It is interesting to note that lexieal functional grammar can generate the first two languages mentioned above 8 and indexed languages can generate {a nbn2a ~' In&gt;tl}.</Paragraph>
    <Section position="1" start_page="76" end_page="76" type="sub_section">
      <SectionTitle>
Linguistic Properties
</SectionTitle>
      <Paragraph position="0"> We now look at some languages that exhibit cross-serial dependencies.</Paragraph>
      <Paragraph position="1"> Let G3 be the CGR with R ={FP,Bs},</Paragraph>
      <Paragraph position="3"> similar to that of lemma 1. First #c = #d = 1, from #S = 1. Since we have Bs rule, c occurs on the left of d and all occurrences of a and b on the left of c get assigned A and B respectively. Similarly all a and b on the right of c, get assigned to the complex category as defined by F. It follows that all symbols to the right of d get combined by FP rule and those on the left by Bs rule. Hence a symbol occurring n symbols to the right of d must be matched by an occurrence n symbols to the right of the left-most symbol.</Paragraph>
      <Paragraph position="4"> For any k, let G4(k) be the CGR with</Paragraph>
      <Paragraph position="6"> for any k. Note that #A i = #Ai -a. This implies #b i = #a i . The rest of the argument parallels that for L3 above . Thus {FP, Bs } has the power to express unbounded cross-serial dependencies.</Paragraph>
      <Paragraph position="7"> Now we can compare with Tree Adjoining Grammars (TAG). s A TAG without local constraints cannot generate L3. A TAG with local constraints can generate this, but it cannot generate L6 = {am b&amp;quot; c m d&amp;quot; \] m,n &gt;-1}. L4(2) can be transformed into L6 by the homomorphism erasing ca,d and e. TAG languages are closed under homomorphisms and thus the categorial language L4(2) is not a TAG language. TAG languages exhibit only limited cross serial dependencies. Thus, though TAG Languages and CG languages share some properties like linear growth, semilinearity, generation of all context-free languages, limited context sensitive power, and Parikh boundedness, they are different in their generative capacities.</Paragraph>
      <Paragraph position="8"> Acknowledgements We would like to thank Weiguo Wang and Dawei Dai for helpful discussions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML