File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1013_metho.xml

Size: 14,604 bytes

Last Modified: 2025-10-06 14:14:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1013">
  <Title>Parsing for Semidirectional Lambek Grammar is NP-Complete</Title>
  <Section position="3" start_page="95" end_page="99" type="metho">
    <SectionTitle>
2 Semidirectional Lambek Grammar
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="95" end_page="97" type="sub_section">
      <SectionTitle>
2.1 Lambek calculus
</SectionTitle>
      <Paragraph position="0"> The semidirectional Lambek calculus (henceforth SDL) is a variant of J. Lambek's original (Lambek 58) calculus of syntactic types. We start by defining the Lambek calculus and extend it to obtain SDL.</Paragraph>
      <Paragraph position="1"> Formulae (also called &amp;quot;syntactic types&amp;quot;) are built from a set of propositional variables (or &amp;quot;primitive types&amp;quot;) B = {bl, b2,...} and the three binary connectives * , \,/, called product, left implication, and right implication. We use generally capital letters A, B, C,... to denote formulae and capitals towards the end of the alphabet T, U, V, ... to denote sequences of formulae. The concatenation of sequences U and V is denoted by (U, V).</Paragraph>
      <Paragraph position="2"> The (usual) formal framework of these logics is a Gentzen-style sequent calculus. Sequents are pairs (U, A), written as U =~ A, where A is a type and U is a sequence of types. 5 The claim embodied by sequent U =~ A can be read as &amp;quot;formula A is derivable from the structured database U&amp;quot;. Figure 2 shows Lambek's original calculus t.</Paragraph>
      <Paragraph position="3"> First of all, since we don't need products to obtain our results and since they only complicate matters, we eliminate products from consideration in the sequel. null In Semidirectional Lambek Calculus we add as additional connective the \[_P implication --% but equip it only with a right rule.</Paragraph>
      <Paragraph position="4"> U, B, V :=~ A (-o R) if T = (U, Y) nonempty. T :~ B --o A 5In contrast to Linear Logic (Girard 87) the order of types in U is essential, since the structural rule of permutation is not assumed to hold. Moreover, the fact that only a single formula may appear on the right of ~, make the Lambek calculus an intuitionistic fragment of the multiplicative fragment of non-commutative propositional Linear Logic.</Paragraph>
      <Paragraph position="5">  Let us define the polarity of a subformula of a sequent A1, * *., Am ::~ A as follows: A has positive polarity, each of Ai have negative polarity and if B/C or C\B has polarity p, then B also has polarity p and C has the opposite polarity of p in the sequent. A consequence of only allowing the (-o R) rule, which is easily proved by induction, is that in any derivable sequent --o may only appear in positive polarity. Hence, -o may not occur in the (cut) formula A of a (Cut) application and any subformula B -o A which occurs somewhere in the prove must also occur in the final sequent. When we assume the final sequent's RHS to be primitive (or --o-less), then the (-o R) rule will be used exactly once for each (positively) occuring -o-subformula. In other words, (-o R) may only do what it is supposed to do: extraction, and we can directly read off the category assignment which extractions there will be.</Paragraph>
      <Paragraph position="6"> We can show Cut Elimination for this calculus by a straight-forward adaptation of the Cut elimination proof for L. We omit the proof for reasons of space.</Paragraph>
      <Paragraph position="7"> Proposition 1 (Cut Elimination) Each SDL-derivable sequent has a cut-free proof.</Paragraph>
      <Paragraph position="8"> The cut-free system enjoys, as usual for Lambek-like logics, the Subformula Property: in any proof only subformulae of the goal sequent may appear.</Paragraph>
      <Paragraph position="9"> In our considerations below we will make heavy use of the well-known count invariant for Lambek systems (Benthem 88), which is an expression of the resource-consciousness of these logics. Define #b(A) (the b-count of A), a function counting positive and negative occurrences of primitive type b in an arbi- null trary type A, to be if A= b if A primitive and A ~ b</Paragraph>
      <Paragraph position="11"> The invariant now states that for any primitive b, the b-count of the RHS and the LHS of any derivable sequent are the same. By noticing that this invariant is true for (Ax) and is preserved by the rules, we immediately can state: Proposition 2 (Count Invariant) If I-sb L U ==~ A, then #b(U) = #b(A) fo~ any b ~ t~.</Paragraph>
      <Paragraph position="12"> Let us in parallel to SDL consider the fragment of it in which (/R) and (\R) are disallowed. We call this fragment SDL-. Remarkable about this fragment is that any positive occurrence of an implication must be --o and any negative one must be / or \.</Paragraph>
    </Section>
    <Section position="2" start_page="97" end_page="98" type="sub_section">
      <SectionTitle>
2.2 Lambek Grammar
</SectionTitle>
      <Paragraph position="0"> Definition 3 We define a Lambek grammar to be a quadruple (E, ~r, bs, l) consisting of the finite alphabet of terminals E, the set jr of all Lambek formulae generated from some set of propositional variables which includes the distinguished variable s, and the lezical map l : ~, --* 2 7 which maps each terminal to a finite subset off.</Paragraph>
      <Paragraph position="1"> We extend the lexical map l to nonempty strings of terminals by setting l(wlw2...w~) := l(wl) x l(w~) x ... x l(w,) for wlw2...wn E ~+.</Paragraph>
      <Paragraph position="2"> The language generated by a Lambek grammar G = (~,~',bs,l) is defined as the set of all strings wlw~...wn E ~+ for which there exists a sequence x==~x x==~x B~, B2, C~, C2, c n+l, b n+l =&gt; y (*) B~, B2, C~, C2, c n, b n ~ c --o (b --o y) A2, B\[, B2, C~, C2, c n, b n =* x  of types U E l(wlw2...wn) and k k U ~ bs. We denote this language by L(G).</Paragraph>
      <Paragraph position="3"> An SDL-grammar is defined exactly like a Lambek grammar, except that kSD k replaces kl_.</Paragraph>
      <Paragraph position="4"> Given a grammar G and a string w = WlW2... wn, the parsing (or recognition) problem asks the question, whether w is in L(G).</Paragraph>
      <Paragraph position="5"> It is not immediately obvious, how the generative capacity of SDL-grammars relate to Lambek grammars or nondirectional Lambek grammars (based on calculus LP). Whereas Lambek grammars generate exactly the context-free languages (modulo the missing empty word) (Pentus 93), the latter generate all permutation closures of context-free languages (Benthem 88). This excludes many context-free or even regular languages, but includes some context-sensitive ones, e.g., the permutation closure of a n b n c n .</Paragraph>
      <Paragraph position="6"> Concerning SD\[, it is straightforward to show that all context-free languages can be generated by SDLgrammars* null Proposition 4 Every context-free language is generated by some SDL-grammar.</Paragraph>
      <Paragraph position="7"> Proof. We can use a the standard transformation of an arbitrary cfr. grammar G = (N, T, P, S) to a categorial grammar G'. Since -o does not appear in G' each SDl_-proof of a lexical assignment must be also an I_-proof, i.e. exactly the same strings are judged grammatical by SDL as are judged by L. D Note that since the {(Ax), (/L), (\L)} subset of I_ already accounts for the cfr. languages, this observation extends to SDL-.</Paragraph>
      <Paragraph position="8"> Moreover, some languages which are not context-free can also be generated.</Paragraph>
      <Paragraph position="9"> Example. Consider the following grammar G for the language anbnc n. We use primitive types B = {b, c, x, y, z} and define the lexical map for E =  {a, b, c} as follows:</Paragraph>
      <Paragraph position="11"> The distinguished primitive type is x* To simplify the argumentation, we abbreviate types as indicated above* Now, observe that a sequent U =~ x, where U is the image of some string over E, only then may have balanced primitive counts, if U contains exactly one occurrence of each of A2, B2 and C2 (accounting for the one supernumerary x and balanced y and z counts) and for some number n &gt;_ 0, n occurrences of each of A1, B1, and C1 (because, resource-oriented speaking, each Bi and Ci &amp;quot;consume&amp;quot; a b and c, resp., and each Ai &amp;quot;provides&amp;quot; a pair b, c). Hence, only strings containing the same number of a's, b's and c's may be produced. Furthermore, due to the Subformula Property we know that in a cut-free proof of U ~ x, the mMn formula in abstractions (right rules) may only be either c -o (b --o X) or b -o X, where X E {x,y}, since all other implication types have primitive antecedents. Hence, the LHS of any sequent in the proof must be a subsequence of U, with some additional b types and c types interspersed.</Paragraph>
      <Paragraph position="12"> But then it is easy to show that U can only be of the form Anl, A2, B~, B2, C~, C2, since any / connective in U needs to be introduced via (/L).</Paragraph>
      <Paragraph position="13"> It remains to be shown, that there is actually a proof for such a sequent* It is given in Figure 3.</Paragraph>
      <Paragraph position="14"> The sequent marked with * is easily seen to be derivable without abstractions.</Paragraph>
      <Paragraph position="15"> A remarkable point about SDL's ability to cover this language is that neither L nor LP can generate it. Hence, this example substantiates the claim made in (Moortgat 94) that the inferential capacity of mixed Lambek systems may be greater than the sum of its component parts. Moreover, the attentive reader will have noticed that our encoding also extends to languages having more groups of n symbols, i.e., to languages of the form n n n al a2 ... a k * Finally, we note in passing that for this grammar the rules (/R) and (\R) are irrelevant, i.e. that it is at the same time an SOL- grammar.</Paragraph>
    </Section>
    <Section position="3" start_page="98" end_page="99" type="sub_section">
      <SectionTitle>
3 NP-Completeness of the Parsing
Problem
</SectionTitle>
      <Paragraph position="0"> We show that the Parsing Problem for SDL-grammars is NP-complete by a reduction of the 3-Partition Problem to it. 6 This well-known NP-complete problem is cited in (GareyJohnson 79) as follows.</Paragraph>
      <Paragraph position="1"> Instance: Set ,4 of 3m elements, a bound N E Z +, and a size s(a) E Z + for each a E `4 such that ~ &lt; s(a) &lt; ~- and ~o~ s(a) = mN.</Paragraph>
      <Paragraph position="2"> Question: Can `4 be partitioned into m disjoint sets `41,`42,...,Am such that, for 1 &lt; i &lt; m, ~ae.a s(a) = N (note that each `4i must 'therefore contain exactly 3 elements from `4)? Comment: NP-complete in the strong sense. Here is our reduction. Let F = (`4, m,N,s) be a given 3-Partition instance. For notational convenience we abbreviate (...((A/BI)/B~)/...)/Bn by A/B~ *...* B2 * B1 and similarly B, -o (... (B1 --o A)...) by Bn *... * B2 * B1 --o A, but note that this is just an abbreviation in the product-free fragment. Moreover the notation A k stands for AoAo ...oA k t~mes We then define the SDL-grammar Gr = (~, ~, bs, l) as follows:</Paragraph>
      <Paragraph position="4"> kler 94) to show that derivability in the multiplicative fragment of propositional Linear Logic with only the connectives --o and @ (equivalently Lambek calculus with permutation LP) is NP-complete.</Paragraph>
      <Paragraph position="5">  The word we are interested in is v wl w2...w3m. We do not care about other words that might be generated by Gr. Our claim now is that a given 3-Partition problem F is solvable if and only if v wl ... w3m is in L(Gr). We consider each direction in turn.</Paragraph>
      <Paragraph position="6"> Lemma 5 (Soundness) If a 3-Partition problem F = (A,m,N,s) has a solution, then vwl...w3m is in/(Gr).</Paragraph>
      <Paragraph position="7"> Proof. We have to show, when given a solution to F, how to choose a type sequence U ~ l(vwl...wzm) and construct an SDL proof for U ==~ a. Suppose `4 = {al,a2,...,a3m}. From a given solution (set of triples) A1,`4~,... ,-Am we can compute in polynomial time a mapping k that sends the index of an element to the index of its solution triple, i.e., k(i) = j iff ai e `4j. To obtain the required sequence U, we simply choose for the wi terminals the type * cS(a3&amp;quot;~) * c ~(&amp;quot;~) (resp. d/bk(3m) k(3m) for W3m). did * bk(i) k(i) Hence the complete sequent to solve is: N d) a/(b 3 *b 3 *...*b3m ac N *c N *...*c m -o</Paragraph>
      <Paragraph position="9"> Let a/Bo, B1,...B3m ~ a be a shorthand for (*), and let X stand for the sequence of primitive types c~(,,~,.) c~(,~.,,-~) c~(,~,) bk(3m), k(3m),bk(3m-l), k(3,~_l),...bko), k(1)&amp;quot; Using rule (/L) only, we can obviously prove B1, . . . B3m , X ::~ d. Now, applying (--o R) 3m + N m times we can obtain B1,...B3m =~ B0, since there are in total, for each i, 3 bi and N ci in X. As final step we have BI,...B3m ~ B0 a ~ a a/Bo, BI,... B3m ~ a (/L) which completes the proof. \[\] Lemma 6 (Completeness) Let F = (.4, m, N, s) be an arbitrary 3-Partition problem and Gr the corresponding SDL-grammar as defined above. Then F has a solution, if v wl... w3m is in L(Gr).</Paragraph>
      <Paragraph position="10"> Proof. Let v wl... W3m 6 L(Gr) and N d), B1,. * * Bsm ~ a a/(b? .....em -o be a witnessing derivable sequent, i.e., for 1 &lt; i &lt; 3m, Bi E l(wi). Now, since the counts of this sequent must be balanced, the sequence B1,...B3m must contain for each 1 _&lt; j &lt; m exactly 3 bj and exactly N cj as subformulae. Therefore we can read off the solution to F from this sequent by including in Aj (for 1 &lt; j &lt; m) those three ai for which Bi has an occurrence of bj, say these are aj(1), aj(2) and aj(3). We verify, again via balancedness of the primitive counts, that s(aj(1)) / s(aj(2)) + s(aj(3)) = N holds, because these are the numbers of positive and negative occurrences of cj in the sequent. This completes the proof. \[\] The reduction above proves NP-hardness of the parsing problem. We need strong NP-completeness of 3-Partition here, since our reduction uses a unary encoding. Moreover, the parsing problem also lies within NP, since for a given grammar G proofs are linearly bound by the length of the string and hence, we can simply guess a proof and check it in polynomial time. Therefore we can state the following: Theorem 7 The parsing problem for SDI_ is NPcomplete. null Finally, we observe that for this reduction the rules (/R) and (\R) are again irrelevant and that we can extend this result to SDI_-.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML