File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/j88-4001_metho.xml
Size: 39,442 bytes
Last Modified: 2025-10-06 14:12:11
<?xml version="1.0" standalone="yes"?> <Paper uid="J88-4001"> <Title>LFP: A LOGIC FOR LINGUISTIC DESCRIPTIONS AND AN ANALYSIS OF ITS COMPLEXITY</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 CLFP GRAMMARS&quot; GRAMMARS BASED ON CONCATENATION THEORY 2.1 SYNTAX OF CLFP </SectionTitle> <Paragraph position="0"> We present a standard version of the first-order theory of concatenation, augmented with the least-fixed-point operator. Before proceeding with the formal description, we give an example to illustrate the scheme we have in mind. Consider the following context-free fragment, adapted directly from Johnson (1985).</Paragraph> <Paragraph position="2"> Here is the corresponding CLFP fragment:</Paragraph> <Paragraph position="4"> In this formulation, x,y, and z range over strings of symbols (morphemes) and NP,VP, etc. are predicates over strings. The second clause is here an abbreviation for two clauses, where case can take two values, namely + Gen and -Gen. At present we do not treat the problem of calculating complex feature structures, but there seems to be no reason that the notation cannot be suitably extended.</Paragraph> <Paragraph position="5"> This; example illustrates the most complex case of a CLFP formula. It is a recursion scheme, which assigns to predicate variables, S,NP, etc. certain formulas (the right-hand sides of the corresponding clauses in the definition). The whole scheme is understood as the simultaneous recursive definition of the predicate variables in the left sides of the definition. To handle the fact that string variables occur on the left-hand side of each clause, we will understand each clause as a function assigning both the formula on its right and the set of individual variables mentioned on the left to the given predicate symbol.</Paragraph> <Paragraph position="6"> We now proceed with the formal definition of CLFP.</Paragraph> <Paragraph position="7"> Let Ivar be a set {Xo,X i .... } of individual variables ranging over strings. Let ~ be a finite set of terminal symbols. These are the constants of our theory. A is another constant denoting the null string. Terms are built from variables and constants using the binary operation of concatenation. We also require a set Pvar of predicate variables, listed as the set {PI,PE .... }.</Paragraph> <Paragraph position="8"> Each predicate variable P is equipped with an arity ar(P), indicating the number of individual arguments that a relation assigned to this variable will have. (The example CLFP scheme given above employs only unary predicate variables S,NP,VP, and Det.) The set of CLFP formulas is given by the following inductive clauses.</Paragraph> <Paragraph position="9"> 1. If P ~ Pvar and (x I ..... x n) is a sequence of Ivar with length n = ar(P) then P(x~ ..... xn) is in CLFP; 2. If t~ and t a are terms, then t I = t a is in CLFP; 3. If x E Ivar and ~b is in CLFP then 3x~b and Vx~b are in CLFP; 4. The usual Boolean combinations of CLFP formulas are in CLFP.</Paragraph> <Paragraph position="10"> 5. This clause requires more definitions. Let fi be a finite nonempty subset of Pvar with a distinguished element S ~ fit. Let (I) : fi --) ~'(Ivar) x CLFP. ((I)(R) is going to be the defining clause for the predicate R.)</Paragraph> <Paragraph position="12"> thus be a finite set of individual variables. Now we say that the whole map (I) is a recursion scheme iff each P E fi occurs only positively in cI)(R) for any R E fi; that is, within the scope of an even number of negation signs. Finally, condition 5 states that if (I) is a recursion scheme, with distinguished variable S, then/zS~ (the least fixed point of ~) is in CLFP.</Paragraph> <Paragraph position="13"> Example 1. Consider the following scheme, which defines a regular language.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Computational Linguistics, Volume 14, Number 4, December 1988 </SectionTitle> <Paragraph position="0"> William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity</Paragraph> <Paragraph position="2"> Similarly, BOP(T) = {v}, and COP(T) is the second formula in the scheme.</Paragraph> <Paragraph position="3"> In the example, we have written our recursion scheme in a conventional style to emphasize its direct connection to the usual grammatical presentations. Thus the variable x is bound by the left-hand side of (1), so this clause has been written with S(x) on the left to make this fact apparent. Also, the use of the <:C/, sign is conventional in writing out OP. In our example, the first clause is taken as defining the distinguished predicate S of our scheme. Finally, there are no occurrences of free predicate variables in this example, but there are in our first example (e.g., noun).</Paragraph> <Paragraph position="4"> The usual rules for calculating free individual variables apply; if Fvar(qb) is the set of free variables of ~b, then Flvar(P(xl ..... xn)) = {xl ..... x,,}. The quantifier and Boolean cases are handled as in standard text presentations. However, if OP is a recursion scheme then Flvar(/zSOP) will be calculated as follows. For each R fit, find Fvar(COP(R)). Remove from this set any variables in BOP(R). The union of the resulting sets for each R E fit is defined to be the set Flvar(/zSOP).</Paragraph> <Paragraph position="5"> The rules for free predicate variables are a bit simpler. In the atomic formula P(x~ ..... x,), P is a free predicate variable. In a recursion scheme OP with domain fit, the set FPvar(/zSOP)) is the union of the sets FPvar(/zSOP(R))), minus the set fit.</Paragraph> <Paragraph position="6"> A final remark on notation: we will use the notation ~t~,..., t,) to stand for the formula 3xl . . . 3x,(~xl ..... xn) A xl = tl A . . . A x~ = t,) where the ti are terms, and the xl are individual variables not appearing in any ts. This will not affect our complexity results in any way.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 SEMANTICS FOR CLFP </SectionTitle> <Paragraph position="0"> We borrow some notation from the Oxford school of denotational semantics to help us explain the meaning of our logic grammars. IfX and Y are sets, then IX--> Y\] is the set of functions from X to Y. Let A = \[Ivar ~ ~*\] be the set of assignments of values to individual variables. We wish to define when a given assignment, say or, satisfies a given CLFP formula ~b. This will depend on the meaning assigned to the free predicate variables in ~b, so we need to consider such assignments. Let PA be the set of maps p from Pvar to the class of relations on IPS* such that the arity of p(P) is ar(P). We are now ready to define for each formula ~b and predicate assignment p, the set At\[\[~b\]\]p C A of individual assignments satisfying ~b with respect to p.</Paragraph> <Paragraph position="1"> 1. ~e(x,,...,x,)np = (~ I (a(x,) ..... a(x,,)) ~ p(e)}; 2. A~t, = t21\]O = (al tl a = t2a}, where ta is the evaluation of t with variables assigned values by a; 3. ~t\[\[3x~b\]lp = {a\[ 3u E E* : a(x/u) ~ ~tff~bl\]p}, and similarly for universal quantification; 4. kt\[dpVqJ\]\] p = At\[\[4~\]\]pUAt\[\[~b\]\]p, and similarly for other Boolean connectives.</Paragraph> <Paragraph position="2"> 5. a~/zs~p = {a I (ak)(~ e ~c~k(s)~p)} where, for each k, opk is a recursion scheme with the same domain fit as OP, and is defined as follows by induction on k. First, we stipulate that for each P e fit, the set BOPr(P) = BOP(P). Then we set</Paragraph> <Paragraph position="4"> where the notation ~R ~ 0(R) : R E fit\] denotes the simultaneous replacement of atomic subformulas R(wl ..... wk) in q, (where R is a free occurrence) by the formula O(R)(wl ..... wk), in such a way that no free occurrences of other variables in 0(R) are captured by a quantifier or a/z-operator in ~b. (We may always change the bound variables of qJ first, to accomplish this.) This definition appears much more clumsy than it really is, and we continue our example to illustrate it.</Paragraph> <Paragraph position="5"> Refer to the example of a regular grammar in the previous section. In the clause for S we are required to substitute the formula FALSE for occurrences of both S and T. This gives, after simplification,</Paragraph> <Paragraph position="7"> Similarly, substitution of FALSE into the clause for T gives OPdeg(T)(v) = FALSE. Now substitution of these new formulae for S and T into OP gives (after simplification): null</Paragraph> <Paragraph position="9"> It is easy to see that continuing this process will simulate all possible derivations in the grammar, and also that it explains the meaning of the scheme OP in terms of the meaning of subformulas.</Paragraph> <Paragraph position="10"> Some remarks are in order to explain why we use the term &quot;least-fixed-point&quot;, and to explain why, in a recursion scheme, all occurrences of recursively called predicates are required to be positive. Let OP : fit ---> CLFP be a recursion scheme. Define the map /~OP\] :</Paragraph> <Paragraph position="12"> where (xl ..... x,) is the sequence of variables in BOP(R), listed in increasing order of subscripts. If R fit, then ~OP\]p(R) = p(R). Next, let ~/~/zROP\]\]p = k->\[&quot;Jl I~OP\]\](k)(P\[R ~'- O : R ~ fit\]) Computational Linguistics, Volume 14, Number 4, December 1988 3 William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity where unions are coordinatewise, F ~k) is the k-th iterate of F, and p\[R <--- ~ : R ~ ~\] is p with the empty relation assigned to each predicate variable in fit. This formula is just the familiar least-fixed-point formula Uk_>~ F(~)(-1-) from denotational semantics. Then we can check that ~tzSgP\]\]p is in PA, and is the least fixed point of the continuous map ~qb\]~. It is then possible to prove that</Paragraph> <Paragraph position="14"> where S is the distinguished predicate variable in fit.</Paragraph> <Paragraph position="15"> If we had no conditions on negative formulas in recursion schemes, then we could entertain schemes</Paragraph> <Paragraph position="17"> which, although they would receive an interpretation in our first definition, would give a T which was not continuous, or even monotonic. We therefore exclude such cases for reasons of smoothness.</Paragraph> <Paragraph position="18"> Next we come to the definition of the language or relation denoted by a formula. A k-ary relation P on E* is said to be definable in CLFP iff there is a CLFP formula th with no free predicate variables such that (Ul ..... /'/k) E P <:~ 3a E kt\[\[th\]\]:(a(x0 ..... a(xk)) = (ul ..... Uk), where xl ..... xk is the list of free variables in ~b arranged in increasing order of subscript. (Notice that the parameter p has been omitted since there are no free predicate variables in ~b.) So far, we have not restricted quantification in our formulas, and every r.e. predicate is definable. We need to add one other parameter to the definition of J(, which will limit the range of quantification. This will be an integer n, which will be the length of an input sentence to be recognized. The occurrences of the formula .~t\[\[~b\]p will thus be changed everywhere in the above clauses to At\[~b\]pn. The only change in the substance of the clauses is in the rule for existential and universal quantification.</Paragraph> <Paragraph position="20"> where n = max(\[uiD. (To abbreviate the right-hand condition, we write (ul ..... uk) ~ ~b). Our first theorem can now be stated.</Paragraph> <Paragraph position="21"> Theorem 1. A language (unary predicate) is boundedly definable in CLFP iff it is in EXPTIME. We defer the proof to the next section.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 EXPTIME AND CLFP </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 ALTERNATION </SectionTitle> <Paragraph position="0"> Before proving Theorem 1, we need to discuss the method of proof both for this result and for the Integer LFP characterization of PTIME in the next section.</Paragraph> <Paragraph position="1"> This material is repeated from the fundamental article of Chandra, Kozen, and Stockmeyer (1981). Their paper should be consulted for the full details of what we state here.</Paragraph> <Paragraph position="2"> An alternating Turing machine can be regarded as a Turing machine with unbounded parallelism. In a given state, and with given tape contents, the machine can spawn a finite number of successor configurations according to its transition rules. These configurations are thought of as separate processes, each of which runs to completion in the same way. A completed process is one which is in a special accepting state with no successors. The results of the spawned processes are reported back to the parent, which combines the results to pass on to its own parent, and so forth. How the parent does this depends on the state of the finite control. These states are classified as being either existential (OR), universal (AND), negating (NOT), or accepting. If the parent is in an existential state, it reports back the logical OR of the results of its offspring. If it is in a universal state, it reports back the logical AND; if the state is negating, the parent reports the negation of its one offspring. An accepting state generates a logical 1 (TRUE) to be reported back. Thus a nondeterministic TM can be regarded as an alternating TM with purely existential states.</Paragraph> <Paragraph position="3"> An alternating TM is defined as a tuple in a standard way. It has a read-only input tape with a head capable of two-way motion. It also has a fixed number of work tapes. The input tape contains a string u E E*, while the work tapes can use a tape alphabet F. The transition relation 6 is defined as for ordinary nondeterministic TMs. The state set is partitioned as described above into universal, existential, negating, and accepting states. The relation 6 is constrained so that existential and universal states have at least one successor, negating states have exactly one successor, and accepting states have no successors. A configuration is then just a tuple describing the current state, positions of the heads, and tape contents as is familiar. The initial configuration is the one with the machine in its initial state, all the work tapes empty, and the input head at the left end of the input u. The successor relation Ibetween configurations is defined again as usual.</Paragraph> <Paragraph position="4"> To determine whether or not a configuration is accepting, we proceed as follows. Imagine the configurations that succeed the given one arranged in a tree, with the given configuration at the root. At each node, the immediate descendants of the configuration are the successors given by F. The tree is truncated at a level determined by the length of the input tape (this truncation is not part of the general definition but will suffice for our results.) The leaf nodes of this tree are labeled with (0) if the configuration at that node is not accepting, and with (1) if the configuration is accepting. The tree is then evaluated according to the description given above. The configuration at the root is accepting iff it is labeled (1) by this method. Thus an input is accepted by</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Computational Linguistics, Volume 14, Number 4, December 1988 </SectionTitle> <Paragraph position="0"> William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity the machine iff the initial configuration with that input is accepting. In our application, it will always suffice to cut off the tree at level 2 on, where n is the length of the input string, and c is a positive constant depending only on the description of the machine.</Paragraph> <Paragraph position="1"> We say that an alternating TM is S(n) space bounded iffin the above tree, for any initial configuration labeling the root, no auxiliary tape length ever exceeds S(n) where n is the length of the input. We are concerned only with the functions S(n) = log n and S(n) = n in this paper. We let the class ASPACE(S(n)) be the class of languages accepted by space-bounded ATMs in this way. We then have the following theorem (Chandra,</Paragraph> <Paragraph position="3"> where DTIME(T(n)) is the class of languages accepted deterministically by ordinary Turing machines within T(n) steps.</Paragraph> <Paragraph position="4"> Our problem in the rest of this section is to show how linear space bounded ATMs and CLFP grammars simulate each other. To facilitate the construction of the next section, it is convenient to add one feature to the definition of alternating Turing machines. Let U be the name of a k-ary relation on E*. We allow machines to have oracle states of the form U?(i I ..... ik), where the ij are tape numbers. If now the predicate U is interpreted by an actual relation on E*, then when M executes such an instruction, it will accept or reject according to whether the strings on the specified tapes are in the relation U. We will need such states to simulate recursive invocations of recursion schemes. It is not hard to modify the definition of acceptance for ordinary ATMs to that for oracle ATMs. The language or relation accepted by the ATM will now of course be relative to an assignment p of relations to the predicate names U.</Paragraph> <Paragraph position="5"> The next subsections contain our constructions for the CLFP characterizations. Then, in Section 4 we will treat Integer LFP grammars and show how these grammars and logspace bounded ATMs simulate each other.</Paragraph> <Paragraph position="6"> As a consequence of the above lemma, we will then have our main results.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 PROOF OF THEOREM 1 </SectionTitle> <Paragraph position="0"> Our first task is to show that if a language L is (boundedly) CLFP-definable, then it can be recognized by a linear space bounded ATM. The idea is simple.</Paragraph> <Paragraph position="1"> Given an input string, our machine will try to execute the logical description of the grammar. Its states will correspond to the logical structure of the CLFP formula. If that formula is, for example, the logical AND of two subformulas, then the part of our machine for that formula will have an AND state. A recursion scheme will be executed with states corresponding to the predicate variables involved in the recursion, and so forth. To give an explicit construction of an ATM corresponding to a formula 4, of CLFP we need to be precise about the number of work tapes required. This will be the sum of the number of free individual variables of 4,, and the number of &quot;declarations&quot; of bound variables in 4,. A &quot;declaration&quot; is either the occurrence of a universal or existential quantifier in 4,, or one of the individual variables bound on the left side of a (non-S) clause in a recursion scheme. If that clause defines the predicate R, then the number of variables declared at that point is ar(R) = \[Bqb(R)l. We thus define the number /3(4,) of declarations of bound variables in 4' by induction as follows: 1. \]3(R(x I ..... Xn)) = 0, 2. 13(/1 = /2) = 0,</Paragraph> <Paragraph position="3"> The number 7(4,) counts the maximum number of tapes needed, and is defined to be/3(4,) + IFivar(4,) I + 1.</Paragraph> <Paragraph position="4"> We can now state the inductive lemma which allows the construction of ATMs.</Paragraph> <Paragraph position="5"> Lemma 2. Let 4, be a CLFP formula, with IFlvar(4,)l = k, and T:Flvar(4,) ~ {1 ..... k}. Let m = y(4,).</Paragraph> <Paragraph position="6"> Then we may construct an m-tape ATM M(4,,T) having the following properties: (i) M has oracle states P? for each free predicate variable of 4,, and (ii) For any a:Flvar(4,) ---> Z*, and any environment p, we have the following. Let n = max{\[a(xi)\[}. Then M with oracle states for the p(P), started with a(xO on tape T(x,) ..... and a(xk) on tape T(xk), and the other tapes blank, will accept without ever writing more than n symbols on any tape, if and only if (a(x l) ..... e~(xk)) E Jl4.\[\[4,\]\]pn.</Paragraph> <Paragraph position="7"> Proof: This lemma formalizes the intuitive idea, stated above, that to calculate the membership of a string x in the language defined by a recursion scheme, it suffices to execute the scheme recursively. The full proof would use the formal definition of the semantics of ATMs, which themselves are given by least-fixed-point definitions. We have chosen not to give the full proof, because the amount of explanation would be overwhelming relative to the actual content of the proof. Instead we give a reasonably complete account of the inductive construction involved, and illustrate with the regular set example of the previous section.</Paragraph> <Paragraph position="8"> To start the induction over formulas 4,, suppose that 4, is R(x~ ..... xk). Then we may take M to be a machine with k = 7(4,) tapes, with one oracle state P, and the single instruction P?(T(x0 ..... T(xk)).</Paragraph> <Paragraph position="9"> If 4, is t~ = tz, then we let M be a simple machine evaluating t~ and t z, using perhaps an extra tape for bookkeeping. It does a letter-by-letter comparison, so that it never has to copy more than the maximum length of any one tape.</Paragraph> <Paragraph position="10"> If 4, is ~0, then M(4,) consists of adding a negating Computational Linguistics, Volume 14, Number 4, December 1988 5 William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity state before the initial state of M(q,), and transferring control to that initial state.</Paragraph> <Paragraph position="11"> If ~b is qq V ~02, we construct MI and M2 by inductive hypothesis. Then M(~b) is constructed by having disjoint instruction sets corresponding to each M;, prefixed by an OR state which transfers control to either of the two formerly initial states. The free individual &quot;variables of the disjunction are those occurring free in either disjunct. Let T be an assignment of tapes to the free variables of the disjunction. Then we construct MI with a T~ such that Tl(x) = T(x), and similarly for M 2, where x is a free individual variable. Otherwise, any tapes referenced in M~ are distinct from any tapes referenced in M2. In other words, the machine M has shared storage for the free variables, and private storage for variables bound in either disjunct. The oracle states in the two pieces of code are not made disjoint, however, because a predicate variable is free in the disjunction iff it is free in either disjunct. It is clear that the number of tapes of the ~ V 1~2 is just ~(I\]/I V 1~2). For the case of th = ~l/k ~b,, we make exactly the same construction, only using an AND state as the new initial state.</Paragraph> <Paragraph position="12"> If ~b is 3x~b, and T is a tape assignment for the free variables of ~b, then we construct M(~,) using the extended tape assignment which assigns a new tape k + 1 to the variable x, and otherwise is the same as T. Now M is constructed to go through an initial loop of existential states, which fills tape k + l with a string no longer than the maximum length of any string on tapes 1 through k. It then transfers control to the initial state of M(~,). The same construction is used for the universal quantifier, using an initial loop of universal states.</Paragraph> <Paragraph position="13"> Finally, we need to treat the case of a recursion scheme/.~Sqb. Suppose that * has domain ~, and let T be a tape assignment for/xSqb. For each clause CO(Q), where Q E ~, we construct a machine M(Q) by inductive hypothesis. The global free variables of each M(Q) will have tapes assigned by T. However, we construct the M(Q) all in such a way that the local tape numbers do not overlap the tape numbers for any other M(R).</Paragraph> <Paragraph position="14"> This procedure will give tape numbers to all the variables in the set Bdp(Q). Let this set be {zl ..... Zk} in increasing order. Define TQ(Zi) to be the tape assigned to zi in M(Q).</Paragraph> <Paragraph position="15"> The machine M(lxSd~) will consist of the code for the M(Q), arranged as blocks; the initial state of each such block will be labeled Q. In all the blocks, recursive oracle calls to Q? will be replaced by statements transferring control to Q. Thus, consider an oracle call Q?(il ..... ik), in any block M(R). Replace this call by code which copies tape i z to tape TQ(ZO ..... and tape i k to tape TQ(Zk). Insert code that empties all other tapes local to M(Q), and insert a statement &quot;go to Q.&quot; This completes the construction, and we now illustrate it with an example. Consider the recursion scheme introduced in the first section.</Paragraph> <Paragraph position="17"> We construct the machine M(S) as follows :l tape 1 :x</Paragraph> <Paragraph position="19"> Similarly, we can construct a machine M(T) for the T clause. Then the result of pasting together the two constructions is shown in Figure 1.</Paragraph> <Paragraph position="20"> tape 1 : x tape 2 : y (initially blank) tape 3 : v (initially blank) tape 4 : w (initially blank) S : guess a value of y, such that lyl ~ Ixl, on tape 2; go to (ql or q2 or q7); q l : go to (q3 and q4);</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Computational Linguistics, Volume 14, Number 4, December 1988 </SectionTitle> <Paragraph position="0"> William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity As we remarked, we cannot give a full proof of the correctness of our construction. However, the construction does correspond to the formal semantics of CLFP. In particular, the semantics of recursion corresponds to the iterated schemes O k. Iterating the scheme k times roughly corresponds to developing the computation tree of the ATM to k levels, and replacing the oracle states at the leaves of the k-level tree with rejecting states corresponds to substituting FALSE into the kth iteration.</Paragraph> <Paragraph position="1"> With these remarks, the proof is complete.</Paragraph> <Paragraph position="2"> Lemma 3. Suppose L is accepted by a S(n) = n-bounded ATM. Then there is a CLFP formula ~b such that for all u E ~*, we have u E L C/:> u ~ ~b.</Paragraph> <Paragraph position="3"> Proof: We may assume that M is an ATM with one work tape, if we allow M to print symbols in an auxiliary tape alphabet F. By a result in Chandra, Kozen, and Stockmeyer (1981) M has no negating states. We show how to construct a formula ~b, which has constants ranging over F, but which has the property stated in the conclusion of the lemma: for each string x over ~, M accepts x iffx ~b. The formula ~b will be given as a recursion scheme /xS~. Each state q of M will become a binary predicate variable q(x,y) in ~. The meaning of q(u, v), where u and v are specific strings in F*, is that M is in state q, scanning the first symbol of v, and that u and v are the portions of the work tape to the left and the right of the head, respectively.</Paragraph> <Paragraph position="4"> We give a perfectly general example to illustrate the construction of ~. In this example, the tape alphabet F is {a,b}. Suppose that q is a universal state of M and that 8(q,a) = {(r,b,right),(s,a,left)}, and 8(q,b) = {(p,b,left),(q,a,right)}. Then dp(q)(x,y) is the following formula: /~ Vwt\[(x = wtr A y = at ~ r(xb,t) A s(w,o~at)) o~{a,b} A(x = wtr /~ y = bt ~ p(w,trbt) /~ q(xa,t))\] The distinguished element of 2~ is qo, the start state of M. Notice that all predicate variables in R occur positively in ~, and that the search for w and t is limited to strings no longer than the length of the original input to M. If q is an accepting state of M, then we have a clause in ~ of the form q(x,y) <==> TRUE, where TRUE is some tautology.</Paragraph> <Paragraph position="5"> Technically speaking, the explicit substitutions r(xb,t) are not allowed in our formulas, but these can be expressed by suitable sentences like (3z)(z = xb /~ r(z,t)), as remarked in the first section. The cases for q(x,y) when x and y are null must also be handled separately because M fails if it tries to leave the original region.</Paragraph> <Paragraph position="6"> Finally, we can obtain a formula over the constant alphabet E by a more complicated construction. If we encode F into E by a homomorphic mapping, then a machine N can be constructed to simulate M. N will have tape alphabet E, but will have a number n of work tapes bounded linearly by the constant involved in the encoding. We now make a formula corresponding to N, but the predicates will have to be 2n-ary, one pair of arguments for each tape of N. With these remarks, the proof of the lemma is complete.</Paragraph> <Paragraph position="7"> Theorem I follows immediately from the above lemmas. null</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 ILFP: GRAMMARS WITH INTEGER INDEXING </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 SYNTAX OF ILFP </SectionTitle> <Paragraph position="0"> Our characterization of the defining power of CLFP relied on the result EXPTIME = ASPACE(n). We also know that PTIME = ASPACE(Iog n). Is there a similar logical notation that gives a grammatical characterization of PTIME? This section is devoted to giving an affirmative answer to this question. As stated in the introduction, this result is already known (Immerman 1982, Vardi 1982), but the result fits well with the CLFP theorem, and may in the linguistic domain have some real applications other than ours to Head Grammars. To explain the logic, it helps to consider acceptance by a logspace bounded ATM. In this case, the machine has a read-only input tape, which can be accessed by a two-way read head. Writing is strictly disallowed on the input tape, in contrast to the linear space bounded ATMs of the previous section. There is also a number k of work tapes on which computation occurs. Suppose that these work tapes use a binary alphabet. If their size always is less than or equal to I-log2 n-I, then they are always capable of representing the numbers from 0 through n - 1. We thus think of the contents of the work tapes as indices of specific positions in the read-only input string, though in fact they may not serve this purpose in an arbitrary computation. Since the input is off-line, substrings of the input will not be quantified.</Paragraph> <Paragraph position="1"> Instead, we quantify over the integer subscripts, and the input simply becomes a global parameter appearing in the semantics. Instead of having equations between strings as atomic formulas, we will have equations between integer terms. In order to access the input, we will have, for each symbol a E E, an atomic predicate symbol a(i) of one argument, which will be true iff in the given input x, the symbol x(i) at position i is a. (We number the positions from 0 through n - 1). We allow individual constant symbols 0,1, and last, which will be interpreted as 0, 1, and n - 1, respectively, when the input has size n. As primitive arithmetic operations we allow addition and subtraction, and multiplication and integer division by 2. All of these operations are interpreted modulo n when the input is given.</Paragraph> <Paragraph position="2"> We need not give the formal definition of ILFP formulas, as it is the same as for CLFP, except that individual variables come from a set {io,i ~ .... }, terms are formed as above from arithmetic combinations of individual variables and constants, and the unary predicates a(/) are atomic formulas.</Paragraph> <Paragraph position="3"> Example 2. Consider the CFG S---> aSb \[bSa\[ SS \[ab \[ ba Computational Linguistics, Volume 14, Number 4, December 1988 7 William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity This is represented in ILFP as follows:</Paragraph> <Paragraph position="5"> (Again, the explicit substitution of terms for variables is not officially allowed but can be introduced by definition.) null The meaning of the above scheme should be clear.</Paragraph> <Paragraph position="6"> The predicate S(id) is intended to mean that node S dominates positions i through j in the input. Thus the assertion S(0,1ast), with no free variables, will be satisfied by a string x iff x is generated by the given CFG. The relation of this descriptive formalism to the CKY algorithm for context-free recognition should also suggest itself.</Paragraph> <Paragraph position="7"> Our definition of the meaning function At\[\[4'\]\] is like that in Section 2, except that the parameter n is replaced by a string x E X*. Thus 1. At~p(i I ..... ik)\]px = {t~ \[ (a(i 0 .... a(ik) ) ~ /9(/9)}; 2. At~a(i)\]px = {a I x(a(i)) = a}; 3..~t~t 1 = tz\]pX = {a I tldegt = tEa}; 4. At~3i4'\]px = {a \[ (3m < Ixl)(~(i/m) ~ ~q4'\]px)}; 5. Boolean combinations are as before; 6. At~SdP\]px = {a I (3k)(a E At\[C~pk(S)\]px)} The schemes qb k are defined for recursion schemes as above.</Paragraph> <Paragraph position="8"> If 4' is a formula of ILFP with no free individual or predicate variables then S\[4'\]\]px is either A, the set of all individual assignments, or 0, independent of p, but depending onx. We say thatx ~ 4'iffS~4'~px is all of A. A language L _C X* is ILFP-definable iff for some 4' in ILFP, L = {x I x D 4'}. Our objective is now Theorem 2. A language is ILFP-definable iff it is in PTIME.</Paragraph> <Paragraph position="9"> The proof appears in the next subsection.</Paragraph> </Section> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.2 PROOF OF THEOREM 2 </SectionTitle> <Paragraph position="0"> The idea of our proof is the same as that for Theorem 1, and only a sketch of the proof is necessary. We first restate Lemma 2 for ILFP, using the same definition for</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> /3 and % </SectionTitle> <Paragraph position="0"> Lemma 4. Let 4' be an ILFP formula, with IFIvar(4')\] = k, and T : Flvar(4') ---> {1 ..... k}. Let m = 7(4').</Paragraph> <Paragraph position="1"> Then we may construct an m-tape ATM M(4',T) having the following properties: (i) M has oracle states P? for each free predicate variable of 4', and (ii) For any x ~ E*, any a mapping Flvar(4') to natural numbers, and any environment p, we have the following: M with oracle states for the p(P), started with x on the input tape, binary representations of the integers a(il) on tape T(il) ..... and a(ik) on tape T(ik), and the other tapes blank, will accept without ever writing a valuej > I x \[ on any tape, if and only if (ot(iO ..... t~(ik)) E At\[4'\]pX.</Paragraph> <Paragraph position="2"> Proof: The proof is almost identical to that of Lemma 2.</Paragraph> <Paragraph position="3"> To evaluate equations M may have to use an extra tape, because otherwise the given nonblank tapes would be overwritten by the arithmetic operations. If 4' is a(i) (the only case not covered in (2), then tape 1 is used as a counter to Mcate the input head at the position of the contents of tape 1. Since arithmetic is modulo Ixl, the machine never writes too great a value in these cases.</Paragraph> <Paragraph position="4"> The other cases are proved exactly as in (2), so this completes the proof.</Paragraph> <Paragraph position="5"> Lemma 5. If L ~ ASPACE(Iog n), then L is ILFPdefinable. null Proof: We may assume that L is accepted by an ATM with p binary work tapes and one input tape. (If the tape alphabet is not binary, encode with a homomorphism and expand the number of tapes as necessary.) We may further assume that the machine M never writes a string longer than L..log2(n)__\] - 1 on any work tape (remember one bit on each tape in finite control if necessary). Each work tape, or portion thereof, is thus guaranteed to represent a binary number strictly less than n in value, where n is the length of the input string.</Paragraph> <Paragraph position="6"> We now proceed as in the proof of Lemma 3, but coding the contents of the work tapes as binary numbers. We need a number h, which tells the position of the input head. We also have two numbers l and r, which are the binary values of the tape contents to the left and right of the work tape head (here we describe the case of just one work tape). The number r will actually be the binary value of the reversal of the string to the right of the tape head, because this makes the operation of shifting the head a simple multiplication or division by 2. Since a string may have leading zeroes, we also need to keep two auxiliary numbers II and rr, which are the actual lengths of the strings to the left and right of the head. For each state q of the ATM we thus have a predicate q(h,l,r, ll, rr) of five integer variables. The reader should have no difficulty in encoding the transition rules of M exactly as in Lemma 3. For example, a test as to whether the scanned symbol on the work tape is 0 or 1 becomes a test of the parity of r, and so on. Finally, it can be seen that the case of p work tapes requires 4p + l-ary predicates. This completes the proof of our lemma and thus the theorem.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 WHICH POLYNOMIAL? </SectionTitle> <Paragraph position="0"> We can get a rough estimate of the degree of the polynomial time algorithm, which will recognize strings in the language defined by an ILFP grammar. We saw in the proof of Lemma 4 that if a scheme 4' has 7(4') = P, then an ATM with p + 1 binary work tapes can be constructed to recognize the associated language. The number of configurations of each tape is thus log n * 2log n+l If there are p + 2 tapes, this gives O(logp+ln * n p+IIP := O(n p+2) possible tape configurations. Multiplying by n for the position of the input head gives O(n p+3) possible ATM configurations. From an analysis of the proof of Lemma 1 in Chandra, Kozen, and Stockmeyer</Paragraph> </Section> </Section> <Section position="11" start_page="0" end_page="0" type="metho"> <SectionTitle> 8 Computational Linguistics, Volume 14, Number 4, December 1988 </SectionTitle> <Paragraph position="0"> William C. Rounds LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity (1981), we can see that the polynomial in our deterministic TM algorithm is bounded by the square of the number of ATM configurations. This leads to an O(n 2p+6) recognition algorithm. Since this bound would give an O(n ~2) algorithm for context-free language recognition, we conjecture that the general estimate can be improved. In particular, We would like to remove the factor of 2 from 2p.</Paragraph> </Section> class="xml-element"></Paper>