File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1001_metho.xml

Size: 27,958 bytes

Last Modified: 2025-10-06 14:12:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1001">
  <Title>FeaMble Learnability of Formal Grammars and \[~?he Theory of Natm'al Language Acquisition</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Polynomial Learnability
2ol Formal Modeling of Learning
</SectionTitle>
    <Paragraph position="0"> What constitutes a good model of tile learning behavior? Below we list tlve basic elements that any formal model of learning must con&lt;,. (c.f. \[13\]) 1. Objects to be learned: l,ct us call them ~knacks' for full generality. The question of learnability is asked of a collection of knacks.</Paragraph>
    <Paragraph position="1"> 2. Environment: The way in whidl 'data' are available to tile learner.</Paragraph>
    <Paragraph position="2"> 3. I\[ypotheses: I)escriptious t))r 'knacks', usually CXl)ressed in a certain language.</Paragraph>
    <Paragraph position="3"> 4. /,earners: Ill general functions from data to hypotheses. 5. Criterion of l,earning: \])efines precisely what is meant by  the question; When is a learner said to 'learn' a giwm collection of 'knacks' on the basis of data obtained through the enviromnent ? In most cases 'knacks' can be thought of as subsets of some universe (set) of objects, from which examples are drawn. 1 (Such a set is often called the 'domain' of the learning problem.) The obvions example is the definition of what a language is in the theory of natural language syntax. Syntactically, the English language is nothing but the set of all grammatical sentences, although this is subject to much philosophical controversy. The corresponding mathematical notion of a formal language is one that is fi'ee of such a controversy. A formal language is a subset of the set of all strings in .E* for some alphabet E. Clearly E* is tile domMn. The characterization of a kna&amp; as a subset of a universe is in fact a very general one. For example, a boolean concept of n variables is a subset of the set of all assignments to those n variables, often written 2 '~. Positive examples in this case are assignments to the n variables which 'satisfy' the concept in question.</Paragraph>
    <Paragraph position="4"> When the 'knacks' under consideration can in fact be thought of as subsets of some domain, the overall picture of a learning model looks like the one given in Figure 1.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Polynomial Learnability
</SectionTitle>
      <Paragraph position="0"> Polynomial learnability departs from the classic paradigm of language learning, 'idenitification in the limit', ~ in at least two important aspects, lilt enforces a higher demand oil tile time  nomial, but on the other hand relaxes the criterion of what constitutes a 'correct' grammar by employing an approximate, and probabilistic notion of correctness, or aecraey to be'precise. Furthermore, this notion of correctness is intricately tied to both the time complexity requirement and the way in which the environment presents examples to the learner, Specifically, the environment is assumed to present to the learner examples from the domain with respect to an unknown (to the learner) but fixed probability distribution, and the accuracy of a hypothesis is measured with respect to that same probability distribution.</Paragraph>
      <Paragraph position="1"> This way, the learner is, so to speak, protected from 'bad' presentations of a knack. We now make these ideas precise by specifying the five essential parameters of this learning paradigm. 1. Objects to be learned are languages or subsets of ?2&amp;quot; for some fixed alphabet E. Although we do not specify apriori the language in which to express these grammars a, for each collection of languages Z; of which we ask the learnability, we fix a class of grammars G (such that L(~) = PS where we write L(~) to mean {L(G) I G E ~}) with respect to which we will define the notion of 'complexity' or 'size' of a language. We take the number of bits it takes to write down a grammar under a reasonable 4, fixed encoding scheme to be the size of the grammar. The size of a language is then defined as the size of a minimal grammar for it. (For a language L, we write size(L) for its size.)  2. The environment produces a string in E* with a timeinvariant probability distribution unknown to the learner and pairs it with either 0 or 1 depending on whether the string is in the language in question or not, gives it to the learner. It repeats this process indefinitely.</Paragraph>
      <Paragraph position="2"> 3. The hypotheses axe expressed as grammars. The class of  grammars allowed as hypotheses, say &amp;quot;H, is not necessarily required to generate exactly the class Z; of languages to be learned. In general, when a collection PS can be learned by a learner which only outputs hypotheses from a class 7&amp;quot;/, we say that PS is learnable by Tl, and in particular, when Z; = L(~)) is learnable by ~, the class of representations G is said to be properly learnable. (See \[6\].) 4. Learners passively receive an infinite sequence of positive and negative examples in the manner described above, and aPotentAally any 'l?urning program could be a hypothesis ~By a reasonblc encoding, we mean one which can represent n ditrerent. grannnars using O(log*~) bits.</Paragraph>
      <Paragraph position="4"> hypothesis. In other words, they are functions from finite sequences of positive and negative examples 5 to grammars.</Paragraph>
      <Paragraph position="5"> A learning function is said to polynomially learn a collection of languages just in case it is computable in time polynomial ill the length of the input sample, and for an arbitrary degrees of accuracy e and confidence 5, its output on a sample produced by the environment by the manner described above for any language L in that collection, will be an e-approximation of the unknown language L with confidence probability at least 1 -- a, no matter what the unknown distribution is, as long as the number of strings in the sample exceeds p(e -~, 5 -~, size (L)) for some fixed plynomial p. Here, grammar G is an e-approximation of language L, if the probability distribution over the symmetric difference 6 of L and I,(G) is at most e.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Occam Algorithm
</SectionTitle>
      <Paragraph position="0"> Blumer et al. \[5\] have shown an extremely interesting result revealing a connection between reliable data compression and polynomial learnability. Occam's l~azor is a principle in the philosophy of science which stipulates that a shorter theory is tobe preferred as long as it remains adequate. B\]umel&amp;quot; el; al.</Paragraph>
      <Paragraph position="1"> define a precise version of such a notion in the present context of learning which they call Occam Algorithm, and establishes a relation between the existence of such an algorithm and polynomiM learnability: If there exists a polynomial time algorithm which reliably &amp;quot;compresses&amp;quot; any sample of any language in a given collection to a provably small consistent grammar for it, then such an Mogorithm polynomially learns that collection in the limit. We state this theorem in a slightly weaker form.</Paragraph>
      <Paragraph position="2">  Definition 2.1 Let PS be a language collection with associated represenation ~ with size function &amp;quot;size&amp;quot;. (Define a sequence of subclasses of ~ by 7~n = {G e 7-\[ \] size(G) _&lt; n}.) Then A is an Occar(~ algorithm for PS with range size f(m, ~z) if and only if! VLEPS VS C graph(L) if size(L) = n and \] S I= m then A(S) is consistent with S and A(S)) e 7~I(,~,m ) and .A runs in time polynomial in the length of S.</Paragraph>
      <Paragraph position="3"> Theorem 2.1 (Blumer et al.) If A is an Occam algorithm  for f~ with range size f(n,m) = O(nk~ ~) for some k &gt;_ ; 0 &lt; c~ &lt; 1 then .4 polynomially learns PS in the limit. We give below an intuitive explication of why an 0cesta Algorithm polynomiMly learns in the limit. Suppose A is an Occam Algorithm for PS, and let L ~ l: be the language to be learned, and n its size. Then for an arbitrary sample for L of an arbitrary size, a minimal consistent language for it will never have size larger than size(L) itself. Hence A's output on a sample of size m will always be one of the hypotheses in H\](m,~), whose cardinality is at most 2\](~,n). As the sample size m grows, its effect on the probability that any consistent hypothesis in 7~i(,~,, 0 is accurate will (polynomially) soon dominate that of the growth of the eardinality of the hypothesis class, which is less than linear in the sample size.</Paragraph>
      <Paragraph position="4"> Sin the sequel, we shall call them 'labeled samples' SThe symmetric difference between two sets A and B is (A-B)U(B-A). rFor any langugage L, ~jraph(L) = {(x, O} I x C-: L} U {{a:, I) \] a: ~ L}.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="13" type="metho">
    <SectionTitle>
3 Rar~ked Node Rewriting Grammars
</SectionTitle>
    <Paragraph position="0"> In this section, we define l, hc class of nrihlly context sensitive grammars under consideration, or Ranked Node Rewriting (\]ram.mars (RNR(~'s). \[{NR(\]'s are based on the underlying ideas of Tree Adjoining Grammars (TArt's) s and are also a specical case of context fi'ee tree grammars \[15\] in which unres~,ricted use of w~rial)les for moving, copying and deleting, is not permitted, in other words each rewriting in this system replaces a &amp;quot;ranked&amp;quot; noclterminal node of say rank j with an &amp;quot;incomplete&amp;quot; tree containing exactly j edges that have no descendants. If we define a hierarchy of languages generated by subclasses of RNRG's having nodes and rules with hounded rank j (RNRLj), then RNRL0 = CFL, and RNRLa :: TAL. 9 We formally define these grammars below.</Paragraph>
    <Paragraph position="1"> Definition 'LI (Preliminaries) 77ze following definitions are  necessar!l Jb',&amp;quot; the ,~equel.</Paragraph>
    <Paragraph position="2"> (i) The set of labeled directed trees over an alphabet E is denoted 7;&gt; (ii) r\['ll.e Ta.'ll.'. of an &amp;quot;incomplete&amp;quot; tree is the number of outgoing edges with no descendents.</Paragraph>
    <Paragraph position="3"> (iii) The rarth oj'a node is the. number of outgoing edges. (iv) The ~u&amp; 4'a symbol is defined if the rank of any node labeled by it is always the same, and equal~ that rank.</Paragraph>
    <Paragraph position="4"> (v) A ranked alphabet is one in which every symbol has a rank. (vi) I,l)r writ,': rank(x) for the rank of a~ything x, if it is defined. Definition 3.2 (Ranked Node Rewriting Grammars) A ronl;ed nodt; re'writing grammar C is a q'uinl,ph' {&gt;',,v, E'e, ~, It,., Re;) where: (i) EN is a ranked nonterminal alphabet.</Paragraph>
    <Paragraph position="5"> (ii) );'r is a germinal alphabet di4oint fi'om F~N. We let ~; = }-;N U 2T.</Paragraph>
    <Paragraph position="6"> (iii) ~ is a distinguished symbol distinct from any member of E, indicating &amp;quot;a'a outgoing edge with no descendent&amp;quot;, m (iv) It; is a finite set of labeled trees over E. We refer ~o I(; as ~he &amp;quot;initial trees&amp;quot; of the grammar.</Paragraph>
    <Paragraph position="7"> (v) Ra is a finite set of rewriting rules: R&lt;~ C {(A,a} I A e Y,'N &amp; a C T~u{.} &amp; rank(A) = rank(re)}. (In the sequel, we write A --. o for rewriting rule {A, ce).) (vO ,'a,,V(c) = ,ha, {,-~,4.(A) I A e EN}.</Paragraph>
    <Paragraph position="8">  We emphasize that the nonterminM vs. terminal distinction above does not coiadde with the internal node vs. frontier node distinction. (See examples 2.1 - 2.3.) tiaving defined the notions of 'rewriting' and 'derivation' in the obvious manner, the tree language of a grammar is then defiimd as the set of trees over the terminal alphabet, whid~ can be derived fi'om the grammar. 11 This is analogous to the way the string language of a rewriting grammar in the Chomsky hierarchy is defined.</Paragraph>
    <Paragraph position="9"> Definition 3.:&amp;quot;1 ('IYee Languages and String Languages) The tree language and string Iang~tagc of a RNRG G, denoted s'\]?ree adjoitdng grammars were introduced a.s a formalism for linguistic description by aoshi et al. \[10\], \[9\]. Various formal and computational properties of TAG's were studied in \[17\]. Its linguistic relevance was demons~rated in \[12\].</Paragraph>
    <Paragraph position="10"> 9This hierar,:hy is different fi'om the hierarchy of &amp;quot;meta-TAL's&amp;quot; invented and studied exl.ensively by Weir in \[20\].</Paragraph>
    <Paragraph position="11"> ldegln context free t.ree grammars iu \[15\], variables are used in place of ~J. 'l'hese variables can then be used in rewriting rules to move, copy, or erase subtrees.. \[t is i;his restriction of avoiding such use of variables Hint keeps RNR,G's within the class of etlicient, ly recognizable rewriting systems called &amp;quot;Linear context fi'ee rewriting systems&amp;quot; (\[18\]).</Paragraph>
    <Paragraph position="12"> II'Phis is how an &amp;quot;obligatory adjunction constraint&amp;quot; in the tree adjoining nunar formalism can be sintulated.</Paragraph>
    <Paragraph position="14"> If we now define a hierarchy of languages generated by sub-classes of RNRG's with bounded ranks, context fi'ee languages ((',FL) and tree adjoining languages (TAt) constitute the first two members of the hierarchy.</Paragraph>
    <Paragraph position="15"> Definition 3.4 l;br each j ~ N RNI~Gj = {GIG C RNRG &amp; rank(G) &lt; J}. l;br each j ~ N, I{NIU, j = {L(C) I O e: antiC;;} Theorem 3.1 I{NI~Lo - CFL ~tn.d l~N I~\[.1 : !I'AL.</Paragraph>
    <Paragraph position="16"> We now giw; some examples of grammars in this laierarchy, J2 which also illustrate the way in which the weak generative capacity of different levels of this hierarchy increases progressively.  Example 3.1. 1), = {3% ~ \[ n. C N} C Gl' , is generated by the following l?~Nl~(_7o 9rammar~ where o' is shown in Figure 2.</Paragraph>
  </Section>
  <Section position="4" start_page="13" end_page="13" type="metho">
    <SectionTitle>
4 K-Local Grammars
</SectionTitle>
    <Paragraph position="0"> q'he notion of qocality' of a grammar we define in this paper is a measure of how much global dependency there is within the grammar. By global dependency within a gramnlar, we. mean the interactions that exist between different rules and nonterminals in the grammar. As it is intuitively clear, allowing unbounded amont of global interaction is a major, though not only, cause of a combinatorial explosion in a search for a right grammar. K-locality limits the amount of such interaction, by tSSimpler trees are represented as term struct.ures, whereas lnore involved trees are shown in the figure. Also note that we rise uppercase letters for nonterminals and lowercase for terminals.</Paragraph>
    <Paragraph position="1"> IaSome linguistic motiwltions of this extension of'lDkG's are argagned for by the author in \[1\].</Paragraph>
    <Paragraph position="2"> bounding the number of different rules that can participate in any slngle derivation.</Paragraph>
    <Paragraph position="3"> Pormally, the notion of &amp;quot;k-locality&amp;quot; of a grammar is defined with respect to a formulation of derivations due originally to Vijay-Shankar, Weir, and 3oshi (\[\[9\]), which is a generalization of the notion of parse trees for CFO's. In their formulation, a derivation is a tree recording the tfistory of rewritings. The root of a derivation tree is labeled with an initial tree, and the rest of the nodes with rewriting rules. Each edge corresponds to a rewriting; the edge from a rule (host rule) to auother rule (applied rule) is labeled with the address of the node in the host l, ree at which the rewriting takes place.</Paragraph>
    <Paragraph position="4"> The degree of locality of a derivation is the number of distinct kinds of rewritings that appear in it. In terms of a derivation tree, the degree of locality is the number of different kinds of edges in it, where two edges are equivalent just in ease the two end nodes are labeled by the same rules, and the edges themselves are labeled by the same node address.</Paragraph>
    <Paragraph position="5"> Definition 4.1 Let 7)(G) denote the set of all derivation trees of G, and let r 6 D(G). Then, the degree of locality oft, written locality(r), is d4ned as follows, locality(r) = card{(p,q,,t) I there is an edge in r from a node labeled with p to another labeled with q, and is itself labeled with 77} The degree of locality of a gramm,~r is the maximum of those of all its derivations.</Paragraph>
    <Paragraph position="6"> Definition 4.2 a RNRG G is called k-local if max{locality(r) \] r e ~(C)} _&lt; k.</Paragraph>
    <Paragraph position="7"> We write k-Local-I~NRO - {(7 I G (5 RNRG and G is k-Local} and k-Local-t2Nl~L = { L(G) I G C k-Local-i~NR(: }, etc..</Paragraph>
    <Paragraph position="8"> Example 4.1 L1 = {a&amp;quot;bna&amp;quot;b '' I n,m C N} ~ /t-Local-RNRLo since all the derivations of G, -({S}, {s,a,b}, ~, {s(S,S)}, {S -+ sea, S,b), S --~ A}) generating Lt have deflree of locality at most 4. l,br example, the derivation for the string a3b3ab has degree of locality 4 as shown in Figure 8.</Paragraph>
    <Paragraph position="9"> Because locality of a derivation is the number of distinct kinds of rewritings, inclusive of the positions at which they takc place, k-locality also puts a bound on the number of nonterminal occurrences in any rule. In fact, had we defined the notion of k-locality by the two conditins: (i) at most k rules take part in any derivation, (if) each rule is k-bounded, t4, the analogous learnability result would follow essentially by the same argument. So, k-locality in effect forces a grammar to be an unbounded union of boundedly simple grammar, with bounded number of rules each boundedly small, with a bounded number of nonterminals.</Paragraph>
    <Paragraph position="10"> This fact is captured formally by the existence of the following normal form with only a polynomial expansion factor.</Paragraph>
    <Paragraph position="11"> Lelnma 4.1 (K-Local Normal Form) For every k-Local-RNRGj G, if we let n = size(G), then there is a RNRGj G' such that  ~. L( C') = r,,( a).</Paragraph>
    <Paragraph position="12"> 2. c' is in k-local normal form, i.c. O' = U{1\]~ I i C -rG,} such that: (a) each lIi has a nonterminal set that is: disjoint from any other IIj.</Paragraph>
    <Paragraph position="13"> (b) each tI~ is k-sire, pie, that is i. each Ili contains exactly i initial tree.</Paragraph>
    <Paragraph position="14">  if. each Hi contains at most k rules.</Paragraph>
    <Paragraph position="15"> iii. each IIi contains at most k nonterminal occurrences. null s. ~i~e(c~&amp;quot;) = o(~+').</Paragraph>
    <Paragraph position="16"> Crucially, the constraint of k-locality on RNRG's is an interesting one because not only each k-local subclass is an exponential class containing infinitely many infinite languages, but also k-local subclasses of the RNRG hierarchy become progressively more complex as we go higher in the hierarchy. In particular, for each j, IlNP~Gj can &amp;quot;count up to&amp;quot; 2(j + 1) and for each k &gt; 2, k-local-RN\[4Gj can also count up to 2(j + 1)) 5 We summarize these properties of k-loeal-RNRL's below.</Paragraph>
    <Paragraph position="17"> Theorem 4.1 Pbr every k E N,  1. Vj E N UkeN k-local-RNRLj = RNRLj.</Paragraph>
    <Paragraph position="18"> ~. Vj C N Vk &gt; 3 k-local-RNRLj+l is incomparable with RNRLp 3. Vj, k ~ N k-local:RNRLj is a p~oper subset of (k+I)loeal-t~NRLj. null 4. Vj Vk &gt; 2 E N k-local-RNRLj contains infinitely many infinite languages.</Paragraph>
    <Paragraph position="19"> hfformal t'roof: 1 is obvious because for each grammar in RNRLj, the degree  of locality o~&amp;quot; the grannnar is finite.</Paragraph>
    <Paragraph position="20"> As for 2, we note that the sequence of the languages (for the first three of which we gave example grammars) L~ = {a~*a~...a~ I u ~ N} are each in 3-1ocal-RNRLI_I but not in RNRLi_2.</Paragraph>
    <Paragraph position="21"> To verii} 3, we give the following sequence of languages Lj,k such that for each j and k, Lj, k is in k-local-RNRLj but not in (k-1)-local-RNRL/. Intuitively this is because k-local-languages can have at most O(k) mutually independent dependencies in a single sentence.</Paragraph>
    <Paragraph position="22"> Example 4.2 For each j, k ~ N, let Lj,k = { ~ '~ 2,~2 2~, al ...a20+1 ) al ...a2(j+l) knk kn~ ... a 1 ...a2(j~t) \]nl,n2,...,nk e N}.</Paragraph>
    <Paragraph position="23"> is obvious because Zoo = Uwe~.Lw where Lt~ = {w&amp;quot; \] n e N} are a subset of 2-1ocal-I~NRL0, and hence is a subset of k:local-RNl~Lj for every j and k &gt;_ 2. PSC/C/ clearly contains inifinitely many infinite languages. \[\]</Paragraph>
  </Section>
  <Section position="5" start_page="13" end_page="13" type="metho">
    <SectionTitle>
5 K-Local Languages Are Learnable
</SectionTitle>
    <Paragraph position="0"> It turns out that each k-loeal subclass of each RNRLj is polynomially lear~lable.</Paragraph>
    <Paragraph position="1"> Theorem 5. t For each j and k, k-local-RNRLj is polynomially Icarnable.</Paragraph>
    <Paragraph position="2"> This theorem can be proved by exhibiting an Occam Algorithm i(c.f, for this class with size which is Subsection 2.3), a range l logarithmic in the sample size, and polynomial in the size of a minimal consistent grammar. We ommit a detailed proof and igiw~ an informal outline of the proof. : 1. By the Normal Form Lemma, for any k-local-RNRG G, there is a language equivalent k-local-RNR.G H in k-local normal form whose size is only polynomially larger than the size of G.</Paragraph>
    <Paragraph position="3"> t~A class of grammars G is said to be able to &amp;quot;count up to&amp;quot; j, just in case {a?a'~...a\] \] n e N} e {L(G) \[ G (~ G} but {ai'a'~...a~+ 1 \[ n e N} C/ {c(G) I a e 6}.</Paragraph>
    <Paragraph position="4"> 2. The number of k-simple grammars with is apriori infinite, but for a given positive sample, the number of such grammars that are 'relevant' to that sample (i.e. which could have been used to derive any of the examples) is polynomially bounded in the length of the sample. This follows essentially by the non-erasure and non-copying properties of RNRG's. (See \[3\] for detail.) 3. Out of the set of k-simple grammars in the normal form thus obtained, the ones that are inconsistent with the negative sample are eliminated. Such a filtering can be seen to be performable in polynomial time, appealing to the result of Vijay-Shankar, Weir and Joshi \[18\] that Linear Context Free Rewriting Systems (LCFRS's) are polynomial time recognizable. That R.NRG's are indeed LCFRS's follow also from the non-erasure and non-copying properties.</Paragraph>
    <Paragraph position="5"> 4. What we have at this stage is a polynomially bounded set of k-simple grammars of varying sizes which are all consistent with the input sample. The 'relevant' part 10 of a minimal consistent grammar in k-local normal form is guaranteed to be a subset of this set of grammars. What an Oceam algorithm needs to do, then, is to find some sub-set of this set of k-simple grammars that &amp;quot;covers&amp;quot; all the points in the positive sample, and has a total size that is provably only polynomially larger than the minimal total size of a subset that covers the positive sample and is less than linear in the sample size.</Paragraph>
    <Paragraph position="6"> 5. We formalize this as a variant of &amp;quot;Set Cover&amp;quot; problem which we call &amp;quot;Weighted Set Cover&amp;quot; (WSC), and prove (in \[2 D the existence of an approximation algorithm with a performance guarantee which suffices to ensure that the output of ,4 will be a basis set consistent with the sample which is provably only polynomially larger than a minimal one, and is less than linear in the sample size. The algorithm runs in time polynomial in the size of a minimal consistent grammar and the sample length.</Paragraph>
  </Section>
  <Section position="6" start_page="13" end_page="13" type="metho">
    <SectionTitle>
6 Discussion: Possible Implications
</SectionTitle>
    <Paragraph position="0"> to the Theory of Natural Language Acquisition We have shown that a single, nontrivial constraint of 'k-locality' allows a rich class of mildly context sensitive languages, which are argued by some \[9\] to be an upperbound of weak generative capacity that may be needed by a hnguistic formalism, to be learnable. Let us recall that k-locality puts a bound on the amount of global interactions between different parts (rules) of a grammar. Although the most concise discription of natrual language might require almost unbounded amount of such interactions, it is conceivable that the actual grammar that is acquired by humans have a bounded degree of interactions, and thus in some cases may involve some inefficiency and redundancy. To illustrate the nature of inefficiecy introduced by 'forcing' a grammar to be k-loeal, consider the following. The syntactic category of a noun phrase seems to be essentially context independent in the sense that a noun phrase in a subject position and a noun phrase in an object positionare more or less syntactically equivalent. Such a 'generalization' contributes to the 'global' interaction in a grammar. Thus, for a k-local grammar (for some relatively small k) to account for it, it may have to repeat the same set of noun phrase rules for different constructions.</Paragraph>
    <Paragraph position="1"> tC/This ,lotion is to be made precise.</Paragraph>
    <Paragraph position="2"> As is stated in Section 4, for each fixed k, there are clearly a lot of languages (in a given class) which could not be generated by a k-local grammar. However, it is also the case that many languages, for which the most concise grammar is not a k-local grammar, can be generated by a less concise (and thus perhaps less explanatory) grammar, which is k-locah In some sense, this is similar to the well-known distinction of 'competence' and 'performance'. It is conceivable that performance grammars which are actually acquired by humans are in some sense much less efficient and less explanatory than a competence grammar for the same language. After all when the 'projection problem' asks: 'How is it possible for human infants to acquire their native languages...', it does not seem necessary that it be asking the question with respect to 'competence grammars', for what we know is that the set of 'performance grammars' is feasibly learnable. The possibility that we are suggesting here is that 'k-locality ~ is not visible in competence grammars, however, it is implicitly there so that the languages generated by the class of competence grammars, which are not necessarily k-local, are indeed all k-local languages for some fixed 'k'.</Paragraph>
  </Section>
  <Section position="7" start_page="13" end_page="13" type="metho">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have investigated the use of complexity theory to the evaluation of grammatical systems as linguistic formalisms from the point of view of feasible learnability. In particular, we have demonstrated that a single, natural and non-trivial constraint of &amp;quot;locality&amp;quot; on the grammars allows a rich class of mildly context sensitive languages to be feasibly learnable, in a well-defined complexity theoretic sense. Our work differs from recent works on efficient learning of formal languages, for example by Angluin (\[4\]), in that it uses only examples and no other powerful oracles. We hope to have demonstrated that learning formal -- grammars need not be doomed to be necessarily computationally intractable, and the investigation of alternative formulations of this problem is a worthwhile endeavonr.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML