File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/p83-1001_metho.xml
Size: 18,697 bytes
Last Modified: 2025-10-06 14:11:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P83-1001"> <Title>CONTEXT-FREENESS AND THE COMPUTER PROCESSING OF HUMAN LANGUAGES</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. STRONG GENERATIVE CAPACITY </SectionTitle> <Paragraph position="0"> I now turn to a claim involving strong generative capacity (SGC). In addition to claiming that human languages are non-profligate CFL's, I want to suggest that every human language has a linguistically adequate grammar possessing the Exhaustive Constant Partial Ordering (ECPO) property of Gazdar and Pullum (1981). A grammar has this property if there is a single partial ordering of the nontermihal vocabulary which no right hand side of any rule violates. The ECPO CF-PSG's are a nonempty proper subset of the CF-PSG's. The claim that human languages always have ECPO CF-PSG's is a claim about the strong generative capacity that an appropriate theory of human language should have--one of the first such claims to have been seriously advanced, in fact. It does not affect weak generative capacity; Shieber (1983a) proves that every CFL has an ECPO grammar. It is always possible to construct an ECPO grammar for any CFL if one is willing to pay the price of inventing new nonterminals ad hoc to construct it. The content of the claim lies in the fact that linguists demand independent motivation for the nonterminals they postulate, so that the possibility of creating new ones just to guarantee ECPO-ness is not always a reasonable one.</Paragraph> <Paragraph position="1"> \[OPEN PROBLEM: Could there be a non-profligate CFL which had #(N) < #T (i.e. nonterminal vocabulary strictly smaller than terminal vocabulary) for at least one of its non-ECPO grammars, but whose ECPO grammars always had #(N) > #(T)?\] When the linguist's criteria of evaluation are kept in mind, it is fairly clear what sort of facts in a human language would convince linguists to abandon the ECPO claim. For example, if English had PP - S&quot; order in verb phrases (explain to him ~a~ he'll have to leave) but had S&quot; - PP order in adjectives (so that lucky for us we found you had the form lucky we found you for us), the grammar of English would not have the ECPO property. But such facts appear not to turn up in the languages we know about.</Paragraph> <Paragraph position="2"> The ECPO claim has interesting consequences relating to patterns of constituent order and how these can be described in a fully general way. If a gr~r has the ECPO property, it can be stated in what Gazdar and Pullum call ID/LP format, and this renders numerous significant generalizations elegantly capturable. There are also some potentially interesting implications for parsing, studied by Shieber (1983a), who shows that a modified Earley algorithm can be used to parse ID/LP format gr----mrs directlydeg One putative challenge to any claim that CF-PSG's can be strongly adequate descriptions for human languages comes from Dutch and has been discussed recently by Bresnan, Kaplan, Peters, and Zaenen (1982). Dutch has constructions like (7) dat Jan Pier Marie zag leren zwemmen that Jan Pier Marie saw teach swim &quot;that Jan saw Pier teach Marie to swim&quot; These seem to involve crossing dependencies over a domain of potentially arbitrary length, a configuration that is syntactically not expressible by a CF-PSG. In the special case where the dependency involves stringwise ~dentity, a language with this sort of structure reduces to something like {xx\[~ is in ~*}, a string matching language. However, analysis reveals that, as Bresnan et el. accept, the actual dependencies in Dutch are not syntactic.</Paragraph> <Paragraph position="3"> Grammaticality of a string like (7) is not in general affected by interchanging the NP's with one another, since it does not matter to the ~th verb what the ith NP might he. What is crucial is that (in cases with simple transitive verbs, as above) the ~th predicate (verb) takes the interpretation of the i-lth noun phrase as its argument.</Paragraph> <Paragraph position="4"> Strictly, this does not bear on the issue of SGC in any way that can be explicated without making reference to semantics. What is really at issue is whether a CF-PSG can assign syntactic qtructures to sentences of Dutch in a way that supports semantic interpretation.</Paragraph> <Paragraph position="5"> Certain recent work within the framework of generalized phrase structure gran~mar suggests to me that there is a very strong probability of the answer being yes. One interesting development is to be found in Culy (forthcoming), where it is shown that it is possible for a CFL-inducing syntax in ID/LP format to assign a &quot;flat&quot; constituent structure to strings like Pier Marie za~ leren zwemmen ('saw Pier teach Marie to swim'), and assign them the correct semantics.</Paragraph> <Paragraph position="6"> Ivan Sag, in unpublished work, has developed a different account, in which strings like za~ leren zwemmen ('saw teach to swim') are treated as compound verbs whose semantics is only satisfied if they are provided with the appropriate number of NP sisters. Whereas Culy has the syntax determine the relative numbers of NP's and verbs, Sag is exploring the assumption that this is unnecessary, since the semantic interpretation procedure can carry this descriptive burden. Under this view too, there is nothing about the syntax of Dutch that makes it non-CF, and there is not necessarily anything in the grammar that makes it non-ECPO.</Paragraph> <Paragraph position="7"> Henry Thompson &quot;also discusses the Dutch problem from the GPSG standpoint (in this volume).</Paragraph> <Paragraph position="8"> One other interesting line of work being pursued (at Stanford, like the work of Culy and of Sag) is due to Carl Pollard (Pollard, forthcoming, provides an introduction). Pollard has developed a generalization of context-free grammar which is defined not on trees but on &quot;headed strings&quot;, i.e. strings with a mark indicating that one distinguished element of the string is the &quot;head&quot;, and which combines constituents not only by concatenation but also by &quot;head wrap&quot;. This operation is analogous to Emmon Bach's notion &quot;right (or left) wrap&quot; but not equivalent to it. It involves wrapping a constituent ~ around a constituent B so that the head is to the left (or right) of B and the rest of ~ is to the right (or left) of ~. Pollard has shown that this provides for an elegant syntactic treatment of the Dutch facts. I mention his work because I want to return to make a point about it in the immediately following section.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. TIME COMPLEXITY OF RECOGNITION </SectionTitle> <Paragraph position="0"> The time complexity of the recognition problem (TCR) for human languages is like WGC questions in being decried as irrelevant by some linguists, but again, it is hardly one that serious computational approaches can legitimately ignore. Gazdar (1981) has recently reminded the linguistic community of this, and has been answered at great length by Berwick and Weinberg (1982). Gazdar noted that if transformational grammars (TG's) were stripped of all their transformations, they became CFLinducing, which meant that the series of works showing CFL's to have sub-cubic recognition times became relevant to them. gerwick and Weinberg's paper represents a concerted eff6rt to discredit any such suggestion by insisting that (a) it isn't only the CFL's that have low polynomial recognition time results, and (b) it isn't clear that any asymptotic recognition time results have practical implications for human language use (or for computer modelling of it).</Paragraph> <Paragraph position="1"> Both points should be quite uncontroversial, of course, and it is only by dint of inaccurate attribution that Berwick and Weinberg manage to suggest that Gazdar denies them. However, the two points simply do not add up to a reason for not being concerned with TCR results. Perfectly straightforward considerations of theoretical restrictiveness dictate that if the languages recognizable in polynomial time are a proper subset of those recognizable in exponential time (or whatever), it is desirable to explore the hypothesis that the human languages fall within the former class rather than just the latter.</Paragraph> <Paragraph position="2"> Certainly, it is not just CFL's that have been shown to be efficiently recognizable in deterministic time on a Turing machine. Not only every context-free grammar but also every context-sensitive grammar that can actually be exhibited generates a language that can be recognized in deterministic linear time on a two-tape Turing machine. It is certainly not the case that all the context-sensitive languages are linearly recognizable; it can be shown (in a highly indirect way) that there must be some that are not. But all the examples ever constructed generate linearly recognizable languages. And it is still unknown whether there are CFL's not linearly recognizable.</Paragraph> <Paragraph position="3"> It is therefore not at all necessary that a human language should be a CFL in order to be efficiently recognizable. But the claims about recognizability of CFL's do not stop at saying that by good fortune there happens to be a fast recognition algorithm for each member of the class of CFL's.</Paragraph> <Paragraph position="4"> The claim, rather, is that there is ~ single, universal algorithm that works for every member of the class and has a low deterministic polynomial time complexity. That is what cannot be said of the context-sensitive languages.</Paragraph> <Paragraph position="5"> Nonetheless, there are well-understood classes of gr~-m-rs and automata for which it can be said.</Paragraph> <Paragraph position="6"> For example, Pollard, in the course of the work mentioned above, has shown that if one or other of left head wrap and right head wrap is permitted in the theory of generalized context-free grammar, recognizability in deterministic time ~5 is guaranteed, and if both left head wrap and right head wrap are allowed in gr---.-rs (with individual gr-----rs free to have either or both), then in the general case the upper bound for recognition time is ~7o These are, while not sub-cubic, still low deterministic polynomial time bounds. Pollard's system contrasts in this regard with the lexical-functional gra~ar advocated by Bresnan etal., which is currently conjectured to have an NP-complete recognition problem.</Paragraph> <Paragraph position="7"> I remain cautious about welcoming the move that Pollard makes because as yet his non-CFL-inducing syntactic theory does not provide an explanation for the fact that human languages always seem to turn out to be CFL's. It should be pointed out, however, that it is true of every grammatical theory that not every grammar defined as possible is held to be likely to turn up in practice, so it is not inconceivable that the gr-----rs of human languages might fall within the CFL-inducing proper subset of Pollard-style head gra=mars.</Paragraph> <Paragraph position="8"> Of course, another possibility is that it might turn out that some human language ultimately provides evidence of non-CY-ness, and thus of a need for mechanisms at least as powerful as Pollard's.</Paragraph> <Paragraph position="9"> Bresman etal. mention at the end of their paper on Dutch a set of potential candidates: the so called &quot;free word order&quot; or &quot;nonconfigurational&quot; languages, particularly Australian languages like Dyirbal and Walbiri, which can allegedly distribute elements of a phrase at random throughout a sentence in almost any order. I have certain doubts about the interpretation of the empirical material on these languages, but I shall not pursue chat here. I want instead to show that, counter to the naive intuition that wild word order would necessarily lead to gross parsing complexity, even rampantly free word order in a language does not necessarily indicate a parsing problem that exhibits itself in TCR terms.</Paragraph> <Paragraph position="10"> Let us call transposition of adjacent terminal symbols scrambling, and let us refer to the closure of a language ~ under scrambling as the scramble of 2- The scramble of a CFL (even a regular one) can he non-CF. For example, the scramble of the regular language (abe)* is non-CF, although (abc)* itself is regular. (Of course, the scramble of a CFL is not always non-CF. The scramble of a*b*c* is (~, b, !)*, and both are regular, hence CF.) Suppose for the sake of discussion that there is a human language that is closed under scrambling (or has an appropriately extractable infinite subset that is). The example just cited, the scramble of (abc)*, is a fairly clear case of the sort of thing that might be modeled in a human language that was closed under scrambling. Imagine, for example, the case of a language in which each transitive clause had a verb (~), a nominative noun phrase (~), and an accusative noun phrase (~), and free word order permitted the ~, b, and ~ from any number of clauses to occur interspersed in any order throughout the sentence. If we denote the number of ~'s in a string Z by Nx(Z), we can say ~nat the scramble of (abc)* is (8).</Paragraph> <Paragraph position="11"> (8){~J~ is in (~, b, &)* and N_a(~) = N b(~) = N=(~)} Attention was first drawn to this sort of language by Bach (1981), and I shall therefore call it a Bach lan~uaze. What TCR properties does a Bach language have? The one in (8), at least, can be shown to be recognizable in linear time. The proof is rather trivial, since it is just a corollary of a previously known result. Cook (1971) shows that any language that is recognized by a two-way deterministic pushdown stack automaton (2DPDA) is recognizable in linear time on a Turing machine. In the Appendix, I give an informal description of a 2DPDA that will recognize the language in (81. Given this, the proof that (8) is linearly recognizable is trivial.</Paragraph> <Paragraph position="12"> * Thus even if my WGC and SGC conjectures were falsified by discoveries about free word order languages (which I consider that they have not been), there would still be no ground for tolerating theories of grammar and parsing that fail to impose a linear time bound on recognition. And recent work of Shieber (1983b) shows that there are interesting avenues in natural language parsing to be explored using deterministic context-free parsers that do work in linear time.</Paragraph> <Paragraph position="13"> In the light of the above remarks, some of the points made by Berwick and Weinberg look rather peculiar. For example, Berwick and Weinberg argue at length that things are really so complicated in practical implementations that a cubic bound on recognition time might not make much difference; for short sentences a theory that only guarantees an exponential time bound might do just as well.</Paragraph> <Paragraph position="14"> This is, to begin with, a very odd response to be made by defenders of TG when confronted by a theoretically restrictive claim. If someone made the theoretical claim that some problem had the time complexity of the Travelling Salesman problem, and was met by the response that real-life travelling salesmen do not visit very many cities before returning to head office, I think theoretical computer scientists would have a right to be amused.</Paragraph> <Paragraph position="15"> Likewise, it is funny to see practical implementation considerations brought to bear in defending TG against the phrase structure backlash, when (a) no formalized version of modern TG exists, let alone being available for implementation, and (b) large phrase structure grammars.are being implemented on computers and shown to run very fast (see e.g. Slocum 1983, who reports an all-paths, bottom-up parser actually running in linear time using a CF-PSG with 400 rules and i0,000 lexical entries).</Paragraph> <Paragraph position="16"> Berwick and Weinberg seem to imply that data permitting a comparison of CF-PSG with TG are available. This is quite untrue, as far as I know.</Paragraph> <Paragraph position="17"> I therefore find it nothing short of astonishing to find Chomsky (1981, 234), taking a very similar position, affirming that because the size of the grammar LS a constant factor in TCR calculations, and possibly a large one, The real empirical content of existing results.., may well be that grammars are preferred if they are not too complex in their rule structure. If parsability is a factor in language evolution, we would expect it to prefer &quot;short grammars'---such as transformational gr--~-rs based on the projection principle or the binding theory...</Paragraph> <Paragraph position="18"> TG's based on the &quot;projection principle&quot; and the '~inding theory&quot; have yet to be formulated with sufficient explicitness for it to be determined whether they have a rule structure at all, let alone a simple one, and the existence of parsing algorithms for them, of any sort whatever, has not been demonstrated.</Paragraph> <Paragraph position="19"> The real reason to reject a cubic recognition-time guarantee as a goal to be attained by syntactic theory construction is not that the quest is pointless, but rather that it is not nearly ambitious enough a goal. Anyone who settles for a cubic TC~ bound may be settling for a theory a lot laxer than it could be. (This accusation would be levellable equally at TG, lexical-functional grammar, Pollard's generalized context-free gr-----r, and generalized phrase structure gr~--,-r as currently conceived.) Closer to what is called for would be a theory that defines human gr-,,,,-rs as some proper subset of the ECPO CF-FSG's that generate infinite, uonprofligate, linear-time recognizable languages. Just as the description of ALGOL-60 in BNF formalism had a galvanizing effect on theoretical computer science (Ginsburg 1980, 67), precise specification of a theory of this sort might sharpen quite considerably our view of the computational issues involved in natural language processing. And it would simultaneously be of considerable linguistic interest, at least for those who accept that we need a sharper theory of natural language than the vaguely-outlined decorative notations for Turing machines that are so often taken for theories in linguistics.</Paragraph> </Section> class="xml-element"></Paper>