File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/69/c69-2001_metho.xml
Size: 12,061 bytes
Last Modified: 2025-10-06 14:11:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-2001"> <Title>ON THE PRESERVATION OF CONTEXT-FREE LANGUAGES IN A LEVEL-BASED SYSTEM</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> ON THE PRESERVATION OF CONTEXT-FREE LANGUAGES IN A LEVEL-BASED SYSTEM </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In this paper, a recently proposed level-oriented model for machine analysis and synthesis of natural languages is investigateddeg Claims concerning the preservation of context-free (CF) languages in such a system are examined and shown to be unjustified.</Paragraph> <Paragraph position="1"> Furthermorep it is ~hown that even a revised version of the mode1 (incorporating some recent discoveries) will not be CF-preservlngo Finallyp some theoretical implications of these findings are explored: in partlcular~ claims of greater naturalness and the question of recurslvltyo (r) Over the yearsp and especlally during the pr~waratlon of this paper t I have had the pleasure of many enlightening discussions with Stani%y Peters, for which I am glad to thank hlm.</Paragraph> <Paragraph position="2"> Mey la Dans ce travail, on envisage un module r~cent de synth~se et d'analyse automatiques de l angues naturelles, oriente vers la notion de &quot;nlveau&quot; linguistlque. On examine un postulat selon lequel !es langages context-free seraient stables dans un tel syst~me~ Cette notion s'avere Incorrecte~ De plus~ on montre comment m~me une ~erslon modifi~e du module, incorporan~ certalnes d~couvertes r~centes~ ne conserve pas le caract~re context-free de ces langages. Enfin, on explore l'importance theorique de ces resultats~ en particulier, on examine l'avantage suppose d'un modele dit plus &quot;naturel&quot;, et la question de la recurslvite des langues naturelles. null * Pendant les ann~es, et surtout pendant la r~daction de ce travail, J'ai pu profiter de nombreuses conversations illuminantes avec Stanley Peters, puur lesquelles Je tiens a lul exprimer ma reconnaissance.</Paragraph> <Paragraph position="3"> Mey 2</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. Level-Oriented Systems in Computational Linguistics </SectionTitle> <Paragraph position="0"> A new model for computational linguistic perform~,~ has recently been proposed by the Czechoslovak group of workers at Charles Unlverslty~ Prague~ under the direction of Po Sgall (lp 2). This model has Slgnificant theoretlcal impllcations, since it offers an alternative to transformationally based solutions of the problems encountered in automatic syntactic analysis t and consequently~ to the transformatlona~ model itselfo In particular, the new model claims to compete favorably with the transformational one ~> generative power and structural characterlzatio~; .J sentences.</Paragraph> <Paragraph position="1"> Central to this model is the notion of a &quot;multi-level&quot; or &quot;stratified&quot; grammar. The generation of a sentence at the highest (&quot;deepest&quot;) level of the grammar proceeds by a set of context-free (CF) rules |the output of these rules is then transduced to lower levels by a series of pushdown store automata. The output of the final transduction is some &quot;surface&quot; representation of the sentence to be generated.</Paragraph> <Paragraph position="3"> The whole system desPSribed in the preceding section is said to be &quot;weakly equivalent to a CF phrase structure grammar&quot; (1:148, 221). This assertion is claimed to derive from theorems formulated by N. Chomsky and Ro J. Evey (3; 4), which maintain that the output of a pushdown store transducer (pdt) is equivalent to the set of CF languages (i:109; 2:2.2.5).</Paragraph> <Paragraph position="4"> However, the original theorems about this equivalence concerned pushdown store acceptors (pda), not transducers (pdt as defined by Ginsburg (5:102)); for pdt (as used by Sgall and his group) t CF-preservation is known not to obtain in general (5:104).</Paragraph> <Paragraph position="5"> The condition under which pdt will preserve CF languages is stated by Ginsburg in Th. 3.5.1. (5:104)'.</Paragraph> <Paragraph position="6"> given a pda M t a pdt S, L - T(M), S(L) is CF Iff M and S are associated, i.e. S is obtained by adding outputs to M, the pda that accepts L.</Paragraph> <Paragraph position="7"> For the case of the Czechoslovak &quot;battery&quot; of transducers, this condition could actually be fulfilled, although the authors never say so, explicitlydeg I am Mey 4 referring to their extra condition on the system, as formulated in &quot;the existence of inverse automata for the single levels of the pushdown part \[of the grammar, JM\] &quot; (1:146). Hence the impreciseness of the Czechoslovak proposal may amount to no more than a matter of incomplete formulation.</Paragraph> <Paragraph position="8"> In a recent article, however,- Ginsburg and Rose (6) have shown that the earlier theorem on which they based CF-preservatlon conditions for pdt is falsedeg (And~ by the same tokenp so are the original theorem of Evey's (4:2deg6deg6) and an earller theorem by Ginsburg and Rose (7:3.2)). According to the 1968 revised version of the latter ~heorem by Ginsburg and Rose (6:3(r) 2~)~ CF~p~eserva~ion is made dependent upon an addltlona~ condition on the pdt~ n~m~y~ that o~ ~est~Icting its ou~pu~ %o '~hese strings that are produced by the device when it ends up in an accepting (oz final) state, In all other cases~ the language generated by the pdt will not be equivalent to the set of CF languages, but simply constitute a ~ecurslvely enumerable se%~ Mey 5 3. Practical Consequences for the Prague System For the case under consideration, the new insight referred to in the preceding section has two slgnlflcant consequences: first, the worries of the Czechoslovak group to ensure CF preservation may well have been in vainf unless the new condition can be incorporated into their systemo Otherwiset a device that is practically equlvalent to a Turlng machine is not very exciting to work with in computatlonal linguistic theory or its implementation. null Second~ one of the advantages inherent in the use of CF-preservlng transducers is the guaranteed existence of a whole bevy of working recognition routines (e.g., the Cocke-Roblnson algorithm, the parser developed by Kay I or the predictive analyzer by Kuno, etc.) This advantage becomes illusory if the pdt battery produces a recursively enumerable languaget ioe. one that cannot be guaranteed to be recognized by a CFrecognition routine.</Paragraph> <Paragraph position="9"> If we think of the Czechoslovak system as part of a Mey 6 machine translation proposal, where the route from source to &quot;Interllngua&quot; consists of essentially the same flow (but in the opposite direction) as that from &quot;interllngua&quot; to target language, it appears that the consequences of an incorporation of the Ginsburg-Rose adjustment are far-reaching. Such an incorporation could take place in two ways: either one could check the output of a particular (~-th) device, transducing from a higher (~-th) to a lower (~ - lth) level, or from a lower (~-th) to a higher (~ / lth) level~ to see whether or not this output corresponds to an accepting state (where &quot;higher&quot; and'~oweE&quot; are understood to refer to deeper and more superficial structures respectively); oft alternatively, a built-ln checking device could prevent output from being generated unless the transducer reached an accepting state after reading the input string. So far t Sgall and his group have not suggested ways to handle this problem.</Paragraph> <Paragraph position="10"> Quite another matter is that the Prague group's pPSoposal to let the output of the ~-th de,ice be * proper s~bset of the input of the ~ + Ith~ ~ - Ith</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Mey 7 </SectionTitle> <Paragraph position="0"> transducer respectively, does not seem to be fruitful, or even feasibleo Naturallyp the first question that arises is: what about the remaining input, where does it all come from? It is certainly true, as the authors remark (2:2o2o5)~ that &quot;the output language of the whole description is not necessarily context-free&quot;o AS shown above, it simply never is under the given conditions. Hence the reason given in the rest of the quote is trivial: &quot;since deg.. it is only a proper subaet of the context-free languages of the last pushdown transducer&quot; (ibld.)o While it is always possible to tame a prope~ subset of a CF language and obtain a language tha% i~ not C2~ or maybe not even reguiar~ that cleaxly i~ not the point here. The authors in~nd their output language to be CF~ for reasons llk~ the ones mentioned abovedeg As long as it can be shown (as I have done here) that the system in no case is (even weakly) equivalent to a CF grammar~ the question of the restrictions on the input to the subsequent transducers is t of course, irrelevant to the CF character of the system as a whole.</Paragraph> <Paragraph position="1"> Mey 8 4. Some theoretical implications The proponents of the Prague model have repeatedly asserted that their system has certain advantages over other models of linguistic performance and, in particular t that their grammar is superior (or at least equivalent) to transformational grammar.</Paragraph> <Paragraph position="2"> The claim that the level-based model is superior to others because of easy CF-recognitlon has been. disproved in the preceding sections. Another claim, that of greater naturalness inherent in the level-orlented model, is also often made (sometimes implicitly by reference to the model's stance in tlme-honored linguistic tradition). It should be observed t however~ that such a claim does not concern the formal character of any system. As Chomsky has pointed out (in his discussion of Fillmore's case theory (8:14-16)) I it is vacuous to discuss different formal systems in terms of which is the more &quot;direct&quot; representation of natural language; unless a formal system is interpreted 0 it slmply cannot be compared to another one for &quot;~Lrectneee&quot; of expression.</Paragraph> <Paragraph position="3"> But how about transformational grammar itself with Mey 9 respect to sentence recognition? It has been kno~ for a long time that there is no way of establlshing a universal automatic recognition procedure for a tr~sformatlonal grammar that is not in some ways restricteddeg This was precisely the Prague group's motivation for proposing their system as a (superior) alternatlve to TGo Now that their clalms have been de-substantlated on theoretlcal grounds, it may seem like a meagre consolation to the Prague ~oup that TG itself is in the same boat, theoretically speaking.</Paragraph> <Paragraph position="4"> In an important recent study~ Peters and Ritchle have demonstrated that a context-sensltlve (CS) based transformational grammar~ unless restricted in some respecta~ generates a recurslvely enumerable language~ and conversely, for any recurslvely enumerable language there is a CS based TG that will generate it (9:4.1).</Paragraph> <Paragraph position="5"> As the authors point out (ibido:33), it is imperative to set out and find conditions under which a TG will generate only recurslve languages. One such condition would be to restrict the base of a TG to be CF; however t t-nls will still not guarantee recurslvlty (9:4.2)deg This is preclsely the problem that the Czechoslovak I workers will have to solve in order to make thelr system vlable and to valldate their claims.</Paragraph> <Paragraph position="6"> Mey 10</Paragraph> </Section> </Section> class="xml-element"></Paper>