File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1018_intro.xml
Size: 4,523 bytes
Last Modified: 2025-10-06 14:01:11
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1018"> <Title>Constraints on strong generative power</Title> <Section position="2" start_page="0" end_page="6" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> &quot;How much strong generative power can be squeezed out of a formal system without increasing its weak generative power?&quot; This question, posed by Joshi (2000), is important for both linguistic description and natural language processing. The extension of tree adjoining grammar (TAG) to tree-local multicomponent TAG (Joshi, 1987), or the extension of context free grammar (CFG) to tree insertion grammar (Schabes and Waters, 1993) or regular form TAG (Rogers, 1994) can be seen as steps toward answering this question. But this question is di cult to answer with much finality unless we pin its terms down more precisely.</Paragraph> <Paragraph position="1"> First, what is meant by strong generative power? In the standard definition (Chomsky, 1965) a grammar G weakly generates a set of sentences L(G) and strongly generates a set of structural descriptions (G); the strong generative capacity of a formalism F is then f (G) j F provides Gg. There is some vagueness in the literature, however, over what structural descriptions are and how they can reasonably be compared across theories (Miller (1999) gives a good synopsis).</Paragraph> <Paragraph position="2"> The approach that Vijay-Shanker et al. (1987) and Weir (1988) take, elaborated on by Becker et al. (1992), is to identify a very general class of formalisms, which they call linear context-free rewriting systems (CFRSs), and define for this class a large space of structural descriptions which serves as a common ground in which the strong generative capacities of these formalisms can be compared. Similarly, if we want to talk about squeezing strong generative power out of a formal system, we need to do so in the context of some larger space of structural descriptions.</Paragraph> <Paragraph position="3"> Second, whyis preservation ofweak generative power important? If we interpret this constraint to the letter, it is almost vacuous. For example, the class of all tree adjoining grammars which generate context-free languages includes the grammar shown in Figure 1a (which generates the languagefa;bg null ). We can also add the tree shown in Figure 1b without increasing the grammar's weak generative capacity; indeed, we can add any trees we please, provided they yield only asandbs. Intuitively, the constraint of weak context-freeness has little force.</Paragraph> <Paragraph position="4"> This intuition is verified if we consider that weak context-freeness is desirable for computational e ciency. Though a weakly context-free TAG might be recognizable in cubic time (if we knowtheequivalent CFG),itneed not beparsable in cubic time--that is, given a string, to compute all its possible structural descriptions will take O(n ) time in general. If we are interested in computing structural descriptions from strings, then we need a tighter constraint than preservation of weak generative power.</Paragraph> <Paragraph position="5"> In Section 3 below we examine some restrictions on tree adjoining grammar which are weakly context-free, and observe that their parsers all work in the same way: though given a TAG G, they implicitly parse using a CFG G which derives the same strings as G, but also their corresponding structural descriptions under G,insuch a way that preserves the dynamic-programming structure of the parsing algorithm.</Paragraph> <Paragraph position="6"> Based on this observation, we replace the constraint of preservation of weak generative power with a constraint of simulability: essentially, a grammar G simulates another grammar G if it generates the same strings that G does, as well as their corresponding structural descriptions under G (see Figure 2).</Paragraph> <Paragraph position="7"> So then, within the class of context-free rewriting systems, how does this constraint of simulability limit strong generative power? In Section 4.1 we define a formalism called multicomponent multifoot TAG (MMTAG) which, when restricted to a regular form, characterizes precisely those CFRSs which are simulable by a CFG. Thus, in the sense we have set forth, this formalism can be said to squeeze as much strong generative power out of CFG as is possible. Finally, we generalize this result to formalisms beyond CFG.</Paragraph> </Section> class="xml-element"></Paper>