XML Viewer - j95-4003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/j95-4003_abstr.xml
Size: 16,411 bytes
Last Modified: 2025-10-06 13:48:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-4003">
  <Title>Modularity and Information Content Classes in Principle-based Parsing</Title>
  <Section position="2" start_page="0" end_page="518" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In the development of parsers for syntactic analysis, it is standard practice to posit two working levels: the grammar, on the one hand, and the algorithms, which produce the analysis of the sentence by using the grammar as the source of syntactic knowledge, on the other hand. Usually the grammar is derived directly from the work of theoretical linguists. The interest in building a parser that is grounded in a linguistic theory as closely as possible rests on two sets of reasons: first, theories are developed to account for empirical facts about language in a concise way--they seek general, abstract, language-independent explanations for linguistic phenomena; second, current linguistic theories are supposed to be models of humans' knowledge of language.</Paragraph>
    <Paragraph position="1"> Parsers that can use grammars directly are more likely to have wide coverage, and to be valid for many languages; they also constitute the most economical model of the human ability to put knowledge of language to use. Therefore, postulating a direct correspondence between the parser and theories of grammar is, methodologically, the strongest position, and is usually assumed as a starting point of investigation. However, experiments with parsers that are tightly related to linguistic principles have often been a disappointment, largely because these parsers are inefficient.</Paragraph>
    <Paragraph position="2"> Inefficiency is a problem that cannot simply be cast aside. Computationally, it renders the use of linguistic theories impractical, and, empirically, it clashes with the * D6partement de Linguistique G6n6rale, Universit6 de Gen6ve, 2 rue de Candolle, 1204 Gen6ve, Switzerland (~) 1995 Association for Computational Linguistics Computational Linguistics Volume 21, Number 4 observation that humans make use of their knowledge of language very effectively.</Paragraph>
    <Paragraph position="3"> In this paper, I investigate the computational problem related to the tension between building linguistically based parsers and building efficient ones, which, I argue, derives from the particular forms linguistic theories have taken recently. In particular, I explore the issue of what is a good parsing technique to apply to principle-based theories of grammar. I take Government-Binding (GB) theory (Chomsky 1986a,b; Rizzi 1990) to be a suitable illustration of such theories, and also to show in all clarity the problems that might arise. I differ from other investigations on the import of principle-based parsing in not drawing on cognitive issues or psycholinguistic results to justify my assumptions. Indeed, part of the spirit of this work is to explore how far one can go in advocating principle-based parsing, in the absence of motivations given by cognitive modelling.</Paragraph>
    <Section position="1" start_page="0" end_page="517" type="sub_section">
      <SectionTitle>
1.1 The Problem
</SectionTitle>
      <Paragraph position="0"> When generative grammatical theory in the '70s talked about &amp;quot;dative shift,&amp;quot; &amp;quot;topicalization,&amp;quot; &amp;quot;passive,&amp;quot; it meant that each of these constructions was captured in the grammar by a specific rule. Consequently, rules were not only construction-specific, but also language-specific (French, Italian and Spanish, for instance, have no &amp;quot;dative shift&amp;quot;). The conceptual development of the '80s, in many frameworks, consists in having identified the unifying principles of many of these construction-specific rules. For example, according to GB theory, the same set of principles are at work in the &amp;quot;raising&amp;quot; construction, (la) and in passive, (lb). The principles are X theory, the Theta Criterion, and the Case Filter. In both cases, the relation between the underlying position and the surface string is expressed by chains. Chains consist of the word that undergoes  movement and all the positions this word occupies in the course of a derivation. In (1) the chains are (John, t) and (The children, t).</Paragraph>
      <Paragraph position="1"> (1) a. John seems \[ip t to like Bill \] b. The children are loved t by John.</Paragraph>
      <Paragraph position="2">  The advantage of this treatment is that common properties of language, here certain classes of verbs, are expressed by common principles.</Paragraph>
      <Paragraph position="3"> This search for generality is not unique to GB theory. Feature-structure formalisms also use rule schemata to capture similarities among grammar rules. Moreover, reentrancy as a notational device to express common features seeks the same type of representational economy that is expressed by the use of &amp;quot;traces&amp;quot; in GB theory. It is desirable for a syntactic analyser to make use of linguistic theories to obtain, at least in principle, the same empirical coverage as the theory, and to capture the same generalizations. Moreover, a parser that makes direct use of a linguistic theory is more explanatory. A guiding belief for the development of the generative framework is that a theory that can derive its descriptions from the interaction of a small set of general principles is more explanatory than a theory in which descriptive adequacy is obtained by the interaction of a greater number of more particular, specific principles (Chomsky 1965). This is because the former theory is smaller. Thus, each principle can generate a set the encoding of which would require a much larger number of bits than the bits needed to encode the principle itself. The classic example is the use of natural classes of distinctive features in phonology, in order to compact several rules into one. A modular theory that encodes universal principles has obtained a greater degree of * succinctness than a nonmodular theory, and is considered more explanatory. Since it  Paola Merlo Modularity and Information Content Classes is desirable for the parser to maintain the level of explanatory power of the theory, it must maintain its modularity.</Paragraph>
      <Paragraph position="4"> It has also been argued (Berwick 1991) that the current shift from rules to a modular system of principles has computational advantages. Principle-based grammars engender compactness: Given a set of principles, P1, P2,..., Pn, the principles are stored separately and their interaction is computed on-line; the multiplicative interaction of the principles, P1 x P2 x ... x Pn does not need to be stored. Hence, the size of the grammar is the sum of the sizes of its components: IGI = P~ + P2 + &amp;quot;'&amp;quot; 4- Pn. Consequently, a parser based on such a grammar is compact, and, theoretically, easier to debug, maintain and update. 1 In practice, however, designing and implementing faithful and efficient parsers is not a simple matter.</Paragraph>
      <Paragraph position="5"> Defining &amp;quot;faithfulness&amp;quot; to a linguistic theory is not a trivial task, as a direct relation between the grammar and the parser is not the only option (see Bresnan 1978; Berwick and Weinberg 1984; van de Koot 1990, and references therein). In general, it is not necessary for a parser to implement the principles of the grammar directly. Rather, a covering grammar could be used, more suited to the purpose of parsing. However, it is important that such covering be done in such a way that accidental properties of a particular grammar, which would not hold under counterfactual changes, are not used. Otherwise, the covering grammar would not be sufficiently general.</Paragraph>
      <Paragraph position="6"> A faithful implementation is particularly difficult in the GB framework, as GB principles are informally expressed as English statements, and can take a variety of forms. For example, X theory (a condition on graphs), the Case Filter (an output filter on strings), and the 0 criterion (a bijection relation on predicates and arguments) all fall under the label of principles. Attempts have been made to formalize GB principles to a set of axioms (Stabler 1992).</Paragraph>
      <Paragraph position="7"> One possible, extreme interpretation of the direct use of principles is an approach where no grammar compilation is allowed (Abney 1989; Frank 1992; Crocker 1992). 2 This approach is appealing because it reflects, intuitively, the idea of using the grammar as a set of axioms and reduces parsing to a deduction process. This is very much in the spirit of the current shift in linguistic theories from construction-dependent rules to general principles, and it separates quite clearly the grammar from the parsing algorithm.</Paragraph>
      <Paragraph position="8"> However, it is not obvious that this approach is efficient. Partial evaluation and variable substitution can increase performance, but, as usual, a space/time trade-off will ensue. Excess of partial evaluation off-line increases the size of the grammar, which might, in turn, slow down the parse. Experimentation with different kinds of algorithms suggests that some amount of compilation of the principles might be necessary to alleviate the problem of inefficiency, but that too much compilation slows down the parser again.</Paragraph>
      <Paragraph position="9"> 1 Berwick (1982, 403ff.) shows that the size of a cascade of distinct principles (viewed as machines) is the size of its subparts, while if these same principles are collapsed, the size of the entire system grows mulfiplicatively. Modularity corresponds to maximal succinctness when all independent principles are stated separately. Independent principles are, intuitively, principles that can be computed independently of each other, and therefore whose interactions are all possible. Barton et al. (1987) and Berwick (1990) attempt to formalize the concept of independence as separability, assuming that the topology of a principle-based theory like GB can be mapped onto a planar graph. In fact, if independent modules are separable modules, there is little reason to think that GB is modular, as it corresponds to a highly connected graph.</Paragraph>
      <Paragraph position="10"> 2 By compilation, here and below, I mean off-line computation of some general property of the grammar, for example the off-line computation of the interaction of principles, using partial evaluation or variable substitution.</Paragraph>
    </Section>
    <Section position="2" start_page="517" end_page="517" type="sub_section">
      <SectionTitle>
1.2 On-line Computation is Inefficient
</SectionTitle>
      <Paragraph position="0"> Several researchers note that principle-based parsers allowing no grammar precompilation are inefficient. Firstly, Johnson (1989), Stabler (1990), and van de Koot (1991) note that the computation of a multi-level theory without any precompilation might not even terminate. Secondly, experimental results show that an entirely deductive approach is inefficient. Kashket (1991) discusses a principle-based parser, where no grammar precompilation is performed, and which parses English and Warlpiri using a parameterized theory of grammar. The parsing algorithm is a generate-and-test, backtracking regime. Kashket (1991) reports, for instance, that a 5-word sentence in Warlpiri (which can have 5! analyses, given the free word order of the language) can take up to 40 minutes to parse. He concludes that, although no mathematical analysis for the algorithm is available, the complexity appears to increase exponentially with the input size.</Paragraph>
      <Paragraph position="1"> Fong (1991, 123) discusses a parsing algorithm. He shows that an initial version of the parser, where the phrase structure rules were expressed as a DCG and interpreted on-line, spent 80% of the total parsing time building structure. In a later version, where rules were compiled into an LR(1) table, structure-building constituted 20% of the total parsing time. This same parser includes a module for the computation of long distance dependencies, which works by generate-and-test. Fong finds that this parsing approach is also inefficient.</Paragraph>
      <Paragraph position="2"> Dorr (1987) notices similar effects in a parser that uses an algorithm more parallel in spirit (Earley 1970). Dorr notes that a limited amount of precompilation of the principles speeds up the parse, otherwise too many incorrect alternatives are carried along before being eliminated. For example, in her design, X theory and the other principles are coroutined. She finds that precompiling the principles that license empty categories with the phrase structure rules reduces considerably the number of structures that are submitted to the filtering action of the other principles, and thus speeds up the parse.</Paragraph>
      <Paragraph position="3"> In all these cases, the source of inefficiency stems from the principle-based design.</Paragraph>
      <Paragraph position="4"> Because each principle is formulated to be as general as possible, the &amp;quot;logical&amp;quot; abstraction of each principle from the others causes a lot of overgeneration of structure and, consequently, a very large search space.</Paragraph>
    </Section>
    <Section position="3" start_page="517" end_page="518" type="sub_section">
      <SectionTitle>
1.3 Too Much Precompilation is Inefficient
</SectionTitle>
      <Paragraph position="0"> Simple precompilation is not a solution to the inefficiency of principle-based parsing, however. Experimentation with different amounts of precompilation shows that off-line precompilation speeds up parsing only up to a certain point, and that too much precompilation slows down the parser again.</Paragraph>
      <Paragraph position="1"> The logic of why this happens is clear. The complexity of a parsing algorithm is a composite function of the length of the input and the size of the grammar. For the kind of input lengths that are relevant for natural language, the size of the grammar easily becomes the predominant factor. If principles are precompiled in the form of grammar rules, the size of the grammar increases.</Paragraph>
      <Paragraph position="2"> As Tomita (1986) points out, input length does not cause a noticeable increase in running time up to 35 to 40 input tokens. For sentences of this length, grammar size becomes a relevant factor for grammars that contain more than approximately 220 rules, in his algorithm (an LR parser with parallel stacks). Both Dorr (1987) and Tomita (1986) show experimental results confirming that there is a critical point beyond which the parser is slowed down by the increasing size of the grammar. In the Generalized Phrase Structure Grammar (GPSG) formalism (Gazdar et al. 1985), similar experiments have been performed, which confirm this result. Parsers for GPSG are particularly interesting, because they use a formalism that expresses many grammatical generalizations in  Paola Merlo Modularity and Information Content Classes a uniform format. Therefore, GPSG is, in principle, more amenable to being processed by known parsing techniques. Thompson (1982) finds that expanding metarules, rather than computing them on-line, is advantageous, but that instantiating the variables in the expanded rules is not. Phillips and Thompson (1985) also remark that compiling out a grammar of twenty-nine phrase-structure rules and four metarules is equivalent to &amp;quot;several tens of millions of context-free rules.&amp;quot; Phillips (1992) proposes a modification to GPSG that makes it easier to parse, by using propagation rules, but still notes that variables should not be expanded.</Paragraph>
      <Paragraph position="3"> In conclusion, the lesson from experimentation is that parsing done totally on-line is inefficient, but that compilation is not always a solution. A parser that uses linguistic principles directly must fulfill apparently contradictory demands: for the parser to be linguistically valid it must use the grammar directly, while a limited amount of off-line precompilation might make the parser more efficient. 3 In the next section, I propose and discuss a solution to this problem that builds on other approaches and relates the parser to the grammar in a principled way.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML