File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/p91-1016_intro.xml

Size: 6,185 bytes

Last Modified: 2025-10-06 14:05:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1016">
  <Title>The Acquisition and Application of Context Sensitive Grammar for English</Title>
  <Section position="2" start_page="0" end_page="122" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Although many papers report natural language processing systems based in part on syntactic analysis, their authors typically do not emphasize the complexity of the parsing and grammar acquisition processes that were involved. The casual reader might suppose that parsing is a well understood, minor aspect in such research. In fact, parsers for natural language are generally very complicated programs with complexity at best of O(n 3) where n is the number of words in a sentence. The grammars they usually use are technically, &amp;quot;augmented context free&amp;quot; where the simplicity of the context-free form is augmented by feature tests, transformations, and occasionally arbitrary programs. The combination of even an efficient parser with such intricate grammars may greatly increase the computational complexity of the system \[Tomita 1985\]. It is extremely difficult to write such grammars and they must frequently be revised to maintain internal consistency when applied to new texts.</Paragraph>
    <Paragraph position="1"> In this paper we present an alternative approach using context-sensitive grammar to enable preference parsing and rapid acquisition of CSG from example parsings of newspaper stories.</Paragraph>
    <Paragraph position="2"> Chomsky\[1957\] defined a hierarchy of grammars including context-free and context-sensitive ones. For natural language a grammar distinguishes terminal, single element constituents such as parts of speech from non-terminals which are phrase-names such as NP, VP, AD-VPH, or SNT 2 signifying multiple constituents.</Paragraph>
    <Paragraph position="3"> 1 This work was partially supported by the Army Research Office under contract DAAG29-84-K-0060.</Paragraph>
    <Paragraph position="4"> ~NounPhrase, VerbPhrase, AdverbialPhrase, Sentence A context-free grammar production is characterized as a rewrite rule where a non-terminal element as a left-side is rewritten as multiple symbols on the right.</Paragraph>
    <Paragraph position="5"> Snt -* NP + VP Such rules may be augmented by constraints to limit their application to relevant contexts.</Paragraph>
    <Paragraph position="7"> To the right of the slash mark, the constraints are applied by an interpretive program and even arbitrary code may be included; in this case the interpreter would recognize that the NP must be animate and there must be agreement in number between the NP and the VP. Since this is such a flexible and expressive approach, its many variations have found much use in application to natural language applications and there is a broad literature on Augmented Phrase Structure Grammar \[Gazdar et. al. 1985\], Unification Grammars of various types \[Shieber 1986\], and Augmented Transition Networks \[Allen, J. 1987, Simmoils 1984\].</Paragraph>
    <Paragraph position="8"> In context-sensitive grammars, the productions are restricted to rewrite rules of the form, uXv ---* uYv where u and v are context strings of terminals or nonterminals, and X is a non-terminal and Y is a non-empty string . That is, the symbol X may be rewritten as as the string Y in the context u-..v. More generally, the right-hand side of a context-sensitive rule must contain at least as many symbols as the left-hand side.</Paragraph>
    <Paragraph position="9"> Excepting Joshi's Tree Adjoining Grammars which are shown to be &amp;quot;mildly context-sensitive,&amp;quot; \[Joshi 1987\] context-sensitive grammars found little or no use among natural language processing (NLP) researchers until the reoccurrance of interest in Neural Network computation. One of the first suggestions of their potential utility came from Sejnowski and Rosenberg's NETtalk \[1988\], where seven-character contexts were largely sufficient to map each character of a printed word into its corresponding phoneme -- where each character actually maps in various contexts into several different phonemes. For accomplishing linguistic case analyses McClelland and Kawamoto \[1986\] and Miikulainen and  Dyer \[1989\] used the entire context of phrases and sentences to map string contexts into case structures. Robert Allen \[1987\] mapped nine-word sentences of English into Spanish translations, and Yu and Simmons \[1990\] accomplished context sensitive translations between English and German. It was apparent that the contexts in which a word occurred provided information to a neural network that was sufficient to select correct word sense and syntactic structure for otherwise ambiguous usages of language. null An explicit use of context-sensitive grammar was developed by Simmons and Yu \[1990\] to solve the problem of accepting indefinitely long, recursively embedded strings of language for training a neural network. However although the resulting neural network was trained as a satisfactory grammar, there was a problem of scaleup. Training the network for even 2000 rules took several days, and it was foreseen that the cost of training for 10-20 thousand rules would be prohibitive. This led us to investigate the hypothesis that storing a context-sensitive grammar in a hash-table and accessing it using a scoring function to select the rule that best matched a sentence context would be a superior approach.</Paragraph>
    <Paragraph position="10"> In this paper we describe a series of experiments in acquiring context-sensitive grammars (CSG) from newspaper stories, and a deterministic parsing system that uses a scoring function to select the best matching context sensitive rules from a hash-table. We have accumulated 4000 rules from 92 sentences and found the resulting CSG to be remarkably accurate in computing exactly the parse structures that were preferred by the linguist who based the grammar on his understanding of the text. We show that the resulting grammar generalizes well to new text and compresses to a fraction of the example training rules.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML