File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/c88-2091_abstr.xml
Size: 6,722 bytes
Last Modified: 2025-10-06 13:46:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2091"> <Title>Why Computational Grammarians Can Be Skeptical About Existing Linguistic Theories</Title> <Section position="1" start_page="0" end_page="448" type="abstr"> <SectionTitle> PANEL </SectionTitle> <Paragraph position="0"> The bottle neck in building a practical natural language processing system is not those problems which have been often discussed in research papers, but in ilandling much more dirty, exceptional (for theoreticians, but we frequently encounter) expressions. This panel will focus on the problem which has been rarely written but has been argued informally among researchers who have tried to build a practical natural language processing system at least once.</Paragraph> <Paragraph position="1"> Theory is important and valuable for the explanation and understanding, but is essentially the first order approximation of a target object. As for language~ current theories are Just for the basic part of the language structure. Real language usage is quite different from the basic language structure and a supposed mechanism of interpretation. Natural language processing system must cover real language usage as much as possible. The system model must be designed in such a way that it is clearly understandable by the support of a powerful linguistic theory, and still can accept varieties of exceptional linguistic phenomena which the theory is difficult to treat. How we can design such a system is a major problem in natural language processing, especially for machine translation between the languages of different linguistic families. We have to be concerned with both linguistic and non-llngulstlc world. While we have to study these difficult problems, we must not forget about the realizability of a useful system from the standpoint of engineering.</Paragraph> <Paragraph position="2"> I received valuable comments from Dr. Karen Jensen who cannot participate in our panel, and kindly offered me to use her comments freely in our panel. I want to cite her comments in the followings.</Paragraph> <Paragraph position="3"> i. We need to deal with huge amounts of data (number of 5. We are not interested in using the most constrained/ sentences, paragraphs, etc.). Existing linguistic restricted formalism. LTs generally are, because of theories (LTs) play with small amounts of data.</Paragraph> <Paragraph position="4"> 2. The data involve many (and messy) details. LTs are prematurely fond of simplicity. For example: punctuation is very important for processing real text, but LTs have nothing to say about it. (This is actually strange, since punctuation represents -- to some extent -- intonational contours, and these are certainly linguistically significant.) 3. There is no accepted criterion for when to abandon an LT; one can always modify theory to fit counterexamples. We have fairly clear criteria: if a computational system cannot do its Job in real time, then it fails.</Paragraph> <Paragraph position="5"> 4. We need to use complex attribute-value strnctures, which cannot be manipulated on paper or on a blackboard. &quot;Trees&quot; are only superficially involved. This means we are absolutely committed to computation.</Paragraph> <Paragraph position="6"> LTs have various degrees of commitment.</Paragraph> <Paragraph position="7"> Existing linguistic theories ate of limited usefulness to broad-coverage, real-world computational grammars, perhaps largely because existing theorists focus on limited notions of &quot;grammaticality,&quot; rather than on the goal of dealing, in some fashion, with any piece of input text. Therefore, existing theories play the game of ruling out many strings of a language, rather than the game of trying to assign plausible structures to all strings. We suggest that the proper goal of a working computational grammar is not to accept or reject strings, but to assign the most reasonable structure to every input string, and to comment on it, when necessary. (This goal does not seem to be psychologically implausible for human beings, either.) For years it has seemed theoretically sound to assume that the proper business of a grammar is to describe all of the grammatical structures of its language, and only those stmctrees that ate granlmatical: The grammar of L will thus be a device that generates all of the grammatical sequences of L and none of rhe ungrammatical ones. (Chomsky 1957,</Paragraph> <Paragraph position="9"> supposed claims about language processing mechanismsdeg 6. We are interested in uniqueness as much as in generality. ITs usually are not.</Paragraph> <Paragraph position="10"> 7. We are more interested in coverage of the gran~ar than in completenesslof the grammar. LTs generally pursue completeness.</Paragraph> <Paragraph position="11"> 8. We aim for &quot;all,&quot; but not &quot;only&quot; the grammatical constructions of n natural language. Defining ungrammatical structures is, by and large, a futile task (Alexis Manaster-Ramer~ Wlodzimierz Zadrozny).</Paragraph> <Paragraph position="12"> 9. Existing LTs give at besta high-level specification of the structure of natural language. Writing a computational granmmr is llke writing a real program given very abstract specs (Nelson Uorrea).</Paragraph> <Paragraph position="13"> i0. We are not skeptical of theory, Just of existing theories.</Paragraph> <Paragraph position="14"> At first blush, it seems unnecessary to conjure up any justification for titis claim. Almost by definition, the proper business of a grammar should be grammaticality. However, it has been notoriously difficult to draw a line between &quot;gram. maticai&quot; sequences and &quot;ungnmunalicai&quot; sequences, for any natural human language. It may even be provably impossible to define precisely rhe notion of grammaticality for any language. Nalural language deals with vague predicatus, and might itself be called a vague predicator.</Paragraph> <Paragraph position="15"> This being tree, it still seems worthwhile to ~ at parsing ALL of the gr,'unmalical strings of a language, but parsing ONLY the grammatical strings becomes a dubious enteq~rise at best. Arguments for doing so reduce either to dogma, or to some general notion of proptiety. Argmnenis against, however, arc easy to come by. Leaving theoretical considerations aside for the moment, consider these praguratic ones: (a) The diachronic argumeut. The creativity of human use of language is great, and language systems are always changing. A construction that was once unacceptable becomes acceptable over time, and vice versa. Even if a grammar could</Paragraph> </Section> class="xml-element"></Paper>