File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2106_intro.xml
Size: 5,456 bytes
Last Modified: 2025-10-06 14:05:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2106"> <Title>Modularizing Codescriptive Grammars for Efficient Parsing*</Title> <Section position="2" start_page="0" end_page="628" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Unification-based theories of grammar allow for an integration of different levels of linguistic descriptions in a common framework of typed feature structures. In HPSG this assumption is embodied in the flmdamental concept of a sign (Pollard and Sag, 1987; Pollard and Sag, 1994). A sign is a structure incorporating information from all levels of linguistic analysis, such as phonology, syntax, and semantics. This structure specifies interactions between these levels by means of corderences~ indicating the sharing of information. It also describes how the levels constrain each other mutually. Such a concept of linguistic description is attractive for several reasons: 1. it supports the use of common formalisms and data structures on all linguistic levels, 2. it provides declarative and reversible interface specifications between these levels, 3. all information is simultaneously available, and 4. no procedural interaction between linguistic modules needs to be set up.</Paragraph> <Paragraph position="1"> *This work was funded by the German Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the Verbmobil Project under Grant 01 IV 101 K/1. The responsibility for the content of this study lies with the authors. Similar approaches, especially for the syntax-semantics interface, have been suggested for all major kinds of unification-based theories, such as LFG or CUG. (Halvorsen and Kaplan, 1988) call such approaches codescriptive in contrast to the approach of description by analysis which is closely related to sequential architectures where linguistic levels correspond to components, operating on the basis of the (complete) analysis results of lower levels. In a codescriptive grammar semantic descriptions are expressed by additional constraints.</Paragraph> <Paragraph position="2"> Though theoretically very attractive, codescription has its price: (i) the grammar is difficult to modularize due to the fact that the levels constrain each other mutually and (ii) there is a computational overhead when parsers use the complete descriptions. Problems of these kinds which were already noted by (Shieber, 1985) motivated tile research described here. The goal was to develop more flexible ways of using codescriptive grammars than having them applied by a parser with full informational power. The underlying observation is that constraints in such grammars can play different roles: * Genuine constraints which relate directly to tile grammaticality (wellformedness) of the input. Typically, these are t, he syntactic constraints. null * Spurious constraints which basically build representational structures. These are less concerned with wellformedness of the input but rather with output for other components in the overall system. Much of semantic descriptions are of this kind.</Paragraph> <Paragraph position="3"> If a parser treats all constraints on a par, it cannot distinguish between the structure-building and the filtering constraints. Since unification-based formalisms are monotonic, large structures are built up and have to undergo all the steps of unification, copying, and undoing in the processor. The cost, s of these operations (in time and space) increase exponentially with the size of the structures.</Paragraph> <Paragraph position="4"> In the VI,iRBMOBIL project, tile parser is used within a speech translation system (Wahlster, 1993; Kay, Gawron, and Norvig, 1994). The pars- null German verb komrne (to come).</Paragraph> <Paragraph position="5"> er input consists of word lattices of hypotheses from speech recognition. Tile parser has to identify those paths in tile lattice which represent a grammatically acceptable utterance. Parser and recognizer are increlnental and interactively running in parallel* Even for short utterances, the lattices can contain several lmndreds of word hypotheses, most of which do not combine to grammatical utterances. Parsing these lattices is much more complex than parsing written text.</Paragraph> <Paragraph position="6"> The basic idea presented here is to distribute the labour of evaluating the constraints in the grammar on several processors (i.e., parsers). Important considerations in the design of the system were 1. increasing the Imrtbrmance, 2. achieving incremental and interactive behaviour, null 3. minimizing the ow~'rhead in communication between the processors.</Paragraph> <Paragraph position="7"> We used a mid-size HPSG-kind German grammar written in the &quot;/7)PS formalism (Krieger and Schgfer, 1994). The grammar cospecifies syntax and semantics in the attributes SYN and SEM. A simplified exmnple is shown in tile lexical entry for the verb come in Fig. 1.</Paragraph> <Paragraph position="8"> In the following section, we start with a top-down view of the architecture. Alter that we will describe the conmmnication protocol between the parsing processes. Then several options for creating subgrammars from the complete grammar will be discussed. The subgrammars represent the distribution of information across the parsers. Finally, some experimental results will be reported.</Paragraph> </Section> class="xml-element"></Paper>