File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1137_intro.xml
Size: 4,961 bytes
Last Modified: 2025-10-06 14:03:41
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1137"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Highly constrained unification grammars</Title> <Section position="3" start_page="0" end_page="1089" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Unification grammars (UG) (Shieber, 1986; Shieber, 1992; Carpenter, 1992) have originated as an extension of context-free grammars, the basic idea being to augment the context-free rules with non context-free annotations (feature structures) in order to express additional information. They can describe phonological, morphological, syntactic and semantic properties of languages simultaneously and are thus linguistically suitable for modeling natural languages. Several formulations of unification grammars have been proposed, and they are used extensively by computational linguists to describe the structure of a variety of natural languages.</Paragraph> <Paragraph position="1"> Unification grammars are Turing equivalent: determining whether a given string is generated by a given grammar is as hard as deciding whether a Turing machine halts on the empty input (Johnson, 1988). Therefore, the recognition problem for unification grammars is undecidable in the general case. To ensure its decidability, several constraints on unification grammars, commonly known as the off-line parsability (OLP) constraints, were suggested, such that the recognition problem is decidable for off-line parsable grammars (Jaeger et al., 2005). The idea behind all the OLP definitions is to rule out grammars which license trees in which unbounded amount of material is generated without expanding the frontier word. This can happen due to two kinds of rules: epsilon1-rules (whose bodies are empty) and unit rules (whose bodies consist of a single element). However, even for unification grammars with no such rules the recognition problem is NP-hard (Barton et al., 1987).</Paragraph> <Paragraph position="2"> In order for a grammar formalism to make predictions about the structure of natural language its generative capacity must be constrained. It is now generally accepted that Context-free Grammars (CFGs) lack the generative power needed for this purpose (Savitch et al., 1987), due to natural language constructions such as reduplication, multiple agreement and crossed agreement. Several linguistic formalisms have been proposed as capable of modeling these phenomena, including Linear Indexed Grammars (LIG) (Gazdar, 1988), Head Grammars (Pollard, 1984), Tree Adjoining Grammars (TAG) (Joshi, 2003) and Combinatory Categorial Grammars (Steedman, 2000). In a seminal work, Vijay-Shanker and Weir (1994) prove that all four formalisms are weakly equivalent. They all generate the class of mildly context-sensitive languages (MCSL), all members of which have recognition algorithms with time complexity O(n6) (Vijay-Shanker and Weir, 1993; Satta, 1994).1 As a result of the weak equivalence of four independently developed (and linguistically motivated) extensions of CFG, the class MCSL is considered to be linguistically meaningful, a natural class of languages for characterizing natural languages.</Paragraph> <Paragraph position="3"> Several authors tried to approximate unification grammars by means of context-free grammars (Rayner et al., 2001; Kiefer and Krieger, 2004) and even finite-state grammars (Pereira and Wright, 1997; Johnson, 1998), but we are not aware of any work which relates unification grammars with the class MCSL. The main objective of this work is to define constraints on UGs which naturally limit their generative capacity. We define two natural and easily testable syntactic constraints on UGs which ensure that grammars satisfying them generate the context-free and the mildly context-sensitive languages, respectively.</Paragraph> <Paragraph position="4"> The contribution of this result is twofold: * From a theoretical point of view, constraining unification grammars to generate exactly the class MCSL results in a grammatical formalism which is, on one hand, powerful enough for linguists to express linguistic generalizations in, and on the other hand cognitively adequate, in the sense that its generative capacity is constrained; * Practically, such a constraint can provide efficient recognition algorithms for the limited class of unification grammars.</Paragraph> <Paragraph position="5"> We define some preliminary notions in section 2 and then show a constrained version of UG which generates the class CFL of context-free languages in section 3. Section 4 presents the main result, namely a restricted version of UG and a mapping of its grammars to LIG, establishing the proposition that such grammars generate exactly the class MCSL. For lack of space, we favor intuitive explanation over rigorous proofs; the full details can be found in Feinstein (2004).</Paragraph> </Section> class="xml-element"></Paper>