File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1066_metho.xml

Size: 21,955 bytes

Last Modified: 2025-10-06 14:07:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1066">
  <Title>Guaranteeing Parsing Termination of Uni cation Grammars</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
a22a29a28a26a14a31a30 FEATS
</SectionTitle>
    <Paragraph position="0"> a14 , specifying the arcs and a partial function, a25a33a28a34a14 a32 ATOMS, labelling the sinks, and a12a14 is an ordered set of distinguished nodes in a14 called roots. a0 is not necessarily connected, but the  sequence a11a50a45a52a51 a15a54a53a24a53a54a53a38a15 a45a56a55 a18 of (not necessarily disjoint) FSs. We use the two views of MRSs interchangeably. null The sub-structure of a36 a19 a11a50a45a57a51 a15a54a53a54a53a24a53a58a15 a45a59a55 a18 , induced by the pair a11a50a60 a15a50a61a62a18 and denoted a36  a73 a35 and there exists a total function a77 a28a21a14 a32 a14 a73 such that for every root</Paragraph>
    <Paragraph position="2"> Skeletal grammars are a variant of uni cation grammars which have an explicit context-free backbone/skeleton. These grammars can be viewed as an extension of context-free grammars, where every category is associated with an informative FS. An extended category is a pair a11a50a45 a15a44a89a85a18 where a45 is an FS and a89 a2 CATS.</Paragraph>
    <Paragraph position="3"> De nition 2.1. A skeletal grammar (over FEATS, ATOMS and CATS) is a tuplea0a90a19 a11a88a91 a15a38a92a7a15 a45a94a93 a18 where a91 is a nite set of rules, each of which is an MRS of length a95a97a96a99a98 (with a designated rst element, the head of the rule), and a sequence of length a95 of categories; a92 is a lexicon, which associates with every terminal a100 (over a xed nite set a101 of terminals) a nite set of extended categoriesa92 a6a83a100 a8 ; a45 a93 is the start symbol (an extended category).</Paragraph>
    <Paragraph position="4"> A skeletal form is a pair a11a50a36 a15a103a102a89a38a18 , where a36 is an  which is an MRS of length a95a71a96a151a98 ; a92 is a lexicon, which associates with every terminal a100 a nite set of FSs a92 a6a83a100 a8 ; a45 a93 is the start symbol (an FS). General uni cation grammar formalism do not assume the existence of a context-free backbone.</Paragraph>
    <Paragraph position="5"> Derivations, pre-terminals, languages and derivation trees for general uni cation grammars are dened similarly to skeletal grammars, ignoring all categories.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Off-line-parsability constraints
</SectionTitle>
    <Paragraph position="0"> It is well known that uni cation based grammar formalisms are Turing-equivalent in their generative capacity (Pereira and Warren, 1983; Johnson, 1988, 87-93); determining whether a given string a1 is generated by a given grammar a0 is equivalent to deciding whether a Turing machine a152 halts on an empty input, which is known to be undecidable. Therefore, the recognition problem is undecidable in the general case. However, for grammars that satisfy a certain restriction, called off-line parsability constraint (OLP), decidability of the recognition problem is guaranteed. In this section we present some different variants of the OLP constraint suggested in the literature. Some of the constraints (Pereira and Warren, 1983; Kaplan and Bresnan, 1982; Johnson, 1988; Kuhn, 1999) apply only to skeletal grammars since they use the term category which is not well de ned for general unication grammars. Others (Haas, 1989; Shieber, 1992; Torenvliet and Trautwein, 1995; Wintner and Francez, 1999) are applicable to both skeletal and general uni cation grammars.</Paragraph>
    <Paragraph position="1"> Some of the constraints impose a restriction on allowable derivation trees, but provide no explicit de nition of an OLP grammar. Such a de nition can be understood in (at least) two manners: De nition 3.1 (OLP grammar).</Paragraph>
    <Paragraph position="2"> 1. A grammar a0 is OLP iff for every a1a153a2a72a4a10a6 a0a81a8 every derivation tree for a1 satis es the OLP constraint.</Paragraph>
    <Paragraph position="3"> 2. A grammar a0 is OLP iff for every a1a153a2a72a4a10a6 a0a81a8 there exists a derivation tree which satis es the OLP constraint.</Paragraph>
    <Paragraph position="4"> We begin the discussion with OLP constraints for skeletal grammars. One of the rst de nitions was suggested by Pereira and Warren (1983). Their constraint was designed for DCGs (a skeletal uni cation grammar formalism which assumes an explicit context-free backbone) for guaranteeing termination of general proof procedures of de nite clause sets. Rephrased in terms of skeletal grammars, the de nition is as follows: De nition 3.2 (Pereira and Warren's OLP for skeletal grammars (a154a9a4a107a138a156a155a137a157 )). A grammar is off-line parsable iff its context-free skeleton is not innitely ambiguous.</Paragraph>
    <Paragraph position="5"> The context-free skeleton is obtained by ignoring all FSs of the grammar rules and considering only the categories. In Jaeger et al. (2002) we prove that the depth of every derivation tree generated by a grammar whose context-free skeleton is nitely ambiguous is bounded by the number of syntactic categories times the size of its yield, therefore the recognition problem is decidable.</Paragraph>
    <Paragraph position="6"> Kaplan and Bresnan (1982) suggested a linguisticly motivated OLP constraint which refers to valid derivations for the lexical functional grammar formalism (LFG), a skeletal grammar formalism. They impose constraints on two kinds of a158 's, optionality and controlled a158 's, but as these terms are not formally de ned, we use a variant of their constraint, suggested by Johnson (1988, 95-97), eliminating all a158 's of any kind.</Paragraph>
    <Paragraph position="7"> De nition 3.3 (Johnson's OLP (a154a81a4a86a138a160a159 )). A constituent structure satis es the off-line parsability constraint iff it does not include a non-branching dominance chain in which the same category appears twice and the empty string a158 does not appear as a lexical form annotation of any (terminal) node.</Paragraph>
    <Paragraph position="8"> This constraint bounds the depth of any OLP derivation tree by a linear function of the size of its yield, thus ensuring decidability of the recognition problem.</Paragraph>
    <Paragraph position="9"> Johnson's de nition is a restriction on allowable c-structures rather than on the grammar itself. We use de nition 3.1 for a154a9a4a107a138 a159 grammars and refer only to its second part since it is less restrictive. The next de nition is also based on Kaplan and Bresnan's constraint and is also dealing only with OLP derivations. OLP grammar de nitions are according to de nition 3.1.</Paragraph>
    <Paragraph position="10"> X-bar theory grammars (Chomsky, 1975) have a strong linguistic justi cation in describing natural languages. Unfortunately neither Kaplan and Bresnan's nor Johnson's constraints allow such grammars, since they do not allow derivation trees in which the same category appears twice in a non-branching dominance chain. Kuhn (1999) refers to the problem from a linguist's point of view. The purpose of his constraint was to expand the class of grammars which satisfy Kaplan and Bresnan's constraint in order to allow X-bar derivations. Again, since there exists no formal de nition of the different kinds of a158 's we assume that a158 does not represent a lexical item (no a158 -rules).</Paragraph>
    <Paragraph position="11"> De nition 3.4 (Kuhn's OLP (a154a81a4a86a138a49a161 )). A c-structure derivation is valid iff no category appears twice in a non-branching dominance chain with the same f-annotation.</Paragraph>
    <Paragraph position="12"> Kuhn (1999) gives some examples of X-bar theory derivation trees of German and Italian sentences which contain the same category twice in a non-branching dominance chain with a different fannotation. Therefore they are invalid OLP derivation trees (by both Kaplan and Bresnan's and Johnson's constraints), but they satisfy Kuhn's OLP constraint. null According to Kuhn (1999), The Off-line parsability condition is a restriction on allowable c-structures excluding that for a given string, in nitely many c-structure analyses are possible . In other words, Kuhn assumes that OLP is, in fact, a condition that is intended to guarantee nite ambiguity. Kuhn's de nition may allow X-bar derivations, but it does not ensure nite ambiguity. The following grammar is an LFG grammar generating c-structures in which the same category appears twice in a non-branching dominance chain only with a different f-annotation, therefore it satis es Kuhn's definition of OLP. But the grammar is in nitely am- null Therefore, it is not clear whether the condition guarantees parsing termination nor decidability of the recognition problem and we exclude Kuhn's de nition from further analysis.</Paragraph>
    <Paragraph position="13"> The following de nitions are applicable to both skeletal and general uni cation grammars. The rst constraint was suggested by Haas (1989). Based on the fact that not every natural uni cation grammar has an obvious context-free backbone, Haas suggested a constraint for guaranteeing solvability of the parsing problem which is applicable to all uni cation grammar formalisms.</Paragraph>
    <Paragraph position="14"> Haas' de nition of a derivation tree is slightly different from the de nition given above (de nition 2.5). He allows derivation trees with non-terminals at their leaves, therefore a tree may represent a partial derivation.</Paragraph>
    <Paragraph position="15"> De nition 3.5 (Haas' Depth-boundedness (a186a16a113 )). A uni cation grammar is depth-bounded iff for every a4a3a68a31a187 there is a a186a188a68a31a187 such that every parse tree for a sentential form of a4 symbols has depth less than a186 .</Paragraph>
    <Paragraph position="16"> According to Haas (1989), a depth-bounded grammar cannot build an unbounded amount of tree structure from a bounded number of symbols .</Paragraph>
    <Paragraph position="17"> Therefore, for each sentential form of lengtha95 there exist a nite number of partial derivation trees, guaranteeing parsing termination.</Paragraph>
    <Paragraph position="18"> The a154a9a4a107a138 a155a137a157 de nition applies only to skeletal grammars, general uni cation grammars do not necessarily yield an explicit context-free skeleton. But the de nition can be extended for all uni cation grammar formalisms: De nition 3.6 (Finite ambiguity for uni cation grammars (a189a105a45 )). A uni cation grammar a0 is OLP iff for every stringa1 there exist a nite number of derivation trees.</Paragraph>
    <Paragraph position="19"> Shieber's OLP de nition (Shieber, 1992, 79 82) is de ned in terms of logical constraint based grammar formalisms. His constraint is de ned in logical terms, such as models and operations on models. We reformulate the de nition in terms of FSs.</Paragraph>
    <Paragraph position="20"> De nition 3.7 (Shieber's OLP (a154a81a4a86a138a160a190 )). A grammar a0 is off-line parsable iff there exists a nite-ranged function a189 on FSs such that a189a40a6a83a45 a8 a74a191a45 for all a45 and there are no derivation trees admitted by a0 in which a node a11a50a45 a18 dominates a node a11a50a113 a18 , both are roots of sub-trees with an identical yield and a189a40a6a83a45 a8a86a19 a189a40a6a83a113 a8 .</Paragraph>
    <Paragraph position="21"> The constraint is intended to bound the depth of every derivation tree by the range of a189 times the size of its yield. Thus the recognition problem is decidable.</Paragraph>
    <Paragraph position="22"> Johnson's OLP constraint is too restrictive, since it excludes all repetitive unary branching chains and a158 - rules, furthermore, it is applicable only to skeletal grammars. Therefore, Torenvliet and Trautwein (1995) have suggested a more liberal constraint, which is applicable to all uni cation grammar formalisms. null De nition 3.8 (Honest parsability constraint (a192a128a138 )). A grammara0 satis es the Honest Parsability Constraint (HPC) iff there exists a polynomiala193 s.t. for each a1a188a2a194a4a7a6 a0a9a8 there exists a derivation with at mosta193a49a6a38a35a65a1a81a35 a8 steps.</Paragraph>
    <Paragraph position="23"> The de nition guarantees that for every string of the grammar's language there exists at least one polynomial depth (in the size of the derived string) derivation tree. Furthermore, the de nition allows X-bar theory derivation trees, since a category may appear twice in a non-branching dominance chain as long as the depth of the tree is bounded by a polynomial function of its yield.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 OLP Analysis
</SectionTitle>
    <Paragraph position="0"> In this section we rst give some grammar examples and mention their OLP properties, then compare the different variants of OLP de nitions using these examples. The examples use a straightforward encoding of lists as FSs, where an empty list is denoted by a11 a18 , and a11 head a35 tail a18 represents a list whose rst item is a77a34a195a54a100a39a196 , followed by a197a58a100a198a60a67a199 . Figure 1 lists an example uni cation grammar generating the language a147a201a200a24a149</Paragraph>
    <Paragraph position="2"> currences of a200 has exactly one parse tree and its depth is a122a103a202 . Therefore, a0 a51 is a189a105a45 and a192a128a138 . a0 a51 is neither a186a21a113 nor a154a9a4a86a138 a190 ; it may generate arbitrarily deep derivation trees (containing lists of increasing length) whose frontier consists of only one symbol, and thus there exists no nite-ranged function mapping each FS on such a derivation to a nite set of FSs.</Paragraph>
    <Paragraph position="3">  generating the language a147a201a200a24a149 . There exist in nitely many derivation trees, of arbitrary depths, for the string a200 , therefore, a0 a123 is neither a186a21a113 nor a189a105a45 nor  rences of a200 has exactly one parse tree. The feature DEPTH represents the current depth of the derivation tree; at each derivation step an item is added to the DEPTH list. The feature TEMP represents the number of derivation steps before generating the next a200 symbol. Every application of the second rule doubles the depth of TEMP list (with respect to its length after the previous application of the rule). Thus the number of derivation steps for generating each a200 is always twice the number of steps for generating its predecessor, and for every sentential form of length a202 any partial derivation tree's depth is bounded by an exponential function of a202 (approximately a122a103a235 ). Therefore a0a29a236 is a189a105a45 and a186a21a113 but neither a154a9a4a107a138 a190 nor  Inter-relations among the OLP de nitions Below we make a comparison of all given OLP definitions; such relationships were not investigated in the past. We begin by considering skeletal grammars. null Johnson's condition is the only one omitting all a158 's, thus none of the others implies a154a81a4a86a138a160a159 . a154a9a4a107a138a49a159 a192a128a138 : The depth of any a154a9a4a107a138a84a159 derivation tree is bounded by a linear function of its yield, therefore for every string there exists a derivation tree of at most a polynomial depth, and an a154a81a4a86a138 a159 grammar is a192a128a138 .</Paragraph>
    <Paragraph position="4"> a154a9a4a107a138 a159 a154a9a4a86a138a156a155a137a157 ,a186a16a113 ,a189a105a45 ,a154a9a4a107a138 a190 : The grammar of gure 2 is an a154a9a4a107a138 a159 grammar (viewing CAT as the category) but it does not satisfy the other constraints. null a154a9a4a107a138 a155a137a157 a186a21a113 ,a189a105a45 ,a154a9a4a86a138a84a190 ,a192a128a138 : By Jaeger et al. (2002), the depth of any derivation tree (partial/non-partial) admitted by an a154a81a4a86a138 a155a13a157 grammar is bounded by a linear function of the size of its yield, thus an a154a9a4a107a138 a155a137a157 grammar satis es all the other constraints. A grammar satisfying the constraints may still have an in nitely ambiguous context-free backbone.</Paragraph>
    <Paragraph position="5"> We continue the analysis by comparing the definitions which are applicable to general uni cation grammars.</Paragraph>
    <Paragraph position="6"> a186a21a113 a189a105a45 : A a186a21a113 grammar is also a189a105a45 ; it can only generate derivation trees whose depth is bounded by a function of their yield, and there exist only a nite number of derivation trees up to a certain depth. By gure 1, an a189a105a45 grammar is not necessarily a186a16a113 .</Paragraph>
    <Paragraph position="7"> a186a21a113 a154a9a4a107a138 a190 : None of the conditions implies the other. The grammar of gure 3 is a186a21a113 but not a154a9a4a107a138 a190 . A grammar whose language consists of only one word, and its derivation is of a constant depth, may still contain a redundant rule generating arbitrarily deep trees whose frontier is of length a98 .</Paragraph>
    <Paragraph position="8"> Thus it is a154a9a4a107a138 a190 but not a186a21a113 .</Paragraph>
    <Paragraph position="9"> a186a21a113 ,a189a105a45 a192a128a138 : a186a16a113 means that every derivation tree is bounded by some function of its yield. a192a128a138 means that for every string there exist at least one derivation tree of a polynomial depth of its yield.</Paragraph>
    <Paragraph position="10"> The grammar of gure 3 is a186a21a113 and a189a105a45 , but since every derivation tree's depth is exponential in the size of its yield, it is not a192a243a138 . The grammar of gure 2 is a192a128a138 , but since it is in nitely ambiguous, it is neither a189a105a45 nor a186a21a113 .</Paragraph>
    <Paragraph position="11"> a189a105a45 ,a192a128a138 a154a81a4a86a138 a190 : The depth of any derivation tree admitted by an a154a81a4a86a138 a190 grammar is bounded by a linear function of its yield. Thus an a154a9a4a86a138 a190 grammar is a189a105a45 and a192a243a138 . By gure 1, an a189a105a45 and a192a128a138 grammar is not necessarily a154a9a4a86a138 a190 .</Paragraph>
    <Paragraph position="12"> Figure 4 depicts the inter-relations hierarchy diagram of the OLP de nitions, separated for skeletal and general uni cation grammars. The arrows represent the implications discussed above.</Paragraph>
    <Paragraph position="13"> skeletal grammars.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Undecidability proofs
</SectionTitle>
    <Paragraph position="0"> For the de nitions which are applicable only to skeletal grammars it is easy to verify whether a grammar satis es the constraint. The de nitions that apply to arbitrary uni cation grammars are harder to check. In this section we give sketches of proofs of undecidability of three of the OLP de nitions: Finite Ambiguity (a189a105a45 ), Depth-Boundedness (a186a21a113 ) and Shieber's OLP (a154a9a4a107a138 a190 ). Theorem 1. Finite ambiguity is undecidable.</Paragraph>
    <Paragraph position="1"> Proof sketch. In order to show that nite ambiguity is undecidable, we use a reduction from the membership problem, which is known to be undecidable (Johnson, 1988). We assume that there exists an algorithm, a45a59a245 a108 , for deciding a189a105a45 and show how it can be used to decide whether a1a87a2a229a4a10a6 a0a81a8 . Given a string a1 and a grammar a0 , construct  a73 , by adding the rule a45 a93 a32a135a45 a93 to a0 's set of rules. Apply a45a59a245 a108 to a0 a73 , a0 a73 is a189a105a45 on a1 iff a1a254a253a2a142a4a10a6 a0a81a8 . If  a8 , therefore by applying the rule a45 a93 a32a255a45 a93 in nitely many times, there exist an in nite number of derivation trees for a1 admitted by a0 a73 . If a1a3a253a2a229a4a7a6 a0a9a8 then a1 a253a2a229a4a10a6 a0 a73a8 , no application of the additional rule would generate any derivation tree for a1 , and a0 a73 is nitely ambiguous.</Paragraph>
    <Paragraph position="2"> Since the membership problem is undecidable, it is undecidable whether there exist only a nite number of derivation trees for a stringa1 admitted by a0 . Hence nite ambiguity is undecidable.</Paragraph>
    <Paragraph position="3"> Theorem 2. Depth-boundedness is undecidable.</Paragraph>
    <Paragraph position="4"> Proof sketch. In order to prove undecidability of depth-boundedness, we use a reduction from the Turing machines halting problem, which is known to be undecidable (Hopcroft and Ullman, 1979, 183185). We assume that there exists an algorithm,  a244 a110 , for decidinga186a16a113 and show how it can be used to decide whether a Turing machine a152 terminates on the empty input a158 .</Paragraph>
    <Paragraph position="5"> Johnson (1988) suggested a transformation from the Turing machines halting problem to uni cation grammars. The transformation generates a grammar, a0 a1 , which consists of unit-rules only, and can generate at most one complete derivation tree. Assume the existence of an algorithm a45 a244 a110 . Apply  a186a21a113 then the grammar generates a complete derivation tree, therefore its language is non empty and a152 terminates on the empty input. Otherwise, a4a7a6 a0a9a8a84a19a3a2 and a152 does not terminate on the empty input. Thus, we have decided the Turing machines halting problem.</Paragraph>
    <Paragraph position="6"> Theorem 3. a154a81a4a86a138 a190 is undecidable.</Paragraph>
    <Paragraph position="7"> Proof sketch. In order to prove undecidability of a154a9a4a107a138 a190 , we use a combination of the undecidability proofs of a186a16a113 and a189a105a45 . Given a Turing machine a152 , construct a0 a1 using Johnson's reduction, then construct a0 a73 a1 by adding a45 a93 a32 a45 a93 to a0 a1 . Assume the existence of an algorithma45 a190 , deciding a154a9a4a107a138 a190 . a0 a73a1 is a154a9a4a107a138 a190 iff a152 does not terminate on the empty input. Thus, by applying a45 a190 on a0 a73a1 , we have decided the Turing machines halting problem.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML