File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1043_metho.xml
Size: 23,820 bytes
Last Modified: 2025-10-06 14:14:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1043"> <Title>The Complexity of Recognition of Linguistically Adequate Dependency Grammars</Title> <Section position="4" start_page="0" end_page="337" type="metho"> <SectionTitle> 2 Versions of Dependency Grammar </SectionTitle> <Paragraph position="0"> The growing interest in the dependency concept (which roughly corresponds to the O-roles of GB, subcategorization in HPSG, and the so-called domain of locality of TAG) again raises the issue whether non-lexical categories are necessary for linguistic analysis. After reviewing several proposals in this section, we argue in the next section that word order -- the description of which is the most prominent difference between PSGs and DGs -- can adequately be described without reference to non-lexical categories.</Paragraph> <Paragraph position="1"> Standard PSG trees are projective, i.e., no branches cross when the terminal nodes are projected onto the input string. In contrast to PSG approaches, DG requires non-projective analyses. As DGs are restricted to lexical nodes, one cannot, e.g., describe the so-called unbounded dependencies without giving up projectivity. First, the categorial approach employing partial constituents (Huck, 1988; Hepple, 1990) is not available, since there are no phrasal categories. Second, the coindexing (Haegeman, 1994) or structure-sharing (Pollard & Sag, 1994) approaches are not available, since there are no empty categories.</Paragraph> <Paragraph position="2"> Consider the extracted NP in &quot;Beans, I know John likes&quot; (cf. also to Fig.1 in Section 3). A projective tree would require &quot;Beans&quot; to be connected to either &quot;I&quot; or &quot;know&quot; - none of which is conceptually directly related to &quot;Beans&quot;. It is &quot;likes&quot; that determines syntactic fea- null tures of &quot;Beans&quot; and which provides a semantic role for it. The only connection between &quot;know&quot; and &quot;Beans&quot; is that the finite verb allows the extraction of &quot;Beans&quot;, thus defining order restrictions for the NP. This has led some DG variants to adopt a general graph structure with multiple heads instead of trees. We will refer to DGs allowing non-projective analyses as discontinuous DGs.</Paragraph> <Paragraph position="3"> Tesni~re (1959) devised a bipartite grammar theory which consists of a dependency component and a translation component (' translation' used in a technical sense denoting a change of category and grammatical function). The dependency component defines four main categories and possible dependencies between them. What is of interest here is that there is no mentioning of order in TesniSre's work. Some practitioneers of DG have allowed word order as a marker for translation, but they do not prohibit non-projective trees.</Paragraph> <Paragraph position="4"> Gaifman (1965) designed his DG entirely analogous to context-free phrase structure grammars. Each word is associated with a category, which functions like the non-terminals in CFG. He then defines the following rule format for dependency grammars: (1) X(Y,,... , Y~, ,, Y~+I,..., Y,,) This rule states that a word of category X governs words of category Y1,... , Yn which occur in the given order.</Paragraph> <Paragraph position="5"> The head (the word of category X) must occur between the i-th and the (i + 1)-th modifier. The rule can be viewed as an ordered tree of depth one with node labels.</Paragraph> <Paragraph position="6"> Trees are combined through the identification of the root of one tree with a leaf of identical category of another tree. This formalization is restricted to projective trees with a completely specified order of sister nodes. As we have argued above, such a formulation cannot capture semantically motivated dependencies.</Paragraph> <Section position="1" start_page="337" end_page="337" type="sub_section"> <SectionTitle> 2.1 Current Dependency Grammars </SectionTitle> <Paragraph position="0"> Today's DGs differ considerably from Gaifman's conception, and we will very briefly sketch various order descriptions, showing that DGs generally dissociate dominance and precedence by some mechanism. All variants share, however, the rejection of phrasal nodes (although phrasal features are sometimes allowed) and the introduction of edge labels (to distinguish different dependency relations).</Paragraph> <Paragraph position="1"> Meaning-Text Theory (Mer 5uk, 1988) assumes seven strata of representation. The rules mapping from the unordered dependency trees of surface-syntactic representations onto the annotated lexeme sequences of deepmorphological representations include global ordering rules which allow discontinuities. These rules have not yet been formally specified (Mel' 5uk & Pertsov, 1987, p. 187f), but see the proposal by Rambow & Joshi (1994).</Paragraph> <Paragraph position="2"> Word Grammar (Hudson, 1990) is based on general graphs. The ordering of two linked words is specified together with their dependency relation, as in the proposition &quot;object of verb succeeds it&quot;. Extraction is analyzed by establishing another dependency, visitor, between the verb and the extractee, which is required to precede the verb, as in &quot;visitor of verb precedes it&quot;. Resulting inconsistencies, e.g. in case of an extracted object, are not resolved, however.</Paragraph> <Paragraph position="3"> Lexicase (Starosta, 1988; 1992) employs complex feature structures to represent lexical and syntactic entities. Its word order description is much like that of Word Grammar (at least at some level of abstraction), and shares the above inconsistency.</Paragraph> <Paragraph position="4"> Dependency Unification Grammar (Hellwig, 1988) defines a tree-like data structure for the representation of syntactic analyses. Using morphosyntactic features with special interpretations, a word defines abstract positions into which modifiers are mapped. Partial orderings and even discontinuities can thus be described by allowing a modifier to occupy a position defined by some transitive head. The approach cannot restrict discontinuities properly, however.</Paragraph> <Paragraph position="5"> Slot Grammar (McCord, 1990) employs a number of rule types, some of which are exclusively concerned with precedence. So-called head/slot and slot/slot ordering rules describe the precedence in projective trees, referring to arbitrary predicates over head and modifiers. Extractions (i.e., discontinuities) are merely handled by a mechanism built into the parser.</Paragraph> <Paragraph position="6"> This brief overview of current DG flavors shows that various mechanisms (global rules, general graphs, procedural means) are generally employed to lift the limitation to projective trees. Our own approach presented below improves on these proposals because it allows the lexicalized and declarative formulation of precedence constraints. The necessity of non-projective analyses in DG results from examples like &quot;Beans, 1 know John likes&quot; and the restriction to lexical nodes which prohibits gap-threading and other mechanisms tied to phrasal categories. null</Paragraph> </Section> </Section> <Section position="5" start_page="337" end_page="339" type="metho"> <SectionTitle> 3 A Dependency Grammar with Word </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="337" end_page="338" type="sub_section"> <SectionTitle> Order Domains </SectionTitle> <Paragraph position="0"> We now sketch a minimal DG that incorporates only word classes and word order as descriptional dimensions.</Paragraph> <Paragraph position="1"> The separation of dominance and precedence presented here grew out of our work on German, and retains the local flavor of dependency specification, while at the same time covering arbitrary discontinuities. It is based on a (modal) logic with model-theoretic interpretation, which is presented in more detail in (Br~ker, 1997).</Paragraph> <Paragraph position="3"/> </Section> <Section position="2" start_page="338" end_page="338" type="sub_section"> <SectionTitle> 3.1 Order Specification </SectionTitle> <Paragraph position="0"> Our initial observation is that DG cannot use binary precedence constraints as PSG does. Since DG analyses are hierarchically flatter, binary precedence constraints result in inconsistencies, as the analyses of Word Grammar and Lexicase illustrate. In PSG, on the other hand, the phrasal hierarchy separates the scope of precedence restrictions. This effect is achieved in our approach by defining word order domains as sets of words, where precedence restrictions apply only to words within the same domain. Each word defines a sequence of order domains, into which the word and its modifiers are placed.</Paragraph> <Paragraph position="1"> Several restrictions are placed on domains. First, the domain sequence must mirror the precedence of the words included, i.e., words in a prior domain must precede all words in a subsequent domain. Second, the order domains must be hierarchically ordered by set inclusion, i.e., be projective. Third, a domain (e.g., dl in Fig.l) can be constrained to contain at most one partial dependency tree. l We will write singleton domains as &quot;_&quot;, while other domains are represented by &quot;-&quot;. The precedence of words within domains is described by binary precedence restrictions, which must be locally satisfied in the domain with which they are associated. Considering Fig. 1 again, a precedence restriction for &quot;likes&quot; to precede its object has no effect, since the two are in different domains. The precedence constraints are formulated as a binary relation &quot;~&quot; over dependency labels, including the special symbol &quot;self&quot; denoting the head. Discontinuities can easily be characterized, since a word may be contained in any domain of (nearly) any of its transitive heads. If a domain of its direct head contains the modifier, a continuous dependency results. If, however, a modifier is placed in a domain of some transitive head (as &quot;Beans&quot; in Fig. 1), discontinuities occur. Bounding effects on discontinuities are described by specifying that certain dependencies may not be crossed. 2 For the tFor details, cf. (Br6ker, 1997).</Paragraph> <Paragraph position="2"> 2German data exist that cannot be captured by the (more common) bounding of discontinuities by nodes of a certain purpose of this paper, we need not formally introduce the bounding condition, though.</Paragraph> <Paragraph position="3"> A sample domain structure is given in Fig.l, with two domains dl and d2 associated with the governing verb &quot;know&quot; (solid) and one with the embedded verb &quot;likes&quot; (dashed). dl may contain only one partial dependency tree, the extracted phrase, d2 contains the rest of the sentence. Both domains are described by (2), where the domain sequence is represented as &quot;<<&quot;. d2 contains two precedence restrictions which require that &quot;know&quot; (represented by self) must follow the subject (first precedence constraint) and precede the object (second precedence constraint).</Paragraph> <Paragraph position="4"> (2) __ { } << ----. { (subject -.< self), (self --< object)}</Paragraph> </Section> <Section position="3" start_page="338" end_page="339" type="sub_section"> <SectionTitle> 3.2 Formal Description </SectionTitle> <Paragraph position="0"> The following notation is used in the proof. A lexicon Lez maps words from an alphabet E to word classes, which in turn are associated with valencies and domain sequences. The set C of word classes is hierarchically ordered by a subclass relation (3) isaccCxC A word w of class c inherits the valencies (and domain sequence) from c, which are accessed by (4) w.valencies A valency (b, d, c) describes a possible dependency relation by specifying a flag b indicating whether the dependency may be discontinuous, the dependency name d (a symbol), and the word class c E C of the modifier. A word h may govern a word m in dependency d if h defines a valency (b, d, c) such that (m isao c) and m can consistently be inserted into a domain of h (for b = -) or a domain of a transitive head of h (for b = +). This condition is written as (5) governs(h,d,m) A DG is thus characterized by (6) G = (Lex, C, isac, E) The language L(G) includes any sequence of words for which a dependency tree can be constructed such that for each word h governing a word m in dependency d, governs(h, d, m) holds. The modifier of h in dependency d is accessed by</Paragraph> <Paragraph position="2"/> </Section> </Section> <Section position="6" start_page="339" end_page="340" type="metho"> <SectionTitle> 4 The complexity of DG Recognition </SectionTitle> <Paragraph position="0"> Lombardo & Lesmo (1996, p.728) convey their hope that increasing the flexibility of their conception of DG will &quot; ... imply the restructuring of some parts of the recognizer, with a plausible increment of the complexity&quot;.</Paragraph> <Paragraph position="1"> We will show that adding a little (linguistically required) flexibility might well render recognition A/P-complete.</Paragraph> <Paragraph position="2"> To prove this, we will encode the vertex cover problem, which is known to be A/P-complete, in a DG.</Paragraph> <Section position="1" start_page="339" end_page="339" type="sub_section"> <SectionTitle> 4.1 Encoding the Vertex Cover Problem in Discontinuous DG </SectionTitle> <Paragraph position="0"> A vertex cover of a finite graph is a subset of its vertices such that (at least) one end point of every edge is a member of that set. The vertex cover problem is to decide whether for a given graph there exists a vertex cover with at most k elements. The problem is known to be A/7~-complete (Garey & Johnson, 1983, pp.53-56).</Paragraph> <Paragraph position="1"> Fig. 2 gives a simple example where {c, d} is a vertex cover.</Paragraph> <Paragraph position="2"> A straightforward encoding of a solution in the DG formalism introduced in Section 3 defines a root word s of class S with k valencies for words of class O. O has IWl subclasses denoting the nodes of the graph. An edge is represented by two linked words (one for each end point) with the governing word corresponding to the node included in the vertex cover. The subordinated word is assigned the class R, while the governing word is assigned the subclass of O denoting the node it represents. The latter word classes define a valency for words of class R (for the other end point) and a possibly discontinuous valency for another word of the identical class (representing the end point of another edge which is included in the vertex cover). This encoding is summarized in Table 1.</Paragraph> <Paragraph position="3"> The input string contains an initial s and for each edge the words representing its end points, e.g. &quot;saccdadbdcb&quot; for our example. If the grammar allows the construction of a complete dependency tree (cf. Fig. 3 for one solution), this encodes a solution of the vertex cover problem.</Paragraph> </Section> <Section position="2" start_page="339" end_page="340" type="sub_section"> <SectionTitle> 4.2 Formal Proof using Continuous DG </SectionTitle> <Paragraph position="0"> The encoding outlined above uses non-projective trees, i.e., crossing dependencies. In anticipation of counter arguments such as that the presented dependency grammar was just too powerful, we will present the proof using only one feature supplied by most DG formalisms, namely the free order of modifiers with respect to their head. Thus, modifiers must be inserted into an order domain of their head (i.e., no + mark in valencies). This version of the proof uses a slightly more complicated encoding of the vertex cover problem and resembles the proof by Barton (1985).</Paragraph> <Paragraph position="1"> Definition 1 (Measure) Let II * II be a measure for the encoded input length of a computational problem. We require that if S is a set or string and k E N then ISl > k implies IlSll ___ Ilkll and that for any tuple I1(&quot;&quot; ,z,.. &quot;)11 - Ilzll holds. < Definition 2 (Vertex Cover Problem) A possible instance of the vertex cover problem is a triple (V, E, k) where (V, E) is a finite graph and IvI > k N. The vertex cover problem is the set VC of all instances (V, E, k) for which there exists a subset V' C_ V and a function f : E ---> V I such that IV'l <_ k and V(Vm,Vn) E E: f((vm,Vn)) E {Vm,Vn}. <1 Definition 3 (DG recognition problem) A possible instance of the DG recognition problem is a tuple (G, a) where G = (Lex, C, isac, ~) is a dependency grammar as defined in Section 3 and a E E +. The DG recognition problem DGR consists of all instances (G, a) such that a E L(G). <1 For an algorithm to decide the VC problem consider a data structure representing the vertices of the graph (e.g., a set). We separate the elements of this data structure into the (maximal) vertex cover set and its complement set. Hence, one end point of every edge is assigned to the vertex cover (i.e., it is marked). Since (at most) all IEI edges might share a common vertex, the data structure has to be a multiset which contains IEI copies of each vertex. Thus, marking the IVI - k complement vertices actually requires marking IVI - k times IE\[ identical vertices. This will leave (k - 1) * IEI unmarked vertices in the input structure. To achieve this algorithm through recognition of a dependency grammar, the marking process will be encoded as the filling of appropriate valencies of a word s by words representing the vertices. Before we prove that this encoding can be generated in polynomial time we show that: Lemma 1 The DG recognition problem is in the complexity class Alp. \[\] Let G = (Lex, C, isac, Z) and a E \]E +. We give a nondeterministic algorithm for deciding whether a = (Sl-.- sn) is in L(G). Let H be an empty set initially: 1. Repeat until IHI = Iol (a) i. For every Si E O r choose a lexicon entry ci E Lex(si).</Paragraph> <Paragraph position="2"> ii. From the ci choose one word as the head h0.</Paragraph> <Paragraph position="3"> iii. Let H := {ho} and M := {cili E \[1, IOrl\]} \ H.</Paragraph> <Paragraph position="4"> (b) Repeat until M = 0: i. Choose a head h E H and a valency (b, d, c) E h.valencies and a modifier m E M.</Paragraph> <Paragraph position="5"> ii. If governs(h, d, m) holds then establish the dependency relation between h and the m, and add m to the set H.</Paragraph> <Paragraph position="6"> iii. Remove m from M.</Paragraph> <Paragraph position="7"> The algorithm obviously is (nondeterministically) polynomial in the length of the input. Given that (G, g) E DGR, a dependency tree covering the whole input exists and the algorithm will be able to guess the dependents of every head correctly. If, conversely, the algorithm halts for some input (G, or), then there necessarily must be a dependency tree rooted in ho completely covering a. Thus, (G, a) E DGR. \[\] Lemma 2 Let (V, E, k) be a possible instance of the vertex cover problem. Then a grammar G(V, E, k) and an input a(V, E, k) can be constructed in time polynomial in II (v, E, k)II such that (V, E, k) E VC C/:::::v (G(V, E, k), a(V, E, k)) E DGR \[\] For the proof, we first define the encoding and show that it can be constructed in polynomial time. Then we proceed showing that the equivalence claim holds. The set of classes is G =aef {S, R, U} U {Hdi e \[1, IEI\]} U {U~, 1/41i e \[1, IVI\]}. In the isac hierarchy the classes Ui share the superclass U, the classes V~ the superclass R. Valencies are defined for the classes according to Table 2. Furthermore, we define E =dee {S} U {vii/ E \[1, IVl\]}. The lexicon Lex associates words with classes as given in Table 2.</Paragraph> <Paragraph position="8"> We set G(V, E, k) =clef ( Lex, C, isac, ~) and a(V, E, k) =def s Vl''&quot; Vl&quot;'&quot; yIV\[ &quot; &quot; &quot; VlV ~</Paragraph> </Section> </Section> <Section position="7" start_page="340" end_page="341" type="metho"> <SectionTitle> IEI IEI </SectionTitle> <Paragraph position="0"> For an example, cf. Fig. 4 which shows a dependency tree for the instance of the vertex cover problem from Fig. 2. The two dependencies Ul and u2 represent the complement of the vertex cover.</Paragraph> <Paragraph position="1"> It is easily seen 3 that \[\[(G(V,E,k),a(V,E,k))\[\[ is polynomial in \[\[V\[\[, \[\[E\[\[ and k. From \[El _> k and Definition 1 it follows that H(V,E,k)\[I >_ \[IE\]\[ _> \]\[k\[\[ _> k. 3The construction requires 2 * \[V\[ + \[El + 3 word classes, IV\[ + 1 terminals in at most \[El + 2 readings each. S defines IV\[ + k * IE\[ - k valencies, Ui defines \[E\[ - 1 valencies. The length of a is IV\[ * \[E\[ + 1.</Paragraph> <Paragraph position="2"> lem from Fig. 2.</Paragraph> <Paragraph position="3"> Hence, the construction of (G(V, E, k), a(V, E, k)) can be done in worst-case time polynomial in II(V,E,k)ll.</Paragraph> <Paragraph position="4"> We next show the equivalence of the two problems.</Paragraph> <Paragraph position="5"> Assume (V, E, k) * VC: Then there exists a subset V' C_ V and a function f : E --+ V' such that IV'l <_ k and V(vm,v,~) * E : f((vm,vn)) * {(vm,Vn)}. A dependency tree for a(V, E, k) is constructed by: 1. For every ei * E, one word f(ei) is assigned class Hi and governed by s in valency hi.</Paragraph> <Paragraph position="6"> 2. For each vi * V \ V', IEI - I words vi are assigned class R and governed by the remaining copy of vi in reading Ui through valencies rl to rlEl_l.</Paragraph> <Paragraph position="7"> 3. The vi in reading Ui are governed by s through the valencies uj (j * \[1, IWl - k\]).</Paragraph> <Paragraph position="8"> 4. (k - 1) * IEI words remain in a. These receive reading R and are governed by s in valencies r~ (j * \[1, (k - 1)IEI\]).</Paragraph> <Paragraph position="9"> The dependency tree rooted in s covers the whole input a(V, E, k). Since G(V, E, k) does not give any further restrictions this implies a( V, E, k) * L ( G ( V, E, k ) ) and, thus, (G(V, E, k), a(V, E, k)) * DGR.</Paragraph> <Paragraph position="10"> Conversely assume (G(V, E, k), a(V, E, k)) * DGR: Then a(V, E, k) * L(G(V, E, k)) holds, i.e., there exists a dependency tree that covers the whole input. Since s cannot be governed in any valency, it follows that s must be the root. The instance s of S has IEI valencies of class H, (k- 1) * \[E I valencies of class R, and IWl - k valencies of class U, whose instances in turn have I EI- 1 valencies of class R. This sums up to IEI * IVl potential dependents, which is the number of terminals in a besides s. Thus, all valencies are actually filled. We define a subset Vo C_ V by Vo =def {V E VI3i e \[1, IYl - k\]</Paragraph> <Paragraph position="12"> The dependents of s in valencies hl are from the set V' Vo. We define a function f : E --+ V \ Vo by f(ei) =def s.mod(hi) for all ei E E. By construction f(ei) is an end point of edge ei, i.e.</Paragraph> <Paragraph position="13"> (2) V(v,,,,v,d e E: f((v,.,,,v,4,) e {v,,,,v,.,} We define a subset V' C V by V' =def {f(e)le * E}.</Paragraph> <Paragraph position="14"> From (2), (3), and (4) we induce (V, E, k) * VC. * Theorem 3 The DG recognition problem is in the complexity class Af l)C. \[\] The Af:P-completeness of the DG recognition problem follows directly from lemmata 1 and 2. *</Paragraph> </Section> class="xml-element"></Paper>