File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2066_metho.xml
Size: 18,744 bytes
Last Modified: 2025-10-06 14:10:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2066"> <Title>Mildly Non-Projective Dependency Structures</Title> <Section position="4" start_page="507" end_page="508" type="metho"> <SectionTitle> 2 Dependency graphs </SectionTitle> <Paragraph position="0"> For the purposes of this paper, a dependency graph is a directed graph on the set of indices corresponding to the tokens of a sentence. We write OEnc141 to refer to the set of positive integers up to and including n.</Paragraph> <Paragraph position="1"> Definition 1 A dependency graph for a sentence x D w1;:::;wn is a directed graph1 G D .V I E/; where V D OEnc141 and E DC2 V STXV .</Paragraph> <Paragraph position="2"> Throughout this paper, we use standard terminology and notation from graph theory to talk about dependency graphs. In particular, we refer to the elements of the set V as nodes, and to the elements of the set E as edges. We write i ! j to mean that there is an edge from the node i to the node j (i.e., .i;j/ 2 E), and i !ETX j to mean that the node i dominates the node j, i.e., that there is a (possibly empty) path from i to j. For a given node i, the set of nodes dominated by i is the yield of i. We use the notation EM.i/ to refer to the projection of i: the yield of i, arranged in ascending order.</Paragraph> <Section position="1" start_page="507" end_page="507" type="sub_section"> <SectionTitle> 2.1 Dependency forests </SectionTitle> <Paragraph position="0"> Most of the literature on dependency grammar and dependency parsing does not allow arbitrary dependency graphs, but imposes certain structural constraints on them. In this paper, we restrict ourselves to dependency graphs that form forests.</Paragraph> <Paragraph position="1"> Definition 2 A dependency forest is a dependency graph with two additional properties: 1. it is acyclic (i.e., if i ! j, then not j !ETX i); 2. each of its nodes has at most one incoming edge (i.e., if i ! j, then there is no node k such that k $? i and k ! j).</Paragraph> <Paragraph position="2"> Nodes in a forest without an incoming edge are called roots. A dependency forest with exactly one root is a dependency tree.</Paragraph> <Paragraph position="3"> Figure 1 shows a dependency forest taken from PDT. It has two roots: node 2 (corresponding to the complementizer proto) and node 8 (corresponding to the final punctuation mark).</Paragraph> <Paragraph position="4"> from the Prague Dependency Treebank Some authors extend dependency forests by a special root node with position 0, and add an edge .0;i/ for every root node i of the remaining graph (McDonald et al., 2005). This ensures that the extended graph always is a tree. Although such a definition can be useful, we do not follow it here, since it obscures the distinction between projectivity and planarity to be discussed in section 3.</Paragraph> </Section> <Section position="2" start_page="507" end_page="508" type="sub_section"> <SectionTitle> 2.2 Projectivity </SectionTitle> <Paragraph position="0"> In contrast to acyclicity and the indegree constraint, both of which impose restrictions on the dependency relation as such, the projectivity constraint concerns the interaction between the dependency relation and the positions of the nodes in the sentence: it says that the nodes in a subtree of a dependency graph must form an interval, where an interval (with endpoints i and j) is the set OEi;jc141 WD fk 2 V j i DC4 k and k DC4 j g: Definition 3 A dependency graph is projective, if the yields of its nodes are intervals.</Paragraph> <Paragraph position="1"> Since projectivity requires each node to dominate a continuous substring of the sentence, it corresponds to a ban on discontinuous constituents in phrase structure representations.</Paragraph> <Paragraph position="2"> Projectivity is an interesting constraint on dependency structures both from a theoretical and a practical perspective. Dependency grammars that only allow projective structures are closely related to context-free grammars (Gaifman, 1965; Obre,bski and Grali'nski, 2004); among other things, they have the same (weak) expressivity. The projectivity constraint also leads to favourable parsing complexities: chart-based parsing of projective dependency grammars can be done in cubic time (Eisner, 1996); hard-wiring projectivity into a deterministic dependency parser leads to linear-time parsing in the worst case (Nivre, 2003).</Paragraph> </Section> </Section> <Section position="5" start_page="508" end_page="509" type="metho"> <SectionTitle> 3 Relaxations of projectivity </SectionTitle> <Paragraph position="0"> While the restriction to projective analyses has a number of advantages, there is clear evidence that it cannot be maintained for real-world data (Zeman, 2004; Nivre, 2006). For example, the graph in Figure 1 is non-projective: the yield of the node 1 (marked by the dashed rectangles) does not form an interval--the node 2 is 'missing'. In this section, we present several proposals for structural constraints that relax projectivity, and relate them to each other.</Paragraph> <Section position="1" start_page="508" end_page="508" type="sub_section"> <SectionTitle> 3.1 Planarity and multiplanarity </SectionTitle> <Paragraph position="0"> The notion of planarity appears in work on Link Grammar (Sleator and Temperley, 1993), where it is traced back to Mel'Vcuk (1988). Informally, a dependency graph is planar, if its edges can be drawn above the sentence without crossing. We emphasize the word above, because planarity as it is understood here does not coincide with the standard graph-theoretic concept of the same name, where one would be allowed to also use the area below the sentence to disentangle the edges.</Paragraph> <Paragraph position="1"> Figure 2a shows a dependency graph that is planar but not projective: while there are no crossing edges, the yield of the node 1 (the set f1;3g) does not form an interval.</Paragraph> <Paragraph position="2"> Using the notation linked.i;j/ as an abbreviation for the statement 'there is an edge from i to j, or vice versa', we formalize planarity as follows: Definition 4 A dependency graph is planar, if it does not contain nodes a;b;c;d such that linked.a;c/ ^ linked.b;d/ ^ a < b < c < d : Yli-Jyra (2003) proposes multiplanarity as a generalization of planarity suitable for modelling dependency analyses, and evaluates it experimentally using data from DDT.</Paragraph> <Paragraph position="3"> Definition 5 A dependency graph G D .V I E/ is m-planar, if it can be split into m planar graphs</Paragraph> <Paragraph position="5"> such that E D E1]SOHSOHSOH]Em. The planar graphs Gi are called planes.</Paragraph> <Paragraph position="6"> As an example of a dependency forest that is 2planar but not planar, consider the graph depicted in Figure 2b. In this graph, the edges .1;4/ and .3;5/ are crossing. Moving either edge to a separate graph partitions the original graph into two planes.</Paragraph> </Section> <Section position="2" start_page="508" end_page="509" type="sub_section"> <SectionTitle> 3.2 Gap degree and well-nestedness </SectionTitle> <Paragraph position="0"> Bodirsky et al. (2005) present two structural constraints on dependency graphs that characterize analyses corresponding to derivations in Tree Adjoining Grammar: the gap degree restriction and the well-nestedness constraint.</Paragraph> <Paragraph position="1"> A gap is a discontinuity in the projection of a node in a dependency graph (Platek et al., 2001).</Paragraph> <Paragraph position="2"> More precisely, let EMi be the projection of the node i. Then a gap is a pair .jk;jkC1/ of nodes adjacent in EMi such that jkC1 NULjk > 1.</Paragraph> <Paragraph position="3"> Definition 6 The gap degree of a node i in a dependency graph, gd.i/, is the number of gaps in EMi. As an example, consider the node labelled i in the dependency graphs in Figure 3. In Graph 3a, the projection of i is an interval (.2;3;4/), so i has gap degree 0. In Graph 3b, EMi D .2;3;6/ contains a single gap (.3;6/), so the gap degree of i is 1. In the rightmost graph, the gap degree of i is 2, since EMi D .2;4;6/ contains two gaps (.2;4/ and .4;6/).</Paragraph> <Paragraph position="4"> Definition 7 The gap degree of a dependency graph G, gd.G/, is the maximum among the gap degrees of its nodes.</Paragraph> <Paragraph position="5"> Thus, the gap degree of the graphs in Figure 3 is 0, 1 and 2, respectively, since the node i has the maximum gap degree in all three cases.</Paragraph> <Paragraph position="6"> The well-nestedness constraint restricts the positioning of disjoint subtrees in a dependency forest. Two subtrees are called disjoint, if neither of their roots dominates the other.</Paragraph> <Paragraph position="7"> Definition 8 Two subtrees T1;T2 interleave, if there are nodes l1;r1 2 T1 and l2;r2 2 T2 such that l1 < l2 < r1 < r2. A dependency graph is well-nested, if no two of its disjoint subtrees interleave. null Both Graph 3a and Graph 3b are well-nested.</Paragraph> <Paragraph position="8"> Graph 3c is not well-nested. To see this, let T1 be the subtree rooted at the node labelled i, and let T2 be the subtree rooted at j. These subtrees interleave, as T1 contains the nodes 2 and 4, and T2 contains the nodes 3 and 5.</Paragraph> <Paragraph position="10"/> </Section> <Section position="3" start_page="509" end_page="509" type="sub_section"> <SectionTitle> 3.3 Edge degree </SectionTitle> <Paragraph position="0"> The notion of edge degree was introduced by Nivre (2006) in order to allow mildly non-projective structures while maintaining good parsing efficiency in data-driven dependency parsing.2 Define the span of an edge .i;j/ as the interval S..i;j// WD OEmin.i;j/;max.i;j/c141: Definition 9 Let G D .V I E/ be a dependency forest, let e D .i;j/ be an edge in E, and let Ge be the subgraph of G that is induced by the nodes contained in the span of e.</Paragraph> <Paragraph position="1"> SI The degree of an edge e 2 E, ed.e/, is the number of connected components c in Ge such that the root of c is not dominated by the head of e.</Paragraph> <Paragraph position="2"> SI The edge degree of G, ed.G/, is the maximum among the degrees of the edges in G.</Paragraph> <Paragraph position="3"> To illustrate the notion of edge degree, we return to Figure 3. Graph 3a has edge degree 0: the only edge that spans more nodes than its head and its dependent is .1;5/, but the root of the connected component f2;3;4g is dominated by 1. Both Graph 3b and 3c have edge degree 1: the edge .3;6/ in Graph 3b and the edges .2;4/, .3;5/ and .4;6/ in Graph 3c each span a single connected component that is not dominated by the respective head.</Paragraph> </Section> </Section> <Section position="6" start_page="509" end_page="510" type="metho"> <SectionTitle> 3.4 Related work </SectionTitle> <Paragraph position="0"> Apart from proposals for structural constraints relaxing projectivity, there are dependency frameworks that in principle allow unrestricted graphs, but provide mechanisms to control the actually permitted forms of non-projectivity in the grammar.</Paragraph> <Paragraph position="1"> The non-projective dependency grammar of Kahane et al. (1998) is based on an operation on dependency trees called lifting: a 'lift' of a tree T is the new tree that is obtained when one replaces one 2We use the term edge degree instead of the original simple term degree from Nivre (2006) to mark the distinction from the notion of gap degree.</Paragraph> <Paragraph position="2"> or more edges .i;k/ in T by edges .j;k/, where j !ETX i. The exact conditions under which a certain lifting may take place are specified in the rules of the grammar. A dependency tree is acceptable, if it can be lifted to form a projective graph.3 A similar design is pursued in Topological Dependency Grammar (Duchier and Debusmann, 2001), where a dependency analysis consists of two, mutually constraining graphs: the ID graph represents information about immediate dominance, the LP graph models the topological structure of a sentence. As a principle of the grammar, the LP graph is required to be a lift of the ID graph; this lifting can be constrained in the lexicon.</Paragraph> <Section position="1" start_page="509" end_page="510" type="sub_section"> <SectionTitle> 3.5 Discussion </SectionTitle> <Paragraph position="0"> The structural conditions we have presented here naturally fall into two groups: multiplanarity, gap degree and edge degree are parametric constraints with an infinite scale of possible values; planarity and well-nestedness come as binary constraints.</Paragraph> <Paragraph position="1"> We discuss these two groups in turn.</Paragraph> <Paragraph position="2"> Parametric constraints With respect to the graded constraints, we find that multiplanarity is different from both gap degree and edge degree in that it involves a notion of optimization: since every dependency graph is m-planar for some sufficiently large m (put each edge onto a separate plane), the interesting question in the context of multiplanarity is about the minimal values for m that occur in real-world data. But then, one not only needs to show that a dependency graph can be decomposed into m planar graphs, but also that this decomposition is the one with the smallest number of planes among all possible decompositions. Up to now, no tractable algorithm to find the minimal decomposition has been given, so it is not clear how to evaluate the significance of the concept as such.</Paragraph> <Paragraph position="3"> The evaluation presented by Yli-Jyra (2003) makes use of additional constraints that are sufficient to make the decomposition unique.</Paragraph> <Paragraph position="4"> The fundamental difference between gap degree and edge degree is that the gap degree measures the number of discontinuities within a subtree, while the edge degree measures the number of intervening constituents spanned by a single edge. This difference is illustrated by the graphs displayed in the solid edges) has two gaps, but each of its edges only spans one connected component not dominated by 2 (marked by the squares). In contrast, Graph 4b has gap degree 1 but edge degree 2: the subtree rooted at node 2 has one gap, but this gap contains two components not dominated by 2.</Paragraph> <Paragraph position="5"> Nivre (2006) shows experimentally that limiting the permissible edge degree to 1 or 2 can reduce the average parsing time for a deterministic algorithm from quadratic to linear, while omitting less than 1a37 of the structures found in DDT and PDT. It can be expected that constraints on the gap degree would have very similar effects.</Paragraph> <Paragraph position="6"> Binary constraints For the two binary constraints, we find that well-nestedness subsumes planarity: a graph that contains interleaving sub-trees cannot be drawn without crossing edges, so every planar graph must also be well-nested. To see that the converse does not hold, consider Graph 3b, which is well-nested, but not planar.</Paragraph> <Paragraph position="7"> Since both planarity and well-nestedness are proper extensions of projectivity, we get the following hierarchy for sets of dependency graphs: projective SUB planar SUB well-nested SUB unrestricted The planarity constraint appears like a very natural one at first sight, as it expresses the intuition that 'crossing edges are bad', but still allows a limited form of non-projectivity. However, many authors use planarity in conjunction with a special representation of the root node: either as an artificial node at the sentence boundary, as we mentioned in section 2, or as the target of an infinitely long perpendicular edge coming 'from the outside', as in earlier versions of Word Grammar (Hudson, 2003).</Paragraph> <Paragraph position="8"> In these situations, planarity reduces to projectivity, so nothing is gained.</Paragraph> <Paragraph position="9"> Even in cases where planarity is used without a special representation of the root node, it remains a peculiar concept. When we compare it with the notionofgaps, forexample, wefindthat, inaplanar dependency tree, every gap .i;j/ must contain the root node r, in the sense that i < r < j: if the gap would only contain non-root nodes k, then the two paths from r to k and from i to j would cross. This particular property does not seem to be mirrored in any linguistic prediction.</Paragraph> <Paragraph position="10"> In contrast to planarity, well-nestedness is independent from both gap degree and edge degree in the sense that for every d > 0, there are both well-nested and non-well-nested dependency graphs with gap degree or edge degree d. All projective dependency graphs (d D 0) are trivially well-nested.</Paragraph> <Paragraph position="11"> Well-nestedness also brings computational benefits. In particular, chart-based parsers for grammar formalisms in which derivations obey the well-nestedness constraint (such as Tree Adjoining Grammar) are not hampered by the 'crossing configurations' to which Satta (1992) attributes the fact that the universal recognition problem of Linear Context-Free Rewriting Systems isNP-complete.</Paragraph> </Section> </Section> <Section position="7" start_page="510" end_page="511" type="metho"> <SectionTitle> 4 Experimental evaluation </SectionTitle> <Paragraph position="0"> In this section, we present an experimental evaluation of planarity, well-nestedness, gap degree, and edge degree, by examining how large a proportion of the structures found in two dependency treebanks are allowed under different constraints.</Paragraph> <Paragraph position="1"> Assuming that the treebank structures are sampled from naturally occurring structures in natural language, this provides an indirect evaluation of the linguistic adequacy of the different proposals.</Paragraph> <Section position="1" start_page="510" end_page="511" type="sub_section"> <SectionTitle> 4.1 Experimental setup </SectionTitle> <Paragraph position="0"> The experiments are based on data from the Prague Dependency Treebank (PDT) (HajiVc et al., 2001) and the Danish Dependency Treebank (DDT) (Kromann, 2003). PDT contains 1.5M words of newspaper text, annotated in three layers according to the theoretical framework of Functional Generative Description (Bohmova et al., 2003). Our experiments concern only the analytical layer, and are based on the dedicated training section of the treebank. DDT comprises 100k words of text selected from the Danish PAROLE corpus, with annotation primary dependencies are considered in the experiments, which are based on the entire treebank.4</Paragraph> </Section> </Section> class="xml-element"></Paper>