File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/j01-1004_metho.xml

Size: 58,516 bytes

Last Modified: 2025-10-06 14:07:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="J01-1004">
  <Title>D-Tree Substitution Grammars</Title>
  <Section position="3" start_page="91" end_page="102" type="metho">
    <SectionTitle>
2. Definition of DSG
</SectionTitle>
    <Paragraph position="0"> D-trees are the primitive elements of a DSG. D-trees are descriptions of trees, in particular, certain types of expressions in a tree description language such as that of Rogers and Vijay-Shanker (1992). In this section we define tree descriptions and substitution of tree descriptions (Section 2.1) and d-trees (Section 2.2) together with some associated terminology and the graphical representation (Section 2.3). We then define d-tree substitution grammars, along with derivations of d-tree substitution grammars (Section 2.4) and languages generated by these grammars (Section 2.5), and close with an informal discussion of path constraints (Section 2.6).</Paragraph>
    <Section position="1" start_page="91" end_page="94" type="sub_section">
      <SectionTitle>
2.1 Tree Descriptions and Substitution
</SectionTitle>
      <Paragraph position="0"> In the following, we are interested in a tree description language that provides at least the following binary predicate symbols: A, /~, and -~. These three predicate symbols are intended to be interpreted as the immediate domination, domination, and precedence relations, respectively. That is, in a tree model, the literal x/~ y would be interpreted as node (referred to by the variable) x immediately dominates node (referred to by) y, the literal x/~ y would be interpreted such that x dominates y, and x -~ y indicates that x is to the left of y. In addition to these predicate symbols, we assume there is a finite set of unary function symbols, such as label, which are to be used to describe node labeling. Finally, we assume the language includes the equality symbol.</Paragraph>
      <Paragraph position="1"> We will now introduce the notion of tree description.</Paragraph>
      <Paragraph position="2"> Definition A tree description is a finite set (conjunction) of positive literals in a tree description language.</Paragraph>
      <Paragraph position="3"> In order to make the presentation more readable, tree descriptions are usually presented graphically rather than as logical expressions. Figure 5 gives two tree descriptions, each presented both graphically and in terms of tree descriptions. We introduce  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars x x3</Paragraph>
      <Paragraph position="5"> A tree description (which is also a d-tree) with three components.</Paragraph>
      <Paragraph position="6"> the conventions used in the graphical representations in more detail in Section 2.3. Note that with a functor for each feature, feature structure labels can be specified as required. Although feature structures will be used in the linguistic examples presented in Section 4, for the remainder of this section we will assume that each node is labeled with a symbol by the function label. Furthermore, we assume that these symbols come from two pairwise distinct sets of symbols, the terminal and nonterminal labels. (Note that the examples in this section do not show labels for nodes, but rather their names, while the examples in subsequent sections show the labels.) In the following, we consider a tree description to be satisfiable if it is satisfied by a finite tree model. For our current purposes, we assume that a tree model will be defined as a finite universe (the set of nodes) and will interpret the predicate symbols: G, ,~,, and -~ as the immediate domination, domination, and precedence relations, respectively. For more details on the notion of satisfiability and the definition of tree models, see Backofen, Rogers, and Vijay-Shanker (1995), where the axiomatization of their theory is also discussed. 6 We use d ~ d ~ to indicate that the description d ~ logically follows from d, in other words, that d ~ is known in d. 7 Given a tree description d, we say x dominates y in d if d ~ x ,~ y (similarly for the immediate domination and precedes relations).</Paragraph>
      <Paragraph position="7"> We use vars(d) to denote the set of variables involved in the description d. For convenience, we will also call the variables in vars(d) the nodes of description d. For a tree description d, a node x E vars(d) is a frontier node of d if for all y E vars(d) such that x # y, it is not the case that d =~ x ~ y. Only frontier nodes of the tree description can be labeled with terminals. A frontier node labeled with a nonterminal is called a substitution node.</Paragraph>
      <Paragraph position="8"> A useful notion for tree description is the notion of components. Given a tree description d, consider the binary relation on vars(d) corresponding to the immediate domination relations specified in d; i.e., the relation {(x, Y/ I x, Y E vars(d), d =~ x A y}. The transitive, symmetric, reflexive closure of this relation partitions vars(d) into equivalence classes that we call components. For example, the nodes in the tree description in Figure 6 fall into the three components: { Xl, x2, x3, x4, x5 }, { yl, y2, y3, y4, Y5 }, and { zl,z2, Zg, Z4,Z5 }- In particular, note that y4 and Zl (likewise x3 and z2) are not</Paragraph>
      <Paragraph position="10"> Result of substitution by tree description root.</Paragraph>
      <Paragraph position="11"> scription. This is because the reflexive, symmetric, and transitive closure of the immediate domination relation known in the description will not include these pairs of nodes.</Paragraph>
      <Paragraph position="12"> We say that x is the root of a component if it dominates every node in its component, and we say that x is on the frontier of a component if the only node in its component that it dominates is itself. Note that x can be on the frontier of a component of d without being a frontier node of a tree description. For example, in Figure 6, x3 is a frontier of a component but not a frontier of the tree description. In contrast, z3 is both a frontier of a component as well as a frontier of the tree description. We say that x is the root of a tree description if it dominates every node in the tree description. Note that it need not be the case that every tree description has a root. For example, according to the definition of tree descriptions, the description in Figure 6 is a tree description and does not have a root. Although we know that either xl or Yl dominates all nodes in a tree model of the tree description, we don't know which.</Paragraph>
      <Paragraph position="13"> We can now define the substitution operation on tree descriptions that will be used in DSG. We use dl \[y/x\] to denote the description obtained from dl by replacing all instances in dl of the variable x by y.</Paragraph>
      <Paragraph position="14"> Definition Let dl and d2 be two tree descriptions. Without loss of generality, we assume that vars(dl) N vars(d2) = C/. Let x E vars(dl) be a root of a component of dl and y E vars(d2) be a substitution node in the frontier of d2. Let d be the description dl kJ d2\[x/y\]. We say that d is obtained from dl and d2 by substituting x at y.</Paragraph>
      <Paragraph position="15"> Note that in addition, we may place restrictions on the values of the labeling functions for x and y in the above definition. Typically, for a node labeling function such as label we require label(x) = label(y), and for functions that return feature structures we require unifiability (with the unification being the new value of the feature function  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars</Paragraph>
      <Paragraph position="17"> Result of substitution by component root.</Paragraph>
      <Paragraph position="18"> description but the root Zl of a component of the tree description on the right of Figure 5 at the substitution node y3 of the tree description on the left of Figure 5.</Paragraph>
    </Section>
    <Section position="2" start_page="94" end_page="95" type="sub_section">
      <SectionTitle>
2.2 D-Trees
</SectionTitle>
      <Paragraph position="0"> D-trees are certain types of tree descriptions: not all tree descriptions are d-trees. In describing syntactic structure, we are interested in two kinds of primitive tree descriptions. The first kind of primitive tree description, which we call parent-child descriptions, involves n + 1 (n _&gt; 1) variables, say x, xl,. *., Xn, and in addition to specifying categorial information associated with these variables, specifies tree structure of the form {X &amp; I1 ..... X A Xn, Xl &amp;quot;g X2 ..... Xn-1 &amp;quot;C/, Xn} A parent-child description corresponds to a phrase structure rule in a context-free grammar, and by extension, to a phrase structure rule in X-bar theory, to the instantiation of a rule schema in HPSG, or to a c-structure rule in LFG. As in a context-free grammar, in DSG we assume the siblings Xl,..., Xn are totally ordered by precedence, s The second kind of primitive description, which we call a domination description, has the form {x &amp; y}, where x and y are variables. In projecting from a lexical item to obtain the elementary objects of a grammar, this underspecified domination statement allows for structures projected from other lexical items to be interspersed during the derivation process.</Paragraph>
      <Paragraph position="1"> Definition A d-tree is a satisfiable description in the smallest class of tree descriptions obtained by closing the primitive tree descriptions under the substitution operation.</Paragraph>
      <Paragraph position="2"> For example, Figure 9 shows how the d-tree in Figure 6 is produced by using six parent-child descriptions and two domination descriptions. The ovals show cases of substitution; the circle represents a case of two successive substitutions. Figure 10 shows a tree description that is not a d-tree: it is not a parent-child description, nor 8 One could, of course, relax this constraint and assume that they are only partially ordered. However, for now, we do not consider such an extension. See Section 4.4 for a discussion.</Paragraph>
      <Paragraph position="3">  Computational Linguistics Volume 27, Number 1</Paragraph>
      <Paragraph position="5"/>
      <Paragraph position="7"> A description that is not a d-tree.</Paragraph>
      <Paragraph position="8"> Y~ can it be derived from two domination descriptions by substitution, since substitution can only occur at the frontier nodes.</Paragraph>
      <Paragraph position="9"> A d-tree d is complete if it does not contain any substitution nodes, i.e., all the frontier nodes of the description d are labeled by terminals. Given a d-tree d, we say that a pair of nodes, x and y (variables in vars(d)), are related by an i-edge if d ::~ xAy. We say that x is an i-parent and y is an i-child. Given a d-tree d, we say that a pair of nodes, x and y, are related by a d-edge if it is known from d that x dominates y, it is not known from d that x immediately dominates y, and there is no other variable in d that is known to be between them. That is, a pair of nodes x and y, x ~ y, are related byad-edgeifd ~ x~y,d ~ xGy, and for allz E vars(d),ifd ~ (x&amp;zAz~y) then z = x or z = y. If x and y are related by a d-edge, then we say that they are d-parent and d-child, respectively. Note that a node in a d-tree (unlike a node in a tree description) cannot be both an i-parent and a d-parent at the same time.</Paragraph>
    </Section>
    <Section position="3" start_page="95" end_page="96" type="sub_section">
      <SectionTitle>
2.3 Graphical Presentation of a D-Tree
</SectionTitle>
      <Paragraph position="0"> We usually find it more convenient to present d-trees graphically. When presenting a d-tree graphically, i-edges are represented with a solid line, while d-edges are represented with a broken line. All immediate dominance relations are always represented graphically, but only the domination relations corresponding to d-edges are shown explicitly in graphical presentations.</Paragraph>
      <Paragraph position="1"> By definition of d-trees, each component of a d-tree is fully specified with respect to immediate domination. Thus, all immediate domination relations between nodes in a component are indicated by i-edges. Also, by definition, components must be fully specified with respect to precedence. That is, for any two nodes u and v within  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars a component we must know whether u precedes v or vice versa. In fact, all precedence information derives from precedence among siblings (two nodes immediately dominated by a common node). This means that all the precedence in a description can be expressed graphically simply by using the normal left-to-right ordering among siblings.</Paragraph>
      <Paragraph position="2"> Another important restriction on d-trees has to do with how components are related to one another. As we said above, a frontier node of a component of a d-tree can be a d-parent but not an i-parent, and only frontier nodes of a component can serve as d-parents. However, by definition, a frontier node of a d-tree can neither be a d-parent nor an i-parent. Graphically, this restriction can be characterized as follows: edges specifying domination (d-edges) must connect a node on the frontier of a component with a node of another component. Furthermore, nodes on the frontier of a component can have at most one d-child.</Paragraph>
      <Paragraph position="3"> Recall that not every set of positive literals involving A, /~, and -~ is a legal d-tree. In particular, we can show that a description is a d-tree if and only if it is logically equivalent to descriptions that, when written graphically, would have the appearance described above.</Paragraph>
    </Section>
    <Section position="4" start_page="96" end_page="96" type="sub_section">
      <SectionTitle>
2.4 D-Tree Substitution Grammars
</SectionTitle>
      <Paragraph position="0"> We can now define d-tree substitution grammars.</Paragraph>
      <Paragraph position="1"> Definition A d-tree substitution grammar (DSG) G is a 4-tuple (VT, VN, T, ds), where VT and VN are pairwise distinct terminal and nonterminal alphabets, respectively, T is a finite set of elementary d-trees such that the functor label assigns each node in each d-tree in T a label in VT U VN and such that only d-tree frontier nodes take labels in VT, and ds is a characterization of the labels that can appear at the root of a derived tree.</Paragraph>
      <Paragraph position="2"> Derivations in DSG are defined as follows. Let G = (VT, VN, T, ds) be a DSG. Furthermore:</Paragraph>
      <Paragraph position="4"> d-trees in Ti by substitution at a node x such that label(x) c VN}.</Paragraph>
      <Paragraph position="5"> The d-tree language T(G) generated by G is defined as follows.</Paragraph>
      <Paragraph position="6"> T(G)={dcTili&gt;0, discomplete} In a lexicalized DSG, there is at least one terminal node on the frontier of every d-tree; this terminal is (these terminals are) designated the anchor(s) of the d-tree. The remaining frontier nodes (of the description) and all internal nodes are labeled by nonterminals. Nonterminal nodes on the frontier of a description are called substitution nodes because these are nodes at which a substitution must occur (see below). Finally, we say that a d-tree d is sentential if d has a single component and the label of the root of d is compatible with ds.</Paragraph>
    </Section>
    <Section position="5" start_page="96" end_page="98" type="sub_section">
      <SectionTitle>
2.5 Reading D-Trees
</SectionTitle>
      <Paragraph position="0"> A description d is a tree if and only if it has a single component (i.e., it does not have any d-edges). Therefore, the process of reading off trees from d-trees can be viewed as  Computational Linguistics Volume 27, Number 1 a nondeterministic process that involves repeatedly removing d-edges until a d-tree with a single component results.</Paragraph>
      <Paragraph position="1"> In defining the process of removing a d-edge, we require first, that no i-edges be added which are not already specified in the components, and second, that those i-edges that are distinct prior to the process of reading off remain distinct after the removal of the d-edges. This means that each removal of a d-edge results in equating exactly one pair of nodes. These requirements are motivated by the observation that the i-edges represent linguistically determined structures embodied in the elementary d-trees that cannot be created or reduced during a derivation.</Paragraph>
      <Paragraph position="2"> We now define the d-edge removal algorithm. A d-edge represents a domination relation of length zero or more. Given the above requirements, at the end of the composition process, we can, when possible, get a minimal reading of a d-edge to be a domination relation of length zero. Thus, we obtain the following procedure for removing d-edges: Consider a d-edge with a node x as the d-parent and with a d-child y. By definition of d-trees, x is on the frontier of a component. The d-child y can either be a root of a component or not. Let us first consider the case in which y is a root of a component. To remove this d-edge, we equate x with y.9 This gives us the minimal reading that meets the above requirement (that no i-edges are added which are not already specified in the components, and that those i-edges that are distinct prior to the process of reading off remain distinct after the removal of the d-edge). Now we consider the alternate case in which the d-child is not the root of its component. Let z be the root of the component containing y. Now both z and x are known to dominate y and hence in any model of the description, either z will dominate x or vice versa.</Paragraph>
      <Paragraph position="3"> Equating x with y (the two nodes in the d-edge under consideration) has the potential of requiring the collapsing of i-edges (e.g., i-edges between x and its parent and y and its parent in the component including z). As a consequence of our requirement, the only way to remove the d-edge is by equating the nodes x and z. If we equated x with any other node dominated by z (such as y), we would also be collapsing i-edges from two distinct components and equating more than one pair of nodes, contrary to our requirement. The removal of the d-edge by equating x and z can also be viewed as adding a d-edge from x to z (which, as mentioned, is compatible with the given description and does not have the potential for collapsing i-edges). Now since this d-edge is between a frontier of a component and the root of another, it can be removed by equating the two nodes.</Paragraph>
      <Paragraph position="4"> Definition A tree t can be read off from a d-tree d iff t is obtained from d by removing the d-edges of d in any order using the d-edge removal algorithm.</Paragraph>
      <Paragraph position="5"> By selecting d-edges for removal in different orders, different trees can be produced. Thus, in general, we can read off several trees from each d-tree in T(G). For example, the d-tree in Figure 6 can produce two trees: one rooted in xl (if we choose to collapse the edge between y4 and zl first) and one rooted in yl (if we choose to collapse the edge between x3 and z2 first). The fact that a d-tree can have several minimal readings can be exploited to underspecify different word orderings (see Section 4.4). 9 This additional equality to obtain the minimal readings is similar to unification of the so-called top and bottom feature structures associated with a node in tree adjoining grammars, which happens at the end of a derivation. In DSG, if the labeling specifications on x and y are incompatible, then the additional equality statement does not lead to any minimal tree model, just as in TAG, a derivation cannot terminate if the top and bottom feature structures associated with a node do not unify.  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars Thus, while a single d-tree may describe several trees, only some of these trees will be read off in this way. This is because of our assumptions about what is being implicitly stated in a d-tree--for example, our requirement that i-edges can be neither destroyed nor created in a derivation. Assumptions such as these about the implicit content of d-trees constitute a theory of how to read off from d-trees. Variants of the DSG formalism can be defined, which differ with respect to this theory.</Paragraph>
      <Paragraph position="6"> We now define the tree and string languages of a DSG.</Paragraph>
      <Paragraph position="7"> Definition Let G be a DSG. The tree language T(G) generated by G is the set of trees that can be read off from sentential d-trees in T(G).</Paragraph>
      <Paragraph position="8"> Definition The string language generated by G is the set of terminal strings on the frontier of trees in T(G).</Paragraph>
    </Section>
    <Section position="6" start_page="98" end_page="102" type="sub_section">
      <SectionTitle>
2.6 DSG with Path Constraints
</SectionTitle>
      <Paragraph position="0"> In DSG, domination statements are used to express domination paths of arbitrary length. There is no requirement placed on the nodes that appear on such paths. In this section, we informally define an extension to DSG that allows for additional statements constraining the paths.</Paragraph>
      <Paragraph position="1"> Path constraints can be associated with domination statements to constrain which nodes, in terms of their labels, can or cannot appear within a path instantiating a d-edge. 1deg Path constraints do not directly constrain the length of the domination path, which still remains underspecified. Path constraints are specified in DSG by associating with domination statements a set of labels that defines which nodes cannot appear within this path. u Suppose we have a statement x A y with an associated path constraint set, P, then logically this pair can be understood as x A y A Vz(z ~ x A z y~</Paragraph>
      <Paragraph position="3"> Note that during the process of derivation involving substitution, the domination statements in the two descriptions being composed continue to exist and do not play any role in the composition operation itself. The domination statements only affect the reading off process. For this reason, we can capture the effect of path constraints by merely defining how they affect the reading off process. Recall that the reading off process is essentially the elimination of d-edges to arrive at a single component d-tree.</Paragraph>
      <Paragraph position="4"> If there is a d-edge between x and y, we consider two situations: is the d-child y the root of a component, or not? When y is the root of a component, then x and y are collapsed. Clearly any path constraint on this d-edge has no effect. However, when y is not the root of a component, and z is the root of the component containing y, then the tree we obtain from the reading off process is one where x dominates z and not where z properly dominates x. That is, in this case, we replace the d-edge between x and y with a d-edge between x and z, which we then eliminate in the reading off process by equating x and z. But in order to replace the d-edge between x and y with a d-edge between x and z, we need to make sure that the path between z and y does not violate the path constraint associated with the d-edge between x and y.</Paragraph>
      <Paragraph position="5">  3. Properties of the Languages of DSG  It is clear that any context-free language can be generated by DSG (a context-free grammar can simply be reinterpreted as a DSG). It is also easy to show that the weak generative capacity of DSG exceeds that of context-free grammars. Figure 11 shows three d-trees (including two copies of the same d-tree) that generate the non-context-free language { anb'c&amp;quot; In &gt; 1 }. Figure 12 shows the result of performing the first of two substitutions indicated by the arrows (top) and the result of performing both substitutions (bottom). Note that although there are various ways that the domination edges can be collapsed when reading off trees from this d-tree, the order in which we collapse domination edges is constrained by the need to consistently label nodes being equated. This is what gives us the correct order of terminals.</Paragraph>
      <Paragraph position="6"> Figure 13 shows a grammar for the language { w E { a, b, c }* \] w has an equal nonzero number of a's, b's and c's }, which we call Mix. This grammar is very similar to the previous one. The only difference is that node labels are no longer used to constrain word order. Thus the domination edges can be collapsed in any order.</Paragraph>
      <Paragraph position="7"> Both of the previous two examples can be extended to give a grammar for strings containing an equal number of any number of symbols simply by including additional components in the elementary d-trees for each symbol to be counted. Hence, DSG generates not only non-context-free languages but also non-tree adjoining languages, since LTAG cannot generate the language { anbncndnen i n _~ 1 } (Vijay-Shanker 1987). However, it appears that DSG cannot generate all of the tree adjoining languages, and we conjecture that the classes are therefore incomparable (we offer no proof of this claim in this paper). It does not appear to be possible for DSG to generate the copy language { ww \[ w c { a, b }* }. Intuitively, this claim can be motivated by the observation that nonterminal labels can be used to control the ordering of a bounded number of terminals (as in Figure 12), but this cannot be done in an unbounded way, as would be required for the copy language (since the label alphabet is finite).</Paragraph>
      <Paragraph position="9"> Counting to three: After substituting one tree (above) and the derived d-tree (below).</Paragraph>
      <Paragraph position="10"> DSG is closely related (and weakly equivalent) to two equivalent string rewriting systems, UVG-DL and {}-LIG (Rainbow 1994a, 1994b). In UVG-DL, several context-free rewrite rules are grouped into a set, and dominance links may hold between  A grammar for Mix.</Paragraph>
      <Paragraph position="11"> right-hand-side nonterminals and left-hand-side nonterminals of different rules from the same set. In a derivation, the context-free rules are applied as usual, except that all rules from an instance of a set must be used in the derivation, and at the end of the derivation, the dominance links must correspond to dominance relations in the derivation tree. {}-LIG is a multisebvalued variant of Linear Index Grammar (Gazdar 1988). UVG-DL and {}-LIG, when lexicalized, generate only context-sensitive languages.</Paragraph>
      <Paragraph position="12"> Finally, Vijay-Shanker, Weir, and Rainbow (1995), using techniques developed for UVG-DL (Rainbow 1994a; Becker and Rambow 1995), show that the languages generated by lexicalized DSG can be recognized in polynomial time. This can be shown with a straightforward extension to the usual bottom-up dynamic programming algorithm for context-free grammars. In the DSG case, the nonterminals in the chart are paired with multisets. The nonterminals are used to verify that the immediate dominance relations (i.e., the parent-child descriptions) hold, just as in the case of CFG. The multisets record the domination descriptions whose lower (dominated) node has been found but whose upper (dominating) node still needs to be found in order for the parse to find a valid derivation of the input string (so-called open domination descriptions). The key to the complexity result is that the size of the multisets is linearly bounded by the length of the input string if the grammar is lexicalized, and the number of multisets of size n is polynomial in n. Furthermore, if the number of open domination descriptions in any chart entry is bounded by some constant independent  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars of the length of the input string (as is plausible for many natural languages including English), the parser performs in cubic time.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="102" end_page="115" type="metho">
    <SectionTitle>
4. Some Linguistic Analyses with DSG
</SectionTitle>
    <Paragraph position="0"> In Section 1, we saw that the extended domain of locality of the elementary structures of DSG--which DSG shares with LTAG--allows us to develop lexicalized grammars in which the elementary structures contain lexical items and the syntactic structure they project. There has been considerable research in the context of LTAG on the issue of how to use the formalism for modeling natural language syntax--we mention as salient examples XTAG-Group (1999), a wide-coverage grammar for English, and Frank (1992, forthcoming), an extensive investigation from the point of view of theoretical syntax. Since DSG shares the same extended domain of locality as LTAG, much of this research carries over to DSG. In this section, we will be presenting linguistic analyses in DSG that follow some of the elementary principles developed in the context of LTAG. We will call these conventions the standard LTAG practices and summarize them here for convenience.</Paragraph>
    <Paragraph position="1"> * Each elementary structure contains a lexical item (which can be multiword) and the syntactic structure it projects.</Paragraph>
    <Paragraph position="2"> * Each elementary structure for a syntactic head contains syntactic positions for its arguments. (In LTAG, this means substitution or foot nodes; in DSG, this means substitution nodes.) * When combining two elementary structures, a syntactic relation between their lexical heads is established. For example, when substituting the elementary structure for lexical item ll into an argument position of the elementary structure for lexical item 12, then ll is in fact an argument of 12.</Paragraph>
    <Paragraph position="3"> In Section 1 we also saw that the adjoining operation of LTAG has two properties that appear arbitrary from a tree description perspective. The first property is the recursion requirement, which states that the root and foot of an auxiliary tree must be identically labeled. This requirement embodies the principle that auxiliary trees are seen as factoring recursion. The second property, which we will refer to as the nesting property of adjunction, follows from the fact that the adjoining operation is not symmetrical. All the structural components projected from one lexical item (corresponding to the auxiliary tree used in an adjoining step) are included entirely between two components in the other projected structure. That is, components of only one of the lexically projected structures can get separated in an adjoining step.</Paragraph>
    <Paragraph position="4"> In this section, we examine some of the ramifications of these two constraints by giving a number of linguistic examples for which they appear to preclude the formulation of an attractive analysis. We show that the additional flexibility inherent in the generalized substitution operation is useful in overcoming the problems that arise.</Paragraph>
    <Section position="1" start_page="102" end_page="106" type="sub_section">
      <SectionTitle>
4.1 Factoring of Recursion
</SectionTitle>
      <Paragraph position="0"> We begin by explaining why, in LTAG, the availability of analyses for long-distance dependencies is limited by the recursion requirement. Normally, substitution is used in LTAG to associate a complement to its head, and adjunction is used to associate a modifier. However, adjunction rather than substitution must be used with complements involving long-distance dependencies, e.g., in wh-dependencies and raising  S-analysis for extraction from infinitival complements.</Paragraph>
      <Paragraph position="1"> constructions. Such auxiliary trees are called predicative auxiliary trees. 12 In a predicative auxiliary tree, the foot node should be one of the nonterminal nodes on the frontier that is included due to argument requirements of the lexical anchor of the tree (as determined by its active valency). However, the recursion requirement means that all frontier nonterminal nodes that do not have the same label as the root node must be designated as substitution nodes, which may mean that no well-formed auxiliary tree can be formed.</Paragraph>
      <Paragraph position="2"> Let us consider again the topicalized sentence used as an example in Section 1, repeated here for convenience:  (1) Many of us, John hopes to meet  A possible analysis is shown in Figure 3 in Section 1. We will refer to this analysis as the VP-complement analysis. Note that the individual pieces of the structures projected from lexical items follow standard LTAG practices. Because of the recursion requirement, the tree on the right is not (a description of) an auxiliary tree. To obtain an auxiliary tree in order to give a usual TAG-style account of long-distance dependencies, the complement of the equi-verb (control verb) hopes must be given an S label, which in turn imposes a linguistic analysis using an empty (PRO) subject as shown in Figure 14 (or, at any rate, an analysis in which the infinitival to meet projects to S). The VP-complement analysis has been proposed within different frameworks, and has been adopted as the standard analysis in HPSG (Pollard and Sag 1994). However, because this would require an auxiliary tree rooted in S with a VP foot node, the recursion requirement precludes the adoption of such an analysis in LTAG. We are 12 This term is from Schabes and Shieber (1994). Kroch (1987) calls such trees complement auxiliary trees.  Rainbow, Vijay-Shanker, and Weir D-Tree Substitution Grammars</Paragraph>
      <Paragraph position="4"> to e gave the book  not suggesting that one linguistic analysis is better than another, but instead we point out that the formal mechanism of LTAG precludes the adoption of certain linguistically motivated analyses. Furthermore, this mechanism makes it difficult to express entire grammars originally formulated in other formalisms in LTAG; for example, when compiling a fragment of HPSG into TAG (Kasper et al. 1995). In fact, the compilation produces structures just like those (described) in Figure 3. Kasper et al. (1995) consider the tree on the right of Figure 3 to be an auxiliary tree with the VP sibling of the anchor determined to be the foot node. Technically, the tree on the right of Figure 3 caimot be an auxiliary tree. Kasper et al. (1995) overcome the problem by making the node label a feature (with all nodes having a default label of no significance). This determination of the foot node is independent of the node labels of the frontier nodes. Instead, the foot node is chosen because it shares certain crucial features (other than label!) with the root node. These shared features are extracted from the HPSG rule schema and are used to define the localization of dependencies in the compiled TAG grammar. See Kasper et al. (1995) for details.</Paragraph>
      <Paragraph position="5"> A similar example involves analyses for sentences such as (2), which involve extraction from argument PPs.</Paragraph>
      <Paragraph position="6"> (2) John, Peter gave the book to Figure 15 shows the structures obtained by using the method of Kasper et al. (1995) for compiling an HPSG fragment to TAG-like structures. In contrast to traditional TAG analyses (in which the elementary tree contains the preposition and its PP, with the NP complement of the preposition as a substitution node), the PP argument of the ditransitive verb is not expanded. ~3 Instead the PP tree anchored by the preposition is substituted. However, because of the extraction, DSG's notion of substitution rather than LTAG substitution would need to be used.</Paragraph>
      <Paragraph position="7"> These examples suggest that the method for compiling an HPSG fragment into TAG-like structures discussed in Kasper et al. (1995) can be simplified by compiling HPSG to a DSG-like framework.</Paragraph>
      <Paragraph position="8"> 13 Recall that we are not, in this section, advocating one analysis over another; rather, we are discussing the range of options available to the syntactician working in the TAG framework.  We have shown a number of examples where some, but not all, of the possible linguistic analyses can be expressed in LTAG. It could be claimed that a formal framework limiting the range of possible analyses constitutes a methodological advantage rather than a disadvantage. However, as is well known, there are several other examples in English syntax where the factoring of recursion requirement in fact eliminates all plausible LTAG analyses. The only constraint assumed here is that extraction is localized in elementary trees. One such example in English is extraction out of a &amp;quot;picture-NP&amp;quot; (a noun which takes a prepositional complement from which extraction into the main sentence is possible), as illustrated in the following example: (3) This painting, John bought a copy of Following the standard LTAG practices, we would obtain the structures described in Figure 16. As these descriptions show, the recursion constraint means that adjoining cannot be used to provide this analysis of extraction out of NPs. See Kroch (1989) for various examples of such constructions in English and their treatment using an extension of TAG called multicomponent tree adjoining grammars. (We return to analyses using multicomponent TAG in Section 4.2.) However, we now show that all of these cases can be captured uniformly with generalized substitution (see Figure 17). The node labeled X in fl arises due to the argument requirements of the anchor (the verb) and when X = S, fl is a predicative auxiliary tree in LTAG. The required derived phrase structure in these cases is described by 7. To obtain these trees, it would suffice to simply substitute the component rooted in X of c~ at the node labeled X in ft. While in general, such a substitution would not constrain the placement of the upper component of t, because of the labels of the relevant nodes, this substitution will always result in % The use of substitution at argument nodes not only captures the situations where adjoining or multicompo- null Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars</Paragraph>
      <Paragraph position="10"> General case of extraction.</Paragraph>
      <Paragraph position="11"> nent adjoining is used for these examples, it also allows the DSG treatment to be uniform, and is applicable even in cases where there is no extraction (e.g., the upper component of a is not present).</Paragraph>
      <Paragraph position="12"> We end this discussion of the nature of foot nodes by addressing the question of how the choice of foot nodes limits illicit extractions. In the TAG approach, the designation of a foot node specifically rules out extraction from any structure that gets attached to any other frontier node (other arguments), or from structures that are adjoined in (adjuncts). However, as has been pointed out before (Abeill6 1991), the choice of foot nodes is not always determined by node labels alone, for example in the presence of sentential subjects or verbs such as ddduire, which can be argued to have two sentential objects. In these cases some additional linguistic criteria are needed in order to designate the foot node. These same linguistic criteria can be used to designate frontier nodes from which extraction is possible; extraction can be regulated through the use of features. We also note that in moving to a multicomponent TAG analysis, an additional regulatory mechanism becomes necessary in any case to avoid extractions out of subjects (and, to a lesser degree, out of adjuncts). We refer the interested reader to Rainbow, Vijay-Shanker, and Weir (1995) and Rambow and Vijay-Shanker (1998) for a fuller discussion.</Paragraph>
    </Section>
    <Section position="2" start_page="106" end_page="110" type="sub_section">
      <SectionTitle>
4.2 Interspersing of Components
</SectionTitle>
      <Paragraph position="0"> We now consider how the nesting constraint of LTAG limits the TAG formalism as a descriptive device for natural language syntax. We contrast this with the case of DSG, which, through the use of domination in describing elementary structures projected from a lexical item, allows for the interleaving of components projected from lexical items during a derivation.</Paragraph>
      <Paragraph position="1"> Consider the raising example introduced in Section 1 repeated here as (4a), along with its nontopicalized version (4b), which indicates a possible original position for  Topicalization out of the clause of a raising verb.</Paragraph>
      <Paragraph position="2"> the topicalized phrase) 4 (4) a. To many of us, John appears to be happy b. John appears to many of us to be happy Following standard LTAG practices of localizing argument structure (even in the presence of topicalization) and the standard LTAG analysis for the raising verb appear, the descriptions shown in Figure 18 could be proposed. Because of the nesting property of adjunction, the interleaving required to obtain the relevant phrase structure for the sentence (4a) cannot be realized using LTAG with the assumed lexical projections (or any other reasonable structures where the topicalized PP and the verb appear are in the same projected structure). In contrast, with these projections, using generalized substitution in DSG (i.e., equating the VP argument node of the verb and the root of the infinitival VP), the only possible derived tree is the desired one.</Paragraph>
      <Paragraph position="3"> We will now consider an example that does not involve a wh-type dependency: (5) Didn't John seem to like the gift? Following the principles laid out in Frank (1992) for constructing the elementary trees of TAG, we would obtain the projections described in Figure 19 (except for the node labels). Note in particular the inclusion of the auxiliary node with the cliticized negation marker in the projection of the raising verb seem. Clearly the TAG operations could never yield the necessary phrase structure given this localization. Once again, the use of generalized substitution in DSG would result in the desired phrase structure.</Paragraph>
      <Paragraph position="4"> An alternative to the treatment in Frank (1992) is implemented in the XTAG grammar for English (XTAG-Group 1999) developed at the University of Pennsylvania. The XTAG grammar does not presuppose the inclusion of the auxiliary in the projection of the main verb. Rather, the auxiliary gets included by separately adjoining a tree 14 Throughout this section, we underline the embedded clause with all of its arguments, such as here, the raised subject.  to like the gift projected from the auxiliary verb. The adjunction of the auxiliary is forced through a linguistically motivated system of features. A treatment such as this is needed to avoid using multicomponent adjoining. In our example, the auxiliary, along with the negation marker, is adjoined into the tree projected by the embedded verb like, which may be considered undesirable since semantically, it is the matrix verb seem that is negated. We take this example to show once more that TAG imposes restrictions on the linguistic analyses that can be expressed in it. Specifically, there are constructions (which do not involve long-distance phenomena) for which one of the most widely developed and comprehensive theories for determining the nature of localization in elementary trees--that of Frank (1992)---calmot be used because of the nature of the TAG operation of adjunction. In contrast, the operations of DSG allow this theory of elementary lexical projections to be used.</Paragraph>
      <Paragraph position="5"> In English, the finite verb appears before the subject only in questions (and in some other contexts such as neg-inversion), but in other languages, this word order is routine, leading to similar problems for an LTAG analysis. In V1 languages such as Welsh, the subject appears in second position after the finite verb in the standard declarative sentence. The raised subject behaves in the same manner as the matrix subject, as observed in Harley and Kulick (1998) and illustrated in (6), from Hen- null John happening be seeing Mary happens to be seeing Mary In German, a V2 language, the finite verb appears in second position in matrix clauses. The first position may be occupied by any constituent (not necessarily the subject). When the subject is not in initial position, it follows the finite verb, both in  simplex sentences and in raising constructions: (7) a. Leider wird es standig regnen unfortunately will itNOM continually rain Unfortunately, it will rain continually b. Oft schien e_~s uns st~indig zu regnen often seemed itNo M USDA w continually to rain Often it seemed to us to rain continually In the German example, a separate adjunction of the tensed verb (as in the XTAG analysis of the English auxiliary) is not a viable analysis at all, since the tensed verb is not an auxiliary but the main (raising) verb of the matrix clause.</Paragraph>
      <Paragraph position="6"> We now return to examples that do not include raising, but only wh-dependencies. (8) a. John slept under the bridge b. Which bridge did John sleep under? Most LTAG analyses would treat the prepositional phrase in (8a) as an adjunct and use an intransitive frame for the verb. However, the related sentence (8b) cannot be analyzed with TAG operations in the same way, because the projected structures from the verb and the preposition would have to be as shown in Figure 20. The interspersing of components from these projections to obtain the desired tree cannot be obtained using adjoining. Clearly, with the appropriate generalized substitutions in DSG, this tree alone will be derived with these lexical projections.</Paragraph>
      <Paragraph position="7"> Related problems arise in languages in which a wh-moved element does not invariably appear in sentence-initial position, as it does in English. For example, in  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars Kashmiri, the wh-element ends up in second position in the presence of a topic. This is the case even if the wh-element comes from the embedded clause and the topic from the matrix clause. (The data is from Bhatt, \[1994\].) (9) a. rameshan kyaa dyutnay tse RameShERc whatNoM gave yOUDA T What did you give Ramesh? b. rameshan kyaai chu baasaan ki me kor ti RameShERc what is belieVeNPERF that IERC do What does Ramesh believe that I did? Another example comes from Rumanian. Rumanian differs from English in that it allows multiple fronted wh-elements in the same clause. Leahu (1998) illustrates this point with the examples in (10) (her (8a) and (11a)); (10a) shows multiple wh-movement in the same clause, while (10b) shows multiple wh-words in one clause that originate from different clauses, resulting again in an interspersed order.</Paragraph>
      <Paragraph position="8"> (10) a. Cinei cuij ti promite o masina tj? who to whom promises a car Who promises a car to whom? b. Cinei pe cinej a zis ti ca a vazut tj? who whom has said that has seen Who has said he has seen whom? The examples discussed in this section show a range of syntactic phenomena in English and in other languages that cannot be analyzed using the operations of TAG. We conclude that complex interspersing is a fairly common phenomenon in natural language. As in the case of factoring of recursion, sometimes we find that the definition of adjunction precludes certain linguistically plausible analyses but allows others; in other cases, TAG does not seem to allow any linguistically plausible analysis at all. However, in each case, we can use standard LTAG practices for projecting structures from lexical items and combine the resulting structures using the generalized substitution operation of DSG to obtain the desired analyses, thus bringing out the underlying similarity of related constructions both within languages and crosslinguistically. null</Paragraph>
    </Section>
    <Section position="3" start_page="110" end_page="113" type="sub_section">
      <SectionTitle>
4.3 Linguistic Use of Path Constraints
</SectionTitle>
      <Paragraph position="0"> In the examples discussed so far, we have not had the need to use path constraints.</Paragraph>
      <Paragraph position="1"> The d-edges seen so far express any domination path. Recall that path constraints can be associated with a d-edge to express certain constraints on what nodes, in terms of their labels, cannot appear within a path instantiating a d-edge.</Paragraph>
      <Paragraph position="2">  Path constraints are needed to rule out ungrammatical super-raising.</Paragraph>
      <Paragraph position="3"> As an example of the use of path constraints, let us consider the well-known case  of &amp;quot;super-raising&amp;quot;: (11) a. It seems wood appears to float b. *Wood seems it appears to float c. Wood seems to appear to float  In (11a), the subject of float, wood, has raised to the appears clause, while the raising verb seem does not trigger raising and has an expletive it as its subject* In (11b), wood has raised further, and appear now has an expletive subject; (11b) is completely ungrammatical. If we make the intermediate raising verb appear nonfinite (and hence without a subject), as in (11c), the sentence is again grammatical. Now consider the DSG analysis for (11a) shown in Figure 21. The d-tree for seem has an S substitution node, since seems takes a finite complement with a subject* Appear,  Rambow, Vijay-Shanker, and Weir D-Tree Substitution Grammars since it is finite, projects to S, but takes a VP complement since its complement, the float clause, is nonfinite and has no overt subject, is We furthermore assume that the raising verbs seem and appear do not select for subjects, but that the expletive subject it is freely available for inclusion in their d-trees, since expletive it is semantically vacuous and merely fulfills syntactic requirements (such as subject-verb agreement), not semantic ones. We substitute the float d-tree into the appear d-tree, and the result into the seem d-tree, as indicated by the solid arrows in Figure 21. Given the reading off process, this derived d-tree can be seen to express two possibilities, depending on where the wood component and the expletive it end up. These two possibilities correspond to (11a) and (11b).</Paragraph>
      <Paragraph position="4"> To exclude the ungrammatical result, we use the path constraints discussed in Section 2.6. Let us make the uncontroversial assumption that as we project from a verb, we will project to a VP before projecting to an S. But we will interpret this notion of projection as also applying to the d-edges between nodes labeled VP: we annotate the d-edge between the VP nodes in the float tree (and in fact in all trees, of course) as having a path constraint that does not allow an S node on this path. This is, after all, what we would expect in claiming that the float tree represents a structure lexically projected from the verb float. 16 Given this additional grammatical expression, after the substitution at the S node of the seems tree, it is no longer possible to read off from the d-tree in Figure 21 a tree whose yield is the ungrammatical (11b). The only possible way of reading off from the derived d-tree yields (11a).</Paragraph>
      <Paragraph position="5"> What is striking is that this particular path constraint disallowing S nodes between VP nodes in structures projected from a verb can be used in other cases as well. In fact, this same path constraint on its own, when applied to the English examples considered so far, predicts the correct arrangement of all components among the two d-trees being combined, regardless of whether the nesting constraint of adjoining must be met (extraction out of clausal or VP complements, extraction from NP or PP complements), or not (extraction from the clause of a raising verb, raising verb with fronted auxiliary, or extraction from an adjunct). For example, in Figure 18, after substituting the to be happy component at the VP node of the appears d-tree, a path constraint on the d-edge between the two VP nodes of the to be happy tree makes it impossible for the to any of us component to intervene, thus leaving the interspersed tree as the only possible result of the reading off process, even if we relaxed the requirement on label equality for the removal of d-edges during the reading off process.</Paragraph>
      <Paragraph position="6"> Note that while the same path constraints apply in all cases, in LTAG, as we have seen, the nesting constraint of adjoining precludes deriving the correct order in some cases, and the use of extensions such as multicomponent adjoining has been suggested. In fact, because there are both situations in which the arrangement of components of the lexically projected structures corresponds to adjoining and situations in which this arrangement is inappropriate, Vijay-Shanker (1992) raises the question of whether the definition of the formalism should limit the arrangement of components of the lexically projected structures, or whether the possible arrangements should be derived from the linguistic theory and from intuitions about the nature of the elementary objects of a grammar. This subsection partially addresses this question and shows 15 The point we are making in this section relies on there being some distinction between the labels of the roots of the appear and float clauses, a linguistically uncontroversial assumption. Here, we use the categorial distinction between S and VP for convenience only; we could also have assumed a difference in feature content.</Paragraph>
      <Paragraph position="7"> 16 Bleam (2000) uses informal path constraints in much the same way in order to restrict Spanish clitic climbing in an LTAG analysis.  Computational Linguistics Volume 27, Number 1 how the path constraint expressing the nature of projection from a lexical item can be used to derive the arrangements of components corresponding to adjoining in some cases as well as predict when the nesting condition of adjoining is too limiting in the others.</Paragraph>
    </Section>
    <Section position="4" start_page="113" end_page="115" type="sub_section">
      <SectionTitle>
4.4 Underspecification of Linear Precedence
</SectionTitle>
      <Paragraph position="0"> In our proposed tree description language, we provide for underspecified dominance but not for underspecified linear precedence. As a consequence, in the graphical representations of d-trees, we assume that sister nodes are always ordered as shown. This may seem arbitrary at first glance, especially since in many linguistic frameworks and theories it is common to specify linear precedence (LP) separately from syntactic structure (GPSG, HPSG, LFG, ID/LP-TAG \[Joshi 1987\] and FO-TAG \[Becker, Joshi, and Rambow 1991\], various dependency-based formalisms, and so on). This separate specification of LP rules allows for underspecified LP rules, which is useful in cases in which word order is not fully fixed.</Paragraph>
      <Paragraph position="1"> In principle, an underspecification of LP could easily be added to DSG without profoundly changing its character or formal properties. The reason we have not done so is that in all cases, the same effect can be achieved using underspecified dominance alone, though at the cost of forcing a linguistic analysis that uses binary branching phrase structure trees rather than n-ary branching ones. We will illustrate the point using examples from German, which allows for scrambling of the arguments.</Paragraph>
      <Paragraph position="2"> Consider the following German examples. 17 (12) a. dat~ die Kinder dem Lehrer das Buch geben that \[the children\]NOM \[the teacher\]DAT \[the book\]Acc give that the children give the teacher the book b. dat~ dem Lehrer die Kinder das Buch geben c. dat~ dem Lehrer das Buch die Kinder geben All orders of the three arguments are possible, resulting in six possible sentences (three of which are shown in (12)). In DSG, we can express this by giving the lexical entry for geben shown in Figure 22. TM The arguments of the verb have no dominance specified among them, so that when using this d-tree (which is of course not yet a tree) in a derivation, we can choose whichever dominance relations we want when we read off a tree at the end of the derivation. As a result, we obtain any ordering of the arguments.</Paragraph>
      <Paragraph position="3"> As mentioned previously, while we can derive any ordering, we cannot, in DSG, obtain a flat VP structure. However, our analysis has an advantage when we consider &amp;quot;long scrambling,&amp;quot; in which arguments from two lexical verbs intersperse. (In German, only certain matrix verbs allow long scrambling.) If we have the subject-control verb versuchen 'to try', the nominative argument is the overt subject of the matrix clause, while the dative and accusative arguments are arguments of the embedded clause.</Paragraph>
      <Paragraph position="4"> Nonetheless, the same six word orders are possible (we again underline the embedded 17 We give embedded clauses starting with the complementizer in order to avoid the problem of V2. For a discussion of V2 in a framework like DSG, see Rambow (1994a) and Rambow and Santorini (1995). 18 We label all projections from the verb (except the immediate preterminal) VP. We assume that relevant levels of projection are distinguished by the feature content of the nodes. This choice has mainly been made in order to allow us to derive verb-second matrix clause order using the same d-trees, which is also why the verb is in a component of its own.</Paragraph>
      <Paragraph position="5">  that \[the children\]NOM \[the teacher\]DAT \[the book\]ncc to give that the children try to give the teacher the book b. daf~ dem Lehrer die Kinder das Buch zu geben versuchen c. daf~ dem Lehrer das Buch die Kinder zu geben versuchen versuchen try We can represent the matrix verb as shown in Figure 23, and a derivation as shown in Figure 24. It is clear that we can still obtain all possible word orders, and that this  DSG derivation for a complex sentence.</Paragraph>
      <Paragraph position="6"> would be impossible using simple LP rules that order sister nodes. 19 (It would also be impossible in LTAG, but see Joshi, Becker, and Rambow \[2000\] for an alternate discussion of long scrambling in LTAG.)</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML