XML Viewer - p04-1071

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1071_metho.xml
Size: 16,563 bytes
Last Modified: 2025-10-06 14:09:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1071">
  <Title>Wrapping of Trees</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Hierarchical Decomposition of Strings
</SectionTitle>
    <Paragraph position="0"> and Trees Like many approaches to formalization of natural language syntax, TAG is based on a hierarchical decomposition of strings which is represented by ordered trees. (Figure 1.) These trees are, in essence, graphs representing two relationships--the left-to-right ordering of the structural components of the string and the relationship between a component and its immediate constituents.</Paragraph>
    <Paragraph position="1"> The distinguishing characteristic of TAG is that it identifies an additional hierarchical decomposition of these trees. This shows up, for instance when a clause which has the form of a wh-question is embedded as an argument within another clause. In the  wh-form (as in the right-hand tree of Figure 1), one of the arguments of the verb is fronted as a wh-word and the inflectional element (does, in this case) precedes the subject. This is generally known in the literature as wh-movement and subj-aux inversion, but TAG does not necessarily assume there is any actual transformational movement involved, only that there is a systematic relationship between the wh-form and the canonical configuration. The 'a10 's in the trees mark the position of the corresponding components in the canonical trees.1 When such a clause occurs as the argument of a bridge verb (such as think or believe) it is split, with the wh-word appearing to the left of the matrix clause and the rest of the subordinate clause occurring to the right (Figure 2). Standardly, TAG accounts analyze this as insertion of the tree for the matrix clause between the upper an lower portions 1This systematic relationship between the wh-form and the canonical configuration has been a fundamental component of syntactic theories dating back, at least, to the work of Harris in the '50's.</Paragraph>
    <Paragraph position="2"> of the tree for the embedded clause, an operation known as tree-adjunction. In effect, the tree for the embedded clause is wrapped around that of the matrix clause. This process may iterate, with adjunction of arbitrarily many instances of bridge verb trees: Who does Bob believe . . . Carol thinks that Alice likes.</Paragraph>
    <Paragraph position="3"> One of the key advantages of this approach is that the wh-word is introduced into the derivation within the same elementary structure as the verb it is an argument of. Hence these structures are semantically coherent--they express all and only the structural relationships between the elements of a single functional domain (Frank, 2002). The adjoined structures are similarly coherent and the derivation preserves that coherence at all stages.</Paragraph>
    <Paragraph position="4"> Following Rogers (2003) we will represent this by connecting the adjoined tree to the point at which it adjoins via a third, &amp;quot;tree constituency&amp;quot; relation as in the right hand part of Figure 2. This gives us  structures that we usually conceptualize as three-dimensional trees, but which can simply be regarded as graphs with three sorts of edges, one for each of the hierarchical relations expressed by the structures. Within this context, tree-adjunction is a process of concatenating these structures, identifying the root of the adjoined structure with the point at which it is adjoined.2 The resulting complex structures are formally equivalent to the derivation trees in standard formalizations of TAG. The derived tree is obtained by concatenating the tree yield of the structure analogously to the way that the string yield of a derivation tree is concatenated to form the derived string of a context-free grammar. Note that in this case it is essential to identify the point in the frontier of each tree component at which the components it dominates will be attached. This point is referred to as the foot of the tree and the path to it from the root is referred to as the (principal) spine of the tree. Here we have marked the spines by doubling the corresponding edges of the graphs.</Paragraph>
    <Paragraph position="5"> Following Rogers (2002), we will treat the sub-ject of the clause as if it were &amp;quot;adjoined&amp;quot; into the rest of the clause at the root of the a4 . At this point, this is for purely theory-internal reasons--it will allow us to exploit the additional formal power we will shortly bring to bear. It should be noted that it does not represent ordinary adjunction. The sub-ject originates in the same elementary structure as the rest of the clause, it is just a somewhat richer structure than the more standard tree.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Raising Verbs and Subj-Aux Inversion
</SectionTitle>
    <Paragraph position="0"> A problem arises, for this account, when the matrix verb is a raising verb, such as seems or appears as in 2Context-free derivation can be viewed as a similar process of concatenating trees.</Paragraph>
    <Paragraph position="1"> Alice seems to like Bob Who does Alice seem to like Here the matrix clause and the embedded clause share, in some sense, the same subject argument. (Figure 3.) Raising verbs are distinguished, further, from the control verbs (such as want or promise) in the fact that they may realize their subject as an expletive it: It seems Alice likes Bob.</Paragraph>
    <Paragraph position="2"> Note, in particular, that in each of these cases the inflection is carried by the matrix clause. In order to maintain semantic coherence, we will assume that the subject originates in the elementary structure of the embedded clause. This, then, interprets the raising verb as taking an a4 to an a4 , adjoining at the a4 between the subject and the inflectional element of the embedded clause (as in the left-hand side of Figure 3).</Paragraph>
    <Paragraph position="3"> For the declarative form this provides a nesting of the trees similar to that of the bridge verbs; the embedded clause tree is wrapped around that of the matrix clause. For the wh-form, however, the wrapping pattern is more complex. Since who and Alice must originate in the same elementary structure as like, while does must originate in the same elementary structure as seem, the trees evidently must factor and be interleaved as shown in the right-hand side of the figure. Such a wrapping pattern is not possible in ordinary TAG. The sequences of labels occurring along the spines of TAG tree sets must form context-free languages (Weir, 1988). Hence the &amp;quot;centerembedded&amp;quot; wrapping patterns of the bridge verbs and the declarative form of the raising verbs are possible but the &amp;quot;cross-serial&amp;quot; pattern of the wh-form of the raising verbs is not.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Higher-order Decomposition
</SectionTitle>
    <Paragraph position="0"> One approach to obtaining the more complicated wrapping pattern that occurs in the wh-form of the raising verb trees is to move to a formalism in which the spine languages of the derived trees are TALs (the string languages derived by TAGs), which can describe such patterns. One such formalism is the third level of Weir's Control Language Hierarchy (Weir, 1992) which admits sets of derivation trees generated by CFGs which are filtered by a requirement that the sequences of labels on the spines occur in some particular TAL.3 The problem with this approach is that it abandons the notion of semantic coherence of the elementary structures.</Paragraph>
    <Paragraph position="1"> It turns out, however, that one can generate exactly the same tree sets if one moves to a formalism in which another level of hierarchical decomposition is introduced (Rogers, 2003). This now gives structures which employ four hierarchical relations--the fourth representing the constituency relation encoding a hierarchical decomposition of the third-level structures. In this framework, the seem structure can be taken to be inserted between the subject and the rest of the like structure as shown in Figure 4. Again, spines are marked by doubling 3TAG is equivalent to the second level of this hierarchy, in which the spine languages are Context-Free.</Paragraph>
    <Paragraph position="2"> the edges.</Paragraph>
    <Paragraph position="3"> The third-order yield of the corresponding derived structure now wraps the third-order like structure around that of the seem structure, with the fragment of like that contains the subject attaching at the third-order &amp;quot;foot&amp;quot; node in the tree-yield of the seem structure (the a4 ) as shown at the bottom of the figure. The center-embedding wrapping pattern of these third-order spines guarantees that the wrapping pattern of spines of the tree yield will be a TAL, in particular, the &amp;quot;cross-serial&amp;quot; pattern needed by raising of wh-form structures.</Paragraph>
    <Paragraph position="4"> The fourth-order structure has the added benefit of clearly justifying the status of the like structure as a single elementary structure despite of the apparent extraction of the subject along the third relation.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Locality Effects
</SectionTitle>
    <Paragraph position="0"> Note that it is the a4 to a4 recursion along the third-order spine of the seem structure that actually does the raising of the subject. One of the consequences of this is that that-trace violations, such as</Paragraph>
    <Paragraph position="2"> cannot occur. If the complementizer originates in the seem structure, it will occur under the a10 . If it originates in the like tree it will occur in a similar position between the CP and the a4 . In either case,  the complementizer must precede the raised subject in the derived string.</Paragraph>
    <Paragraph position="3"> If we fill the subject position of the seem structure with expletive it, as in Figure 5, the a4 position in the yield of the structure is occupied and we no longer have a4 to a4 recursion. This motivates analyzing these structures as a10 to a10 recursion, similar to bridge verbs, rather than a4 to a4 . (Figure 5.) More importantly the presence of the expletive subject in the seem tree rules out super-raising violations such  No matter how the seem structure is interpreted, if it is to raise Alice then the Alice structure will have to settle somewhere in its yield. Without extending the seem structure to include the a10 position, none of the possible positions will yield the correct string (and all can be ruled out on simple structural grounds). If the seem structure is extended to include the a10 , the raising will be ruled out on the assumption that the structure must attach at a10 .</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Subject-Object Asymmetry
</SectionTitle>
    <Paragraph position="0"> Another phenomenon that has proved problematic for standard TAG accounts is extraction from nominals, such as Who did Alice publish a picture of a7 .</Paragraph>
    <Paragraph position="1"> Here the wh-word is an argument of the prepositional phrase in the object nominal picture of. Apparently, the tree structure involves wrapping of the picture tree around the publish tree. (See Figure 6.) The problem, as normally analyzed (Frank, 2002; Kroch, 1989), is that the the publish tree does have the recursive structure normally assumed for auxiliary trees. We will take a somewhat less strict view and rule out the adjunction of the publish tree simply on the grounds that it would involve attaching a structure rooted in a10 (or possibly CP) to a DP node. The usual way around this difficulty has been to assume that the who is introduced in the publish tree, corresponding, presumably, to the as yet missing DP. The picture tree is then factored into two components, an isolated DP node which adjoins at the wh-DP, establishing its connection to the argument trace, and the picture DP which combines at the object position of publish.</Paragraph>
    <Paragraph position="2"> This seems to at least test the spirit of the semantic coherence requirement. If the who is not extraneous in the publish tree then it must be related in some way to the object position. But the identity of who is ultimately not the object of publish (a picture) but rather the object of the embedded preposition (the person the picture is of).</Paragraph>
    <Paragraph position="3"> If we analyze this in terms of a fourth hierarchical relation, we can allow the who to originate in the picture structure, which would now be rooted in CP. This could be allowed to attach at the root of the publish structure on the assumption that it is a C-node of some sort, providing the wrapping of its tree-yield around that of the publish. (See Figure 6.) Thus we get an account with intact elementary structures which are unquestionably semantically coherent.</Paragraph>
    <Paragraph position="4"> One of the striking characteristics of extraction of this sort is the asymmetry between extraction from the object, which is acceptable, and extraction from the subject, which is not: a4 Who did a picture of a7 illustrate the point.</Paragraph>
    <Paragraph position="5"> In the account under consideration, we might contemplate a similar combination of structures, but in this case the picture DP has to somehow migrate up to combine at the subject position. Under our assumption that the subject structure is attached to the illustrate tree via the third relation, this would require the subject structure to, in effect, have two  feet, an extension that strictly increases the generative power of the formalism. Alternatively, we might assume that the picture structure attaches in the yield of the illustrate structure or between the main part of the structure and the subject tree, but either of these would fail to promote the who to the root of the yield structure.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Processing
</SectionTitle>
    <Paragraph position="0"> As with any computationally oriented formalism, the ability to define the correct set of structures is only one aspect of the problem. Just as important is the question of the complexity of processing language relative to that definition. Fortunately, the languages of the Control Language Hierarchy are well understood and recognition algorithms, based on a CKY-style dynamic programming approach, are know for each level. The time complexity of the algorithm for the a0a2a1a4a3 level, as a function of the length of the input (a5 ), is a6a8a7a9a5a11a10a13a12a14a16a15a18a17a20a19a16a21 (Palis and Shende, 1992). In the case of the fourth-order grammars, which correspond to the third level of the CLH, this gives an upper bound of a6a8a7a9a5a23a22a24a14a25a21 . While, strictly speaking, this is a feasible time complexity, in practice we expect that approaches with better average-case complexity, such as Earlystyle algorithms, will be necessary if these grammars are to be parsed directly. But, as we noted in the introduction, grammars of this complexity are not necessarily intended to be used as working grammars. Rather they are mechanisms for expressing the linguistic theory serving as the foundation of working grammars of more practical complexity.</Paragraph>
    <Paragraph position="1"> Since all of our proposed use of the higher-order relations involve either combining at a root (without properly embedding) or embedding with finitely bounded depth of nesting, the effect of the higherdimensional combining operations are expressible using a finite set of features. Hence, the sets of derived trees can be generated by adding finitely many features to ordinary TAGs and the theory entailed by our accounts of these phenomena (as expressed in the sets of derived trees) is expressible in FTAG. Thus, a complete theory of syntax incorporating them would be (not necessarily not) compatible with implementation within existing TAG-based systems. A more long term goal is to implement a compilation mechanism which will translate the linguistic theory, stated in terms of the hierarchical relations, directly into grammars stated in terms of the existing TAG-based systems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML