XML Viewer - c02-1034

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1034_intro.xml
Size: 11,877 bytes
Last Modified: 2025-10-06 14:01:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1034">
  <Title>A quantitative model of word order and movement in English, Dutch and German complement constructions</Title>
  <Section position="4" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> We propose a quantitative model for expressing word order and movement constraints that enables a simple and uniform treatment of a heterogeneous collection of linear ordering phenomena in English, Dutch and German complement structures. Underlying the scheme are central tenets of the psycholinguistically motivated Performance Grammar (PG) formalism, in particular the assumption that linear order is realized at a late stage of the grammatical encoding process. The model is described here in declarative terms based on typed feature unification. We show that both the within- and between-language variations of the ordering phenomena under scrutiny reduce to differences between a few numerical parameters.</Paragraph>
    <Paragraph position="1"> The paper is organized as follows. In Section 2, we sketch PG's hierarchical structures. Section 3, the kernel of the paper, describes the linearization and movement model. In Section 4, we turn to central word order phenomena in the three target languages. Section 5, finally, contains some conclusions. null 2. Hierarchical structure in PG PG's hierarchical structures consist of unordered trees ('mobiles') composed out of elementary building blocks called lexical frames. These are 3-tiered mobiles assembled from branches called segments.</Paragraph>
    <Paragraph position="2"> The top layer of a frame consists of a single phrasal node (the 'root'; e.g. Sentence, Noun Phrase, ADJectival Phrase, Prepositional Phrase), which is connected to one or more functional nodes in the second layer (e.g., SUBJect, HeaD).</Paragraph>
    <Paragraph position="3"> At most one exemplar of a functional node is allowed in the same frame. Every functional node dominates exactly one phrasal node ('foot') in the third layer, except for HD which immediately dominates a lexical (part-of-speech) node. Each lexical frame is 'anchored' to exactly one lexical item: a lemma (printed below the lexical node serving as the frame's HeaD). A lexical frame encodes the word category (part of speech), subcategorization features, and morphological diacritics (person, gender, case, etc.) of its lexical anchor (cf. the elementary trees of Tree Adjoining Grammar (TAG; e.g. Joshi &amp; Schabes, 1997).</Paragraph>
    <Paragraph position="4"> Associated with every categorial node (i.e., lexical or phrasal node) is a feature matrix, which includes two types of features: agreement features (not to be discussed here; see Kempen &amp; Harbusch, forthcoming) and topological features. The latter play a central role in the linear ordering mechanism. Typed feature unification of topological features takes place whenever a phrasal foot node of a lexical frame is replaced (substituted for) by a lexical frame. Substitution is PG's sole composition operation. Substitution involves unification of the feature matrices that are associated with the substituted phrasal foot node and the root node of the substituting lexical frame. Substitution gives rise to the derivation tree of a well-formed syntactic structure iff the phrasal foot node of all obligatory segments of each lexical frame successfully unifies with the root of another frame. The tree in Figure 1 is well-formed because the MODifier segments are not obligatory.</Paragraph>
    <Paragraph position="5">  Dana hates (example from Sag &amp; Wasow,1999). Order of branches is arbitrary. Filled circles denote substitution. (The feature matrices unified as part of the substitution operations are not shown.) 3. Linear structure in PG The above-mentioned topological features are associated with the phrasal root nodes of lexical frames. Their value is a feature matrix specifying a 'topology', that is, a one-dimensional array of left-to-right slots. In this paper we will only be concerned with topological features associated with Snodes. They serve to assign a left-to-right order to the segments (branches) of verb frames (i.e. lexical frames specifying the major constituents of clauses). On the basis of empirical-linguistic arguments (which we cannot discuss here), we propose that S-topologies of English, Dutch and German contain exactly nine slots:</Paragraph>
    <Paragraph position="7"> The slots labeled Fi make up the Forefield (from Ger. Vorfeld); the Mj slots belong to the Midfield (Mittelfeld); the Ek's define the Endfield (Nachfeld; terms adapted from traditional German grammar; cf. Kathol, 2000). Table 1 illustrates which clause constituents select which slot as their 'landing site'.</Paragraph>
    <Paragraph position="8"> Notice, in particular, that the placement conditions refer not only to the grammatical function fulfilled by a constituent but also to its shape. For instance, while the Direct Object takes M3 as its default landing site, it selects F1 if it is a Wh-phrase or carries focus, and M2 if it is a personal pronoun (it). In terms of Figure 1, if Kim carries focus, it may occupy slot F1 of the topology associated with the complement clause headed by hate.</Paragraph>
    <Paragraph position="9">  How is the Direct Object NP Kim 'extracted' from the subordinate clause and 'moved' into the main clause? Movement of phrases between clauses is due to lateral topology sharing. If a sentence contains more than one verb, each of the verb frames concerned instantiates its own topology.</Paragraph>
    <Paragraph position="10"> This applies to verbs of any type, whether main, auxiliary or copula. In such cases, the topologies are allowed to share identically labeled lateral (i.e. left- and/or right-peripheral) slots, conditionally upon several restrictions to be explained shortly.</Paragraph>
    <Paragraph position="11"> After two slots have been shared, they are no longer distinguishable; in fact, they are the same object. In the example of Figure 1, the embedded topology shares its F1 slot with the F1 slot of the matrix clause. This is indicated by the dashed borders of the bottom F1 slot:</Paragraph>
    <Paragraph position="13"> In sentence generation, the overt surface order of a sentence is determined by a Read-out module that traverses the hierarchy of topologies in left-toright, depth-first manner. Any lexical item it 'sees' in a slot, is appended to the output string. E.g., Kim is seen while the Reader scans the matrix topology rather than during its traversal of the embedded topology. See Figure 2 for the ordered tree corresponding to Kim we know Dana hates  motion (cf. Figure 1). Rectangles represent (part of) the topologies associated with the verb frames.</Paragraph>
    <Paragraph position="14"> The number of lateral slots an embedded topology shares with its upstairs neighbor is determined by the parameters LS (left-peripherally shared area) and RS (right-hand share). The two laterally shared areas are separated by a non-shared central area.</Paragraph>
    <Paragraph position="15"> The latter includes at least the slot occupied by the HeaD of the lexical frame (i.e., the verb) and usually additional slots. The language-specific parameters LS and RS are defined in the lexical entries of complement-taking verbs, and dictate how (part of) the feature structure associated with the foot of S-CMP-S segments gets instantiated. For instance, the lexical entry for know (Figure 1) states that LS=1 if the complement clause is finite and declarative. This causes the two S-nodes of the CoMPlement segment to share one left-peripheral slot, i.e. F1. If the complement happens to be interrogative (as in We know who Dana hates), LS=0, implying that the F1 slots do not share their contents and who cannot 'escape' from its clause.</Paragraph>
    <Paragraph position="16"> In the remainder of this Section we present a rule system for lateral topology that is couched in a typed feature logic and uses HPSG terminology.</Paragraph>
    <Paragraph position="17"> The system deals with a broad variety of movement phenomena in English, Dutch and German.</Paragraph>
    <Paragraph position="18"> We define a clausal topology as a list of slot types serving as the value of the topology (&amp;quot;TPL&amp;quot;) feature associated with S-nodes:  The value of a TPL feature may be a disjunctive set of alternative topologies rather than a single topology. See the CMP-S node of Figure 3 for an example.</Paragraph>
    <Paragraph position="19"> As for syntactic parsing, in Harbusch &amp; Kempen (2000) we describe a modified ID/LP parser that can compute all alternative hierarchical PG structures licensed by an input string. We show that such a parser can fill the slots of the topologies associated with any such structure in polynomial time.</Paragraph>
    <Paragraph position="20"> for English, and S [TPL [?]F1t,M1t,M2t,M3t,M4t,M5t,M6t,E1t,E2t[?]] for Dutch and German. Slot types are defined as attributes that take as value a non-branching list of lemmas or phrases (e.g. SUBJect-NP, CoMPlement-S or HeaD-v). They are initialized with the value empty list, denoted by &amp;quot;[?][?]&amp;quot; (e.g., [ F1t F1 [?][?]]).</Paragraph>
    <Paragraph position="21"> Lists of segments can be combined by the append operation, represented by the symbol &amp;quot; [?]&amp;quot;. The expression &amp;quot;L1 [?]L2&amp;quot; represents the list composed of the members of L1 followed by the members of L2. We assume that L2 is non-empty. If L1 is the empty list, &amp;quot;L1 [?]L2&amp;quot; evaluates to L2. Slot types may impose constraints on the cardinality (number of members) of the list serving as its value. Cardinality constraints are expressed as subscripts of the value list. E.g., the subscript &amp;quot;c=1&amp;quot; in</Paragraph>
    <Paragraph position="23"> states that the list serving as F1's value should contain exactly one member. Cardinality constraints are checked after all constituents that need a place have been appended.</Paragraph>
    <Paragraph position="24"> Depending on the values of sharing parameters LS and RS, the list can be divided into a left area (comprising zero or more slot types), the central area (which includes at least one slot for the HeaD verb), and the right area (possibly empty). Topology sharing is licensed exclusively to the lateral areas. LS and RS are set to zero by default; this applies to the topologies of main clauses and adverbial subclauses. The root S of a complement clause obtains its sharing parameter values from the foot of the S-CMP-S segment belonging to the lexical frame of its governing verb. For example, the lexical entry for know states that the complement of this verb should be instantiated with LS=1 if the clause type (CTYP) of the complement is declarative. This causes the first member of the topologies associated with the S-nodes to receive a coreference tag (indicated by boxed numbers):  If, as in the example of Figure 1, know's complement is indeed declarative, the foot of the complement segment can successfully unify with the root of the hate frame. As a consequence, the F1 slot of the complement clause is the same object as the F1 slot of the main clause, and any fillers will seem to have moved up one level in the clause hierarchy:  Filling a slot also involves coreference tags. For example, the HeaDs of English verb frames obtain their position in the local topology by looking up the slot associated with the coreference tag:</Paragraph>
    <Paragraph position="26"> The information associated with the foot node of the HeaD segment will now be appended to the current content, if any, of slot M1. The same mechanism serves to allocate the finite complement clause (or rather its root S-node) to slot E2 of the matrix clause:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML