File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/e93-1002_metho.xml
Size: 25,745 bytes
Last Modified: 2025-10-06 14:13:18
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1002"> <Title>Semantic Encoder Semantic RepresentaUon SSP- SCHEMES Scheme Selector SSP- Scheme top-down information Dynamic SSP- Structures</Title> <Section position="2" start_page="0" end_page="3" type="metho"> <SectionTitle> 1 Computational Modelling </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="3" type="sub_section"> <SectionTitle> Incremental Language Production of </SectionTitle> <Paragraph position="0"> This paper sketches some basic features of the SYNPHONICS account of the computational modelling of incremental language production with the example of the generation of passive sentences. The SYNPHONICS (&quot;Syntactic and Phonological Realization of Incrementally Generated Conceptual Structures&quot;) approach, which subscribes to a cognitive science perspective on language processing, aims at linking psycholinguistic insights into the nature of the human natural language production process with well-established assumptions in theoretical and computational linguistics concerning the representation and processing of grammatical knowledge. null Research in psycholinguistics (e.g., Garrett 1988, Levelt 1989) has revealed that the process of converting a preverbal, conceptual content into overt speech is performed by a number of autonomous sub-processes specialized for different tasks within the overall process: the pre-linguistic conceptualization component (the Conceptualizer, in Levelt's terms) plans a content to be expressed and delivers a corresponding conceptual representation to the linguistic formulation component (the Formulator), which in turn selects the appropriate items (lemmas and lexemes) from the system's lexical data base and, guided by the syntactic and phonological specifications of the lexical items, produces abstract syntac- null tic and phonological representations. The output of the Formulator is taken up by the articulation component (the Articulator), whose task it is to produce a physical speech signal. These components are considered to be autonomous modules, whose modes of operation are each governed by theft own sets of principles and restrictions. Furthermore, the system as a whole is constrained by there being a unidirectional flow of information, i.e., there is no feedback between sub-processes. 1 Finally, it is widely accepted that human language production proceeds in an incremental, piecemeal fashion (Kempen & Hoenkamp 1987): Rather than having to wait for complete input structures, components are able to process fragmentary input (&quot;increments&quot;). As soon as a particular component has passed the results on to its successor component, it is ready for processing the next input increment. Thus, a given increment is processed sequentially by different components, whereas components may operate in parallel on different increments.</Paragraph> <Paragraph position="1"> Theoretical linguists of various persuasions converge on the idea that a large amount of grammatical information that former theories of grammar handled by extensive rule systems ought to be captured by detailed grammatical specifications of lexical items instead. From this angle, the grammar of a language merely consists of a small set of general licensing principles for structm~ projected from the lexicon. The present paper subscribes to this view. More specifically, the SYNPHONICS Formulator uses a grammar for German in the mold of Head-driven Phrase Structure Grammar (HPSG; Pollard & Sag 1987, 1992). The HPSG-style lexical approach to basic aspects of grammar tallies with the central role that recent psycholingnistic theories of language production assign to the lexicon in the formation of linguistic st~ctures (lexicon-&iven generation; e.g., Levelt 1989).</Paragraph> <Paragraph position="2"> In contrast to other approaches to the computational modelling of empirically substantiated features of human language production, such as Kempen & Hoenkamp's (1987) Incremental Procedural Grammar and de Smedt's (1990) Incremental Parallel Formulator, however, the SYNPHONICS process model distinguishes strictly between declarative grammatical knowledge and its procedural application, thus taking the stance of theoretical linguistics and related computational approaches. As in HPSG, the declarative grammatical knowledge of the SYNPHONICS Formulator is represeated in a unification-based formalism with sorted lea1 In contrast, AI models that are concerned with incremental processing (e.g., Reithinger 1992) often make extensive use of feedback at the cost of economy and cognitive adequacy.</Paragraph> <Paragraph position="3"> ture structures. Unlike deduction-based approaches to natural language generation in computational linguistics (e.g., Shieber et al. 1990), however, the SYNPHONICS approach involves a detailed and transparent process model, with sub-processes being explicitly specified at any point in the overall process. This property serves to make the model adjustable to empirical results about the course of human language production and open to a verification of its claims, namely to aim at the computational modelling of cognitive processes.</Paragraph> <Paragraph position="4"> In order to make the above considerations more concrete, we will discuss the roles of the Conceptualizer and the Formulator in the production of a particular linguistic construction in some more detail in the remainder of this paper. The discussion of the principles guiding the production of passive sentences serves to illustrate to what extent the determinants of this construction can be traced to the feedback-free interplay between the Conceptualizer and the Formulator and the constraints specific to the involved modules. We cannot go into the details of the passive here; rather, we will confine the presentation to some quite simple cases. In order to capture the full range of the passive construction across languages, the account presented here needs to be enlarged in parts.</Paragraph> </Section> </Section> <Section position="3" start_page="3" end_page="4" type="metho"> <SectionTitle> 2 The SYNPHONICS Conceptualizer </SectionTitle> <Paragraph position="0"> The conceptual input into the Formulator - in short: CS for &quot;conceptual structure&quot; - is represented in the RefO/RetN format (Habel 1982, 1986a/b; Eschenbach et al. 1989). The basic representational units are Referential Objects (ReIDs), which are stored and processed in a net-like structure, a Referential Net (REIN). RefOs are labeled, inter alia, with sortal attributes and property and relation designations. The notion of RefOs comprises the entire range of discourse entities, such as objects, times, and situations (events, processes, states).</Paragraph> <Paragraph position="1"> The input representation reflects certain aspects of the organization of the information which the Conceptn_ali~er delivers to the Formulator. One important dimension of organization is the relative prominence of conceptual units such as particular RefOs. In the incremental process of forming a conceptual representation of the content to be expressed in an utterance, relative prominence can manifest itself in the time course of conceptualization, with more prominent units tending towards earlier conceptualization than less prominent ones. The prominence of a ReID can, for example, be due to its perceptual saliency (cf. Flores d'Areais 1987), its conceptual accessibility (i.e., the ease with which it can be reuieved from memory: cf. Bock & Warren 1985) or its sortal properties (such as animacy, humanness, etc.; eL Bock et al. 1992). We assume that the Conceptualizer's output representation is a stream formally: a list - of RefO/RefN fragments; the position on the fist indicates the order of conceptualization, which in turn is the order in which these fragments (&quot;increments&quot;) are made available to the Formulator for syntactic and phonological encoding.</Paragraph> <Paragraph position="2"> Furthermore, we assume coherence among conceptual increments. This means, in technical terms of formal representation, that RefOs are linked by certain means, most notably by what we call embedding information.</Paragraph> <Paragraph position="3"> Embedding information is one instance of a RefO's connection with its conceptual environment. As an example, embedding information characterizes a RefO's thematic role in event types and other sorts of situations to varying degrees of specification.</Paragraph> </Section> <Section position="4" start_page="4" end_page="4" type="metho"> <SectionTitle> 3 The SYNPHONICS Formulator </SectionTitle> <Paragraph position="0"> The SYNPHONICS Formulator, which is a formulation component for German sentence production, consists of three sub-components: the semantic encoder, which transforms the conceptual input structure CS into an abstract semantic representation SR (cf. Bierwisch & Schreuder 1992); the syntactic encoder, which, on the basis of SR, selects lexical items and forms an abstract syntactic representation; the phonological encoder, which forms an abstract phonological representation. 2 Figure 1 (next page) shows the internal structure of the SYNPHONICS Formulator.</Paragraph> <Paragraph position="1"> Syntactic structures are constructed incrementally, using two types of SR information. At the semantics-syntax interface, the so-called Scheme Selector employs the (possibly underspeeified) embedding information associated with RefOs in order to select abstract X-barschemata in the form of minimally specified HPSG-like feature structures, such as a complementation scheme, which reflects a functor-argument relation, or an adjunction scheme, which reflects a modifier-modified relation. Thereby, the top-down construction of syntactic structure is triggered. At the semantics-lexicon interface, the so-called Lemma Selector uses the sortal attributes and property or relation specifications of RefOs in order to select the appropriate lexical items, whose syntactic specifications serve as the starting point for the bottom-up projection of phrasal structures.</Paragraph> <Paragraph position="2"> Both top-down information and bottom-up information pass through the so-called Functional Inspector, where they are checked for the requirements of functional completeness of lexical items with regard to their semantic demands. These concern, for example, determiners and case-marking prepositions as well as passive auxiliaries.</Paragraph> <Paragraph position="3"> If necessary, the Functional Inspector initiates a further consultation of the lexicon.</Paragraph> <Paragraph position="4"> Each newly formed syntactic structure must be licensed by a set of HPSG-style declarative grammatical principles. In the case of lexical bottom-up information, the principles mainly effect phrasal feature projection (Head Feature Principle, Subcategorization Principle, Semantics Principle). As regards the top-down structures, the principles serve to enrich the structural information specified so far (Immediate Dominance Schemata, Subcategorization Principle, etc.).</Paragraph> <Paragraph position="5"> Next, the so-called Integrator lakes the floor, which constructs a dynamic syntactic, phonological and semantic representation underlying the utterance segment currently being generated. The construction proceeds incrementally and monotonous; the only operation available to the Integrator is unification of feature structures 3.</Paragraph> <Paragraph position="6"> The procedural execution of integration is guided by a number of heuristics that reflect the need to meet the demands of rapid utterance production. One important heuristic principle crucial to the present topic is the following: &quot;Integrate phonologically filled material as soon as possible into the highest and lefimost position available in the current utterance fragment.&quot; The newly formed increment representation is again subject to the grammatical licensing principles.</Paragraph> </Section> <Section position="5" start_page="4" end_page="6" type="metho"> <SectionTitle> 4 Morphology and Syntax of the </SectionTitle> <Paragraph position="0"> Passive Before we proceed to the application of our process model to the production of passive sentences, we will sketch the basic features of the present SYNPHONICS account of the syntax of the passive.</Paragraph> <Paragraph position="1"> The traditional HPSG-account of the passive (Pollard & Sag 1987) consists in a lexical rule that simply restructures the elements on the SUBCAT list of a verb. The application of the lexical rule to the basic active entry of a verb leads to a revised SUBCAT list in which the formerly highest NP, i.e., the subject, may occupy the lowest oblique position, while the former direct object NP takes the subject position. The initial account has since been modified repeatedly; we simply mention T. Kiss' proposal for German, 4 according to which the passive rule is split into two parts, a rule of Subject Demotion and a Subject Condition, which roughly corresponds to a rule of Object Promotion.</Paragraph> <Paragraph position="2"> Rather than merely stipulating lexical rules such as Subject Demotion and Object Promotion, the SYNPHONICS account traces the effects these rules are intended to capture to properties of the argument structures of the passive participle and the passive auxiliary. 5 The morphological operation of passive participle formarion gives rise to what might be regarded as an &quot;ergarivization&quot; of the verb, i.e., the verb's external argument (in the sense of Williams 1983) is exempt from any syntactic principle that refers to subcategorized-for arguments. (Technically, this is realized by transferring the argument, which is marked by a special externality feature, from the verb's SUBCAT list to a blockedargument \[BLOCKED_ARG\] list. 6) The passive auxiliary is treated as an argument-attraction verb (cf. Hinrichs & Nakazawa 1991): It subcategorizes for a passive participle and attracts the arguments that the participle subcategorizes for as its own subcategorized-for arguments. Argument attraction is a mechanism that affects only the argument structure of the governed verb, but does not affect the primary link between semantic roles and arguments. The resulting SUBCAT list of the verbal complex is subject to the relevant grammatical principles, as usual. In the case of the German passive auxiliary werden, which we treat as an ergarive raising verb, the blocked external argument of the participle cannot be attracted. Rather, the corresponding parameter in the semantics will be existentially bound (if there is 4 as yet unpublished work at IBM Germany, Institute for 5 Note that this approach is intended to capture the formation of the passive in morphologically rich languages such as German and Dutch, where passivization is essentially a morphological process. A different parametrical variation, such as the development of the auxiliary into a syntactic category in English, may lead to a passive construction that requires an analysis in syntactic terms.</Paragraph> <Paragraph position="3"> 6 The term &quot;blocked argument&quot; is borrowed from Haider (1984), who, however, introduced it in a different framework.</Paragraph> <Paragraph position="4"> no oblique agent phrase). Figure 2 shows the resulting structure of the German participle-auxiliary complex gebissen werd- (&quot;oe bitten').</Paragraph> <Paragraph position="6"/> <Section position="1" start_page="6" end_page="6" type="sub_section"> <SectionTitle> Auxiliarv Comnlex </SectionTitle> <Paragraph position="0"> On this basis, the effects of Object Promotion follow from the Subcategorization Principle and a structural case theory that replaces the original lexical case theory of Pollard & Sag (1987, 1992; eL Pollard 1991).</Paragraph> <Paragraph position="1"> Technically, arguments in the lexical entry of a verb are marked by a case-type feature. The Subcategorization Principle handles the arguments of the verbal complex in the usual way. The new Case Principle either realizes the structural case type by a nominative or accusative value (in languages such as German and English), or checks for the instantiation of the values for the lexical case type. Due to our structural case theory, we reject an isomorphic relation between the order of dements on the SUBCAT list and the so-called hierarchy of grammatical functions f'Lxed in the lexicon. Rather, we def'me grammatical functions, quite GB-like, in structural terms. From this angle, the order of elements on the SUBCAT list is nothing but a lexically fixed default prominence order of arguments. If the first argument on the basic SUBCAT list of a verb has been blocked, i.e., relegated to the BLOCKED_ARG list, the first subeategorized-for argument has to be integrated in the highest accessible structural position, were it receives nominative case by means of the Case Principle.</Paragraph> <Paragraph position="2"> Figure 3 shows the structural description of the German passive sentence (daft) Peter gebissen wird ('(that) Peter is bitten'). T is the category of the functional element We note in passing that the theory makes the correct predictions for German impersonal passives, i.e., passives without nominatively marked NPs, such as Hier wird getanzt \[~ere be (3 sg) danced'\],Den M~mnern wird geholfen \['the men (dat pl) be (3 sg) helped ~\] and Der Opfer wird gedacht \['the victims (gen pl) be (3 sg) remembered'\]. Since the passive auxiliary attracts all (non-blocked) argument NPs of the participle, impersonal passives are automatically formed if the participle's SUBCAT list is empty (as in the case of getanzt) or contains argument NPs with lexically marked case only (as in the case of geholfen and gedacht). In the latter case, the argument NPs keep their lexically marked morphological form. Impersonal passives lack subjects simply because the least oblique argument NP cannot be structurally case-marked as nominative.</Paragraph> </Section> </Section> <Section position="6" start_page="6" end_page="9" type="metho"> <SectionTitle> 5 The Production of Passives </SectionTitle> <Paragraph position="0"> We differentiate between two types of stimuli that trigger the production of passive sentences. The fh-St is a stimulus external to the linguistic system; the second is a stimulus internal to the linguistic system. The two types exemplify different ways in which the relevant cognitive modules - the Conceptualizer and the Formulator - are synchronized in order to jointly perform the task of producing an utterance.</Paragraph> <Paragraph position="1"> The first case can be traced to a condition concerning the content of the conceptual structure CS that the Conceptualizer delivers to the Formulator. CS may include a situation-type concept (e.g., an event-type concept) that is marked for an agentive thematic role, without at the same time including the corresponding agentive RefO. In terms of its underlying cognitive function, this is an extreme case of what has been described as &quot;agent backgrounding&quot; in the typological literature (e.g., Foley & van Valin 1985 and Keenan 1985). There are various motivations for agent backgrounding; the most notable ones are the following: there is a particularly salient or easily inferable agent (e.g., Frank Rijkaard was sent off for knocking down his opponent); the agent is unknown (e.g., My car has been stolen); the situation-type predicate alone is focused, with a corresponding defocusing of the agent (e.g., German impersonal passives of the sort Heute abend wird bier getanzt \['there will be a dance here tonight'; literally: &quot;tonight is here danced&quot;\]). The passive formation device allows the Formulator to follow the Conceptualizer's decision to dispense with the agent ReiD. Thus, the two modules' principles of information processing tally with each other.</Paragraph> <Paragraph position="2"> More concretely, we assume that the production process involves the following crucial steps: The Conceptualizer delivers a situation type increment whose agent role remains unspecified (or has as yet not been specified).</Paragraph> <Paragraph position="3"> The Lemma Selector chooses an item that matches the corresponding semantic representation. Since this is a situational relation lacking its first argument, the participle form of the lemma, whose category is adjectival, is selected. The Functional Inspector completes the categorial requirements of the situation type increment, which actually calls for a verbal category, by initiating a call to the appropriate auxiliary (i.e., werden). The complex form gives rise to a verbal projection whose external argument appears on the BLOCKED_ARG list and therefore is not subject to the Subcategorization Principle. The corresponding parameter is existentially bound in the semantics. Thus, the verbal projection satisfies the grammatical licensing conditions for constructions with non-subcategorized-for external arguments. The second case can be traced to a processing strategy that the Formulator employs when it has to react to the Conceptualizer's selection of a particular Reid as the most prominent referential CS constituent, especially under the constraints of rapid utterance production. In general, as soon as one process component delivers an informational increment to its successor component, the latter strives for further processing the increment without delay. As was claimed above, a certain Reid may be the first increment that the Conceptualizer passes to the Formulator due to its being the most prominent conceptual unit in the CS selected for verbalization. Now, a prominent ReID argument may often be made available to the Formulator although its embedding information, such as information about its thematic role in an event type, is unspecified or at least underdetermined. In particular, the ReiD may be passed to the Formulator prior to the situation-type concept to which it is an argument.</Paragraph> <Paragraph position="4"> In such cases, the Formulator follows the strategy to assign to the syntactic phrase that verbalizes the Reid the most prominent available position in the current utterance segment - i.e., in general, the structural sub-ject position- without waiting for information about the RefO's thematic role. 7 However, if it turns out subsequenfly that the phrase, due to information about the thematic role of the corresponding Reid available later on, doe~ not show the regular argument properties of subjects, principles guiding the Lemma Selection process force the formation of a passive sentence.</Paragraph> <Paragraph position="5"> In this case, the production process involves the following crucial steps: The ReiD increment is passed from the Conceptualizer to the Formulator prior to the situation type increment. Following the above mentioned integration heuristic, the Integrator inserts the phrase corresponding to the Reid into the most prominent syntactic position, where it receives the nominative case by the structural Case Principle. No specific information about the RefO's thematic role has been used so far. At some stage in the process, however, such information must become available to the Formulator. We assume that this occurs when the situation type increment enters the Formulator. Lemma Selection is restricted not only by the corresponding SR, but also by information linked to constituents already represented in the temporary utterance fragment constructed so far. In the present case, the Formulator is equipped with additional embedding information. It may turn out that the Reid whose realizing phrase has already been integrated in the most prominent position has the standard properties of an internal argument, for example, because it is the theme 7 Abstracting from time factors, the subject position can, in more general terms, be filled without paying attention to the thematic role the ReID in question holds in a situation. This is essentially suggested by experimental studies using a picture-description task, where the presentation of the depiction of an isolated object accompanies the presentation of the depiction of the entire scene involving the object (see, e.g., Turner & Rormnetveit 1968). The additional presentation of the patient object, which raises its prominence in memory, often sets off the test subjects on the passive voice. See also the study reported in Tannenbanm & Williams (1968), in which drawing the speaker's attention to either the agent or the patient in a situation by mesns of verbal cues, thereby manipulating the speaker's memory access, also affected the choice of verbal voice.</Paragraph> <Paragraph position="6"> in an actional event type. (Technically, the relevant embedding information is available via coindexing of the semantics of the already integrated NP and the theme argument of the situation type increment.) Lemma Selection must take this information into account by choosing a lemma with a theme as the highest subcategorized-for argument (i.e., as the final argument to be projected by the Subeategorization Principle). This is exactly the property of the participle form of the lemma appropriate to the situation type increment in question.</Paragraph> <Paragraph position="7"> From here on, the process of passive sentence formation proceeds as in the ftrst case.</Paragraph> </Section> class="xml-element"></Paper>