File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/c88-1038_intro.xml

Size: 15,397 bytes

Last Modified: 2025-10-06 14:04:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1038">
  <Title>Sequencing in a Connectionist Model of Language Processing I</Title>
  <Section position="3" start_page="0" end_page="186" type="intro">
    <SectionTitle>
2. Conneetionism and Language Processing
</SectionTitle>
    <Paragraph position="0"> In recent years there has been increasing interest in cognitive models built on networks of simple processing units which ~espond to the parallel spread of activation through the network (Feldman &amp; Ballard 1982, McClelland, Rumelhart, &amp; the PDP Research Group 1986). In the area of natural language processing, these models, generally referred to as eorurectlonist, have been shown to exhibit interesting properties not shared by more conventional symbolic approaches. In particular, connectionist approaches to language ,analysis (e.g., Cottrell &amp; Small 1983, McClelland &amp; Kawamoto 1986, Waltz &amp; Pollack 1985) are able to model priming effects and the interaction of different knowledge sources in lexical access. There have been only limited attempts to apply connectionist models to language generation (e.g., Dell 1986, Kukich 1986) bot the potential there is also clear. While generation is usually conceived of as a top-down process involving sequential stages, it also involves bottom-up aspects, a good deal of parallelism, and &amp;quot;leaking&amp;quot; between tbe various stages, in addition to the priming effects which are handled well by spreading activation approaches.</Paragraph>
    <Paragraph position="1"> Still, there are significant problems to be surmounted when treating language processing in a connectionist framework. An important one is the representation and utilization of i~ffmmation about the sequencing of constituents. While information about serial order is certainly a key element in parsing, it has been possible in existing connectionist parsing schemes to avoid dealing with this problem because of the limited sets of examples that are treated. Generation is another matter: no sentence can be generated without attention to the ordering of constituents. If connectionism is to succeed as an approach to human language processing, it must be possible to handle this sort of information within the confines imposed by the framework. This paper presents a localized connectionist model of language generation in which sequencing is dealt with in terms of basic features characteristic of these models: spreading activation, firing thresholds, and mutual inhibition. The same sequencing information is also usable during parsing. Most importantly, the approach offers a psychologically plausible account of sequencing in which syntactic and semantic factors interact to yield a particular ordering. The model is implemented in a program called CHIE which has been used to test tbe model's adequacy for a limited set of English and Japanese structures.</Paragraph>
    <Paragraph position="2"> 3. A Framework for Conneetionist Language Processing In this section we give an overview of knowledge representation and processing in the model. The main features of the model are the following:  1. Memory consists of a network of nodes joined by weighted connections. The system's knowledge is embodied entirely in i~5 these connections.</Paragraph>
    <Paragraph position="3"> 2. Conceptsare represented as schemas consisting of subnetworks of the memory.</Paragraph>
    <Paragraph position="4"> 3. The basic units of linguistic knowledge are schematic  subnetworks associating form directly with function. These formfunction mappings comprise an inventory from which selections are made during generation and parsing.</Paragraph>
    <Paragraph position="5"> 4. Formally, the linguistic units are composed of surface-level patterns ranging from phrasal lexical pattems to purely syntactic patterns.</Paragraph>
    <Paragraph position="6"> 5. Processing consists in the parallel spread of activation through the network starting with nodes representing inputs. The amount of activation spreading along a connection depends on the connection's weight and may be either positive (excitatory) or negative (inhibitory). Activation on nodes decays over time.</Paragraph>
    <Paragraph position="7"> 6. Decision making in the model takes the form of competition among sets of mutually inhibiting nodes and the eventual dominance of one over the others.</Paragraph>
    <Paragraph position="8"> 7. Processing is more interactional than modular. Pragmatic, semantic, and syntactic information may be involved simultaneously in the selection of units of linguistic knowledge. The model provides a better account of human language generation than other computational models. In particular, it offers these advantages:  1. Parallelism and competition, which characterize human language generation, are basic features of the model.</Paragraph>
    <Paragraph position="9"> 2. Priming effects are naturally accommodated. Nodes are primed when there is activation remaining on them as a result of recent processing, and priming disappears as activation decays.</Paragraph>
    <Paragraph position="10"> 2. The system exhibits robusmess in that it can find patterns to match conceptual input even when there are no perfect matches.</Paragraph>
    <Paragraph position="11"> 3. The approach allows for a combination of top-down (goal-driven) and bottom-up (context-driven) processing.</Paragraph>
    <Paragraph position="12"> 4. Generation in the model is flexible because spreading activation automatically finds alternate ways of conveying particular concepts.</Paragraph>
    <Paragraph position="13"> 5. Linguistic and non-linguistic knowledge take the form of  tendencies with degrees of associated strength rather than strict rides or constraints.</Paragraph>
    <Paragraph position="14"> The model is described in detail in Gasser (1988).</Paragraph>
    <Section position="1" start_page="0" end_page="186" type="sub_section">
      <SectionTitle>
3.1. Linguistic Memory
</SectionTitle>
      <Paragraph position="0"> Memory in the model is a localized connectionist implementation of a semantic network similar to Fahlman's NETL (1979). In NETL roles (slots), such as ACTOR, COLOR, and SUBJECT, take the form of nodes rather than links, and links are confined to a small primitive set representing in particular the IS-A, HAS-A, and DISTINCTNESS relations. In the present model, semantic network links are replaced by pairs of weighted, directed connections of a single type, one connection for each direction.</Paragraph>
      <Paragraph position="1"> Linguistic knowledge is integrated into the rest of memory.</Paragraph>
      <Paragraph position="2"> 'Ille basic units of linguistic knowledge are generalizations of two types of acts: illocntions and utterances. In this paper we will be mainly concerned with the latter. A generalized utterance (GU) is a schema (implemented as a network fragment) associating a morphosyntactic pattern with a semantic content and possibly contextual factors. GUs include schemas for clauses, noun phrases, adjective phrases, and prepositional phrases. They are arranged in a generalization hierarchy with syntactic structures at its more general end and phrasal lexical entries at its more specific end. Thus lexical entries in the model are just a relatively specific type ofGU. A GU normally has a node representing the whole phrase, one or more nodes representing constituents of the phrase, and one or more nodes representing semantic or pragmatic aspects of the phrase.</Paragraph>
      <Paragraph position="3"> Figure 1 shows how a lexical ennui would be represented in a simplified version of the system which does not incorporate information about sequencing. Nodes are denoted by rectangles and pairs of connections by lines. For convenience schema boundaries are indicated by fuzzy rectangles with rounded comers, but these boundaries have no significance in processing. Node names likewise  are shown for convenience only; they are not accessible to the basic procedures. Names of lexical entries begin with an asterisk. I.owercase names indicate roles, and role names preceded by a colon are abbreviations of longer names. In the figure, for example, &amp;quot;:content&amp;quot; represents the CONTENT of *SEND-MAIL. The lexical entry shown in the figure, *SEND-MAIL, represents clauses with a form of the word send as their main verb, the concept of ABSTRACT-TRANSFER as their CONTENT, and MAIL as the MEANS of the transfer. The schema is represented as a subtype of the general schema for clauses, from which *SEND-MAIL implicitly inherits other iniormation (not shown in the figure).</Paragraph>
      <Paragraph position="4"> Note that the *SEND-MAIL entry includes tile information needed to associate semantic and synaetic roles. For example, there is a connection joining the CONTENT of the SUBJECT 3 constituent with the ACTOR of the CONTENT of the whole clause, that is, the person performing the instance of ABSTRACT-TRANSFER that is being referred to. The other two constituents shown represent the noun phrases referring to the semantic OBJECT and the RECIPIENT of the ABSTRACT-TRANSFER. The former could also be referred to as the &amp;quot;direct object&amp;quot; of the clause. The latter is realized either as an &amp;quot;indirect object&amp;quot;, as in Mary sent John the letter, or a prepositional phrase with to, as in Mary sent the letter to John.</Paragraph>
    </Section>
    <Section position="2" start_page="186" end_page="186" type="sub_section">
      <SectionTitle>
3.2. Processing in General
</SectionTitle>
      <Paragraph position="0"> Each node in the network has at any given time an activation level. When the activation of a node reaches its filing threshold, the node fires and sends activation along all of its output comaections. The firing of a node represents a decision made by the system. For example, the selection of a schema matching an input pattern is represented by the firing of the head node of the schema. Following firing, a node is inhibited for an interval during which its state is unaffected by inputs from other nodes. After this interval has passed, the node retains a small amount of positive activation and can be further activated from other nodes.</Paragraph>
      <Paragraph position="1"> Tim amount of activation spreading from one node to another is proportional to the weight on the connection from the source to the destination node. The weight may be high enough to cause the destination node to fire on the basis of that activation alone. For example, when activation spreads along a cmmection from an instance to a type node, say, from JOHN to HUMAN, we generally want the type node to fire immediately. In most cases, however, activation from more than one source is required for a node to fire. Connection weights may also be negative, in which case the relationship is an inhibitory one because the negative activation spread lessens the likelihood of the destination node's firing.</Paragraph>
      <Paragraph position="2"> To simulate parallelism, the process is broken into time steps.</Paragraph>
      <Paragraph position="3"> During each time step, activation spreads from each firing node to the set of nodes directly connected to it. (In some cases activation may continue to spread beyond this point.) Sometimes we want only one node from a set to fire at a given time. For example, in the generation of a clause, the system should select only one of the set of verb lexical entries. In such cases the members of the set form a network of mutually inhibiting nodes called a winner-take-all (WTA) network (Feldman &amp; Ballard 1982).</Paragraph>
      <Paragraph position="4"> The nodes art; activated through the firing of a source node which is connected to all of the network members. At this time one of the network memher nodes may already have enough activathm to fire. If not, a specified interval is allowed to pass and if none of the members has yet fired, they receive additional activation, which is usually enough to cause one of them to fire. In any case, when one of the nodes fires, it immediately inhibits the others, effectively preventing them from firing for the time being.</Paragraph>
    </Section>
    <Section position="3" start_page="186" end_page="186" type="sub_section">
      <SectionTitle>
3.3. Language Processing
</SectionTitle>
      <Paragraph position="0"> Language processing can be viewed as a series of selections, eacll made or, the basis of a set of factors which make quantitative contributions to the decisions. During sentence generation the items selected include general morphosyntaetic patterns for the sentence and its constituents (e.g., STATEMENT, COULD-YOU-QIJESTION, COUNTABLF.-NP, etc.) and a set of lexical items to fill the slots in these patterns. Dining sentence ~malysis the items selected include word senses, semantic roles to be assigned to referents, and intentions to be attributed to the speaker.</Paragraph>
      <Paragraph position="1"> In the present model the selection process is implemented in terms of 1) the parallel convergence of activation on one or more candidate nodes and 2) the eventual domin,'mce of one of these nodes over the others as a result of mutual inhibition through a WTA network. Consider the case of lexical selection in generation. All lcxical entries, such as *SEND-MAIL above, have a CONTENT role, and it is through this role that entries are selected during generation. Activation converges on the CONTENT role of a lexical entry starting from nodes representing conceptual features of an input. Any number of lexical e\[mies may receive some activation for a given input, but N~.canse the CONTENT roles of entries inhibit each other through a WTA network, only one is selected.</Paragraph>
      <Paragraph position="2"> Input to generation consists of a set of firing nodes representing a goal of the speaker. As activation spreads from the input nodes, it conw:rges on nodes representing a general pattern appropriate for the goal type, for example, the STATEMENT pattern, and a set of patterns apprnpdate for the propositional content of the goal. These include lexical patterns such as *SEND-MAIL and *LETTER as well as gl~unmatical patterns such as PAST-CLAUSE and INDEFINITE-NP.</Paragraph>
      <Paragraph position="3"> While some important aspects of parsing have not yet been implemented in CItlE, the basic mechanism works for parsing as well as for generation. Input consists of firing nodes representing words.</Paragraph>
      <Paragraph position="4"> These are given to the progran~ at intervals of four time steps.</Paragraph>
      <Paragraph position="5"> Activation from the word nodes converges on entries for lexical and syntactic patterns. For definite noun phrases, this leads to the firing of nodes representing referents. Verb entries specify the general proposition types and also provide for temporary &amp;quot;role binding&amp;quot;. Role binding amomtts to the firing in close proximity of a node or set of nodes representing a referent and a node representing its semantic role in the proposition. However, the program, like most other connectionist models, currently has no way of storing these role bindings in long-term memory.</Paragraph>
      <Paragraph position="6"> The model also has a decay mechanism reflecting the importance of recency in processing. The activation level of all nodes decreases at a fixed rate.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML