XML Viewer - w90-0103

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/w90-0103_metho.xml
Size: 27,949 bytes
Last Modified: 2025-10-06 14:12:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="W90-0103">
  <Title>A Connectionist Treatment of Grammar for Generation: Relying on Emergents</Title>
  <Section position="4" start_page="15" end_page="15" type="metho">
    <SectionTitle>
3 The FIG Approach to Generation
</SectionTitle>
    <Paragraph position="0"> Reduced to bare essentials, a generator's task is to get from concepts (what the speaker wants to express) to words (what he can say). On this view, the key problem in generation is computing the relevance (pertinence) of a particular word, given the concepts to express. Syntactic and other knowledge mediates this computation of relevance.</Paragraph>
    <Paragraph position="1"> Accordingly FIG is based on word choice -- every other consideration is analyzed in terms of how it affects word choice.</Paragraph>
    <Paragraph position="2"> FIG is based on a large semantic network. Words are nodes in the network, the activation they receive represents evidence for their relevance. The basic FIG algorithm is:  1. each node of the input is a source of activation 2. activation flows through the network 3. when the network settles, the most highly activated word is selected and emitted 4. activation levels are updated to represent the new current state 5. steps 2 through 4 repeat until all of the input has been conveyed  Thus FIG is an incremental generator. Its network must be designed so that, when it settles, the node which is most highly activated corresponds to the best next word. This paper discusses only the network structures which encode syntactic knowledge.</Paragraph>
    <Paragraph position="3"> Elsewhere I argue that FIG points the way to accurate and flexible word choice (Ward 1988), producing natural-sounding output for machine translation (Ward 1989c), and modeling the key aspects of the human language production process (Ward 1989a).</Paragraph>
  </Section>
  <Section position="5" start_page="15" end_page="16" type="metho">
    <SectionTitle>
4 Conneetionist Syntax: Overview
</SectionTitle>
    <Paragraph position="0"> In FIG constructions and constituents also are represented as nodes in the knowledge network. Their activation levels represent their current relevance. They interact with other nodes by means of activation flow. Any number of constructions can be simultaneously active. This handles part-wise parallelism, competition, and superimposition.</Paragraph>
    <Paragraph position="1"> Syntactic considerations manifest themselves only through their effects on the activation levels of words (directly or indirectly). An utterance is simply the result of successive word choices. FIG does produce grammatical sentences, most of the time, but their 'syntactic structure' is emergent, a side-effect of expressing the meaning. Thus we can say that the syntactic form of utterances is emergent in FIG 2. This point will be illustrated repeatedly in Section 6.</Paragraph>
    <Paragraph position="2"> Mechanisms developed by linguists (and often adopted by generation researchers), such as unification, are not directed to the task of generation (or parsing) so much as to the goal of explaining sentence structure. Accounting for the structure of sentences may be a worthwhile goal for lingnistics, but building syntactic structures is not necessary for language generation, as subsequent sections will show.</Paragraph>
    <Paragraph position="3"> The most common metaphor for generation is that of making choices among alternatives. For example, a generator may choose among words for a concept, among ways to syntactically realize a constituent, and among concepts to bind to a slot. Given this metaphor, organizing choices becomes the key problem in generator design. Attempts to build parallel generators while retaining the notion of explicit choice run up against problems of sequencing the choices or of doing bookkeeping so that the order of choices can vary. This appears to be difficult, judging by the general paucity of published outputs in descriptions of parallel generators. On the other hand, relying on emergents means 2post hoe examination of FIG output might make one think, for example, 'this exhibits the choice of the existential-there construction.' In HG there is indeed an inhibit link between the nodes ex-there and subj-pred, and so when generating the network tends to reach a state where only one of these is highly activated. The most highly activated construction can have a strong effect on word choices, which is why the appearance of syntactic choice arises.</Paragraph>
    <Paragraph position="4">  there are no explicit choices to worry about, and thus there are no problems of ordenng or bookkeeping at all(Ward 1989b).</Paragraph>
    <Paragraph position="5"> In FIG all types of knowledge represented are uniformly in the network, and interact freely at run time. FIG not only allows this kind of interaction among various considerations when generating, it relies on it. It relies on synergy among constructions in the same way that Construction Grammar does. It relies on synergy between semantic and syntactic considerations, as seen below in Section 6.7. It also enables interaction among lexical choices and syntactic considerations. null</Paragraph>
  </Section>
  <Section position="6" start_page="16" end_page="17" type="metho">
    <SectionTitle>
5 Knowledge of Syntax
</SectionTitle>
    <Paragraph position="0"> This section presents FIG's representation of knowledge, first presenting it in a declarative form then showing how that representation maps into network structures.</Paragraph>
    <Paragraph position="1"> Starting with this section I will be largely describing FIGas-implemented, as of May 1990. This is for the sake of concreteness. The theory, however, is intended to apply to parallel generators in general. Moreover, the syntactic knowledge presented in this section is purely illustrative. I do not claim that these represent the facts of English, nor the best way to describe them in a grammar. In particular, many generalizations are not captured. The examples are intended simply to illustrate the representational tools and computational mechanisms available in FIG. Many details are left unexplained for lack of space.</Paragraph>
    <Paragraph position="2"> Figure 1 shows FIG's definition of noun-phr, representing the English noun-phrase construction. This construction has three constituents: np-1, np-2, and np-3. rip-1 and np-3 are obligatory, np-2 is optional. Glossing over the details for the moment, the list at the end of each constituent's definition specifies how to realize the constituent. For example, np-1, np-2, and np-3, should be realized as an article, adjective, and noun, respectively.</Paragraph>
    <Paragraph position="3"> Figure 2 shows the construction for the case frame of the word &amp;quot;go.&amp;quot; First comes go-w, for the word &amp;quot;go,&amp;quot; which is obligatory. Next come (optionally): a verb-particle representing direction (as in &amp;quot;go away&amp;quot; or &amp;quot;go back home&amp;quot; or &amp;quot;go down to the lake&amp;quot;), a prepositional phrase to express the destination, and a propose clause.</Paragraph>
    <Paragraph position="4"> Figure 3 shows the representation of the existential &amp;quot;there&amp;quot; construction, as in &amp;quot;there was a poor cobbler.&amp;quot; The 'inhibit' field indicates that this construction is incompatible with the passive construction and also with subj-pred, the construction responsible for the basic SVO ordering of English.</Paragraph>
    <Paragraph position="5"> Figure 4 shows knowledge about when and where constructions are relevant. Bdetty, constructions are associated with words, with concepts, and with other constructions.</Paragraph>
    <Paragraph position="6"> Constructions are associated with the meanings they can express. For example, ex-there is listed under the concept introductory, representing that this construction is appropriate for introducing some character into the story, and purpose-clause is listed as a way to express the purposer relation.</Paragraph>
    <Paragraph position="7"> Constructions are associated with words. For example go-p is the 'valence' (case frame) of go-w and noun-phr is the 'maximal' of cnoun.</Paragraph>
    <Paragraph position="8"> Constructions are also associated with other constructions. For example, the fourth constituent of go-p subcategodzes for purpose-clause (Figure 2); and there are negative associations among incompatible constructions, for example the 'inhibit' link between ex-there and subj-pred (Figure 3).</Paragraph>
    <Paragraph position="9"> Figure 5 shows a fragment of FIG's network, where the numbers on the links are their weights. This is partially  specified by the knowledge shown in the previous figures.</Paragraph>
    <Paragraph position="10"> The mapping from s-expressions to network structures is not quite trivial. For example, the link from noun to peaehw comes from the statements that peachw has 'subcat' cnoun and that cnoun has 'bigcat' noun. Similarly, the link from peaehw to noun-phr is inherited by peachw from the 'maximals' information on cnoun.</Paragraph>
  </Section>
  <Section position="7" start_page="17" end_page="18" type="metho">
    <SectionTitle>
6 Various Syntactic Phenomena
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="17" end_page="17" type="sub_section">
      <SectionTitle>
6.1 Constituency
</SectionTitle>
      <Paragraph position="0"> The links described above suffice to handle constituency.</Paragraph>
      <Paragraph position="1"> Consider for example the fact that common nouns must be preceded by articles in FIG's subset of English. Suppose that peachw is activated, perhaps because a peache concept is in the input. Activation flows from peachw via nounphr, rip-l, and article to a-w and the-w.</Paragraph>
      <Paragraph position="2"> In this way the relevance of a noun increases the relevancerating of articles. Provided that other activation levels are appropriate, this will cause some article to become the most highly activated word, and thus be selected and emitted. Note that FIG does not first choose to say a noun, then decide to say an article; rather the these 'decisions' emerge as activation levels settle.</Paragraph>
      <Paragraph position="3"> Any node can be mentioned by a constituent, thus constructions can specify: which semantic elements to include (metonymies), what order to mention things in, what function words to choose, and what inflections to use.</Paragraph>
    </Section>
    <Section position="2" start_page="17" end_page="17" type="sub_section">
      <SectionTitle>
6.2 Subcategorization
</SectionTitle>
      <Paragraph position="0"> Consider the problem of specifying where a given concept should appear and what syntactic form it should take.</Paragraph>
      <Paragraph position="1"> In FIG this is handled by simultaneously activating a concept node and a syntactic construction or category node. For example, the third constituent of go-p specifies that 'the direction of the going' be expressed as a 'verbal particle.' Activation will thus flow to an appropriate word node, such as downw, both via the concept filling the directionr slot and via the syntactic category vparticle. Thanks to this sort of activation flow FIG tends to select and emit an appropriate word in an appropriate form (Ward 1988). Government, for example, the way that some verbs govern case markers, is handled in the same way.</Paragraph>
    </Section>
    <Section position="3" start_page="17" end_page="18" type="sub_section">
      <SectionTitle>
6.3 Word Order
</SectionTitle>
      <Paragraph position="0"> In an incremental connectionist generator, at each time the activation level of a word must represent its current relevance. In particular, words which are currently syntactically appropriate must be strongly activated. In FIG the representation of the current syntactic state is distributed across the constructions. There is no central process which plans or manipulates word order; each construction simply operates  independently. More highly activated constructions send out more activation, and so have a greater effect. But in the end, FIG just follows the simple rule, 'select and emit the most highly activated word.' Thus word order is emergent.</Paragraph>
      <Paragraph position="1"> In FIG the current syntactic state is encoded in constructions' activation levels and 'cursors.' The cursor of a construction points to the currently appropriate constituent and ensures that it is relatively highly activated. To be specific, the cursor gives the location of a 'mask' specifying the weights of the links from the construction to constituents.</Paragraph>
      <Paragraph position="2"> The mask specifies a weight of 1.0 for the constituent under the cursor, and for subsequent constituents a weight proportional to their closeness to the cursor. (Subsequent constituents must receive some activation so that there is part-wise parallelism.) (For unordered constructions the weights on all construction-constituent links are the same.) For example, when the cursor of noun-phr points to np1, articles receive a large proportion of the activation of noun-phr. Thus, an article is likely to be the most highly activated word and therefore selected and emitted. After an article is emitted the cursor is advanced to np-2, and so on.</Paragraph>
      <Paragraph position="3"> Advancing cursors is described in Section 6.5.</Paragraph>
      <Paragraph position="4"> In accordance with the intuition that a word is not truly appropriate unless it is both syntactically and semantically appropriate, the activation level for words is given by the product (not the sum) of incoming syntactic and semantic activation, where 'syntactic activation' is activation received from constituents and syntactic categories. The problem with simply summing is that it results in the the network often being in a state where many word-nodes have nearly equal activation, which makes the behavior is oversensitive to minor changes in link weights.</Paragraph>
    </Section>
    <Section position="4" start_page="18" end_page="18" type="sub_section">
      <SectionTitle>
6.4 Optional Constituents
</SectionTitle>
      <Paragraph position="0"> When building a noun-phrase a generator should emit an adjective if semantically appropriate, otherwise it should ignore that option and emit a noun next. FIG does this without additional mechanism.</Paragraph>
      <Paragraph position="1"> To see this, suppose &amp;quot;the&amp;quot; has been emitted and the cursor of noun-pbr is on its second constituent, np-2. As a result adjectives get activation, via rip-2, and so to a lesser extent do nouns via np-3. There are two cases: If the input includes a concept linked (indirectly perhaps) to some adjective, that adjective will receive activation from it. In this case the adjective will receive more syntactic activation than any noun does, and hence have more total activation, so it will be selected next. If the input does not include any concept linked to an adjective, then a noun will have more activation than any adjective (since only the noun receives semantic activation also), and so a noun will be selected next.</Paragraph>
      <Paragraph position="2"> Most generators use some syntax-driven procedure to inspect semantics and decide explicitly whether or not to realize an optional constituent. In FIG, the decision to include or to omit an optional constituent (or adjunct) is emergent -- ff an adjective becomes highly activated it will be chosen, in the usual fashion, otherwise some other word, most likely a noun, will be.</Paragraph>
    </Section>
    <Section position="5" start_page="18" end_page="18" type="sub_section">
      <SectionTitle>
6.5 Updating Constructions
</SectionTitle>
      <Paragraph position="0"> Recall that FIG, after selecting and emitting a word, updates activation levels to represent the new state. There are are several aspects to this.</Paragraph>
      <Paragraph position="1"> The cursors of constructions must advance as constituents are completed. The update mechanism can 'skip over' 'opt constituents, since, for example, ff there are no adjectives, the cursor of noun-phr should not remain stuck forever at the second constituent. More than one construction may be updated after a word is output, for example, emitting a noun may cause updates to both the prep-phr construction and the noun-phr construction.</Paragraph>
      <Paragraph position="2"> Constructions which are 'guiding' the output should be scored as more relevant. Therefore the update process adds activation to those constructions whose cursors have changed and sets temporary lower bounds on their activation levels. Thus, even though FIG does not make any syntactic plans, it tends to form a grammatical continuation of whatever it has already output. After the last constituent of a construction has been completed, the cursor is reset and the lower bound is removed.</Paragraph>
      <Paragraph position="3"> Why is a separate update mechanism necessary? Most generators simply choose a construction and 'execute' it straightforwardly. However, in FIG no construction is ever 'in control.' For example, one construction may be strongly activating a verb, but activation from other constructions may 'interfere,' causing an adverbial, for example, to be interpolated. Therefore constructions need this kind of feed-back on what words have been output.</Paragraph>
    </Section>
    <Section position="6" start_page="18" end_page="18" type="sub_section">
      <SectionTitle>
6.6 No Instantiation or Binding
</SectionTitle>
      <Paragraph position="0"> It is not obvious that notions of instanfiafion, binding, embedding, or recursion are essential for the description of natural language. Nor are mechanisms for these things essential for the generation task, I conjecture. This subsection considers a problem which is usually handled with instantiation and shows how it can be handled more simply without.</Paragraph>
      <Paragraph position="1"> Consider the problem of generating utterances with multiple 'copies,' for example, several noun phrases, or several uses of &amp;quot;a&amp;quot;. Note that FIG as described so far would have problems with this. For example since all words of category cnoun have links to noun-phr, that node might receive more activation than appropriate, in cases when several nouns are active. This could result in over-activation of articles, and thus premature output of &amp;quot;the,&amp;quot; for example.</Paragraph>
      <Paragraph position="2"> In fact FIG uses a special rule for activation received across inherited links: the maximum (not the sum) of these amounts is used. For example, this rule applies to the 'maximal' links from nouns to noun-phr, thus noun-phr effectively 'ignores' all but the most highly activated noun. (This was not shown in Figure 5.)  An earlier version of FIG handled this by actually making copies. For example, it would make a copy of noun-phr for each noun-expressible concept, and bind each copy to the appropriate concept, and to copies of a-w and thew. This worked but it made the program hard to extend.</Paragraph>
      <Paragraph position="3"> In particular, it was hard to choose weights such that the network would behave properly both before and after new nodes were inslantiated and linked in.</Paragraph>
    </Section>
    <Section position="7" start_page="18" end_page="18" type="sub_section">
      <SectionTitle>
6.7 Low-level Coherence
</SectionTitle>
      <Paragraph position="0"> Words must stand in the correct relations to their neighbors. For example, a generator must not produce &amp;quot;the big man went to the mountain&amp;quot; when the input calls for &amp;quot;the man went to the big mountain&amp;quot;. This is the problem of emitting the right adjective at the right time, or, in Other words, only emitting adjectives that stand in an appropriate relation to the head noun.</Paragraph>
      <Paragraph position="1"> Most generators handle this easily with structure-mapping or pointer following. For example, a syntax-directed generator may, whenever building a noun phrase, traverse the 'modified-by' pointer to find the item to turn into an adjective. FIG, however, eschews structure manipulation and pointer following. Like all connectionist approaches, therefore, it is potentially subject to problems with crosstalk.</Paragraph>
      <Paragraph position="2"> The way to avoid this is to ensure that related concepts become highly activated together. In the example, bige should become activated together with mountainc, not together with old-mane. Using a more elaborate terminology, this means that there should be some kind of 'focus of attention' (Chafe 1980), which successively 'lights up' groups of related nodes.</Paragraph>
      <Paragraph position="3"> This condition is met in FIG, thanks to the links among the nodes of the input. For example, if mountaincl is linked by a sizer link to bigel, then bigcl will tend to become highly activated whenever mountaincl is. Thus, when oldmancl is the most highly activated concept-node, bigel will only receive energy from it indirectly (via an inverseagentr link, a locationr link, and a sizer link) and thus will not be activated sufficiently to interfere early in the sentence. null</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="18" end_page="20" type="metho">
    <SectionTitle>
7 Example
</SectionTitle>
    <Paragraph position="0"> This section describes how FIG produces &amp;quot;the old woman went to a stream to wash clothes.&amp;quot; For this example the input is the set of nodes go-el, old-womancl, washelothescl, streamcl, and paste, linked together as shown in Figure 6. (The names of the concepts have been anglicized for the reader's convenience.) (Boxes are drawn around nodes in the input so that they can be easily identified in subsequent diagrams.) Initially each node of the input has 11 units of activation.</Paragraph>
    <Paragraph position="1"> After activation flows, before any word is output, the most highly activated word node is the-w, primarily for the reasons shown in Figure 7. Figure 8 shows the activation levels of selected nodes.</Paragraph>
    <Paragraph position="2"> After &amp;quot;the&amp;quot; is emitted the update mechanism activates noun-phr and advances its cursor to np-2. The most highly activated word becomes old-womanw, largely due to activation from np-3.</Paragraph>
    <Paragraph position="3"> After &amp;quot;old woman&amp;quot; is emitted noun-phr is reset -- that is, the cursor is set back to np-1 and it thereby becomes ready to guide production of another noun phrase. Also, now the cursor on subj-pred advances to sp-2. As a result verbs, in particular go-w, become highly activated.</Paragraph>
    <Paragraph position="4">  go-w is selected. Because pastc has more activation than presentc, infinitivec and so on, go-w is inflected and emitted as &amp;quot;went&amp;quot; (the inflection mechanism is not described in this paper), go-p's cursor advances to its second constituent, thus it activates directional particles, although there is no semantic input to any such word in this case. tolw becomes the most highly activated word, primarily for the reasons shown in Figure 9.</Paragraph>
    <Paragraph position="5"> After &amp;quot;to&amp;quot; is emitted, the cursor of prep-phris advanced. The key path of activation flow is now from the second constituent of prep-phr to noun to streamw to noun-phr to article to a-w. Thus a is selected. The inflection mechanism produces &amp;quot;a&amp;quot; not &amp;quot;an&amp;quot; since consnt-initial is more highly activated than vowel-initial.</Paragraph>
    <Paragraph position="6"> Then the cursor of noun-phr advances and &amp;quot;stream&amp;quot; is emitted. After this the cursor of go-p advances to gp-4.</Paragraph>
    <Paragraph position="7"> From this constituent activation flows to purpose-clause, and in due course &amp;quot;to&amp;quot; and &amp;quot;wash clothes&amp;quot; are emitted. Now that all the nodes of the input are expressed, FIG ends, having produced &amp;quot;the old woman went to a stream to wash clothes.&amp;quot;</Paragraph>
  </Section>
  <Section position="9" start_page="20" end_page="21" type="metho">
    <SectionTitle>
8 About the Implementation
</SectionTitle>
    <Paragraph position="0"> I have used a connectionist model because it is a good way to explore interactivity, parallelism, emergents, not because of fondness for connectionism-for-its-own-sake.</Paragraph>
    <Paragraph position="1"> Thus I have not attempted to develop a distributed connectionist model. Distributed models do have various advantages, such as elegant handling of generalizations and the potential for learning. Yet the current state of PDP technology does not seem up to building an interactive model of a complex task like language generation. I therefore developed FIG as a structured (localis0 connectionist system.</Paragraph>
    <Paragraph position="2"> I have also not attempted to make FIG a 'pure' connectionist model. For example, updating constructions is currently done by a special process that goes in and changes activation levels and moves the cursor. (This process uses the third elements in the constituent descriptions of Figures 1-3, not previously discussed.) FIG could be made more 'pure' by doing this connectionistically, perhaps by adding new nodes with special properties. But this change would not improve FIG's performance, since there seems no need for the update process to interact with the other processes.</Paragraph>
    <Paragraph position="3"> A connectionist model of computation allows parallelism and emergents, but it certainly does not require them. Indeed, other generators built using structured connectionism (Kalita &amp; Shastri 1987; Gasser 1988; Kitano 1989; Stolcke 1989) do not appear to exploit parallelism much, nor do they exhibit emergent properties. For example, Gasser's CHIE relies heavily on winner-take-all subnetworks, which cuts down on the amount of effective parallelism. Also, far from exploiting emergents, CHIE uses 'neuron firings' to model syntactic choices; these happen sequentially and the  exact order and timing of firings seems crucial.</Paragraph>
    <Paragraph position="4"> Currently FIG has about 350 nodes and 1000 links. Before each word choice, activation flows until the network settles down, with cutoff after 9 cycles. This takes about .2 seconds per word on average, simulating parallel activation flow on a Symbolics 3670 (1.6 seconds on a Sun 3/140)~ The correct operation of FIG depends on having correct link weights. I have no theory of weights, indeed rinding appropriate ones is still largely an empirical process. However there are regularities, for example, all 'inhibit' links have weight .7, almost all links from syntactic categories to their members have weight .5, and so on. Many of the weights have a rationale: for example, the link from rip-1 to articles has a relatively high weight because articles get very little activation from other sources. No single weight is meaningful; the way it functions in context is. For example, the exact weight of the link from the first constituent of subj-pred to noun is not crucial, as long as the product of it and the weight on the agentr relation is appropriate.</Paragraph>
    <Paragraph position="5"> FIG's knowledge is, of course, very limited. Adding new concepts, words or constructions is generally straightforward; they can be encoded by analogy to similar nodes, and usually the same link weights suffice. Occasionally new nodes and links interact with other knowledge in the system in unforeseen ways, causing other nodes to get too much or too little activation. In these cases it is necessary to debug the network. Sometimes trial-and-error experimentation is required, but often the acceptable range of weights can be determined by examination. This is a kind of back-propagation by hand; it could doubtless be automated.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML