File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/80/c80-1026_abstr.xml

Size: 21,480 bytes

Last Modified: 2025-10-06 13:45:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C80-1026">
  <Title>A PRODUCTION SYSTEM MODEL OF FIRST LANGUAGE ACQUISITION</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> AMBER is a model of first language acquisition that improves its performance through a process of error recovery. The model is implemented in ACTG, an adaptive production system language. AMBER starts with the ability to say only one word at a time, but adds rules for inserting additional words in the correct order, based on comparisons between predicted and observed sentences. These insertion rules may be overly general and lead to errors of commission; in turn, these lead to more conservative rules with additional conditions. AMBER's learning mechanisms account for many of the developments observed in children's speech.</Paragraph>
    <Paragraph position="1"> Introduction The acquisition of language has been a popular topic among researchers in Artificial Intelligence. Impressive language learning programs have been developed by Siklossy \[i\], Hedrick \[2\], Anderson \[3\], Selfridge \[4\], and Berwick \[5\]. The generality and power of these systems vary greatly, but they share one characteristic: none of the programs provide a psychologically plausible model of children's language learning.</Paragraph>
    <Paragraph position="2"> In this paper I describe the beginnings of a more realistic model of first language acquisition. This model is called AMBER, an acronym for Acquisition Model Based on Error Recovery. As its name implies, the model simulates the incremental nature of the child's language learning process. AMBER is concerned with the production component of children's speech, since most of the reliable data relate to production rather than the understanding process.</Paragraph>
    <Paragraph position="3"> Below I summarize the major developments found during this period.</Paragraph>
    <Paragraph position="4"> After this, I present an overview of ACTG, the production system language in which the model is stated. Next I consider some assumptions about the child's linguistic knowledge at various stages during the learning process.</Paragraph>
    <Paragraph position="5"> After considering the initial and final stages of AMBER, I discuss the learning mechanisms leading to the transition process. Finally, I consider the limitations of the model and propose directions for future research.</Paragraph>
    <Paragraph position="6"> The Mayor Phenomena Children do not learn language in an all-or-none fashion. They begin their linguistic careers uttering one word at a time, and slowly evolve through a number of stages, each co~taining speech more like that of the adult than the one before. In this section I discuss the features of the three stages which AMBER attempts to explain. I discuss these stages in their order of occurrence, dealing only with the major phenomena in each case.</Paragraph>
    <Paragraph position="7"> The One-Word Stage Around the age of one year, most children begin to produce words in isolation, and continue this strategy for some months. Presumably tile child spends much of this period connecting particular words to particular concepts; once this has been done, he can produce these words under the appropriate circumstances. AMBER does not attempt to explain the word-learning process.</Paragraph>
    <Paragraph position="8"> Like Anderson's LAS \[3\], it assumes that links between words and concepts have already been established.</Paragraph>
    <Paragraph position="9"> Bloom \[6\] has examined this period in detail, with an eye to understanding the relation between the one-word stage and those which follow it. Early in this stage, successive one-word utterances seem entirely disconnected; the child randomly comments on anything that happens to be in the environment.</Paragraph>
    <Paragraph position="10"> Later, he begins to name in succession different aspects of the same event or object; words are still separated by noticeable pauses and no regular order can be detected, but conceptual continuity seems present. Moreover, this development occurs only a few months before the child begins to combine words into very simple sentences. AMBER's starting point lies somewhere within this later part of the one-word stage.</Paragraph>
    <Paragraph position="11">  Teleg \[.#phi c Speech Around the age of 18 months, the child begins to combine words into meaningful sequences. In order-based languages such as English, the child usually follows the adult order.</Paragraph>
    <Paragraph position="12"> Initially only pairs of words are produced, but these are followed by three-word and later by four-word utterances. The simple sentences occurring in this stage consist almost entirely of content words. Brown \[7\] has described speech during this period as telegraphic, since g rammatical morphemes such as tense endings and prepositions are absent, as they would be in a telegram.</Paragraph>
    <Paragraph position="13"> Brown has also noted that the majority of two-word utterances express a rather small set of pairwise semantic relations. AMBER assumes a small number of case relations such as agent, action, and possession from which Brown's pairwise relations can be derived. In addition, tile child uses a few function words like &amp;quot;there&amp;quot;, &amp;quot;more&amp;quot;, and &amp;quot;all-gone&amp;quot; to express simple forms of nomination, recurrence, and negation.</Paragraph>
    <Paragraph position="14"> AMBER attempts to learn the relative word orders for expressing these recurring relations.</Paragraph>
    <Paragraph position="15"> The Acquisition of Grammatical Morphemes Brown \[7\] has also studied the period from about 24 to 40 months, during which the child masters the grammatical morphemes which were absent during the previous stage. Brown pointed out that these morphemes modulate the major meanings of sentences which are expressed through content words. AMBER reflects this distinction by representing the information expressed by contents words and grammatical morphemes in different ways. These morphemes are learned gradually; the time between the initial production of a morpheme and its mastery (i.e., when it is correctly used in all required contexts) may be as long as 16 months. In addition, Brown has examined the order in which 14 English morphemes are acquired, and has found this order to be remarkably consistent across children. For example, present progressive (eating) and plural (dogs) were always learned quite early, while third person singular (eats) and copulas (is, are) took longer. He found that the syntactic and semantic complexities of the morphemes were highly correlated with their order of mastery. Since the current version of AMBER cannot deal with exceptions, I will consider only regular constructions in this paper.</Paragraph>
    <Paragraph position="16"> The ACTG Formalism AMBER is implemented in ACTG, an adaptive production system language.</Paragraph>
    <Paragraph position="17"> Below I present an overview of AC%G, beginning with a discussion of its propositional network. After this I consider the representation of procedures as productions. Finally, I examine ACTG's facilities for changing its own behavior through the creation of new productions.</Paragraph>
    <Paragraph position="18"> The Propositional Network ACTG stores its factual, declarative knowledge in a long-term propositional network. Individual facts are stored as propositions, which may be arbitrary list structures. As we will see in more detail below, AMBER incorporates two main types of propositions. One sort expresses a goal to say a particular word in a certain position. The second type of proposition expresses various kinds of relations, including facts like x possesses y, y is a *ball, and &amp;quot;ball&amp;quot; is the word for *ball (where concepts are preceded by &amp;quot;*&amp;quot; to distinguish them from their associated words).</Paragraph>
    <Paragraph position="19"> At any given time, some subset of the propositional network is active. Many of the active propositions have been recently added to the network by productions. Others, after lying dormant for a time, have been reactivated through their association (i.e., sharing of symbols) with other recently activated facts. AMBER uses this process of spreading activation primarily to retrieve information about the words associated with particular concepts. The level of activation for a proposition naturally decays over time, unless it is offset by other factors.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
The Production System
</SectionTitle>
      <Paragraph position="0"> ACTG represents procedural knowledge as a set of condition-action rules called productions. The conditions and actions of these rules can be quite general, since they may contain variables that match against arbitrary structures. When all the conditions of a production match against some portion of active memory, its actions may be carried out. These may interact with the environment, or add new propositions to the active part of the network.</Paragraph>
      <Paragraph position="1"> Structures matching variables in the conditions remain bound to these variables in the actions. After a production has been applied, the state of memory is reexamined and the system cycles.</Paragraph>
      <Paragraph position="2">  184-If two or more productions are found to be true, one must be selected in preference to the others. This decision is based on the relative strength ~ of the productions, and on the summed activations of the propositions matched by each. The product of these two numbers is computed, and the production with the highest value is selected.</Paragraph>
      <Paragraph position="3"> Since a single production can match against a set of propositions in different ways, ties may sometimes occur. In such cases, one of the matches is selected at random.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
The ACTG Learning Mechanisms
</SectionTitle>
      <Paragraph position="0"> ACTG incorporates a powerful set of mechanisms for modeling learning phenomena. The most basic of these is the designation process, which allows the creation of a new production as one of the actions of an existing rule.</Paragraph>
      <Paragraph position="1"> Variables bound in the conditions of the learning rule are passed to the offspring, making the new rule more specific than its creator. Most of AMBER'S learning heuristics rely on the designation process.</Paragraph>
      <Paragraph position="2"> A second mechanism leads to the strengthening of a production each time it is recreated. Since the strength of a rule plays an important role in the selection phase, productions which have been relearned many times will be preferred. On the other hand, the strength of a rule can be decreased if it leads to an error, lowering its chances for selection.</Paragraph>
      <Paragraph position="3"> The discovery of an error also leads to a call on the discrimination process. Here the recent firings of the responsible production are examined. If one or more propositions have been present at successful firings and absent at faulty ones, they are added as extra conditions on a new, more conservative version of the rule. Together with the strengthening and weakening processes, this mechanism gives ACTG the ability to recover from overgeneralizations.</Paragraph>
      <Paragraph position="4"> AMBER's Linguistic Knowledge Learning is the result of an interaction between a set of relatively general techniques for acquiring knowledge and the environment in which they find themselves. In this section I consider AMBER's representation of that environment. After this I examine the procedures the model assumes at the outset, as well as the form of the rules at which it eventually arrives.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Representing Sentences
</SectionTitle>
      <Paragraph position="0"> Before AMBER can learn how to generate legal sentences, it must be exposed to examples of such sentences. One might represent a sentence as a simple list of words in the order they are said. However, though children learn to produce words in the correct order very early on, they also omit many words that an adult would include. For example, the utterance &amp;quot;Daddy ball&amp;quot; omits information about the action being carried out, as well as tense information. AMBER's representation of the sentences it hears reflects this ability to note order in the absence of information about adjacency.</Paragraph>
      <Paragraph position="1"> The model represents the occurrence of each morphem e as a separate proposition, each containing information about the speaker, the word being produced, and the relative order of occurrence. Thus, the fact that Mommy said the sentence &amp;quot;Daddy bounces the ball&amp;quot; would be stored as a set of seven propositions: (said 1 Mommy pause); (said 2 Mommy Daddy); (said 3 Mommy bounce); (said 4 Mommy s) ; (said 5 Mommy the); (said 6 Mommy ball); and (said 7 Mommy pause). The first and last propositions act as delimitors which mark the beginning and end of the sentence. This representation, combined with ACTG's pattern-matching capability, allows the statement of learning rules which focus on relative word order but ignore adjacency information. The resulting production rules omit words, just as the child does.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Representing Meaning
</SectionTitle>
      <Paragraph position="0"> Adults conversing with a child almost invariably discuss recent or ongoing events, so that the child can associate some event with every sentence he hears.</Paragraph>
      <Paragraph position="1"> The language acquisition process does not consist solely of learning to produce or parse legal word combinations; it consists of learning the mapping between meanings and words.</Paragraph>
      <Paragraph position="2"> Accordingly, AMBER is presented not with isolated sentences as its data, but with sentence/meaning pairs.</Paragraph>
      <Paragraph position="3"> AMBER represents the meaning of a sentence as a number of propositions, each incorporating one of a small set of relations. The most prevalent of these is the type relation, which connects tokens to the various concepts of which they are examples. There is no restriction on the number of type relations which may come off a token;  -185--thus, the propositions (token-i type red) and (token-i type ball) state that the object token-i is both red and a ball. Events are represented with relations such as a~, action, and object. The propositlons (event-i agent token-2), (event-i action token-3), and (event-i object token-4) represent an event with an agent, action, and object whose types have yet to be specified.</Paragraph>
      <Paragraph position="4"> AMBER's representation makes a strong distinction between the main meaning of a sentence as expressed through its content words and the modulations of this meaning as expressed through its grammatical morphemes. The model assumes that a type relation pointing to a particular concept (e.g., *ball) is present for every content word (e.g., ball) found in the associated sentence. Moreover, a word-for relation is assumed present to establish the connection between word and concept. The presence of these two relations tells AMBER when a word contributes to the major meaning of a sentence.</Paragraph>
      <Paragraph position="5"> Modulations on this meaning are represented by a different set of relations, such as number, time-of-action, possession, and so forth. Some of these relations connect tokens to various values, as in token-i number singular) and token-2 time-of-action past). Others, as in (token-4 possesses token-5) and token-5 in token-6), actually relate tokens.</Paragraph>
      <Paragraph position="6"> AMBER's Initial Performance System AMBER starts with the ability to produce single words in isolation. But even at this stage, the model draws on a set of general heuristics for generating utterances which will still be useful after its learning is complete. AMBER does not say words as soon asthey come to mind; first there is an active planning stage during which sequential goals are set.</Paragraph>
      <Paragraph position="7"> The model starts with rules for initializing and ending this planning phase, and for implementing its plans once they are complete (that is, actually saying the words in the planned order). The goals which result from the planning process look very like the data from which AMBER learns. The two-word utterance &amp;quot;Daddy ball&amp;quot; would be represented by the propositions (goal 1 AMBER pause), (goal 2 AMBER Daddy), (goal 3 AMBER ball), and (goal 4 AMBER pause), in which the model is the speaker.</Paragraph>
      <Paragraph position="8"> At the outset, AMBER has only a single rule for inserting such goals in memory; stated in English for the sake of clarity and with its variables underlined, it is: If you have no goals yet, and you see vtoken with type v type, and vword is the word for vtype, then set up a goal to pause, followed by a goal to say vwo~d, followed by a goal to pause.</Paragraph>
      <Paragraph position="9"> This rule separates the goal utterance from others by initial and final pauses. Thus, even though successive words may describe different aspects of the same event, they will be separated by noticeable gaps just as Bloom observed. Only after additional rules have been formed for inserting sounds between the initial word and the pauses can multi-word utterances begin to occur.</Paragraph>
      <Paragraph position="10"> AMBER at Later Stages On the basis of comparisons between sentences it hears and those it predicts, AMBER creates and modifies rules for saying multiple words at a time. These rules lead to the insertion of new goals between existing ones.</Paragraph>
      <Paragraph position="11"> Thus, they are dependent on the innate rules described above for initializing the goal insertion process and for carrying out goals once they have been set.</Paragraph>
      <Paragraph position="12"> Imagine a situation in which AMBER sees Daddy bouncing a ball. Also suppose that the one-word rule we saw above happens to select &amp;quot;bounce&amp;quot; as the word that should be said. This would lead to three goals: (goal 1 AMBER pause), (goal 2 AMBER bounce), and (goal 3 AMBER pause). After some experience with English, the model will have generated a rule like: If you have a goal to pause, followed by a goal to say vword2, and you have no intermediate goals, and vword2 is the word for vtype2, and vtoken2 is of type vtype2, and vtoken2 is the action of vevent, and vtokenl is the agent of vevent, and Vtokenl is of type vty~el, and vwordl is the word for vtypel, then insert a goal to say vwordl between the other goals--This rule would add a goal to say the agent &amp;quot;Daddy&amp;quot; after the first pause and before &amp;quot;bounce&amp;quot;, using the proposition (goal 1.5 AMBER Daddy). Similar rules lead to the production of two- and three-word sentences expressing the major relations described by Brown.</Paragraph>
      <Paragraph position="13">  Later, AMBER also acquires rules for inserting grammatical morphemes. Since most grammatical morphemes are adjacent to the word whose meaning they modulate, they are generally inserted directly before or after the content word with which they occur. For example, a rule for regular pluralization might be stated: If you have a goal to say vword, and vword is the word for vtype, and vtoken is of type vtype, and vtoken is the agent of vevent, and the number of vtoken is plural, then insert a goal to say S directly after vword This rule is specific to the agent of an event, but similar rules could be learned for objects and locations. Some morphemes express a relation between two content words, such as the prepositions &amp;quot;in&amp;quot; and &amp;quot;on&amp;quot; and the morpheme for possession. In these cases, the morpheme is inserted between the two related content words.</Paragraph>
      <Paragraph position="14"> The Acquisition Process For a system to learn from its mistakes, it must be able to compare its own actions to the desired ~ones, note the differences between them, and modify its behavior accordingly. In this section I describe AMBER's error correction mechanisms. First I examine the model's prediction mechanism and its relation to the goal structures mentioned earlier. Next I discuss AMBER's response to errors of omission, first for content words and then for grammatical morphemes. Finally, I consider errors of commission and the resulting call on the discrimination mechanism.</Paragraph>
      <Paragraph position="15"> The Equivalence of Goals and Predictions AMBER learns by comparing its predictions about what will be said in a given situation to what it actually hears. However, a learning system must do more than improve its ability to predict; it must also improve its ability to perform. AMBER accomplishes this by using the same productions for making predictions and for planning its speech acts. As we saw above, these rules add goal structures such as (goal</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML