XML Viewer - p99-1055

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1055_metho.xml
Size: 18,541 bytes
Last Modified: 2025-10-06 14:15:26
<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1055">
  <Title>A Selectionist Theory of Language Acquisition</Title>
  <Section position="4" start_page="429" end_page="431" type="metho">
    <SectionTitle>
2 A Selectionist Model of Language
Acquisition
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="429" end_page="430" type="sub_section">
      <SectionTitle>
2.1 The Dynamics of Darwinian Evolution
</SectionTitle>
      <Paragraph position="0"> Essential to Darwinian evolution is the concept of variational thinking (Lewontin, 1983). First, differ1 Note that the transformational approach is not restricted to UG-based models; for example, Brill's influential work (1993) is a corpus-based model which successively revises a set of syntactic_rules upon presentation of partially bracketed sentences. Note that however, the state of the learning system at any time is still a single set of rules, that is, a single &amp;quot;grammar&amp;quot;.</Paragraph>
      <Paragraph position="1">  ences among individuals are viewed as &amp;quot;real&amp;quot;, as opposed to deviant from some idealized archetypes, as in pre-Darwinian thinking. Second, such differences result in variance in operative functions among individuals in a population, thus allowing forces of evolution such as natural selection to operate. Evolutionary changes are therefore changes in the distribution of variant individuals in the population. This contrasts with Lamarckian transformational thinking, in which individuals themselves undergo direct changes (transformations) (Lewontin, 1983).</Paragraph>
    </Section>
    <Section position="2" start_page="430" end_page="430" type="sub_section">
      <SectionTitle>
2.2 A population of grammars
</SectionTitle>
      <Paragraph position="0"> Learning, including language acquisition, can be characterized as a sequence of states in which the learner moves from one state to another. Transformational models of language acquisition identify the state of the learner as a single grammar/hypothesis.</Paragraph>
      <Paragraph position="1"> As noted in section 1, this makes difficult to explain the inconsistency in child language and the smoothness of language development.</Paragraph>
      <Paragraph position="2"> We propose that the learner be modeled as a population of &amp;quot;grammars&amp;quot;, the set of all principled language variations made available by the biological endowment of the human language faculty. Each grammar Gi is associated with a weight Pi, 0 &lt;_ Pi &lt;_ 1, and ~pi -~ 1. In a linguistic environment E, the  weight pi(E, t) is a function of E and the time variable t, the time since the onset of language acquisition. We say that Definition: Learning converges if Ve,0 &lt; e &lt; 1,VGi, \[ pi(E,t+ 1) -pi(E,t) \[&lt; e That is, learning converges when the composition  and distribution of the grammar population are stabilized. Particularly, in a monolingual environment ET in which a target grammar T is used, we say that learning converges to T if limt-.cv pT(ET, t) : 1.</Paragraph>
    </Section>
    <Section position="3" start_page="430" end_page="430" type="sub_section">
      <SectionTitle>
2.3 A Learning Algorithm
</SectionTitle>
      <Paragraph position="0"> Write E -~ s to indicate that a sentence s is an utterance in the linguistic environment E. Write s E G if a grammar G can analyze s, which, in a narrow sense, is parsability (Wexler and Culicover, 1980; Berwick, 1985). Suppose that there are altogether N grammars in the population. For simplicity, write Pi for pi(E, t) at time t, and p~ for pi(E, t+ 1) at time t + 1. Learning takes place as follows: The Algorithm: Given an input sentence s, the child with the probability Pi, selects a grammar Gi {,</Paragraph>
      <Paragraph position="2"> Comment: The algorithm is the Linear rewardpenalty (LR-p) scheme (Bush and Mostellar, 1958), one of the earliest and most extensively studied stochastic algorithms in the psychology of learning.</Paragraph>
      <Paragraph position="3"> It is real-time and on-line, and thus reflects the rather limited computational capacity of the child language learner, by avoiding sophisticated data processing and the need for a large memory to store previously seen examples. Many variants and generalizations of this scheme are studied in Atkinson et al. (1965), and their thorough mathematical treatments can be found in Narendra and Thathac!lar (1989).</Paragraph>
      <Paragraph position="4"> The algorithm operates in a selectionist manner: grammars that succeed in analyzing input sentences are rewarded, and those that fail are punished. In addition to the psychological evidence for such a scheme in animal and human learning, there is neurological evidence (Hubel and Wiesel, 1962; Changeux, 1983; Edelman, 1987; inter alia) that the development of neural substrate is guided by the exposure to specific stimulus in the environment in a Darwinian selectionist fashion.</Paragraph>
    </Section>
    <Section position="4" start_page="430" end_page="431" type="sub_section">
      <SectionTitle>
2.4 A Convergence Proof
</SectionTitle>
      <Paragraph position="0"> For simplicity but without loss of generality, assume that there are two grammars (N -- 2), the target grammar T1 and a pretender T2. The results presented here generalize to the N-grammar case; see Narendra and Thathachar (1989).</Paragraph>
      <Paragraph position="1"> Definition: The penalty probability of grammar Ti in a linguistic environment E is ca = Pr(s C/ T~ I E -~ s) In other words, ca represents the probability that the grammar T~ fails to analyze an incoming sentence s and gets punished as a result. Notice that the penalty probability, essentially a fitness measure of individual grammars, is an intrinsic property of a UG-defined grammar relative to a particular linguistic environment E, determined by the distributional patterns of linguistic expressions in E. It is not explicitly computed, as in (Clark, 1992) which uses the Genetic Algorithm (GA). 2 The main result is as follows: Theorem:</Paragraph>
      <Paragraph position="3"> a function of Pl (t) and taking expectations on both 2Claxk's model and the present one share an important feature: the outcome of acquisition is determined by the differential compatibilities of individual grammars. The choice of the GA introduces various psychological and linguistic assumptions that can not be justified; see Dresher (1999) and Yang (1999). Furthermore, no formal proof of convergence is given.</Paragraph>
      <Paragraph position="4">  converges to the target grammar T1, which has a penalty probability of 0, by definition, in a mono-lingual environment. Learning is robust. Suppose that there is a small amount of noise in the input, i.e. sentences such as speaker errors which are not compatible with the target grammar. Then cl &gt; 0.</Paragraph>
      <Paragraph position="5"> If el &lt;&lt; c2, convergence to T1 is still ensured by \[1\]. Consider a non-uniform linguistic environment in which the linguistic evidence does not unambiguously identify any single grammar; an example of this is a population in contact with two languages (grammars), say, T1 and T2. Since Cl &gt; 0 and c2 &gt; 0, \[1\] entails that pl and P2 reach a stable equilibrium at the end of language acquisition; that is, language learners are essentially bi-lingual speakers as a result of language contact. Kroch (1989) and his colleagues have argued convincingly that this is what happened in many cases of diachronic change. In Yang (1999), we have been able to extend the acquisition model to a population of learners, and formalize Kroch's idea of grammar competition over time.</Paragraph>
      <Paragraph position="6">  rectly measure the rate of change in the weight of the target grammar, and compare with developmental findings. Suppose T1 is the target grammar, hence cl = 0. The expected increase of Pl, APl is computed as follows:</Paragraph>
      <Paragraph position="8"> Since P2 = 1 - pl, APl \[3\] is obviously a quadratic function of pl(t). Hence, the growth of Pl will produce the familiar S-shape curve familiar in the psychology of learning. There is evidence for an S-shape pattern in child language development (Clahsen, 1986; Wijnen, 1999; inter alia), which, if true, suggests that a selectionist learning algorithm adopted here might indeed be what the child learner employs.</Paragraph>
    </Section>
    <Section position="5" start_page="431" end_page="431" type="sub_section">
      <SectionTitle>
2.5 Unambiguous Evidence is Unnecessary
</SectionTitle>
      <Paragraph position="0"> One way to ensure convergence is to assume the existence of unambiguous evidence (cf. Fodor, 1998): sentences that are only compatible with the target grammar but not with any other grammar. Unambiguous evidence is, however, not necessary for the proposed model to converge. It follows from the theorem \[1\] that even if no evidence can unambiguously identify the target grammar from its competitors, it is still possible to ensure convergence as long as all competing grammars fail on some proportion of input sentences; i.e. they all have positive penalty probabilities. Consider the acquisition of the target, a German V2 grammar, in a population of grammars  below: 1. German: SVO, OVS, XVSO 2. English: SVO, XSVO 3. Irish: VSO, XVSO 4. Hixkaryana: OVS, XOVS  We have used X to denote non-argument categories such as adverbs, adjuncts, etc., which can quite freely appear in sentence-initial positions. Note that none of the patterns in (1) could conclusively distinguish German from the other three grammars. Thus, no unambiguous evidence appears to exist. However, if SVO, OVS, and XVSO patterns appear in the input data at positive frequencies, the German grammar has a higher overall &amp;quot;fitness value&amp;quot; than other grammars by the virtue of being compatible with all input sentences. As a result, German will eventually eliminate competing grammars.</Paragraph>
    </Section>
    <Section position="6" start_page="431" end_page="431" type="sub_section">
      <SectionTitle>
2.6 Learning in a Parametric Space
</SectionTitle>
      <Paragraph position="0"> Suppose that natural language grammars vary in a parametric space, as cross-linguistic studies suggest. 3 We can then study the dynamical behaviors of grammar classes that are defined in these parametric dimensions. Following (Clark, 1992), we say that a sentence s expresses a parameter c~ if a grammar must have set c~ to some definite value in order to assign a well-formed representation to s. Convergence to the target value of c~ can be ensured by the existence of evidence (s) defined in the sense of parameter expression. The convergence to a single grammar can then be viewed as the intersection of parametric grammar classes, converging in parallel to the target values of their respective parameters.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="431" end_page="433" type="metho">
    <SectionTitle>
3 Some Developmental Predictions
</SectionTitle>
    <Paragraph position="0"> The present model makes two predictions that cannot be made in the standard transformational theories of acquisition: 1. As the target gradually rises to dominance, the child entertains a number of co-existing grammars. This will be reflected in distributional patterns of child language, under the null hypothesis that the grammatical knowledge (in our model, the population of grammars and their respective weights) used in production is that used in analyzing linguistic evidence. For grammatical phenomena that are acquired relatively late, child language consists of the output of more than one grammar.</Paragraph>
    <Paragraph position="1">  2. Other things being equal, the rate of development is determined by the penalty probabili null ties of competing grammars relative to the input data in the linguistic environment \[3\]. In this paper, we present longitudinal evidence concerning the prediction in (2). 4 To evaluate developmental predictions, we must estimate the the penalty probabilities of the competing grammars in a particular linguistic environment. Here we examine the developmental rate of French verb placement, an early acquisition (Pierce, 1992), that of English subject use, a late acquisition (Valian, 1991), that of Dutch V2 parameter, also a late acquisition (Haegeman, 1994).</Paragraph>
    <Paragraph position="2"> Using the idea of parameter expression (section 2.6), we estimate the frequency of sentences that unambiguously identify the target value of a parameter. For example, sentences that contain finite verbs preceding adverb or negation (&amp;quot;Jean voit souvent/pas Marie&amp;quot; ) are unambiguous indication for the \[+\] value of the verb raising parameter. A grammar with the \[-\] value for this parameter is incompatible with such sentences and if probabilistically selected for the learner for grammatical analysis, will be punished as a result. Based on the CHILDES corpus, we estimate that such sentences constitute 8% of all French adult utterances to children. This suggests that unambiguous evidence as 8% of all input data is sufficient for a very early acquisition: in this case, the target value of the verb-raising parameter is correctly set. We therefore have a direct explanation of Brown's (1973) observation that in the acquisition of fixed word order languages such as English, word order errors are &amp;quot;trifingly few&amp;quot;. For example, English children are never to seen to produce word order variations other than SVO, the target grammar, nor do they fail to front Wh-words in question formation. Virtually all English sentences display rigid word order, e.g. verb almost always (immediately) precedes object, which give a very high (perhaps close to 100%, far greater than 8%, which is sufficient for a very early acquisition as in the case of French verb raising) rate of unambiguous evidence, sufficient to drive out other word order grammars very early on.</Paragraph>
    <Paragraph position="3"> Consider then the acquisition of the subject parameter in English, which requires a sentential subject. Languages like Italian, Spanish, and Chinese, on the other hand, have the option of dropping the subject. Therefore, sentences with an overt subject are not necessarily useful in distinguishing English 4In Yang (1999), we show that a child learner, en route to her target grammar, entertains multiple grammars. For example, a significant portion of English child language shows characteristics of a topic-drop optional subject grammar like Chinese, before they learn that subject use in English is obligatory at around the 3rd birthday.</Paragraph>
    <Paragraph position="4"> from optional subject languages. 5 However, there exists a certain type of English sentence that is indicative (Hyams, 1986): There is a man in the room.</Paragraph>
    <Paragraph position="5"> Are there toys on the floor? The subject of these sentences is &amp;quot;there&amp;quot;, a non-referential lexical item that is present for purely structural reasons - to satisfy the requirement in English that the pre-verbal subject position must be filled. Optional subject languages do not have this requirement, and do not have expletive-subject sentences. Expletive sentences therefore express the \[+\] value of the subject parameter. Based on the CHILDES corpus, we estimate that expletive sentences constitute 1% of all English adult utterances to children.</Paragraph>
    <Paragraph position="6"> Note that before the learner eliminates optional subject grammars on the cumulative basis of expletive sentences, she has probabilistic access to multiple grammars. This is fundamentally different from stochastic grammar models, in which the learner has probabilistic access to generative ~ules. A stochastic grammar is not a developmentally adequate model of language acquisition. As discussed in section 1.1, more than 90% of English sentences contain a subject: a stochastic grammar model will overwhehningly bias toward the rule that generates a subject. English children, however, go through long period of subject drop. In the present model, child sub-ject drop is interpreted as the presence of the true optional subject grammar, in co-existence with the obligatory subject grammar.</Paragraph>
    <Paragraph position="7"> Lastly, we consider the setting of the Dutch V2 parameter. As noted in section 2.5, there appears to no unambiguous evidence for the \[+\] value of the V2 parameter: SVO, VSO, and OVS grammars, members of the \[-V2\] class, are each compatible with certain proportions of expressions produced.by the target V2 grammar. However, observe that despite of its compatibility with with some input patterns, an OVS grammar can not survive long in the population of competing grammars. This is because an OVS grammar has an extremely high penalty probability.</Paragraph>
    <Paragraph position="8"> Examination of CHILDES shows that OVS patterns consist of only 1.3% of all input sentences to children, whereas SVO patterns constitute about 65% of all utterances, and XVSO, about 34%. Therefore, only SVO and VSO grammar, members of the \[-V2\] class, are &amp;quot;contenders&amp;quot; alongside the (target) V2 grammar, by the virtue of being compatible with significant portions of input data. But notice that OVS patterns do penalize both SVO and VSO grammars, and are only compatible with the \[+V2\] gram5Notice that this presupposes the child's prior knowledge of and access to both obligatory and optional subject grammars. null  mars. Therefore, OVS patterns are effectively un-ambiguous evidence (among the contenders) for the V2 parameter, which eventually drive SVO and VSO grammars out of the population.</Paragraph>
    <Paragraph position="9"> In the selectioni-st model, the rarity of OVS sentences predicts that the acquisition of the V2 parameter in Dutch is a relatively late phenomenon.</Paragraph>
    <Paragraph position="10"> Furthermore, because the frequency (1.3%) of Dutch OVS sentences is comparable to the frequency (1%) of English expletive sentences, we expect that Dutch V2 grammar is successfully acquired roughly at the same time when English children have adult-level subject use (around age 3; Valian, 1991). Although I am not aware of any report on the timing of the correct setting of the Dutch V2 parameter, there is evidence in the acquisition of German, a similar language, that children are considered to have successfully acquired V2 by the 36-39th month (Clahsen, 1986). Under the model developed here, this is not an coincidence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML