File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0501_metho.xml
Size: 20,089 bytes
Last Modified: 2025-10-06 14:09:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0501"> <Title>The Input for Syntactic Acquisition: Solutions from Language Change Modeling</Title> <Section position="3" start_page="1" end_page="2" type="metho"> <SectionTitle> 2 The Acquisition Proposals </SectionTitle> <Paragraph position="0"> The first proposal is set in a Principles and Parameters framework (Chomsky 1981) where the adult grammar consists of a specific set of parameter values and the process of acquisition is figuring out what those values are. An unambiguous triger (Fodor 198, Dresher 199, Lightfoot 199) is a piece of data from the O-language that unambiguously signals one parameter value over another for a given parameter. Crucially, an unambiguous triger for ), no matter what other parameter values (A, B, C, ...) might also be affecting the O-language form of the data.</Paragraph> <Paragraph position="1"> Because an unambiguous triger corresponds to exactly one parameter P and thus can alter the value of P only, this proposal would allow children to bypass the Credit Problem noted by Dresher (199), which is the problem of deciding which parameters to update given a particular piece of input. In addition, unambiguous trigers allow the learner to bypass the combinatoric explosion problem that could occur when trying to set n parameters. Instead of having to test out 2 n different grammars on the input in the Olanguages, the child's language acquisition mechanism simply tests out the n parameters separately by loking for unambiguous trigers for these n parameters in the input from the Olanguage. Thus, this proposal aids the process of acquiring the adult grammar quickly and correctly. A potential pitfall of this proposal is data sparseness: the quantity of data that fits this very specific restriction might be very small for a parameter P and the child just might not see enough of it for it to have an effect .</Paragraph> <Paragraph position="2"> The second proposal is that children only heed data in degree-0 clauses (Lightfoot 191) when they first begin to set their syntactic parameter values. &quot;Degree&quot; refers to the level of embedding, so a degree-0 clause corresponds to a main clause The basis for this proposal is that while local grammatical relationships (such as those in degree-0 clauses) provide a lot of information to the learner, degree-0 data tends to be &quot;messier&quot; grammatically - that is, more grammatical processes seem to apply to degree-0 clauses than to degree-1 clauses. The messier status of this data allows the child to converge to a grammar that is not exactly the same as the adult grammar. Thus, this proposal focuses on how to allow small grammatical changes to occur in individuals so that larger changes can happen to the population over time. The cost of combining this proposal with the previous one is that the child is now restricted to learn only from degree-0 unambiguous trigers, thereby compounding the potential data sparseness problem that unambiguous trigers already have.</Paragraph> <Paragraph position="3"> In fact, it may wel be necesary to restrict the set of parameters relevant for determining if a triger is unambiguous to some initial pol in order to get any unambiguous trigers at al. A candidate set for the initial pol of parameters might be derived from a hierarchy of parameters along the lines of the one based on cros-linguistic comparison that is described in Baker (201, 205). The exact domain of a degree-0 clause is defined as the main clause and the front of the embeded clause for theory-internal reasons. For a more detailed description and explanation, se Lightfoot (191).</Paragraph> </Section> <Section position="4" start_page="2" end_page="3" type="metho"> <SectionTitle> 3 Old English Change </SectionTitle> <Paragraph position="0"> Alowing language change to occur as it historically did is a mark of &quot;correct&quot; acquisition, especially for change involving syntactic parameters that can only be altered during acquisition - any change that builds up in the population must be due to changes that occur during acquisition. The parameter we use in this work is OV/VO word order and the change is a shift in Old English from a strongly OV distribution between 100 and 150 A.D. to a strongly VO distribution at 120 A.D. A strongly OV distribution has many uterances with OV order (2). A strongly VO distribution as many uterances with VO order (3).</Paragraph> <Paragraph position="1"> (2) he Gode flancode he God thanked 'He thanked God' (Beowulf, 625) (3) fla ahof Paulus up his heafod then lifted Paul up his head 'Then Paul lifted his head up' (Blickling Homilies, 187.35) Because change can occur only during acquisition, the data children are heeding in their input during acquisition has a massive effect on the population's linguistic composition over time. In this work, we explore the posibility that the data children are heeding during acquisition are the degree-0 unambiguous trigers. For Old English, the unambiguous trigers have the form of (4a) and (5a). Examples of unambiguous trigers of each kind are in (4b-c) and (5b-c).</Paragraph> <Paragraph position="2"> The Object is adjacent to either a Verb or a Verb-Marker on the appropriate side - the correct O-language order. In addition to this correct &quot;surface order&quot; in the O-language, an unambiguous triger must also have an unambiguous derivation to produce this surface order. This means that no other combination of parameters with the alternate word order value could produce the observed surface order. For example, a Subject Verb Object uterance could be produced more than one way because of the Verb-Second (V2) movement parameter which was also available in Old English (as in modern Dutch and German). With V2 movement, the Verb moves from its &quot;underlying&quot; position to the second position in the sentence. Because of this, a Subject Verb Object uterance can be parsed with either word order (OV or VO) and so cannot unambiguously signal either order. Thus, correct surface order alone does not suffice only an uterance with the correct surface order and that cannot be generated with the competing word order value is an unambiguous triger .</Paragraph> <Paragraph position="3"> Because V2 movement (among other kinds of movement) can move the Verb away from the Object, Verb-Markers can be used to determine the original position of the Verb with respect to the Object. Verb-Markers include particles ('up'), non-finite complements to finite verbs ('shall...perform'), some closed-clas adverbials ('never'), and negatives ('not') as described in Lightfoot (191).</Paragraph> <Paragraph position="4"> The curious fact about Old English Verb-Markers (unlike their modern Dutch and German counterparts) is that they were unreliable - often they moved away from the Object as well, leaving nothing Verb-like adjacent to the Object. This turned uterances which potentially were unambiguous trigers for either OV or VO order into ambiguous uterances which could not help acquisition. We term this &quot;triger destruction,&quot; and it has the effect of making the distribution of OV and VO uterances that the child uses during We note that this could potentialy be very resource-intensive to determine since al other interfering parameter values (such as V2) must be taken into acount. Hence, there is ned for some restriction of what parameters must be initialy considered to determine if an uterance contains an unambiguous triger for a given parameter. acquisition (the distribution in the degree-0 unambiguous trigers) diferent from the distribution of the OV and VO uterances in the population. It is this difference that &quot;biases&quot; children away from the distribution in the population and it is this difference that wil cause small grammatical changes to accumulate in the population until the larger change emerges - the shift from being strongly OV to being strongly VO. Thus, the question of what data children heed during acquisition has found a very suitable testing ground in Old English.</Paragraph> </Section> <Section position="5" start_page="3" end_page="5" type="metho"> <SectionTitle> 4 The Model </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="3" end_page="5" type="sub_section"> <SectionTitle> 4.1 The Acquisition Model & Old English Data </SectionTitle> <Paragraph position="0"> The acquisition model in this work is founded on several ideas previously explored in the acquisition modeling and language change literature. First, grammars with oposing parameter values (such as OV and VO order) compete with each other both during acquisition (Clack & Robert 193) and within a population over time (Pintzuk 202, among others). Second, population-level change is the result of a build-up of individual-level &quot;misconvergences&quot; (Niyogi & Berwick, 197, 196, 195). Third, individual linguistic behavior can be represented as a probabilistic distribution of multiple grammars. This is the result of multiple grammars competing during acquisition and stil existing after acquisition.</Paragraph> <Paragraph position="1"> Multiple grammars in an individual are instantiated as that individual accesing g grammars with probability p g each (Yang 203).</Paragraph> <Paragraph position="2"> In our simulation, there are two grammars (g = 2) - one with the OV/VO order set to OV and one with the OV/VO order set to VO. In a stable = 1 of being accessed and all unambiguous trigers come from this grammar. In the unstable system for our . Both grammars leave unambiguous trigers in the input to the child. If the quantity of unambiguous trigers from each grammar is approximately equal, these quantities wil effectively cancel each other out (whatever quantity puls the child towards OV wil be counterbalanced by the quantity of trigers puling the child towards VO). Therefore, the crucial quantity is how many more unambiguous trigers one grammar has than the other, since this is the quantity that wil not be cancelled out. This is the advantage a grammar has over another in the input. Table 1 shows the advantage in the degree-0 (D0) clauses and degree-1 (D1) clauses that the points in time of Old English.</Paragraph> <Paragraph position="3"> The corpus data shows a 1.6% advantage for the OV grammar in the D0 clauses at 100 A.D. which means that only 16 out of every 100 sentences in the input are actually doing any work for acquisition (and more specifically, puling the child towards the OV grammar). The data also show that the D1 advantage is much stronger. However, this does not help our learners for two reasons: a) Based on samples of modern children's input (4K from CHILDES (MacWhiney & Snow 1985) and 4K from young children's stories (for details on this data, see Pearl (205)), D1 clauses only make up ~16% of modern English children's input. If we assume that the quantity of D1 input to children is approximately the same no matter what time period they live in , then our Old English children also heard D1 data in their input ~16% of the time.</Paragraph> <Paragraph position="4"> b) Our learners can only use D0 data, anyway. This leads to two questions for the restrictions imposed by the acquisition proposals - a question of sufficiency and a question of necessity. First, A negative advantage for OV advantage means the VO gramar has the advantage.</Paragraph> <Paragraph position="5"> At this point in time, we are unaware of any studies that sugest that the composition of motherese, for example, has altered significantly over time.</Paragraph> <Paragraph position="6"> we can simply ask if these restrictions on the data children heed are sufficient to allow the Old English population to shift from OV to VO at the appropriate time. Then, suposing that they are, we can ask if these restrictions are necessary to get the job done - that is, wil the population shift correctly even if these restrictions do not hold? We can relax both the restriction to learn only from unambiguous trigers and the restriction to learn only from degree-0 clause data - and then see if the population can stil shift to a strongly VO distribution on time.</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 4.2 The Acquisition Model: Implementation </SectionTitle> <Paragraph position="0"> The acquisition model itself is based around the idea of probabilistic access function of binary parameter values (Bock & Kroch 1989) in an individual. For example, if an individual has a function that accesses the VO order value 30% of the time, the uterances generated by that individual would be VO order 30% of the time and OV order 70% of the time. Note that this is the distribution before other parameters such as V2 movement alter the order, so the O-language distribution produced by this speaker is not 30-70.</Paragraph> <Paragraph position="1"> However, the O-language distribution wil stil have some unambiguous OV trigers and some unambiguous VO trigers, so a child hearing data from this speaker wil have to deal with the conflicting values. Thus, a child wil have a probabilistic access function to account for the OV/VO distribution- and acquisition is the process of setting what the VO access probability is, based on the data heard during the critical period.</Paragraph> <Paragraph position="2"> The VO access value ranges from 0.0 (all OV access) to 1.0 (all VO access). A value of 0.3, for example, would correspond to accessing VO order 30% of the time. A child begins with this value at 0.5, so there is a 50% chance of accessing either OV or VO order.</Paragraph> <Paragraph position="3"> Two mechanisms help summarize the data the child has seen so far without using up computing resources: the Noise Filter and a modified Batch Learner Method (Yang 203). The Noise Filter acts as a buffer that separates &quot;signal&quot; from &quot;noise&quot;. An unambiguous triger from the minority grammar is much more likely to be construed as &quot;noise&quot; than an unambiguous triger from the majority grammar. An example use is below with the VO access value set to 0.3 (closer to pure OV than pure VO):</Paragraph> </Section> </Section> <Section position="6" start_page="5" end_page="7" type="metho"> <SectionTitle> 6) Noise Filter Use </SectionTitle> <Paragraph position="0"> probabilistic value of VO access = 0.3 if next unambiguous triger = VO = &quot;noise&quot; with 70% chance and ignored = &quot;signal&quot; with 30% chance and heeded if next unambiguous triger = OV = &quot;noise&quot; with 30% chance and ignored = &quot;signal&quot; with 70% chance and heeded The initial value of VO access of 0.5, so there is no bias for either grammar when determining what is &quot;noise&quot; and what is &quot;signal&quot;. The modified Batch Learner method deals with how many unambiguous trigers it takes to alter the child's current VO access value. The more a grammar is in the majority, the smaller the &quot;batch&quot; of its trigers has to be to alter the VO access value (see Table 2). The current VO access value is used to decide whether a grammar is in the majority, and by how much.</Paragraph> <Paragraph position="1"> each grammar are required, based on what the current VO access value is for the child.</Paragraph> <Paragraph position="2"> Below is an example of the modified Batch Learner method with the VO access value set to 0.3: 7) modified Batch Learner method use probabilistic value of VO access = 0.3 if next unambiguous triger = VO if 4 th VO triger seen, alter value of VO access towards VO else if next unambiguous triger = OV if 2 nd OV triger seen, alter value of VO access towards OV The initial value of 0.5 means that neither grammar requires more trigers than the other at the begining to alter the current value.</Paragraph> <Paragraph position="3"> Both mechanisms rely on the probabilistic value of VO access to reflect the distribution of trigers seen so far. The logic is as folows: in order to get to a value below 0.5 (more towards OV), significantly more unambiguous OV trigers must have been seen; in order to get to a value above 0.5 (more towards VO), significantly more unambiguous VO trigers must have been seen. The individual acquisition algorithm used in the model is below: Initial value of VO access = 0.5 While in critical period Get a piece of input from the linguistic environment created by the rest of the population members.</Paragraph> <Paragraph position="4"> If input is an unambiguous triger If input passes through Noise Filter Increase relevant batch counter If counter is at threshold Alter current VO access value Note that the final VO access value after the critical period is over does not have to be 0.0 or 1.0 - it may be a value in between. It is suposed to reflect the distribution the child has heard, not necessarily be one of the extreme values.</Paragraph> <Section position="1" start_page="5" end_page="7" type="sub_section"> <SectionTitle> 4.3 Population Model: Implementation </SectionTitle> <Paragraph position="0"> Since individual acquisition drives the linguistic composition of the population, the population algorithm centers around the individual acquisition algorithm: Population age range = 0 to 60 Initial population size = 1800 Initialize members to starting VO access value At 100 A.D. and every 2 years until 120 A.D. Members age 59-60 die; the rest age 2 years New members age 0 to 1 created New members use individual acquisition algorithm to set their VO access value Based on estimates from Koenigsberger & Brigs (1987). Based on historical corpus data.</Paragraph> </Section> <Section position="2" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 4.4 Population Values from Historical Data </SectionTitle> <Paragraph position="0"> We use the historical corpus data to initialize the average VO access value in the population at 100 A.D., calibrate the model between 100 and 150 A.D., and determine how strongly VO the distribution has to be by 120 A.D. However, note that while the VO access value reflects the OV/VO distribution before interference from other parameters causes uterances to become ambiguous, the historical data reflects the distribution after this interference has caused uterances to become ambiguous. Table 3 shows how much of the data from the historical corpus is comprised of ambiguous uterances.</Paragraph> <Paragraph position="1"> comprised of ambiguous uterances at various points in time.</Paragraph> <Paragraph position="2"> We know that either OV or VO order was used to generate all these ambiguous uterances - so our job is to estimate how many of them were generated with the OV order and how many with the VO order. This determines the &quot;underlying&quot; distribution. Once we know this, we can determine what VO access value produced that underlying OV/VO distribution. Folowing the process detailed in Pearl (205), we rely on the fact that the D0 distribution is more distorted than the D1 distribution (since the D0 distribution always has more ambiguous trigers). The process itself involves using the difference in distortion between the D0 and D1 distribution to estimate the difference in distortion between the D1 and underlying distribution. Once this is done, we have average VO access values for initialization, calibration, and the target.</Paragraph> <Paragraph position="3"> population at various points in time, based off historical corpus data.</Paragraph> <Paragraph position="4"> Thus, to satisfy the historical facts, a population must start with an average VO access value of 0.23 at 100 A.D., reach an average VO access value of 0.31 between 100 and 150 A.D., and reach an average VO access value of 0.75 by 120 A.D.</Paragraph> </Section> </Section> class="xml-element"></Paper>