File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1054_metho.xml

Size: 20,596 bytes

Last Modified: 2025-10-06 14:14:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1054">
  <Title>Co-evolution of Language and of the Language Acquisition Device</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Theoretical Background
</SectionTitle>
    <Paragraph position="0"> Grmnnmtical acquisition proceeds on the basis of a partial genotypic specifica.tion of (universal) grmnmar (UG) complemented with a learning procedure elmbling the child to complete this specification appropriately. The parameter setting frainework of Chomsky (1981) claims that learning involves fixing the wdues of a finite set of finite-valued parameters to select a single fully-specified grammar from within the space defined by the genotypic specification of UG. Formal accounts of parameter setting have been developed for small fragments but even in these search spaces contain local maxima and subset-superset relations which may cause a learner to converge to an incorrect grammar (Clark, 1992; Gibson and Wexler, 1994; Niyogi and Berwick, 1995). The solution to these problems involves defining d(,fault, umnarked initial values for (some) parameters and/or ordering the setting of paraineters during learning.</Paragraph>
    <Paragraph position="1"> Bickerton (1984) argues for the Bioprograin Hypothesis a.s an explanation for universal similarities between historically unrelated creoles, and for the rapid increase in gramlnatical complexity accompanying the transition from pidgin to creole languages. Prom the perspective of the parameters framework, the Bioprogram Hypothesis claims that children are endowed genetically with a UG which, by default, specifies the stereotypical core creole grammar, with right-branching syntax and subject-verb-object order, as in Saramaccan. Others working within the parameters framework have proposed unmarked, default parameters (e.g. Lightfoot, 1991), but the Bioprogram Hypothesis can be interpreted as towards one end of a continuum of proposals ranging from all parameters initially unset to all set to default values.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="422" type="metho">
    <SectionTitle>
2 The Language Acquisition Device
</SectionTitle>
    <Paragraph position="0"> A model of the Language Acquisition Device (LAD) incorporates a UG with associated parameters, a parser, and an algorithm for updating initial parameter settings on parse failure during learning.</Paragraph>
    <Section position="1" start_page="0" end_page="420" type="sub_section">
      <SectionTitle>
2.1 The Grammar (set)
</SectionTitle>
      <Paragraph position="0"> Basic categorial grammar (CG) uses one rule of application which combines a functor category (containing a slash) with an argument category to form a derived category (with one less slashed argument category). Grammatical constraints of order and agreement are captured by only allowing directed application to adjacent matching categories. Generalized Categorial Grammar (GCG) extends CG with further rule schemata) The rules of FA, BA, generalized weak permutation (P) and backward and forward colnposition (I?C, BC) are given in Figure 1 (where X, Y and Z are category variables, \[ is a vm'iable over slash and backslash, and ...</Paragraph>
      <Paragraph position="1"> denotes zero or more further flmctor arguments).</Paragraph>
      <Paragraph position="2"> Once pernmtation is included, several semantically l\Y=ood (1993) is a general introduction to Categorial Grammar mid extensions to the basic theory. The most closely related theories to that presented here are those of Steedman (e.g. 1988) and Hoffman (1995).</Paragraph>
      <Paragraph position="3">  equivalent derivations for Kim loves Sandy become available, Figure 2 shows the non-conventional left-branching one. Composition also allows alternative non-conventional semantically equivalent (leftbranching) derivations.</Paragraph>
      <Paragraph position="4"> GCG as presented is inadequate as an account of UG or of any individual grammar. In particular, the definition of atomic categories needs extending to deal with featural variation (e.g. Bouma and van Noord, 1994), and the rule schemata, especially composition and weak permutation, must be restricted in various parametric ways so that overgeneration is prevented for specific languages. Nevertheless, GCG does represent a plausible kernel of UG; Hoffman (1995, 1996) explores the descriptive power of a very similar system, in which generalized weak permutation is not required because functor arguments are interpreted as multisets. She demonstrates that this system can handle (long-distance) scrambling elegantly and generates mildly context-sensitive languages (Joshi et al, 1991).</Paragraph>
      <Paragraph position="5"> The relationship between GCG as a theory of UG (GCUG) and as a the specification of a particular grammar is captured by embedding the theory in a default inheritance hierarchy. This is represented as a lattice of typed default feature structures (TDFSs) representing subsumption and default inheritance relationships (Lascarides et al, 1996; Lascarides and Copestake, 1996). The lattice defines intensionally the set of possible categories and rule schemata via type declarations on nodes. For example, an intransitive verb might be treated as a subtype of verb, inheriting subject directionality by default from a type gendir (for general direction).</Paragraph>
      <Paragraph position="6"> For English, gendir is default right but the node of the (intransitive) functor category, where the directionality of subject arguments is specified, overrides this to left, reflecting the fact that English is predominantly right-branching, though subjects appear to the left of the verb. A transitive verb would inherit structure from the type for intransitive verbs and an extra NP argument with default directionality specified by gendir, and so forth. 2 For the purposes of the evolutionary simulation described in SS3, GC(U)Gs are represented as a sequence of p-settings (where p denotes principles or parameters) based on a flat (ternary) sequential encoding of such default inheritance lattices. The in2Bouma and van Noord (1994) and others demonstrate that CGs can be embedded in a constraint-based representation. Briscoe (1997a,b) gives further details of the encoding of GCG in TDFSs.</Paragraph>
      <Paragraph position="7">  heritance hierarchy provides a partial ordering on parameters, which is exploited in the learning procedure. For example, the atomic categories, N, NP and S are each represented by a parameter encoding the presence/absence or lack of specification (T/F/?) of the category in the (U)G. Since they will be unordered in the lattice their ordering in the sequential coding is arbitrary. However, the ordering of the directional types gendir and subjdir (with values L/R) is significant as the latter is a more specific type. The distinctions between absolute, default or unset specifications also form part of the encoding (A/D/?). Figure 3 shows several equivalent and equally correct sequential encodings of the fragment of the English type system outlined above.</Paragraph>
      <Paragraph position="8"> A set of grammars based on typological distinctions defined by basic constituent order (e.g. Greenberg, 1966; Hawkins, 1994) was constructed as a (partial) GCUG with independently varying binary-valued parameters. The eight basic language families are defined in terms of the unmarked order of verb (V), subject (S) and objects (0) in clauses.</Paragraph>
      <Paragraph position="9"> Languages within families further specify the order of modifiers and specifiers in phrases, the order of adpositions and further phrasal-level ordering parameters. Figure 4 list the language-specific ordering parameters used to define the full set of grammars in (partial) order of generality, and gives examples of settings based on familiar languages such as &amp;quot;English&amp;quot;, &amp;quot;German&amp;quot; and &amp;quot;Japanese&amp;quot;. 3 &amp;quot;English&amp;quot; defines an SVO language, with prepositions in which specifiers, complementizers and some modifiers precede heads of phrases. There are other grammars in the SVO family in which all modifers follow heads, there are postpositions, and so forth. Not all combinations of parameter settings correspond to attested languages and one entire language family (OVS) is unattested. &amp;quot;Japanese&amp;quot; is an SOV language with 3Throughout double quotes around language names are used as convenient mnemonics for familiar combinations of parameters. Since not all aspects of these actual languages are represented in the grammars, conclusions about actual languages must be made with care.</Paragraph>
      <Paragraph position="10"> postpositions in which specifiers and modifiers follow heads. There are other languages in the SOV family with less consistent left-branching syntax in which specifiers and/or modifiers precede phrasal heads, some of which are attested. &amp;quot;German&amp;quot; is a more complex SOV language in which the parameter verb-second (v2) ensures that the surface order in main clauses is usually SVO. 4 There are 20 p-settings which determine the rule schemata available, the atomic category set, and so forth. In all, this CGUG defines just under 300 grammars. Not all of the resulting languages are (stringset) distinct and some are proper subsets of other languages. &amp;quot;English&amp;quot; without the rule of permutation results in a stringset-identical language, but the grammar assigns different derivations to some strings, though the associated logical forms are identical. &amp;quot;English&amp;quot; without composition results in a subset language. Some combinations of p-settings result in 'impossible' grammars (or UGs). Others yield equivalent grammars, for example, different combinations of default settings (for types and their subtypes) can define an identical category set.</Paragraph>
      <Paragraph position="11"> The grammars defined generate (usually infinite) stringsets of lexical syntactic categories. These strings are sentence types since each is equivalent to a finite set of grammatical sentences formed by selecting a lexical instance of each lexicai category.</Paragraph>
      <Paragraph position="12"> Languages are represented as a finite subset of sentence types generated by the associated grammar.</Paragraph>
      <Paragraph position="13"> These represent a sample of degree-1 learning triggers for the language (e.g. Lightfoot, 1991). Subset languages are represented by 3-9 sentence types and 'full' languages by 12 sentence types. The constructions exemplified by each sentence type and their length are equivalent across all the languages defined by the grammar set, but the sequences of lexical categories can differ. For example, two SOV language renditions of The man who Bill likes gave Fred a 4Representation of the vl/v2 parameter(s) in terms of a type constraint determining allowable functor categories is discussed in more detail in Briscoe (1997b).  present, one with premodifying and the other post-modifying relative clauses, both with a relative pronoun at the right boundary of the relative clause, are shown below with the differing category highlighted.</Paragraph>
      <Paragraph position="14"> Bill likes who the-man a-present Fred gave</Paragraph>
    </Section>
    <Section position="2" start_page="420" end_page="421" type="sub_section">
      <SectionTitle>
2.2 The Parser
</SectionTitle>
      <Paragraph position="0"> The parser is a deterministic, bounded-context stack-based shift-reduce algorithm. The parser operates on two data structures, an input buffer or queue, and a stack or push down store. The algorithm for the parser working with a GCG which includes application, composition and permutation is given in Figure 5. This algorithm finds the most left-branching derivation for a sentence type because Reduce is ordered before Shift. The category sequences representing the sentence types in the data for the entire language set are designed to be unambiguous relative to thi s 'greedy', deterministic algorithm, so it will always assign the appropriate logical form to each sentence type. However, there are frequently alternative less left-branching derivations of the same logical form.</Paragraph>
      <Paragraph position="1"> The parser is augmented with an algorithm which computes working memory load during an analysis (e.g. Baddeley, 1992). Limitations of working memory are modelled in the parser by associating a cost with each stack cell occupied during each step of a derivation, and recency and depth of processing effects are modelled by resetting this cost each time a reduction occurs: the working memory load (WML) algorithm is given in Figure 6. Figure 7 gives the right-branching derivation for Kim loves Sandy, found by the parser utilising a grammar without permutation. The WML at each step is shown for this derivation. The overall WML (16) is higher than for the left-branching derivation (9).</Paragraph>
      <Paragraph position="2"> The WML algorithm ranks sentence types, and  1. The Reduce Step: if the top 2 cells of the stack are occupied, then try a) Application, if match, then apply and goto 1), else b), b) Combination, if match then apply and goto 1), else c), c) Permutation, if match then apply and goto 1), else goto 2) 2. The Shift Step: if the first cell of the Input Buffer is occupied, then pop it and move it onto the Stack together with its associated lexical syntactic category and goto 1), else goto 3) 3. The Halt Step: if only the top cell of the Stack is occupied by a constituent of category S, then return Success, else return Fail  The Match and Apply operation: if a binary rule schema matches the categories of the top 2 cells of the Stack, then they are popped from the Stack and the new category formed by applying the rule schema is pushed onto the Stack.</Paragraph>
      <Paragraph position="3"> The Permutation operation: each time step lc) is visited during the Reduce step, permutation is applied to one of the categories in the top 2 cells of the Stack until all possible permutations of the 2 categories have been tried using the binary rules. The number of possible permutation operations is finite and bounded by the maximum number of arguments of any functor category in the grammar.</Paragraph>
      <Paragraph position="4">  1. Assign any new Stack entry in the top cell (introduced by Shift or Reduce) a WML value of 0 2. Increment every Stack cell's WML value by 1 3. Push the sum of the WML values of each Stack  cell onto the WML-record When the parser halts, return the sum of the WML-record gives the total WML for a derivation  thus indirectly languages, by parsing each sentence type from the exemplifying data with the associated grammar and then taking the mean of the WML obtained for these sentence types. &amp;quot;English&amp;quot; with Permutation has a lower mean WML than &amp;quot;English&amp;quot; without Permutation, though they are stringset-identical, whilst a hypothetical mixture of &amp;quot;Japanese&amp;quot; SOV clausal order with &amp;quot;English&amp;quot; phrasal syntax has a mean WML which is 25% worse than that for &amp;quot;English&amp;quot;. The WML algorithm is in accord with existing (psycholinguisticallymotivated) theories of parsing complexity (e.g. Gibson, 1991; Hawkins, 1994; Rambow and Joshi, 1994).</Paragraph>
    </Section>
    <Section position="3" start_page="421" end_page="422" type="sub_section">
      <SectionTitle>
2.3 The Parameter Setting Algorithm
</SectionTitle>
      <Paragraph position="0"> The parameter setting algorithm is an extension of Gibson and Wexler's (1994) Trigger Learning Algorithm (TLA) to take account of the inheritance-based partial ordering and the role of memory in learning. The TLA is error-driven - parameter settings are altered in constrained ways when a learner cannot parse trigger input. Trigger input is defined as primary linguistic data which, because of its structure or context of use, is determinately unparsable with the correct interpretation (e.g. Lightfoot, 1991). In this model, the issue of ambiguity and triggers does not arise because all sentence types are treated as triggers represented by p-setting schemata. The TLA is memoryless in the sense that a history of parameter (re)settings is not maintained, in principle, allowing the learner to revisit previous hypotheses. This is what allows Niyogi and Berwick (1995) to formalize parameter setting as a Markov process. However, as Brent (1996) argues, the psychological plausibility of this algorithm is doubtful - there is no evidence that children (randomly) move between neighbouring grammars along paths that revisit previous hypotheses. Therefore, each parameter can only be reset once during the learning process. Each step for a learner can be defined in terms of three functions: P-SETTING, GRAMMAR and PARSER, as: PARSERi(GRAMMAR/(P-SETTING/(Sentence j))) A p-setting defines a grammar which in turn defines a parser (where the subscripts indicate theoutput of each function given the previous trigger). A parameter is updated on parse failure and, if this results in a parse, the new setting is retained. The algorithm is summarized in Figure 8. Working memory grows through childhood (e.g. Baddeley, 1992), and this may assist learning by ensuring that trigger sentences gradually increase in complexity through the acquisition period (e.g. Elman, 1993) by forcing the learner to ignore more complex potential triggers that occur early in the learning process. The WML of a sentence type can be used to determine whether it can function as a trigger at a particular stage in learning.</Paragraph>
      <Paragraph position="1">  Reset the first (most general) default or unset parameter in a left-to-right search of the p-set according to the following table: Input: D 1 D0 ? ? \] Output: R 0 R 1 ? 1/0 (random) I (where 1</Paragraph>
      <Paragraph position="3"/>
    </Section>
  </Section>
  <Section position="5" start_page="422" end_page="423" type="metho">
    <SectionTitle>
3 The Simulation Model
</SectionTitle>
    <Paragraph position="0"> The computational simulation supports the evolution of a population of Language Agents (LAgts), similar to Holland's (1993) Echo agents. LAgts generate and parse sentences compatible with their current p-setting. They participate in linguistic interactions which are successful if their p-settings are compatible. The relative fitness of a LAgt is a function of the proportion of its linguistic interactions which have been successful, the expressivity of the language(s) spoken, and, optionally, of the mean WML for parsing during a cycle of interactions. An interaction cycle consists of a prespecified number of individual random interactions between LAgts, with generating and parsing agents also selected randomly. LAgts which have a history of mutually successful interaction and high fitness can 'reproduce'.</Paragraph>
    <Paragraph position="1"> A LAgt can 'live' for up to ten interaction cycles, but may 'die' earlier if its fitness is relatively low. It is possible for a population to become extinct (for example, if all the initial LAgts go through ten interaction cycles without any successful interaction occurring), and successful populations tend to grow at a modest rate (to ensure a reasonable proportion of adult speakers is always present). LAgts learn during a critical period from ages 1-3 and reproduce from 4-10, parsing and/or generating any language learnt throughout their life.</Paragraph>
    <Paragraph position="2"> During learning a LAgt can reset genuine param- null 1. Generate cost: 1 (GC) 2. Parse cost: ! (PC) 3. Generate subset language cost: 1 (GSC) 4. Parse failure cost: 1 (PF) 5. Parse memory cost: WML(st) 6. Interaction success benefit: 1 (SI) 7. Fitness(WML): SI GC * GC+PC X GC+GSC X 8. Fitness(-~WML): sI cc GC+PC X CC.-\[-GSC  allows the deterministic recovery of the initial setting. Fitness-based reproduction ensures that successful and somewhat compatible p-settings are preserved in the population and randomly sampled in the search for better versions of universal grammar, including better initial settings of genuine parameters. Thus, although the learning algorithm per se is fixed, a range of alternative learning procedures can be explored based on the definition of the inital set of parameters and their initial settings. Figure 9 summarizes crucial options in the simulation giving the values used in the experiments reported in SS4 and Figure 10 shows the fitness functions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML