File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0711_metho.xml

Size: 13,913 bytes

Last Modified: 2025-10-06 14:07:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0711">
  <Title>Learnability: A Self-contained Tutorial for</Title>
  <Section position="4" start_page="61" end_page="61" type="metho">
    <SectionTitle>
2 The Structural Triggers Learner
</SectionTitle>
    <Paragraph position="0"> One recent model of human syntax acquisition, The Structural Triggers PSearner (STPS) (Fodor, 1998), employs the human parsing mechanism to determine if an input is parametrically ambiguous. Parameter values are viewed as bits of tree structure (treelets). When the learner's current grammar is insufficient to parse the current input sentence, the treelets may be utilized during the parsing process in the same way as any natural language grammar would be applied; no unusual parsing activity is necessary. The treelets are adopted as part of the learner's current grammar hypothesis when: 1) they are required for a successful parse of the current input sentence and 2) the sentence is unambiguous. The STL thus learns only from fully unambiguous sentences. 4 3Of course, the extent to which such unambiguous sentences exist in the domain of human languages is an empirical issue. This is an important open research question which is the focus of a recent research endeavor here at CUNY. Our approach involves tagging a large, cross-linguistic set of child-directed sentences, drawn from the CHILDES database, with each sentence's parametric signature. By cross-tabulating the shared parameter values against different languages, the study should shed some light as to the shape of ambiguity in input samples typically encountered by children.</Paragraph>
    <Paragraph position="1"> 4This is actually the strategy employed by just one of several different STL variants, some of which are designed to manage domains in which unambiguous sen- null new parameter values.</Paragraph>
    <Paragraph position="2"> See Figure 1.</Paragraph>
  </Section>
  <Section position="5" start_page="61" end_page="64" type="metho">
    <SectionTitle>
3 The Feasibility of the STL
</SectionTitle>
    <Paragraph position="0"> The number of input sentences consumed by the STL before convergence on the target grammar can be derived from a relatively straightforward Markov analysis. Importantly, the formulation most useful to analyze performance does not require states which represent the grammars of the parameter space (contra Niyogi and Berwick (1996)). Instead, each state of the system depicts the number of parameters that have been set, t, and the state transitions represent the probability that the STL will adopt some number of new parameter values, w, on the basis of the current state and whatever usable parametric information is revealed by the current input sentence. See Figure 2.</Paragraph>
    <Paragraph position="1"> The following factors (described in detail below) determine the transition probabilities:</Paragraph>
    <Paragraph position="3"> * the effective expression rate (e I) Not all parameters are relevant parameters.</Paragraph>
    <Paragraph position="4"> Irrelevant parameters control properties of phenomena not present in the target language, such as clitic order in a language without clitics. For tences are rare or nonexistent.</Paragraph>
    <Paragraph position="5">  forming in a parameter space of three parameters. Nodes represent the current number of parameters that have been correctly set. Arcs indicate a change in the number that are correctly set. In this diagram, after each input is consumed, 0, 1 or 2 new parameters may be set. Once the learner enters state 3, it has converged on the target.</Paragraph>
    <Paragraph position="6"> our purposes, the number of relevant parameters, r, is the total number of parameters that need to be set in order to license all and only the sentences of the target language.</Paragraph>
    <Paragraph position="7"> Of the parameters relevant to the target language as a whole, only some will be relevant to any given sentence. A sentence expresses those parameters for which a specific value is required in order to build a parse tree, i.e. those parameters which are essential to the sentence's structural description. For instance, if a sentence does not have a relative clause, it will not express parameters that concern only relative clauses; if it is a declarative sentence, it won't express the properties peculiar to questions; and so on. The expression rate, e, for a language, is the average number of parameters expressed by its input sentences. Suppose that each sentence, on average, is ambiguous with respect to a of the parameters it expresses. The effective expression rate, e ~, is the mean proportion of expressed parameters that are expressed unambiguously (i.e. e' = (e - a)/e). It will also be useful to consider a' = (1 - e~).</Paragraph>
    <Section position="1" start_page="62" end_page="63" type="sub_section">
      <SectionTitle>
3.1 Derivation of a Transition
Probability Function
</SectionTitle>
      <Paragraph position="0"> To present the derivation of the probability that the system will change from an arbitrary state St to state St+w, (0 &lt; w &lt; e) it is useful to set ambiguity aside for a moment. In order to set all r parameters, the STL has to encounter enough batches of e parameter values, possibly overlapping with each other, to make up the full set of r parameter values that have to be established.</Paragraph>
      <Paragraph position="1"> Let H(wlt, r,e) be the probability that an arbitrary input sentence expresses w new (i.e. as yet unset) parameters, out of the e parameters expressed, given that the learner has already set t parameters (correctly), for a domain in which there are r total parameters that need to be set.</Paragraph>
      <Paragraph position="2"> This is a specification of the hypergeometric distribution and is given in Equation 1.</Paragraph>
      <Paragraph position="4"> Now, to deal with ambiguity, the effective rate of expression, e t, is brought into play. Recall that e ~ is the proportion of expressed parameters that are expressed unambiguously. It follows that the probability that any single parameter is expressed unambiguously is also e ~ and the probability that all of the expressed, but as yet unset parameters are expressed unambiguously is e ~w. That is, the probability that an input is effectively unambiguous and hence usable for learning is equal to e ~w.</Paragraph>
      <Paragraph position="6"> (2) Equation (2) can be used to calculate the probability of any possible transition of the Markov system that models STL performance. One method to determine the number of sentences expected to be consumed by the STL is to sum the number of sentences consumed in each state. Let E(Si) represent the expected number of sentences that will be consumed in state Si. E is given by the following recurrence relation: 5</Paragraph>
      <Paragraph position="8"> The expected total is simply:</Paragraph>
      <Paragraph position="10"> which is equal to the expected number to be consumed before any parameters have been set  by the waiting-STL before convergence. Fixed rate of expression.</Paragraph>
      <Paragraph position="11"> (= E(So)) plus the number expected to be consumed after the first successful learning event (at which point the learner will be in state Se) summed with the number of sentences expected to be consumed in every other state up to the state just before the target is attained (St-l). Etot can be tractably calculated using dynamic programming.</Paragraph>
    </Section>
    <Section position="2" start_page="63" end_page="63" type="sub_section">
      <SectionTitle>
3.2 Some Results
</SectionTitle>
      <Paragraph position="0"> Table 1 presents numerical results derived by fixing different values of r, e, and e ~. In order to make assessments of performance across different situations in terms of increasing rates of ambiguity, a percentage measure of ambiguity, a ~, is employed which is directly derived from er: a ~ = 1 - e ~, and is presented in Table 1 as a percent (the proportion is multiplied by 100).</Paragraph>
      <Paragraph position="1"> Notice that the number of parameters to be set (r) has relatively little effect on convergence time. What dominates learning speed is ambiguity and expression rates. When a ~ and e are both high, the STL is consuming an unreasonable number of input sentences. However, the problem is not intrinsic to the STL model of acquisition. Rather, the problem is due to a too rigid restriction present in the current formulation of the input sample. By relaxing the restriction, the expected performance of the STL improves dramatically. But first, it is informative to discuss why the framework, as presented so far, leads to the prediction that the STL will consume an extremely large number of sentences at rates of ambiguity and expression approaching natural language.</Paragraph>
      <Paragraph position="2"> By far the greatest amount of damage inflicted by ambiguity occurs at the very earliest stages of learning. This is because before any learning takes place, the STL must wait for the occurrence of a sentence that is fully unambiguous. Such sentences are bound to be extremely rare if the expression rate and the degree of ambiguity is high. For instance, a sentence with 20 out of 20 parameters unambiguous will virtually never occur if parameters are ambiguous on average 99% of the time (the probability would be (1/100)2deg).</Paragraph>
      <Paragraph position="3"> After learning gets underway, STL performance improves tremendously; the generally damaging effect of ambiguity is mitigated. Every successful learning event decreases the number of parameters still to be set. Hence, the expression rate of unset parameters decreases as learning proceeds. And to be usable by the STL, the only parameters that need to be expressed unambiguously are those that have not yet been set. For example, if 19 parameters have already been set and e = r = 20 as in the example above, the probability of encountering a usable sentence in the case that parameters are ambiguous on average 99% of the time and the input sample consists of sentences expressing 20 parameters, is only (1/100) 1 = 1/100. This can be derived by plugging into Equation (2): w = 1, t = 19, e ---- 20, and r = 20 which is equal to: H(1119, 20, 20)(1/100) 1 = (1)(1/100).</Paragraph>
      <Paragraph position="4"> Clearly~ the probability of seeing usable inputs increases rapidly as the number of parameters that are set increases. All that is needed, therefore, is to get parameter setting started, so that the learner can be quickly be pulled down into more comfortable regions of parametric expression. Once parameter setting is underway, the STL is extremely efficient.</Paragraph>
    </Section>
    <Section position="3" start_page="63" end_page="64" type="sub_section">
      <SectionTitle>
3.3 Distributed Expression Rate
</SectionTitle>
      <Paragraph position="0"> So far e has been conveniently taken to be fixed across all sentences of the target language. In which case, when e = 10, the learner will have to wait for a sentence with exactly 10 unambiguously expressed parameters in order to get started on learning, and as discussed above, it can be expected that this will be a very long wait. However, if one takes the value of e to be uniformly distributed (rather than fixed) then the learner will encounter some sentences which  express fewer than 10 parameters, and which are correspondingly more likely to be fully un-ambiguous and hence usable for learning.</Paragraph>
      <Paragraph position="1"> In fact, any distribution of e can be incorporated into the framework presented so far. Let Di(x) denote the probability distribution of expression of the input sample. That is, the probability that an arbitrarily chosen sentence from the input sample I expresses x parameters. For example, if Di imposes a uniform distribution, then DI(x) = 1/emax where every sentence expresses at least 1 parameter and emax is the maximum number of parameters expressed by any sentence. Given Di, a new transition probability P'(St '+ St+w) = P'(wlt, r, emax, e') can be formulated as: C/maz P'(w\[t,r,e .... et)-- - ~ Di(i)P(wlt,r,i,e' ) (5) where P is defined in (2) above and emaz represents the maximum number of parameters that a sentence may express instead of a fixed number for all sentences.</Paragraph>
      <Paragraph position="2"> To see why Equation (5) is valid, consider that to set w new parameters at least w must be expressed in the current input sentence. Also usable, are sentences that express more parameters (w + I, w + 2, w + 3,..., emax). Thus, the probability of setting w new parameters is simply the sum of the probabilities that a sentence expressing a number of parameters, i, from w to emax, is encountered by the STL (= Di(i)), times the probability that the STL can set w additional parameters given that i are expressed. By replacing P with P' in Equation 3 and modifying the derivation of the base case, 6 the total expected number of sentences that will be consumed by the STL given a distribution of expression Di(x) can be calculated.</Paragraph>
      <Paragraph position="3"> Table 2 presents numerical results derived by fixing r and a' and allowing e to vary uniformly from 0 to emax. As in Table 1, a percentage measure of ambiguity, a', is employed.</Paragraph>
      <Paragraph position="4"> The results displayed in the table indicate a striking decrease in the number of sentences that the that the STL can be expected to consume compared to those obtained with a fixed expression rate in place. As a rough comparison, with the ambiguity rate (a') at 80%: when</Paragraph>
      <Paragraph position="6"> by the STL before convergence. Uniformly distributed rate of expression.</Paragraph>
      <Paragraph position="7"> e varies uniformly from 0 to 10, the STL requires 430 sentences (from Table 2); when e is fixed at 5, the number of sentences required is 3,466 (from Table 1).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML