File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3203_metho.xml

Size: 13,479 bytes

Last Modified: 2025-10-06 14:11:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3203">
  <Title>Learning Quantity Insensitive Stress Systems via Local Inference</Title>
  <Section position="3" start_page="24" end_page="25" type="metho">
    <SectionTitle>
3 The Neighborhood Learner
</SectionTitle>
    <Paragraph position="0"> In this section, I present the basic unsupervised batch learner, called the Neighborhood Learner, which learns 29 of the 33 patterns. In the next section, I introduce one modification to this learner which results in perfect accuracy.</Paragraph>
    <Paragraph position="1"> The basic version of the learner operates in two stages: prefix tree construction and state-merging, cf. Angluin (1982). These two stages find smaller descriptions of the observed data; in particular state-merging may lead to generalization (see below).</Paragraph>
    <Paragraph position="2"> A prefix tree is constructed as follows. Set the ini- null sidered in order. If [?]t [?] Q, (c,a,t) [?] d then set c = t. Otherwise, add a new state n to Q and a new arc (c,a,n) to d. A new arc is therefore created on every symbol in the first word. The last state for a word is added to F. The process is repeated for each word. The prefix tree for Pintupi words from Table  The second stage of the learner is state-merging, a process which reduces the number of states in the machine. A key concept in state merging is that when two states are merged into a single state, their transitions are preserved. Specifically, if states p and q merge, then a merged state pq is added to the machine, and p and q are removed. For every arc that left p (or q) to a state r, there is now an arc from pq going to r. Likewise, for every arc from a state r to p (or q), there is now an arc from r to pq.</Paragraph>
    <Paragraph position="3"> The post-merged machine accepts every word that the pre-merged machine accepts, and possibly more.</Paragraph>
    <Paragraph position="4"> For example, if there is a path between two states which become merged, a loop is formed.</Paragraph>
    <Paragraph position="5">  What remains to be explained is the criteria the learner uses to determine whether two states in the prefix tree merge. The Neighborhood Learner merges two states iff they have the same neighborhood, guaranteeing that the resulting grammar is neighborhood-distinct.</Paragraph>
    <Paragraph position="6"> The intuition is that the prefix tree provides a structured representation of the input and has recorded information about different environments, which are represented in the tree as states. Learning is a process which identifies actually different environments as 'the same'-- here states are 'the same' iff their local features, i.e their neighborhoods, are the same. For example, suppose states p and q in the prefix tree are both final or both nonfinal, and they share the same incoming symbol set and outgoing symbol set. In the learner's eyes they are then 'the same', and will be merged.</Paragraph>
    <Paragraph position="7"> The merging criteria partitions the states of the Pintupi prefix tree into five groups. States 3,5 and 7 are merged; states 2,4,6 are merged, and states 8,9,10,12 are merged. Merging of states halts when no two nodes have the same neighborhood- thus, the resulting machine is neighborhood-distinct. The result for Pintupi is shown in Figure 6.</Paragraph>
    <Paragraph position="8">  The machine in Figure 6 is equivalent to the one in Figure 1- they accept exactly the same language.</Paragraph>
    <Paragraph position="9">  I.e. neighborhood merging of the prefix tree in Figure 5 generalizes from the data exactly as desired.</Paragraph>
    <Section position="1" start_page="25" end_page="25" type="sub_section">
      <SectionTitle>
3.1 Results of Neighborhood Learning
</SectionTitle>
      <Paragraph position="0"> The Neighborhood Learner successfully learns 29 of the 33 language types (see appendix). These are exactly the 29 canonically neighborhood-distinct languages. This suggests the following claim, which has not been proven.</Paragraph>
      <Paragraph position="1">  This can be verified by checking to see if the minimized versions of the two machines are isomorphic.</Paragraph>
      <Paragraph position="2">  The proof is made difficult by the fact that the acceptor returned by the Neighborhood Learner is not necessarily the  (5) Conjecture: The Neighborhood Learner identifies the class of canonically neighborhood-distinct languages.</Paragraph>
      <Paragraph position="3">  In SS4, I discuss why the learner fails where it does, and introduce a modification which results in perfect accuracy.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="25" end_page="26" type="metho">
    <SectionTitle>
4 Reversing the PrefixTree
</SectionTitle>
    <Paragraph position="0"> This section examines the four cases where neighborhood learning failed and modifies the learning algorithm, resulting in perfect accuracy. The goal is to restrict generalization because in every case where learning failed, the learner overgeneralized by merging more states than it should have. Thus, the resulting grammars recognize multiple words with n syllables.</Paragraph>
    <Paragraph position="1"> The dual stress pattern of Lower Sorbian places stress initially and, in words of four or more syllables, on the penult (see #9 Table 1). The prefix tree built from these words is shown in Figure 7.</Paragraph>
    <Paragraph position="2">  Here the Neighborhood Learner fails because it merges states 2 and 3. The resulting grammar incorrectly accepts words of the form 20 [?] .</Paragraph>
    <Paragraph position="3"> The proposed solution follows from the observation that if the prefix tree were constructed in reverse (reading each word from right to left) then the corresponding states in this structure would not have the same neighborhoods, and thus not be merged. A reverse prefix tree is constructed like a forward prefix tree, the only difference being that the order of symbols in each word is reversed. When neighborhood learning is applied to this structure and the resulting machine reversed again, the correct grammar is obtained, shown in Figure 4.</Paragraph>
    <Paragraph position="4"> How is the learner to know whether to construct the prefix tree normally or in reverse? It simply does both and intersects the results. Intersection of two canonical acceptor.</Paragraph>
    <Paragraph position="5">  languages is an operation which returns a language consisting of the words common to both. Similarly, machine intersection returns an acceptor which recognizes just those words that both machines recognize. This strategy is thus conservative: the learner keeps only the most robust generalizations, which are the ones it 'finds' in both the forward and reverse prefix trees.</Paragraph>
    <Paragraph position="6"> This new learner is called the Forward Backward Neighborhood (FBN) Learner and it succeeds with all the patterns (see appendix).</Paragraph>
    <Paragraph position="7"> Interestingly, the additional languages the FBN Learner can acquire are ones that, under foot-based analyses like those in Hayes (1995), require feet to be built from the right word edge. For example, Lower Sorbian has a binary trochee aligned to the right word edge; Indonesian iteratively builds binary trochaic feet from the right word edge; Cayuvava iteratively builds anapests from the right word edge. Thus structuring the input in reverse appears akin to a footing procedure which proceeds from the right word boundary.</Paragraph>
  </Section>
  <Section position="5" start_page="26" end_page="27" type="metho">
    <SectionTitle>
5 Predictions of Neighborhood Learning
</SectionTitle>
    <Paragraph position="0"> In this section, let us examine some of the predictions that are made by neighborhood learning. In particular, let us consider the kinds of languages that the Neighborhood Learner can and cannot learn and compare them with the attested typology.</Paragraph>
    <Section position="1" start_page="26" end_page="26" type="sub_section">
      <SectionTitle>
5.1 Binary and Ternary Stress Patterns
</SectionTitle>
      <Paragraph position="0"> Neighborhood learning suggests an explanation of the fact that the stress rhythms found in natural language are binary or ternary and not higher nary, and of the fact that stress falls within a three-syllable window of the word edge: perhaps only systems with these properties are learnable. This is because the neighborhood learner cannot distinguish between sequences of the same symbol with length greater than two.</Paragraph>
      <Paragraph position="1"> As an example, consider the quaternary (and higher n-ary) stress pattern 2(0001)  I follow Hopcroft et al (2001) in our notation of regular expressions with one substitution- we use  |instead of + to indicate disjunction.</Paragraph>
      <Paragraph position="2"> Similarly, neighborhood learning cannot distinguish a form like 02000 from 020000, so a system that places stress on the pre-antepenult (e.g. 02000, 002000, 0002000) is not learnable. With samples from the pre-antepenultimate language</Paragraph>
    </Section>
    <Section position="2" start_page="26" end_page="27" type="sub_section">
      <SectionTitle>
5.2 Minimal Word Conditions
</SectionTitle>
      <Paragraph position="0"> A subtle prediction made by neighborhood-learning is that a QI stress language with a pattern like the one exemplified by Hopi (shown in Figure 8) cannot have a minimal word condition banning monosyllables. This is because if there were no monosyllables in this language, then state 4 in Figure 8 would have the same neighborhood as state 2 (as in Figure 9).</Paragraph>
      <Paragraph position="1">  not allowing monosyllables.</Paragraph>
      <Paragraph position="2"> Since such a grammar recognizes a nonneighborhood-distinct language it cannot be learned by the Neighborhood Learner.</Paragraph>
      <Paragraph position="3"> As it happens, Hopi is a QS language which prohibits light, but permits heavy, monosyllables. Since I have abstracted away from the internal structure of the syllable in this paper, this prediction is not disconfirmed by the known typology: there are in fact no QI Hopi-like stress patterns in Gordon's (2002) typology which ban all monosyllables; i.e there are no QI patterns like the one in Figure 9.</Paragraph>
      <Paragraph position="4"> Some QI languages do have a minimal word condition banning all monosyllables. To our knowledge these are Cavine~na and Cayuvava (see Table 1), Mohawk (which places stress on the penult  like Nahuatl), and Diyari, Mohwak, Pitta Pitta and Wangkumara (all which assign stress like Pintupi) (Hayes, 1995). The Forward Backward Neighborhood Learner learns all of these patterns successfully irrespective of whether the patterns (and corresponding input samples) permit monosyllables, predicting that such patterns do not correlate with a prohibition on monosyllables (see appendix).</Paragraph>
      <Paragraph position="5"> Other QI languages prohibit light monosyllables.</Paragraph>
      <Paragraph position="6"> Diegue~no, for example, places stress finally like Atayal (see Table 1), but only allows heavy monosyllables. This is an issue to attend to in future research when trying to extend the learning algorithm to QS patterns, when the syllable type (light/heavy) is included in the representational scheme.</Paragraph>
    </Section>
    <Section position="3" start_page="27" end_page="27" type="sub_section">
      <SectionTitle>
5.3 Restrictiveness and Other Approaches
</SectionTitle>
      <Paragraph position="0"> There are languages that can be learned by neighborhood learning that phonologists do not consider to be natural. For example, the Neighborhood Learner learns a pattern in which words with an odd number of syllables bear initial stress but words with an even number of syllables bear stress on all odd syllables.</Paragraph>
      <Paragraph position="1"> However, the grammar for this language differs from all of the attested systems in that it has two loops but is slender (cf. Estonian which has two loops but is not slender). Thus this case suggests a further formal restriction to the class of possible stress systems.</Paragraph>
      <Paragraph position="2"> More serious challenges of unattestable, but Neighborhood Learner-able, patterns exist; e.g.</Paragraph>
      <Paragraph position="3"> 21*. In other words, it does not follow from neighborhood-distinctness that languages with stress must have stressless syllables. Nor does the notion that every word must bear some stress somewhere (i.e. Culminativity- see Hayes (1995)).</Paragraph>
      <Paragraph position="4"> However, despite the existence of learnable pathological languages, this approach is not unrestricted. The class of languages to be learned is finite--as in the Optimality-theoretic and Principles and Parameters frameworks--and is a proper subset of the regular languages. Future research will seek additional properties to better approximate the class of QI stress systems that can be exploited by inductive learning.</Paragraph>
      <Paragraph position="5"> This approach offers more insight into QI stress systems than earlier learning models. Optimality-theoretic learning models (e.g. (Tesar, 1998)) and models set in the Principles and Parameters framework (e.g. (Dresher and Kaye, 1990)) make no use of any property of the class of patterns to be learned beyond its finiteness. Also, our learner is much simpler than these other models, which require a large set of a priori switches and cues or constraints.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML