File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-0607_concl.xml
Size: 13,246 bytes
Last Modified: 2025-10-06 13:53:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0607"> <Title>Modeling English Past Tense Intuitions with Minimal Generalization</Title> <Section position="6" start_page="6" end_page="8" type="concl"> <SectionTitle> 4 Testing the Model </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="6" end_page="7" type="sub_section"> <SectionTitle> 4.1 Training </SectionTitle> <Paragraph position="0"> Before a model can be tested, it must be trained on a representative learning set. For our studies of English past tenses, we used a corpus of 4253 verbs, consisting of all the verbs that had a frequency of 10 or greater in the English portion of the CELEX database (Burnage 1991). We trained our model to predict the past tense form from the present stem.</Paragraph> <Paragraph position="1"> The model, implemented in Java, accomplished its task fairly rapidly, learning the English past tense pattern in about 20 minutes on a 450 MHz PC. Most of this learning time was spent expanding and refining the more detailed rules; the broad generalizations governing the system were in place after only a few dozen words had been examined. null</Paragraph> </Section> <Section position="2" start_page="7" end_page="7" type="sub_section"> <SectionTitle> 4.2 Corpus testing </SectionTitle> <Paragraph position="0"> As a first test of our model's performance, we divided the training data randomly into ten parts, and used the model to predict past tenses for each part based on the remaining nine tenths. For virtually every verb, the first choice of our model was the regular past tense, in its phonologically correct form: [-t], [-d], or [-d], depending on the last segment of the stem. We consider this preference to be appropriate, given that English past tenses are on the whole a highly regular system; human speakers output irregulars only because they have memorized them.</Paragraph> </Section> <Section position="3" start_page="7" end_page="8" type="sub_section"> <SectionTitle> 4.3 Testing on novel forms </SectionTitle> <Paragraph position="0"> In our opinion, the most important criterion for a model like ours is the ability to deal with novel, made-up stems in the same way that people do.</Paragraph> <Paragraph position="1"> Novel stems access the native speaker's generative ability, abstracting away from whatever behavior results from memorization of existing verbs.</Paragraph> <Paragraph position="2"> To begin, we have found that the model correctly inflects unusual words like Prasada and Pinker's (1993) ploamph and smairg; i.e. as [plomft] and [smergd]. The model can do this because it learns highly general rules that encompass these unusual items. Moreover, when confronted with the non-English sound [x] in to out-Bach [aUtbax], our model correctly predicts [aUtbaxt].</Paragraph> <Paragraph position="3"> The model is able to do this because it can generalize using features, and thus can learn a rule that covers [x] based on phonetically similar segments like [f] and [k].</Paragraph> <Paragraph position="4"> On a more systematic level, we have explored the behavior of the model with a carefully chosen Source code available at http://www.linguistics.ucla.edu/ people/hayes/rulesvsanalogy/ set of made-up verbs, which were rated both by our model and by groups of native speakers. We carried out two experiments, which are described in detail in Albright and Hayes (2001).</Paragraph> <Paragraph position="5"> In our first experiment, we asked participants to complete a sentence by using the past tense of a made-up verb that had been modeled in previous sentences. For example, participants filled in the blank in the frame &quot;The chance to rife would be very exciting. My friend Sam ___ once, and he loved it.&quot; Typically, they would volunteer rifed, or occasionally rofe or some other irregular form. In the second experiment, participants were given a number of choices, and rated each on a scale from 1 (worst) to 7 (best).</Paragraph> <Paragraph position="6"> In selecting verbs to use in the experiments, we tried to find a set of verbs for which our learning model would make a wide range of different predictions. We began with a constructed corpus of phonologically-ordinary monosyllables (i.e. combinations of common onsets and rhymes), and used the model to predict past tenses for each. Based on these predictions, we selected four kinds of verbs, which according to the model: I. should sound especially good as regular, but not as irregular II. should sound especially good as (some kind of) irregular, but not as regular III. should sound good both as regular and as some kind of irregular IV. should not sound especially good either as regular or as any kind of irregular Here are examples of all four categories.</Paragraph> <Paragraph position="7"> I. Blafe is expected to sound particularly good as a regular (because it falls within the scope of the high confidence voiceless-fricative rule), but not as an irregular. II. Spling is expected to sound especially good as an irregular (splung), because it fits a high-reliability [I] - [[?]] rule, but it is not predicted to be especially good as a regular. III. Bize is predicted to sound good as both a regular and an irregular, since it falls into a highly reliable context for regulars (final fricatives) and also falls into a highly reliable context for the [a] - [o] change (before a coronal obstruent). IV. Gude is not covered by any especially reliable rules for either regulars or irregulars. The full set of verbs is given in the Appendix.</Paragraph> <Paragraph position="8"> When we tested these four categories of made-up verbs, we found that our participants gave them ratings that corresponded fairly closely to the predictions of our model. Not only did participants strongly prefer regulars, as we would expect, but there was also a good match of model to data within the categories I-IV defined above. The following graphs show this for both regulars and irregulars (all responses are rescaled to the same The graphs show data for participant ratings; similar results were obtained when we counted how frequently the various past tenses were volunteered when the participants were asked to fill in the blanks themselves.</Paragraph> <Paragraph position="9"> As a more stringent test, we can examine not just mean values, but word-by-word predictions. A measure of this is the correlation between the model's predictions and the experimental results.</Paragraph> <Paragraph position="10"> The correlations are carried out separately for regulars and irregulars, since an overall correlation only establishes that the model knows that it should rate regulars highly.</Paragraph> <Paragraph position="11"> In summary, our experiments validate a number of the model's predictions. First, participants prefer regulars over irregulars. Second, their intuitions are gradient, ranging continuously over the scale.</Paragraph> <Paragraph position="12"> Third, participants favor only those irregular forms that fall within a context characteristic of existing irregular verbs, like -ing ~ -ung. Finally, and most surprisingly, the participants followed the predictions of our model in favoring regular forms that can be derived by rules with high reliability.</Paragraph> <Paragraph position="13"> We conclude that our model captures a number of subtle but important patterns in the preferences of human speakers for past tense formation of novel verbs. Some of these preferences (e.g., the special preference for voiceless-fricative regulars) are not predicted by traditional linguistic analyses. We have obtained similar results in other languages (Albright 1999; to appear) We suspect that there may be many generalizations in morphology that are apprehended by native speakers but have been missed by traditional analysis. The use of machine learning may be useful in detecting such generalizations.</Paragraph> <Paragraph position="14"> 5 How the model could be improved</Paragraph> </Section> <Section position="4" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.1 Phonological representations </SectionTitle> <Paragraph position="0"> Our model uses a very simple kind of phonological representation, from Chomsky and Halle (1968), and a very simple schema for rules ((5)). While this works well in systems that involve only local phonological generalizations, more complex systems are likely to require better representations if the correct generalizations are to be discovered.</Paragraph> <Paragraph position="1"> For example, the notion &quot;closest vowel&quot; is needed to characterize vowel harmony (e.g. Hungarian konyv-nAk - konyv-nek 'book-dative'). Our model cannot ignore the consonants that intervene between vowels, so it would not do well in learning this kind of rule. Our model also lacks any notion Statistical testing reported in Albright and Hayes (2001) indicates that the effect on regulars cannot be attributed (entirely) to a &quot;trade-off&quot; effect with irregulars; i.e. splinged does not sound bad just because splung sounds good. In fact, the observable tradeoff effects are equally strong in both directions: some irregular forms sound worse because they also fall into a strong context for regulars.</Paragraph> <Paragraph position="2"> of syllables or syllable weight. Thus it could not learn the generalization that all polysyllabic English verb stems are regular (Pinker and Prince 1988); nor could it learn the distribution of the Latin abstract noun suffixes [-ia] and [-ieVs], which depends on the weight of the stem-final syllable ([graV.ti.a] 'favor', [kle.men.tia] 'mercy' vs.</Paragraph> <Paragraph position="3"> [maV.te.ri.eVs] 'matter'; Mester 1994). Lastly, the model lacks any notion of foot structure. Thus, it could not learn the distribution of the Yidin y locative suffixes [-la] and [-V] (prelengthening), which is arranged so that the output will have an even number of syllables, that is, an integral number of disyllabic stress feet ([>>gabu][>>d y ula] 'clay-loc.' vs.</Paragraph> <Paragraph position="4"> [u>>na][ga>>raV] 'whale-loc'; Dixon 1977).</Paragraph> <Paragraph position="5"> Phonological theory provides some of the means to solve these problems: theories of long-distance rules (e.g. Archangeli and Pulleyblank 1987), of syllable weight (McCarthy 1979), and of foot structure (Hayes 1982). We anticipate that incorporating such mechanisms would permit these phenomena to be learned by our system.</Paragraph> <Paragraph position="6"> At the same time, however, we must consider the possibility that introducing new structures may expand the hypothesis space so much that it cannot be searched effectively by minimal generalization. Thus, where there are alternative phonological theories available, they should be assessed for whether they permit the right generalizations to be found without excessively expanding search time.</Paragraph> <Paragraph position="7"> It may also be possible to cut back on search time by using better algorithms for searching the hypothesis space.</Paragraph> </Section> <Section position="5" start_page="8" end_page="8" type="sub_section"> <SectionTitle> 5.2 Multiple changes </SectionTitle> <Paragraph position="0"> A number of morphological processes involve multiple changes, as in the German past participle geschleppt 'dragged', derived from schlepp- using both prefixation and suffixation. Our model (specifically, our method for detecting affixes) cannot characterize such cases as involving two simple changes, and would treat the relation as arbitrary.</Paragraph> <Paragraph position="1"> Two methods that might help here would be (a) to use some form of string-edit distance (Kruskal 1983), weighted by phonetic similarity, to determine that -schlepp- is the string shared by the two forms; (b) to adopt some method of morpheme discovery (e.g. Baroni 2000; Goldsmith 2001; Neuvel, to appear; Schone and Jurafsky 2001; Baroni et al. 2002) and use its results to favor rules that prefix ge- and suffix -t.</Paragraph> <Paragraph position="2"> Summarizing, we anticipate that improvements in the model could result from better phonological representations, better methods of search, and more sophisticated forms of string matching.</Paragraph> <Paragraph position="3"> Appendix: Made-Up Verbs Used in the Experiments I. Expected to be especially good as regular blafe [blef], bredge [brdZ], chool [tSul], dape [dep], gezz [gz], nace [nes], spack [spaek], stire [star], tesh [tS], wiss [ws] II. Expected to be especially good as irregular blig [blg], chake [tSek], drit [drt], fleep [flip], gleed [glid], glit [glt], plim [plm], queed [kwid], scride [skrad], spling [spl], teep [tip] III. Expected to be good both as regular and as irregular bize [baz], dize [daz], drice [dras], flidge [fldZ], fro [fro], gare [ger], glip [glp], rife [raf], stin [stn], stip [stp] IV. Not expected to be especially good either as regular or as irregular gude [gud], nold [nold], nung [n], pank [paek], preak [prik], rask [raesk], shilk [Slk], tark [tark], trisk [trsk], tunk [tk],</Paragraph> </Section> </Section> class="xml-element"></Paper>