File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/69/c69-4901_abstr.xml
Size: 19,830 bytes
Last Modified: 2025-10-06 13:45:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-4901"> <Title>PART ~E - LELIEOSTATIST\]DS</Title> <Section position="1" start_page="0" end_page="18" type="abstr"> <SectionTitle> PART ~E - LELIEOSTATIST\]DS </SectionTitle> <Paragraph position="0"> The Swadesh theory of lexicostatistics (1950, 1952, 1955) provided the first quantitative comparison of related languages based on a well-defined model of language change. The stochastic nature of this model was poorly understood by linguists, in the main, and many have rejected the theory in the course of a protracted and confused controversy. Meanwhile field linguists, especially those working with language groups of unknown history, have accepted lexicestatisties and have found it to he an efficient, valid and reliable tec huique.</Paragraph> <Paragraph position="1"> The Swadesh thee .r~ There are serious oversimplifications of reality implicit in lexicostatistics, and it is these, rather than the stochastic aspects, which are limitations of the theory. Swadesh hypothesized, in effect, that (i) it is possible to discover a set of basic, universal and non-cultural meanings, and he constructed a list of about 200 such meanings; (ii) in every natural language, at a given tl ~e, there is a unique lexical representation (word)corresponding to each of these meanings; but (iii) over short time intervals, the word representing any meaning runs a small but constant risk of being replaced by a different (non-cognate) word; and (iv) the replacement, or non-replacement, of the lexical representation of a meaning occurs independently of that of any other meaning, and independently over different periods of time. To formalize (i) and (ii), we must postulate the existence, for each natural language, at all points t in time, of a lexico_..___~n, represented by a finite abstract set Lt. A well-defined equivalence relation corresponding to cognation partitions the elements of ~TLt (T a real interval) into equivalence classes. If k6Ls, I~L t (t,s6T) are cognate, we write ~(k,l) = i. Otherwise ~(k,1) = O.</Paragraph> <Paragraph position="2"> Further, we must postulate the existence of a finite abstract set M (corresponding to the universal set of meanings), and a procedure for defining, for any t, and any Lt, a unique map from M into Lt. This map, written M ~Lt, specifies that for each t m~M there is a i~ L t such that m~l (i mean., ss m).</Paragraph> <Paragraph position="3"> Hypotheses (iii) and (iv) imply that the changes over time in the image of the map M---~*-t Lt have a certain stochastic aspect. This can be modelled by the probability statement</Paragraph> <Paragraph position="5"> a universal constant, t>s, and ~t-s -~ 0 as t-~s; and two independence conditions ; let S m i ----~ k i for i = 1,2,...,IM~ (~Mi the number of elements in M),</Paragraph> <Paragraph position="7"> variables.</Paragraph> <Paragraph position="8"> This model has a number of immediate properties which form the central thesis of the Swadesh theory. These are presented here as Theorems I, 2 and 3. For simplicity we will assume that at any time t, at most one word in L t can belong to a cognation equivalence class. This simplifies notation and proofs, although the assumption may be relaxed without substantively affecting this development. One further type of assumption is required to ensure a degree of randomness in the choice of replacing word during lexical replacement. To prove Theorem I as stated below, we require BC > I such that for all m, t>s, ~(k,l) = O.</Paragraph> <Paragraph position="9"> a finite number of disjoint intervals,</Paragraph> <Paragraph position="11"> Let N(t-s) be the number of changes (with respect to the cognation relation) of the mapping m~t i in the interval (s.t\].</Paragraph> <Paragraph position="12"> Then N(Z-s) = 0 is just the event tha% a Poisson process remains at zero onthe interval (s,t\] (see, e.g. Parzen, 1960, p.252), and However,</Paragraph> <Paragraph position="14"> which are indenendent ~ languages of the same parent language (which are said to solit at time s) if ' = L&quot; Ls s ' and if s m---*-k (in beth languages) m--t-~t l' in the first language t m~l&quot; in the second language, then ~(k,l') and ~(k)l&quot;) are independent random variables. Theorem 2 Then Let Lt, .Lt, t>s be as above.</Paragraph> <Paragraph position="16"> Ass'~l~g m~-~k in beth languages, By transitivity of the equivalence relation represented by S, the first term on the right is</Paragraph> <Paragraph position="18"> Since we have fixed m--~ k The summation contains at most</Paragraph> <Paragraph position="20"> terms which are not annihilated by &quot;- &quot; ~) (i',i&quot;) and so the total is</Paragraph> <Paragraph position="22"> This completes the proof of the first statement of the theorem.</Paragraph> <Paragraph position="23"> The proof of the second parallels the analogous result in the previous theorem.</Paragraph> <Paragraph position="24"> In natural languages, \[Ltl is several thousands and \[LJ is ne~ligible compared to the exponential te~u, except for very high values of t (where the theory has little applicability). In the next theorem, the results of Theorems 1 and 2 are utilized, neglecting the emr terms of the form O(lLt\[ ).</Paragraph> <Paragraph position="25"> Under certain, more specific restrictions on P\[o-*~ llm-~ ~\] , B~ ~n.~ ~Ived ~or ~ e~ot ~o~ of the error term attached to the expommtial laws (here formulated as Theorems 1 mad 2).</Paragraph> <Paragraph position="26"> Theorem 3 Insofar as we may approximate the results of Theorems I and 2</Paragraph> <Paragraph position="28"> is the maximum likelihood estimator (~) of ~ in the first formula above, and if ~ is known, - log |Ml.l~Sd(k~l )-v. . /% %~-s -- ....</Paragraph> <Paragraph position="29"> k is the }~E of t-s.</Paragraph> <Paragraph position="30"> In the case of two independent daughter languages (Thin. R),</Paragraph> <Paragraph position="32"> It suffices to find the MLE of ~ , the other cases being analogous.</Paragraph> <Paragraph position="33"> -AT Consider binomial trials with parameter p = e ~(k,l) = 1 is the equivalent of a success in one such trial. ~. ~(km,l m) = r is the equivalent of r successes in \[M I trials. m~M The likelihood function of ~ in such a case is</Paragraph> <Paragraph position="35"> and the same process yields t-s A Letr= ~ me M as in Theorem 3. Swadesh (1950) derived a method~logy to utilize the three results</Paragraph> <Paragraph position="37"> as follows. He first selected his list of meanings which he considered basic to all languages. He then * compared Old English with Modern English (t-s ~.I000 years), i.e. he compared the words in each language corresponding to the basic meanings. The etymology of words in those languages being fairly well known, he was able to decide when a pair of words corresponding to the same meaning were cognate (i.e. one was historically derived from the other, or both were derived from a co~n root, by a series of phonological alterations, each of which affected only a part of the word in question). This Immedi&tely led to ~-~ 2 ~ I0&quot;4deg Using the estimate which he obtained as a constant, he dated the relative times of separation or &quot;split&quot; of various Salish (western NorthAmerican Indian) languages from a common parent with the estimator t-s . After the work of Lees (1953), was considered to be a universal constant~ t-s could estimate absolute dates of split, and t-s could date a collection of texts from a dead language.</Paragraph> <Paragraph position="38"> Criticisms of the theor 7 Criticisms of lexicostatlstics fall into two classes. In the first class are protests based on or resulting from the stochastic nature of the model an~or the stochastic nature of the phenomena of lexical loss and replacement. The second class of criticisms refer to particular assumptions in the model , and I will discuss these in the next section.</Paragraph> <Paragraph position="39"> Bergsland and Vogt (1962) presented four cases where t-s (or ~s are not accurate (thre~ too low and one too high), and rejected the Swadesh theory on this basis. In statistical terms, the authors constructed a sample consisting entirely of outliers and rejected an hypothesis without even considering the distribution of the test statistic. Fodor (1962) took the same approach to &quot;disprove&quot; lexicostatistlcs. Chretien (1962) calculated and published pages of ordinary binomial functions to prove, in essence, that t-s is a random variable and hence not &quot;an acceptable mathematical formula. tion&quot; of the Swadesh theory. This basic misunderstanding of the nature of statistical estimation is characteristic not only of critics of lexicostatistics, hut also of many of its practitioners. A more important criticism has been expounded, at great lengthp by Fodor (1965) and, more clearly, by Teeter (1963).</Paragraph> <Paragraph position="40"> Quoting from the latter: &quot;Lexical similarities and dissimilarities do net come about in any one simple way, and any mechanical method of counting lexical similarities cannot separate those due to chance, universals, diffusion, and common origin. Lexical change is the result of many factors, and all are scrambled together in the final result.&quot; (p.~l) This diversity of causes of lexical and semantic change has received detailed study by linguists and semanticists; see, for example, Bloomfield (1933) p.392 ff., Ullman (1957) p.183 ff. Quoting from Lees (1953): &quot; The reasons for morpheme decay, i.e. for changes in vocabulary, have been classified by many authors; they include such processes as word tabu, phonemic confusion of etymologically distinct items close in meaning, change in material culture with loss of obsolete terms, rise of witty terms or slang, adoption of prestige forms from a superstratum language, and various gradual semantic shifts, such as specialization, generalization, and peroration.&quot; (p. 114) And it is Just this diversity and the difficulty of &quot;unscrambling&quot; which, contrary to Teeter and to Fodor, justifies a stochastic model incorporating retention parameters. Consider, for comparison, the problem of constructing a model for the behaviour of gases. We have an enclosed volume containing a large number of particles of finite dimension, undergoing rapid motion. We can assume everything is perfectly deterministic, all the particles obeying Eewton's three laws of motion, and all collisions perfectly elastic. The position of any particle at any time can, theoretically, be calculated precisely if we know the initial state of the system and the time elapsed. Practically speaking, of course, this would be impossibly tedious, boring and pointless, there being so many particles, any two of which may collide, plus the walls, plus gravitational or electrical charge attractions and repulsions to consider. What is possible, interesting, and of great value (witness the fields of kinetic theory and statistical mechanics, dating from the work of men such as Maxwell, Bolt~man% and Einstein) is to consider the nature of each particle as a random process involving appropriate parameters and to consider the statistical bohaviour of the model thus constructed. It is complexity and great difficulty of predie. tion which make a statistical model workable. In the same way, Fodor and others have inadvertently Justified the preposition that some sort of stochastic process might be an appropriate model for lexical change phenomena. The question remains, what process? The Swadesh theory provides at least a first approximation to the correct answer. Problems with Swadesh's mode ~ Before discussing details of the model, it is appropriate to present the results of an early (1953) lexicostatistic investiga. tion of R. Lees. He chose thirteen language pairs, each pair consistlng of an historio language and a modern descendant. The particular choice of pairs presumably stewed from availability and not from any sampling technique. He translated each word in Swadesh's 215-word list (1950) into the 26 languages. After counting the number, r, of cognates between each language pairp he used (in effect), -~S where {MJ ~ 215 according to the number Of indeterminate cognations and uncertainties of translation. To get an estimate of a &quot;universal&quot; , he combined the individual estimates in</Paragraph> <Paragraph position="42"> Using p = e &quot;At as the parameter in the binomial experiment he calculated, for each language pair,</Paragraph> <Paragraph position="44"> which should be approximately the square of a standard normal random variable, if the assumptions of the theory are true. Since an est~imate of ~ is used in calculating p, the sum of the squared va~bles s~d ~ ~,~-~strlbuted. ~ut Z~<9.5, ei~iflcant ,t the I% level, suggesting rejection of the theory.</Paragraph> <Paragraph position="45"> Lees, however, suggested four reasons for not rejecting on the basis of the ~2 test; the large values for ~M~ and r, uncertainty in t, possible inappropriateness of the ~2 test, and the error in estimating ~ . The first and third of those are not valid statistically, and the fourth is a source of very little of the excess ~2. The variability in the time parameter can be incorporated into the ~2 calculation. This only reduces ~2 to 25.9 - 27.5 depending on the variation assumed in t. Lees' results, then, indicate strongly that the theory Is an inadequate model for the phenomena.</Paragraph> <Paragraph position="46"> We turn now to the second class of criticisms of the Swadesh model, those that involve objections, evaluations or improvements related to the generalizations and simplification of reality inherent in lexicostatistlo theory. The listing of assumptions earlier in this chapter will serve as a framework for classifying this latter class of criticisms.</Paragraph> <Paragraph position="47"> (i) There are no universal sets of meanings, it being difficult to specify most meanings without recourse to particular natural languages. ~o llst of meanings yet devised is completely satisfactory for sufficiently diverse languages; Holier (1956), O'Grady (1960), Cohen (196~), Levin (1964), Trager (1966).</Paragraph> <Paragraph position="48"> (ii) The existence of synonymy proves the non-uniqueness of the meaning map MT-~L; and no known methods of eliclting words for given meanings are completely and reliably reproducible, from speaker to speaker or even from occasion to occasion for a single speaker; Gudschinsky (1960). The existence of general and specific terms for a single entity provides a further complication.</Paragraph> <Paragraph position="49"> i (iii) If the parameter ~ can be sald to exist at all, It Is constant neither from language to'language; Bergsland and Vogt (1962), Fodor (1962), from meaning to meaning; Swadesh (1955), Androyev (1962), Ellegard (1962), and especially Dyen (1964), van der Merwe (1966), Dyen, James and Cole (1967), nor even from time interval to time interval for the same meaning; Swadesh (1962).</Paragraph> <Paragraph position="50"> Judgements about cognation are unreliable, especially with respect to languages which are separated by large t-s and whose history is mostly unknown; Fairbanks (1955), Teeter (1963), Lunt (1964). An analysis of this latter problem is beyond the scope of this study.</Paragraph> <Paragraph position="51"> (iv) Lexioal loss and replacement do not occur independently for different meanings, neither are current and future trends entirely independent of what has happened in the past, especially in languages which have possessed an orthography for some time. This has been noted especially in connection with the independence assumption of Theorem 2, as in the interval immediately after a split we might expect parallel (to some extent, at least) evolution of the two daughter langumges; Lees (1953), Hymes (1960), Teeter (1963). Also in this connection~ independence of evolution does not strictly hold where borrowings, loan-translations and imitations of other types are frequent occurrences.</Paragraph> <Paragraph position="52"> Towards a new thgor ~ A number of authors have attempted to deal with one or more of these problems. Swadesh (1952) discarded more than half of the meanings in his original list. For choosing among synonyms, Gudschinsky (19~) proposed a random selection, ~vmes (1960) suggested a procedure which would seleot cognate forms whenever they were available, Satterthwaite (1960) and D,Jen (1960) pointed out that it would be more reasonable to choose the word which is most frequently used for the meaning in question.</Paragraph> <Paragraph position="53"> Little could be done about the central postulate or result of the theory; that ~ is a constant, until the work of Dyen became well known. Dyen, on the basis of comparisons of a large number of Malayopolynesian languages was able to segregate meanings into groups on the basis of their individual ~ 's. A discussion of the mathematical implications of this ( p=e~i (t's) for meaning m i leads to E(r/I M I) = &quot;i~=e &quot;~i(t's) ) was published by van der Merwe (1966). Meanwhile, Dyen (1964) had statistically demonstrated that meanings with high A in the Malayopolynesian languages tend to have high in the Indoeuropean languages and vice ve~. This was the first new type of lexicostatistic result since the work of Lees. Later (1967) this work was refined so that Dyen et al were able to estimate a separate ~ for each meaning on a 196-word list of the Swadesh type.</Paragraph> <Paragraph position="54"> On the problem of independence, Swadesh pointed out that interaction between languages because of contact would bias estimates of t-s downward. Hattori (1953) suggested and Hymes (1960) discussed the formula ~-(r/t Z~I) = e &quot;l'~(t's) as a way of taking into account parallel evolution and the effect of those meanings with lower ~ than the rest of the llst. The latter effect is, however, properly described by using a sum of exponentials and, for the former, it is unreasonable to expect a constant multiplier (1.4) to express the dependence of two languages over all time. It is Clear that the multiplier of -~(t~s) should be near zero when t is close to s and to approach 2 as t gets very large. This was noted by Gleason (1960) who rightly suggested that for all sufficiently large t, estimates of t-s could be corrected by adding a small positive constant.</Paragraph> <Paragraph position="55"> One further suggestion that has been made by many authors and implemented by some, e.g. Hirseh (19~)~ Hattori (1957), is to attempt to construct a larger set M to provide a better (i.e.</Paragraph> <Paragraph position="56"> lower variance) estimate of time intervals.</Paragraph> <Paragraph position="57"> The primary purpose of this paper will be to develop a formal theory of word-meaning relationship, applicable to lexical and semantic change, which incorporates most of the criticisms levelled against the Swadesh theor~ Relationship j to linguistic theories This theory is unique in that ~ t provides a link between two previously unrelated linguistic theories, that of generative grammar, and the conventional descriptive semantics. Elsewhere (1969) we show how stochastic models, like our theory of word meaning behaviour, and Labov's (1967,1968) frequency approach to optional grammatical rules, can be derived by imposing probabilistic structure on formal grammars. On the other hand, the major phenomena and problems of descriptive and historical semantics can be elegantly formalized in terms of this same model.</Paragraph> </Section> class="xml-element"></Paper>