File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/69/c69-0901_ackno.xml
Size: 9,074 bytes
Last Modified: 2025-10-06 13:51:23
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-0901"> <Title>AUTOMATIC SI~IATION OF HISTORICAL CHANGE</Title> <Section position="2" start_page="4" end_page="14" type="ackno"> <SectionTitle> 5 ~ </SectionTitle> <Paragraph position="0"> For example, in Modern Russian the first person singular present of the verb &quot;to be able&quot; is /mogu/ and the second person singular is /mo~i~/. The first singular of &quot;to read&quot; is /~itaju/ and the second singular is /~itaji~/. From these and other sets of verbs we would conclude that the first singular ending is /u/ and that of the second singular is /i~/. Therefore, the stem morpheme alternants must be ImogNmo~# and \[~itaj ~ ~itaj} The first of these two sets exhibits a morphophonemic alternation of g/~. This same alternation in the first person singular occurs in other sets of verbs when the stem ends in a velar. From this (and other corroborative forms) we postulate that an earlier stage of Russian had one form for this verb stem, namely /mog/, and that before the vowel /i/ /g/ later became /~/.</Paragraph> <Paragraph position="1"> The proposal in this paper is to reverse the bottom-to-top model of the comparative method and that of internal reconstruction into a top-to-bottom generative model where the input forms are reconstructed leKical items and the rules are the set of postulated sound changes for the language. But there are two major difficulties in reversing the older models. One, the documented changes have often been incorrectly or incompletely stated and, two, the relative chronology of various rules ~as not been adequately described. We hope to show how the computer can be used at least to test the accuracy of the rules and secondly to test or, hopefully, to help discover the relative chronology of the rules exhibited by their ordering.</Paragraph> <Paragraph position="2"> The model )roposed here is one where the proto language is a set of reconstructed forms (chosen, for example, from a standard reference work). The rules describing the phonological changes in that language are then described and ordered. As the program operates on these forms, the output from each rule represents a synchronic stage in the development of that language. As final output one hopes to get the modern language. If any of the output is incorrect, then it is assumed to be from one of four possible sources: an incorrectly formulated rule (including analogical formation), a non-existent rule, an incorrectly ordered rule, or an incorrectly reconstructed form. Being able to differentiate which of these is the actual cause for the incorrect output is simplest only in the case where all of the output was the result of the application oPS only one rule.</Paragraph> <Paragraph position="3"> 2.0 A sketch of the phonological ~ of Russian. The rules which were tested were an abridged version of a set presented by the author in a recent paper. The rules attempt to account for certain aspects of the development of the phonological system of C~ntemporary Standard Russian from a late form of Proto-Indo-European. These rules were: I Kantor, Marvin and R.N. Smith, &quot;A sketch of the major develppments in Russian historical phonology&quot; (to appear). The original formulation was in terms of distinctive features; however, for this programmatic study a segmental notation has been used for ease of statement, etc.</Paragraph> <Paragraph position="4"> ~.0 Description of test. Approximately five hundred reconstructed Proto-lndo-European forms were chosen from Walde and Pokorny (1932). These were punched onto cards along with their English glosses. A separate Russian gloss was typed onto a print-out of the PIE forms. The transcription of Walde and Pokorny for the PIE lexical items was maintained as closely as possible, PSncluding such notations as subscript e and o. The only criterion imposed on choice of words was that they be as long as possible, so as to have a variety of environments.</Paragraph> <Paragraph position="5"> The program was written in SNOBOL4 for the CDC 6400. Each rule set was numbered so as to coincide with the set of rules listed in section 2, wlth a zero appended to each rule number so as to allow for later insertions. Changing a rule consists at the moment of simple removal and replacement of cards. The history of a word or set of words can be gotten by a11'owlng it to be processed with accompanying output generated by each rule set.</Paragraph> <Paragraph position="6"> Similarly, the lexicon for a particular stage can be generated by allowing the input to be processed up through the rule covering that stage and, if wanted, suppressing output from intermediate stages. With the availability of larger storage capacity the output f~em each stage can be generated once and stored in such a way that it can be referenced simply and thereby ellminate regeneration of input forms when the need for a rule change arises.</Paragraph> <Paragraph position="7"> Frequency counters will be added in older to measure the functional load of a rule, at least in terms of dictionary II frequency. How this can be incorporated meaningfully into a theory of language change is not clear at this time.</Paragraph> <Paragraph position="8"> The effect of borrowing can be simulated by introduction of lexical items just prior to a specific stage, There are too many variables involved in this case and the predictions have been poor. The effects of loss of original PIE are even more obvious but will require much further study.</Paragraph> <Paragraph position="9"> 4.0 Discussion of results. The program is obviously language dependent but the basic conception is of general applicability.</Paragraph> <Paragraph position="10"> The set of rules described in section 2.0 has been programmed and has successfully predicted the Modern Russian form from the PIE input in many cases including the following: PIEE Hod. Russian *bhel~ h- boloz*medhi - me~*aloNg- slug*ang~hi - u~ *~orm- sram *~omb h- zub *g~rffg h- griz null The number of forms of PIE which were not related to Modern Russian was much greater. The main reason is probably due to loss (again assuming a uniform parent PIE as the sole source of the lexicon). The differences between generated and actual Modern Russian could be accounted for in a few instances, for example *apsa did not become Mod. R 'osina' in part because there are no explicit rules in the program for the simplification of consonant clusters. But other incorrect outputs can not be accounted for in many instances by any easily accessible, documented rule. For example, there is no rule to handle the two disparate outputs from similar input forms with respect to the initial cluster 'sp-' in the forms PIE Mod. Russian *sperg- pr'ad*sple~ h- selez-Similarly, there is a rule eu>u, for example, to account for *leut- becoming l~ud -, and others, but there are at least two exceptions to this r~le:</Paragraph> <Paragraph position="12"> Whether the original rule has too general an environmental condition or whether u was generated and later underwent some other changes, heretofore unpostulated, is unknown. It is possible that these forms should not have been considered as correspondences.</Paragraph> <Paragraph position="13"> No case of an error in rule ordering has been found yet for l this sample.</Paragraph> <Paragraph position="14"> Examining the output is at times a forbidding task. It may help to simplify the discovery of causes of error by generating the output of each rule in the form of a KWIC index where the occurrences of each phone would be grouped together and the environments then made quite clear. When input and output are then compared one might find more easily why a rule was not applied or why it should be generalized, etc.</Paragraph> <Paragraph position="15"> 5.0 Conclusions. This paper is necessarily meant to be only a preliminary progress report and as such has raised many other questions in addition to its concrete results. Some of these questions are very basic, in particular, since many of the output forms could not be accounted for, should one really attempt to generate forms of a modern language from reconstructed lexical items if the rules used are not those postulated during the process of reconstruction since the former should be a record of the latter. Given, therefore, a set of reconstructed forms and a separate set of rules, it becomes very difficult to account for the source of errors in the output. Also, the assumption of a uniform, single-stage ~roto language may require many restrictions.</Paragraph> <Paragraph position="16"> The computer can under ideal conditions be successfully used in testing hypothesized changes in the history of a language, given certain simplifying assumptions. It can be expected to operate best when the rules and reconstructed proto forms are established by the same investigator, working within the bounds established by a general theory of historical change.</Paragraph> </Section> class="xml-element"></Paper>