File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1106_intro.xml

Size: 15,093 bytes

Last Modified: 2025-10-06 14:06:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1106">
  <Title>E A General Computational Model for Word-Form</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Types of phonological alternations
</SectionTitle>
    <Paragraph position="0"> in Czech We will deal with three types of phonological alternations: palatalization, assimilation and epenthesis. Palatalization occurs mainly in declension and partly also in conjugation. Assimilation occurs mainly in conjugation. Epenthesis occurs both in declension and in conjugation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Epenthesis
</SectionTitle>
      <Paragraph position="0"> An epenthetic e occurs in a group of consonants before a O-ending. The final group of consonants can consist of a suffix (e.g. -k or -b) and a part of the stem; in this case the epenthesis is obligatory (e.g. kousek x kousku 'piece', malba x maleb 'painting'). In cases when the group is morphologically unseparable, the application of epenthesis depends on whether the group of consonants is phonetically admissable at word end. In loan words, the epenthetic e may occur if the final group of consonants reminds a Czech suffix (e.g. korek x korku 'cork', but alba x alb 'alb'). In declension, two situations can occur: * The base form contains an epenthetic e; the rule has to remove it, if the form has a non-O ending, e.g. chlapec 'boy', chlapci dative/locative sg or nominative pl.</Paragraph>
      <Paragraph position="1"> * The base form has a non-O ending; the rule has to insert an epenthetic e, if the ending is O, e.g. chodba 'corridor', chodeb genitive pl.</Paragraph>
      <Paragraph position="2"> In conjugation, an epenthetic e occurs in the past participle, masculine sg of the verb jit 'to go' (and its prefixed derivations): gel 'he-gone', gla 'she-gone', glo 'it-gone'. The rule has to insert an epenthetic e if the form has a O-ending.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Palatalization and assimilation
</SectionTitle>
      <Paragraph position="0"> Palatalization or assimilation at the morpheme boundaries occurs when an ending/suffix starts with a soft vowel. The alternations are different for different types of consonants. The types of consonants and vowels are as follows: * hard consonants--d, (g,)h, ch, k, n, r, t * soft consonants--c, d, d, j, ~, /, g, t, 2 * neutral consonants--b, l, m; p, s, v, z * hard vowels--a, d, e, d, o, 6, u, ~, y, ~\] and the diphthong ou * soft vowels--d, i, ( The vowel d cannot occur in the ending/suffix so it will not be interesting for us. I also will not discuss what happens with 'foreign' consonants /, q, w and x--they would be treated as v, k, v and s, respectively. The only borrowing from foreign languages that I included to the above lists is g: This sound existed in Old Slavonic but in Czech it changed into h. However, when later new words with g were adopted from other languages, this sound behaved phonologically as h (e.g. hloh, hlozich--from Common Slavonic glog 'hawthorn', and katalog, kataloz(ch 'catalog').</Paragraph>
      <Paragraph position="1"> The phonological alternations are reflected in writing, with one exception--if the consonants d, n and t are followed by a soft vowel, they are palatalized, but the spelling is not changed: spelling: d~, di phonology: /de/,/di/ ne, ni I el, la l t~, ti / \[e/, / \[i/ In other cases the spelling reflects the phonology. In the further text I will use { } for the morpho-phonological level, / / for the phonological level and no brackets for the orthographical level. In the cases where the orthography and phonology are the same I will only use the orthographical level. Let us look at the possible  types of alternation of consonants: * Soft consonant and ~-- The soft consonant is not changed, the soft ~ is changed to e.</Paragraph>
      <Paragraph position="2"> {d(d@} ---+ d(de 'pussycat' dative sg * Soft or neutral consonant and i/(-- No alternations occur.</Paragraph>
      <Paragraph position="3"> { d(di} ~ didi 'pussycat' genitive sg * Hard consonant and a soft vowel -- The alternations differ depending on when and how the soft vowel originated.</Paragraph>
      <Paragraph position="4"> Assimilation: - {k j} -~ e tlak 'pressure' ---+ tladen 'pressed' - {hj)~ mnoho 'much, many' ~ mno2eni'mul-</Paragraph>
      <Paragraph position="6"> It is !not easy to find an example of i this sprt of alternation, as g only occurs in loan words that do not use the old t~rpes of derivation. In colloquial speec h it would be perhaps possible to creat~ the following form: pedaglog 'teacher' ---+ pedago2en( 'working as a teacher' - {dj}-~z sladit 'to sweeten' ~ slazen('sweetening' null This sort of alternation is not productive any more--in newer words r palatalization applies: sladit.'to tune up' --+ slad~n( 'tuning up' In some cases both variants are possible, :or the different variants exist in different dialects--the east (Moray/an) dialects tend to keep this phonological alternation, while the west (Bohemiah) dialects often abandoned it. - {tie} ~ ~e platit !to pay' ~ placen( 'paying' This alternation is also not productive any more. The newest word that I found which shows this sort of phonolog/ca! alternation is the word fotit 'to take a photo' ~ focen( 'taking a photo ~.</Paragraph>
      <Paragraph position="7"> Palatalization: During the historical development of the language several sorts of palatalization  occured--the first and second Slavonic palatalization and further Czech palatalizations. null - {k~/ki} --+ 5e/di (1st pMat.) matka 'mother' ---+ matSin possesive adjective - {k~/ki) --~ ce/ci (2nd palat.)  ductive any more. In newer derivations {sje} --+ se (e.g. kosit 'to mow' kosen( 'mowing') .</Paragraph>
      <Paragraph position="8"> - {zje} ~ 2e kazit 'to spoil' ~ { kazjenz~ -+ ka2en( 'spoiling' This type of assimilation is also not productive any more. In newer derivations {zje} ~ ze (e.g. ~et&amp;it 'to concatenate' --+ /et&amp;eni'concatenating'). Palatalization: With b, m, p and v no alternation occurs ({vrb~} 'willow' dative/locative sg ---+ vrb~). - {s~) + se rosa 'wasp' ---+ {vos@} ~ rose dative/locative sg - {z~} --~ ze koza 'goat' --.+ {koz@} --+ koze dative/locative sg Both palatalization and assimilation yields the same result: - {lje} -+ le akolit 'to school' --+ {$koljem~ gkolen( 'schooling' - {le} ~ le ~kola 'school' -+ { $kol~} ~ ~kole dative/locative sg  rules in the Czech lexicon As the Czech lexicon should serve practical applications I did not try to solve all the problems that occur in Czech phonology. I concentrated on dealing with the alternations that occur in declension and regular conjugation, and the most productive derivations. The rest of alternations occurring in conjugation are treated by inserting several verb stems in the lexicon. The list of alternations and other changes covered by the rules:</Paragraph>
      <Paragraph position="10"> For the CZech lexicon I used the software r tools for two-level morphology developed at Xerox (Karttune.n and Beesley, 1992; Karttunen, 1993). The le:kical forms are created by attaching the proper ending/suffix to the base form in a separate:program. To help the two-level rules to find where they should operate, I also marked morpheme boundaries by special markers. These markers have two further functions: * They bear the information about the length of ending i(or suffix and ending) of the base form, i.e. how many characters should be removed before attaching the ending.</Paragraph>
      <Paragraph position="11"> * They bear the information about the kind of alternation.</Paragraph>
      <Paragraph position="12"> Beside the markers for morpheme boundaries I also use markers for an epenthetic e. As I said before, e is inserted before the last consonat of a final consonant group, if the last consonant is a suffix, or if the consonant group is not phonetically admissable. However, as I do not generally deal with derivation nor with the phonetics, I am not able to recognize what is a suffix and what is phone~ically admissable. That is why I need these special markers.</Paragraph>
      <Paragraph position="13"> Another auxiliary marker is used for marking the suffix -~7~, that needs a special treatment in derivation of feminine nouns and their possesive adjectives. The long vowel/must be shortened in the derivation, and the final k must be palatalized even if the O-ending follows. I need a special marker, as -ik- allows two realizations for both the sohnds in same contexts: Two realizations of i d~edn~7~ 'clerk' ~ d~ednice 'she-clerk', but rybnzT~ 'pond' ~ rybnlce locative sg Two realizations of k d/ednzT~ x d/ednic (genitive pl of the derived</Paragraph>
      <Paragraph position="15"> In the previous section, I described all possible alternations concerning single consonants.</Paragraph>
      <Paragraph position="16"> When I work with the paradigms or with the derivations, it is necessary to specify the kind of the alternation for all consonants that can occur at the boundary. For this purpose I introduced four types of markers: &amp;quot;1P -- 1st palatalization for g, h and k, or the only possible (or no) palatalization for other consonants. I use this marker also for palatalization c --~ 5 in vocative sg of the paradigm chlapec. The final c is in fact a palatalized k, so there is even a linguistic motivation for this.</Paragraph>
      <Paragraph position="17"> A2P -- 2nd palatalization for g, h and k, or the only possible (or no) palatalization for other consonants.</Paragraph>
      <Paragraph position="19"> These markers are followed by a number that denotes how many characters of the base form should be removed before attaching the ending/suffix. Thus there are markers ~ 1P0, ^2P0, ^1P1, etc. The markers starting with ^N only denote the length of the ending of the base form--and instead of using ^N0 I attach the suffix/ending directly to the base form. Fortunately, nearly all paradigms and derivations cause at most one type of alternation, so it is possible to use one marker for the whole paradigm.</Paragraph>
      <Paragraph position="20"> The markers for an epenthetic e are ^El (for e that should be deleted) and ^E2 (for e that should be inserted). The marker for the suffix -zTc in derivations is ^ IK.</Paragraph>
      <Paragraph position="21"> Here are some examples of lexical items and the rules that transduce them to the surface form:  (1) doktorka ^ 1Plin^2P0~ch  The base form is doktorka 'she-doctor'. The marker ^IP1 denotes that the possible alternation at this morpheme boundary is (first) palatalization and that the length of the ending of the base form is 1 (it means that a must be removed from the word form and the possible alternation concerns k). The marker ~2P0 means that the derived possessive adjective has a O-ending and the possible alternation at this morpheme boundary is palatalization. If we rewrite this string to a sequence of morphemes we get the following string: doktork-in-~jch. The sound k in front of i is palatalized, so the correct final form is doktordin~eh, which is genitive plural of the possessive adjective derived from the word doktorka.</Paragraph>
      <Paragraph position="22"> Let us look now at the two-level rules that transduce the lexical string to the surface string. We need four rules in this example: two for deleting the markers, one for deleting the ending -a, and one for palatalization. The rules for deleting auxiliary markers are very simple, as these markers should be deleted in any context. The rules can be included in the definition of the alphabet of symbols:</Paragraph>
      <Paragraph position="24"> This notation means that the auxiliary markers are always realized as zeros on the surface level.</Paragraph>
      <Paragraph position="25"> The rule for deleting the ending -c looks as follows: &amp;quot;Deletion of the ending -a-&amp;quot;</Paragraph>
      <Paragraph position="27"> The first line of the rule describes the context of a one-letter nominal ending u, and the second line describes the context of an infinitive suffix with ending -at or -ovut.</Paragraph>
      <Paragraph position="28"> The rule for palatalization k -+ d looks as follows: null &amp;quot;First palatalization k -&gt; ~&amp;quot; k:~ &lt;=&gt; _ (7,'IK:) \[ a: I ~: \] 7.'iPi: i ; NonCeS: (End) 7.'1PI: ~: ; The first line describes two possible cases: either the derivation of a possesive adjective from a feminine noun (doktorku--~ doktordin), or the derivation of a possesive adjective from a feminine noun derived from a masculine that ends with -~7~ ( ~ednzT~ ~ ( d/ednice -+) d/ednidin). The second context describes a comparative of an adjective, or a comparative of adverb derived from that adjective (ho/k~\] ~ ho/dejM, ho~deji). The set NonCCS contains all character except c, d and s and it is defined in a speciM section. This context condition is introduced, because the groups of consonants ck, dk and sk have different 1st palatalization.</Paragraph>
      <Paragraph position="29"> The label End denotes any character that can occur in an ending and that is removed from the base form.</Paragraph>
      <Paragraph position="30"> (2) korek'2P0^Elem The base form of this word form is korek 'cork'; the marker ^2P0 means their the possible alternation is (second) palatalization and that the length of ending of the base form is 0. The marker ^El means that the base form contains an epenthetic e, and em is the ending of instrumental singular. The correct final form is korkem. The rule for deleting an (epenthetic) e follows:  The first line describes the context for deletion of the suffix -ec in the derivation of the type v~dec 'scientist' --+ v~dkyn~ 'she-scientist'. The second context is the context of the ending -e or the suffix -ce. This suffix must be removed in the derivation of the type soudce 'judge' ~ soudkyn~ 'she-judge'. : The third context is the context of an epenthetic e that is present in the base form and must be removed from a form with a non-O ending. The sets Cons and Vowel contain all consonants and all vowels, respectively.</Paragraph>
      <Paragraph position="31"> The fourth line describes the context for deletion of the infinitive ending -et.</Paragraph>
      <Paragraph position="32"> The whole program contains 35 rules. Some of the rules concern rather morphology than phonology; namely the rules that remove endings or suffixes. One rule is purely technical; it is one of the two rules for the alternation ch ~ ~, as c and h must be treated separately  (though ch is considered one letter in Czech alphabet). Six rules are forced by the Czech spelling rules (e.g. rules for treating /d/, /t/ and/~/in various contexts, or a rule for rewriting y ~ i after soft consonants). 18 rules deal b with the actual phonological alternations and they cover the whole productive phonological system of Czech language. The lexicon using these rules was tested on a newspaper text containing 2,978,320 word forms, with the result of more than 96% analyzed forms.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML