File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2022_metho.xml

Size: 15,267 bytes

Last Modified: 2025-10-06 14:07:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-2022">
  <Title>Cross-linguistic phoneme correspondences</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Metaphonemes, archiphonemes and
</SectionTitle>
    <Paragraph position="0"> keysymbols The phonemic principle, which has been with us since the end of the nineteenth century, proposes that sets of similar sounds, which can be distinguished by the phonological context in which they occur, can be grouped together to form a single abstract phoneme. The distinct sounds or phones have been defined as allophones of the phoneme, one and only one allophone being permitted to appear in any particular phonological context. The metaphoneme principle states that sets of distinct phonemes that appear in different (but related) languages may be grouped together in a similar way as an abstract metaphoneme, where the conditioning factor is the language in question rather than the phonological context. This is the simplest case, but we also allow phonological conditioning to play a part in the definition of metaphonemes. For example, we may want to say that where English has /s/, German has /S/ if it is in the onset and appears immediately before a /t/ but has /s/ otherwise4.</Paragraph>
    <Paragraph position="1"> Archiphonemes (Trubetzkoy, 1939) are used to generalise over phonemes within a language to represent cases where neutralisations arise in certain contexts. For example, for stops that immediately follow /s/ in English, there is no voicing distinction ('skin', for example, cannot be contrasted with 'sgin')5. Trubetzkoy proposed that in such cases we use a different symbol to denote the underspecified or neutralised sound. Similarly, morphophonemes (or systematic phonemes) have been proposed by  between the voiced and voiceless forms, with minimal actual voicing, but no aspiration that is usually associated with voiceless stops.</Paragraph>
    <Paragraph position="2"> generative grammarians (Chomsky, 1964) to represent situations where distinctions are neutralised in certain morphological contexts. For example, the voicing of the final consonant of the stem in 'knife' and 'knives' is determined entirely by the presence or absence of the plural suffix.</Paragraph>
    <Paragraph position="3"> Although there is a superficial similarity between archi- and morphophonemes and metaphonemes, there are a number of crucial differences. We should note first that both archi- and morphophonemes were introduced as an answer to a problem that we do not actually face - namely the problems of violation of the phonemic principle. It is only if one needs to insist on biuniqueness, invariance and linearity that a solution to the potential problem is needed. In the overall approach to phonology and morphology advocated in the present work these are simply not necessary. We allow lexical entries (or definitions of lexical classes) to specify phonological and morphophonological alternations without being restricted to the phonemic principle. Thus, a phoneme in a language can be a realisation of more than one metaphoneme. The other most obvious difference is that archiphonemes are defined only within a single language, whereas metaphonemes are defined across languages. In terms of the overall theory of morphology, phonology and the lexicon into which metaphonemes were designed to fit, the generalisations represented by metaphonemes come at a different level from archiphonemes.</Paragraph>
    <Paragraph position="4"> The keysymbols proposed by Fitt (2001) are much closer to our metaphonemes. The most obvious difference here is that metaphonemes range over languages, while keysymbols are defined across different accents of a single language. However, this apparently significant difference is only sustainable if we maintain that there is a solid definition of what is a language and what is a dialect (or accent). We would maintain that the type of lexicon which represents related languages according to a hierarchical definition of their similarities can be extended very simply to represent distinct dialects of a single language in exactly the same way. However, there are practical differences in the way Fitt's keysymbols and our metaphonemes are employed.</Paragraph>
    <Paragraph position="5"> Fitt assumes a text-to-speech application in which the same words are to be pronounced, but in different accents. We assume a more general lexicon system, in which we may want to represent differences in whole dialects, not just accents, so that not only the pronunciation will be different. Fitt's system allows the definition of a single lexicon which outputs ambiguous strings, including keysymbols, to a speech synthesiser which interprets the keysymbols and disambiguates the pronunciation to get that desired. In the case of metaphonemes, we anticipate a lexical structure which allows lexical entries to be ambiguous as to their pronunciation, but the output of the lexicon as a whole is unambiguous, the metaphonemes being expanded out to their realisation in the different languages (or dialects) as part of the output process from the lexicon.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The metaphonemes of Dutch, English
</SectionTitle>
    <Paragraph position="0"> and German In order to define the metaphonemes, we constructed a database of around 800 cognate words from the three languages. The database began with orthographic forms, to which we automatically added the phonological forms from CELEX6. We then slightly massaged the database so that leading or trailing schwa syllables were ignored and for most cases just the core root was left for each language. Finally, we analysed the forms into syllabic structures and collated the onsets, peaks and codas for each language7.</Paragraph>
    <Paragraph position="1"> With this information we did two things: first we looked at the absolute correspondences, for clusters and for single consonants, and their frequencies. That is, we considered each grouping of correspondences, such as:8</Paragraph>
    <Paragraph position="3"> This gave us both some idea of the likely correspondences and some suggestions as to how phonological context might affect them. We did this for onsets only, codas only and for the two combined.</Paragraph>
    <Paragraph position="4">  and could be applied equally to databases of other cognate languages (e.g. French, Italian, Spanish). Indeed, it would also be possible to construct a database that included for English the cognates from other languages (e.g. French). There will inevitably be gaps in the cognate mappings for any set of languages, a database that maps some English words to one language and other words to another language would be just as acceptable as the database we have worked with to date.</Paragraph>
    <Paragraph position="5"> 8Note that we use the ordering English, Dutch, German throughout.</Paragraph>
    <Paragraph position="6"> Secondly, we extracted all of the individual consonant correspondences. This had to be done semimanually as we wanted to ensure that, in cases such as sk+sx+S the correspondences came out as s+s+S, and k+x+0 (e.g. 'school', 'school', 'Schule'). From this we derived a set of tables9 which give, for each consonant in each language, the consonants it can correspond to in the other two languages and how often it does so in our cognate database.</Paragraph>
    <Paragraph position="7"> As we expected, there were many cases where the consonants in question were almost always the same across the languages (e.g. m+m+m). Also as we expected, the most interesting areas were where one or more languages have different phonological constraints (e.g. /St/ in German onsets vs /st/ in Dutch and English onsets) or where one or more languages have a phoneme that the other(s) do not (e.g. /pf/ in German, /G/ in Dutch).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Analysis of results
</SectionTitle>
      <Paragraph position="0"> The tables themselves give us a great deal of information, but the whole story can only be gleaned from both sets of data taken together. Let us now consider in detail one small area of the analysis, that covering the consonants /s/, /S/ and /z/. The sounds are obviously related phonologically. /s/, /S/ and /z/ are the only sibilants that occur in all three languages. Figure 1 shows the relevant tables for these sounds starting from English. Just looking at these tables tells us that for /S/ in English, there is just a single metaphoneme worth defining, namely a1 SsS a1 , i.e. English /S/ maps to Dutch /s/ and German /S/. The table for /s/, however, shows us rather more interesting things. For English /s/, Dutch has two clear possibilities, /s/ or /z/, while German has three, /S/, /z/ or /s/. To determine how these are related we need to look at the original correspondence database so that we can see if there are any patterns for the possible correspondences. The relevant entries10 from the first data set for onset only are:  We can see that in German, whereas /st/ appears in coda position (corresponding strongly with /st/ in both Dutch and English), in onset position /St/ appears corresponding with /st/ in Dutch and English.</Paragraph>
      <Paragraph position="1"> Indeed, /s/ followed by a consonant in English and Dutch onsets tends to correspond to /S/ followed by that consonant in German onsets. We could speculate on many possible implications of the clustering of consonants, but in the majority of cases, the absolute correspondences across the languages are so strong that we gain very little by considering phonological context. However, this is clearly a case where phonological context is useful. The tables themselves suggest six possible metaphonemes for English /s/: a1 ssS a1 , a1 ssz a1 , a1 sss a1 , a1 szS a1 , a1 szz a1 and a1 szs a1 . The third of these we can eliminate as it is simply the default case where all languages have the same segment. From the data above, we can see that the metaphoneme a1 ssS a1 is likely to be a very useful one, as it occurs in many onset clusters.</Paragraph>
      <Paragraph position="2"> The data above, however, allow us to say even more. When we look at the distribution of the /s/ and /S/ in German, it is evident that a metaphoneme that specified that English and Dutch both have /s/ in all contexts while German has /s/ in the coda and /S/ in the onset would capture a much wider generalisation, and cover 74 of the 131 English /s/ cases. This then leaves us with the alternations that involve /z/ in Dutch and German. We therefore propose a metaphoneme a1 szz a1 , which is clearly evidenced by the 31 cases of this simple correspondence for onsets above. However, looking more closely again, we can see that this correspondence does not occur at all in the coda, where English /s/ (on its own) corresponds to /s/ in both Dutch and German. This is clearly a result of final consonant devoicing in these two languages, and can be captured by making the metaphoneme defined above phonologically conditioned. Thus, English /s/ corresponds to /z/ in the other two languages in the onset, and to /s/ in the coda.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Implications of the results
</SectionTitle>
    <Paragraph position="0"> The intended application of metaphonemes is hierarchically organised multilingual lexicons that permit the sharing of information at all levels (Cahill and Gazdar, 1999), potentially useable for speech recognition or synthesis. The use of metaphonemes allows us to greatly increase the amount of sharing of phonological information across related languages in such a multilingual lexicon. As Tiberius and Cahill (2000) described, using metaphonemes for the vowels alone increased the amount of phonological definitions that could be shared by around 25%. While the use of consonant metaphonemes does not lead to such significant increases in sharing, we estimate that the combined figure rises to around 40%.</Paragraph>
    <Paragraph position="1"> Introducing metaphonemes may also be beneficial with respect to the robustness of NLP systems. Knowledge about cross-linguistic commonalities can help to provide grounds for making 'intelligent guesses' when lexical items for a particular language are not present. For example, consider the lexical entry for English 'plough'. We hypothesise a metaphoneme a1 pppf a1 (/p/ in English and Dutch, /pf/ in German) as well as a1 aUu:u: a1 (/u:/ in Dutch and German, /aU/ in English) and a1 0xg a1 (/0/ in English, /x/ in Dutch and /g/ in German)11. If we know that the English word 'plough' has the form /plaU/ and that the corresponding Dutch word 'ploeg' has the form /plu:x/, we may predict that the German form would be /pflu:g/. In fact, the German 'Pflug' has the form /pflu:k/, due to the pervasive final consonant devoicing. Thus we can see that in such a case, metaphonemes may help us to predict a form, although the result will not necessarily be fully correct. This example also illustrates the usefulness of phonological conditioning, as we would surely want ultimately to define all consonant correspondences in German and Dutch to take account of the final consonant devoicing process.</Paragraph>
    <Paragraph position="2"> Another potential use for metaphonemes is in the field of second language learning, where the typical errors made by learners of a language may be determined by unconscious use of corresponding sounds from their own language.</Paragraph>
    <Paragraph position="3"> As well as giving a good indication of possible candidate metaphonemes, the analysis we performed also gave us other information about the three languages which is potentially of interest to historical linguists. The analysis we did involved matching the corresponding segments in forms which are originally from identical roots.</Paragraph>
    <Paragraph position="4"> Thus we might expect that the data can give us clues about how the languages have changed and diverged. For example, a zero in a possible consonant position in one language suggests that that language has lost a segment where (at least one of) the other languages still have one. Looking at which segments are found in such positions gives us a clue as to which segments are most likely to be lost in language change (at least in these languages). In11All of these metaphonemes are predictable from the full correspondence tables.</Paragraph>
    <Paragraph position="5"> deed, it transpires that the highest ranked segments in these positions are, as one would expect, mostly approximants, liquids and glides (/r/, /w/, /l/ etc.).</Paragraph>
    <Paragraph position="6"> Also interesting is that of the stop consonants, the most likely to be lost in all three languages are the velar consonants /k/ and /g/. Another interesting result from this examination is that Dutch is apparently less likely to have zeros than German, while English is much more likely to have zeros than either of the other two languages. (41 for Dutch compared to 147 in German and 268 in English).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML