File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2171_metho.xml

Size: 11,415 bytes

Last Modified: 2025-10-06 14:07:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2171">
  <Title>Incorporating Metaphonemes in a Multilingual Lexicon</Title>
  <Section position="3" start_page="1126" end_page="1127" type="metho">
    <SectionTitle>
2 A Metaphoneme Inventory
</SectionTitle>
    <Paragraph position="0"> In this section we describe how a phoneme inventory can be defined for a group of languages in which language-specific phonemes flmction its &amp;quot;allophones&amp;quot; of newly defined metaphonemes. We will restrict ourselves to the vowel phonemes of l)utch, English, and Gerlnan. If we know, for example, that words which are realised with an /{I in English are usually realised with an/A/in I)utch, and an/a/in German (as in hand/h{nd/ versus/hAnt/ w:rsus/hant/, cal/k{t/versus/kArl versus/kats(@/, elc.), we might be able to generalise over these three hmguage-specific phonemes and introduce a metaphoneme, e.g. I{Aa\], which captures this generalisation. null To give an impression of the distribution of the different vowel phonemes across l)utch, English, and German, their vowel charts (K6nig and van der Auwera 1994; Wells 1989) were merged into one big vowel chart containing all the vowel phonemes of these three hmguages. 4, The resulting char |is given iu tigure 15:  German This figure shows which vowel phonemes are reatised in which language (e.g./{/occurs in English, but not in l)utch and German), but it does not tell us  w)wel quality: \[high\], \[back\], and \[round\]. The rounded w~wels are/y,y:,Y,Y,2:,2:,9,O,O,O, O:,o:,o:,u,u:,tr:,U, UI. anything about cross-linguistic phoneme correspondeuces. Knowing that Dutch and German both have a phoneme/o:/, does not mean that they are cross-linguistically non-distinctive.</Paragraph>
    <Paragraph position="1"> qb find cross-linguistic phoneme correspondeuces, we followed O'Connor's (1973) strategy for establishing phonelne conespondences between difl'erent accents, identifying phonemes of one accent with those el' another: &amp;quot;How are we to decide whether to equate phoneme X with phoneme A or with phoneme D? We can do so only on the basis el' the words in which they occur: if X and A both occur in a large number of words common to both accents we link them together as representing the same point on the pattern, if, on the other hand, X shares more words with D than with A, we linkXandD. \[...\] Even so, ifXand D occur in a very similar word-set and X and A do not, then it is much more revealing to equate X and D than X and A.&amp;quot; (O'Connor 1973, p. 186) We extended O'Connor's strategy and applied it to a group of (closely) related hmguages sharing a colnmou word stock - in our case a sllbset of the West Gmmanic languages sharing worcls with a common Germanic origin. We compiled a list of g00 (mono- and disyllabic) Germanic cognates, looked up the transcriptions in the CELEX database (Baayen el al. 1995), and then mapped words containiug a palticular vowel in one hmguage onto its cognates in the other two hmguages to see how this particular vowel was realised in tile other two languages. This process was repeated for all the vowels, for all three languages.</Paragraph>
    <Paragraph position="2"> A few examples of tile results we obtained for English vowels are included below c'.</Paragraph>
    <Paragraph position="3"> As can be seen fl'om these v, tlaere is some variation in the closeness of the correspondences. The vowel set/{/-/A/-/a/, as we anticipated at the outset, does turn out to be a wflid correspondence. The set associated with English/i:/, on the other hand, is less clearcut, as there are several possible corr'The remaining correspondence tables are available at http://www, itri .bton.ac.uk/~Carole.'i~iberius/ mphon, html 7Note that the total number o1' words is not always exactly the same in all lhree hmguages. This is because for some words the con'esponding phonemic transcription was not found.  in meal/mi:l/vs/ma:l/vs/ma:l/and deep/di:p/vs /di:p/vs/ti:ff.</Paragraph>
    <Paragraph position="4"> responding vowel phonemes in the other two languages. If we consider the correspondences from the starting point of one of the other languages, the results are slightly different. For instance, English /A:/ corresponds strongly to Dutch/A/, but Dutch /A/ corresponds ahnost equally to English/(/and /A:/. Further investigation is required to ascertain how many of these cases can be further generalised by recourse to phonological or phonotactic properties of the words in question. Currently the mapping from metaphoneme to (language-specific) phoneme requires reference only to the language. For a more  hand (hand) and hart (hem't).</Paragraph>
    <Paragraph position="5"> sophisticated analysis, phonological and phonotactic information would need to be considered as well. Howcvel; even at the present level of analysis, the metaphoneme principle can be helpful in the multilingual lexical structure proposed, as we now disCUSS. null</Paragraph>
  </Section>
  <Section position="4" start_page="1127" end_page="1129" type="metho">
    <SectionTitle>
3 The multilingual inheritance lexicon
</SectionTitle>
    <Paragraph position="0"> In this section, we will explore the sharing of phonological information in the lexical entries of a multilingual inheritance-based lexicon. We focus on phonology rather than orthography as phonology is nearer to primary language use (i.e. spoken language), it can be used as input for hyphenation rules, spelling correction, and it is essential as the level of symbolic representation for speech synthesis (MUD TILEX 1993).</Paragraph>
    <Paragraph position="1">  We will take the multilingual architecture of PolyLex as our starting point. First, we will describe the PolyLex arclaitecture. Then, we will show how phonological information can be shared in the lexical entries.</Paragraph>
    <Paragraph position="2"> PolyLex detines a multilingual inheritance-based lexicon for l)utch, English and German. It is implemented in DATR, an inheritance-based lexical knowledge representation formalism (Evans and Gazdar 1996). The rationale of inheritance-based lexicons requires information to be pushed as far up the hierarchy as it can go, generalising as much as possible. In a multilingual lexicon, this means that information which is common to several languages is stated at higher points in the hierarchy than that which is unique to just one of the languages. In addition, Polykex makes use of orthogonal multiple inheritance which allows a node in the hierarchy to inherit different kinds of information (e.g. semantics, morphology, phonology, syntax) fi'om different parent nodes. In this papen we are just interested in the phonological hierarchy.</Paragraph>
    <Paragraph position="3"> Polykex assumes a contemporary phonological fralnework in which all lexical entries are detined as having a phonological structure consisting of a sequence of structured syllables, a syllable consisting o1' an onset (the initial consonant cluster, which might be split up into onset 1, onset 2, etc.) and a rhylne. The rhyme consists of a peak (the vowel) and a coda (the final consonant cluster, which might bc split up into coda 1, coda 2, etc.). This structure is defined at the top el' the hierarchy, and applies by default to all words. Only the relevant values for onset, peal&lt;, and coda have to be defined at the individual lexical entries (see Cahill and Gazdar 1!)97). Following PolyLex we will concentrate on a segmental phonelnic representation. An example of the lexical entry gram as it would be represented in PolyLex, is shown in figure 2.</Paragraph>
    <Paragraph position="4"> The multilingual phonological entry for gram, is delined by sharing identical segments occnrring in the majority of the language-specific entries (/gr{m/ -/xrAm/-/gram/). That is, onset 1 is/g/, onset 2 is /1&amp;quot;/, and coda is/m/.</Paragraph>
    <Paragraph position="5"> English and German can inherit all the information fiom the common part except for the value of their peak, which is respectively /{/ and /a/. In Dutch, the value of the peak has to be specified as being/A/, plus we will have to override the wdue for the first onset to get \[xrAm\].</Paragraph>
    <Paragraph position="6"> This example misses the generalisation that the  phonologically non-distinctive. For each lexical catry where English uses/{/, l)utch/A/, and German /a/, the value for peak has to be specitied in the language-specific parts. By using the metaphoneme I{Aal instead, this information needs to be specified only once. The resulting multilingual phonemic representation for gram is given in ligure 3.</Paragraph>
    <Paragraph position="7">  All the information has now been pushed up as far as it can go, capturing as many generalisations as possible. The information that \]{Aa\] results ill an/{/in English, an/A/in Dutch, and an/a/in German is specified only at the top level. The languagespecitic boxes are almost empty, except for the value of the first onset in Dutch. The reason for this is that as yet we have only defined cross-linguistic phoneme correspondences for vowels, not for consonants. We do, howevm, suspect that the Dutch/x/ is phonologically non-distinctive fi'om the German and English /g/. Further research defining cross-linguistic phoneme correspondences for consonants  will have to confirm this.</Paragraph>
    <Paragraph position="8"> It is a fundamental feature of this account that the inherited information is only default information which can be overridden. Thus, it is not required that metaphoneme correspondences are complete and we may choose to use a metaphoneme even if one of the languages uses a different vowel in some words. The definitions can be overridden in exactly the same way as the onset definition in Dutch in the example above. So if we consider the vowel correspondences in table 1, we can see that of the 35 words which have cognates in all three languages, 27 can be defined as having the metaphoneme \[{Aa I in the common lexical entry (those for which both English and l)utch have the corresponding vowels). Five of these will require a separate vowel defined for Gerlnan, while the remainder will need separate vowel definitions for all three languages.</Paragraph>
    <Paragraph position="9"> Given this, we can see that economy of representation can be achieved even in cases where the vowel correspondences are far from conclusive.</Paragraph>
    <Paragraph position="10"> Even if only half or fewer of the Dutch words, for example, have the same vowel in cognates for which the English words have the same vowel, this still means that those half can be defined without the need for the language-specific vowel to be defined.</Paragraph>
    <Paragraph position="11"> Another feature of the metaphoneme principle that differentiates it from the phonemic principle is that there is no requirement for biuniqueness.</Paragraph>
    <Paragraph position="12"> A phoneme in a language can be a realisation of morn than one metaphoneme. This means that we can define a metaphoneme I{Aa\[ as well as another, IA:Aal. Each of these will then be used in different common lexical entries. This can be used as an alternative to phonological/phonotactic conditioning or in addition to it, for just those cases where there is more than one correspondence but no obvious phonologicai/phonotactic conditioning for the decision between phonemes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML