File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1009_intro.xml
Size: 7,082 bytes
Last Modified: 2025-10-06 14:06:08
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1009"> <Title>Name pronunciation in German text-to-speech synthesis</Title> <Section position="2" start_page="0" end_page="50" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The correct pronunciation of names is one of the biggest challenges for text-to-speech (TTS) conversion systems. At the same time, many current or envisioned applications, such as reverse directory systems, automated operator services, catalog ordering or navigation systems, to name just a few, crucially depend upon an accurate and intelligible pronunciation of names. Besides these specific applications, any kind of well-formed text input to a general-purpose TTS system is extremely likely to contain names, and the system has to be well equipped to process these names. This requirement was the main motivation to develop a name analysis and pronunciation component for the German version of the Bell Labs multilingual text-to-speech system (GerTTS) (M6bius et al., 1996).</Paragraph> <Paragraph position="1"> Names are conventionally categorized into personal names (first and surnames), geographical names (place, city and street names), and brand names (organization, company and product names).</Paragraph> <Paragraph position="2"> In this paper, we concentrate on street names because they encompass interesting aspects of geographical as well as of personal names. Linguistic descriptions and criteria as well as statistical considerations, in the sense of frequency distributions derived from a large database, were used in the construction of the name analysis component. The system was implemented in the framework of finite-state transducer (FST) technology (see (Sproat, 1992) for a discussion focussing on morphology). For evaluation purposes, we compared the performances of the generM-purpose text analysis and the name-specific systems on training and test materials.</Paragraph> <Paragraph position="3"> As of now, we have neither attempted to determine the etymological or ethnic origin of names, nor have we addressed the problem of detecting names in arbitrary text. However, due to the integration of the name component into the general text analysis system of GerTTS, the latter problem has a reasonable solution.</Paragraph> <Paragraph position="4"> 2 Some problems in name analysis What makes name pronunciation difficult, or special, in comparison to words that are considered as regular entries in the lexicon of a given language? Various reasons are given in the research literature (Carlson, GranstrSm, and LindstrSm, 1989; Macchi and Spiegel, 1990; Vitale, 1991; van Coile, Leys, and Mortier, 1992; Coker, Church, and Liberman, 1990; Belhoula, 1993): * Names can be of very diverse etymological origin and can surface in another language without undergoing the slow linguistic process of assimilation to the phonological system of the new language.</Paragraph> <Paragraph position="5"> * The number of distinct names tends to be very large: For English, a typical unabridged collegiate dictionary lists about 250,000 word types, whereas a list of surnames compiled from an address database contains 1.5 million types (72 million tokens) (Coker, Church, and Liberman, 1990). It is reasonable to assume similar ratios for German, although no precise numbers are currently available.</Paragraph> <Paragraph position="6"> * There is no exhaustive list of names; and in German and some related Germanic languages, street names in particular are usually constructed like compounds (Rheins~ra~e, Kennedyallee) which makes decomposition both practical and necessary.</Paragraph> <Paragraph position="7"> * Name pronunciation is known to be idiosyncratic; there are many pronunciations contradicting common phonological patterns, as well as alternative pronunciations for certain grapheme strings.</Paragraph> <Paragraph position="8"> * In many languages, general-purpose grapheme-to-phoneme rules are to a significant extent inappropriate for names (Macchi and Spiegel, 1990; Vitale, 1991).</Paragraph> <Paragraph position="9"> * Names are not equally amenable to morphological processes, such as word formation and derivation or to morphological decomposition, as regular words are. That does not render such an approach unfeasible, though, as we show in this paper.</Paragraph> <Paragraph position="10"> * The large number of different names together with a restricted morphological structure leads to a coverage problem: It is known that a relatively small number of high-frequency words can cover a high percentage of word tokens in arbitrary text; the ratio is far less favorable for names (Carlson, GranstrSm, and LindstrSm, 1989; van Coile, Leys, and Mortier, 1992).</Paragraph> <Paragraph position="11"> We will now illustrate some of the idiosyncracies and peculiarities of names that the analysis has to cope with. Let us first consider morphological issues. Some German street names can be morphologically and lexically analyzed, such as Kurfiivst-en-damm ('electorial prince dam'), Kirche-nweg ('church path'). Many, however, are not decomposable, such as Henmerich ('?') or Rimparstra~e ('?Rimpar street'), at least not beyond obvious and unproblematic components (Stra~e, Weg, Platz, etc.).</Paragraph> <Paragraph position="12"> Even more serious problems arise on the phonological level. As indicated above, general-purpose pronunciation rules often do not apply to names.</Paragraph> <Paragraph position="13"> For instance, the grapheme <e> in an open stressed syllable is usually pronouned \[e:\]; however, in many first names (Stefan, Melanie) it is pronounced \[e\]. Or consider the word-final grapheme string <ie> in Batterie \[bat~r'i:\] 'battery', Materie \[mat'e:ri~\] 'matter', and the name Rosemarie \[r'o:zomari:\]. And word-final <us>: Mus \[m'u:s\] 'mush, jam' vs. Erasmus \[er'asmus\]. A more special and yet typical example: In regular German words the morphemeinitial substring <chem> as in chemisch is pronounced \[SSe:m\], whereas in the name of the city Chemnilz it is pronounced \[kcm\].</Paragraph> <Paragraph position="14"> Generally speaking, nothing ensures correct pronunciation better than a direct hit in a pronunciation dictionary. However, for the reasons detailed above this approach is not feasible for names. In short, we are not dealing with a memory or storage problem but with the requirement to be able to approximately correctly analyze unseen orthographic strings. We therefore decided to use a weighted finite-state transducer machinery, which is the technological framework for the text analysis components of the Bell Labs multilingual TTS system. FST technology enables the dynamic combination and recombination of lexical and morphological substrings, which cannot be achieved by a static pronunciation dictionary. We will now describe the procedure of collecting lexically or morphologically meaningful graphemic substrings that are used productively in name formation.</Paragraph> </Section> class="xml-element"></Paper>