XML Viewer - j00-2003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/j00-2003_abstr.xml
Size: 6,548 bytes
Last Modified: 2025-10-06 13:41:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="J00-2003">
  <Title>A Multistrategy Approach to Improving Pronunciation by Analogy</Title>
  <Section position="2" start_page="0" end_page="196" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Text-to-phoneme conversion is a problem of some practical importance. Possibly the major application is speech synthesis from text, where we need to convert the text input (i.e., letter string) to something much closer to a representation of the corresponding sound sequence (e.g., phoneme string). A further important application is speech recognition, where we may wish to add a new word (specified by its spelling) to the vocabulary of a recognition system. This requires that the system has some idea of the &amp;quot;ideal&amp;quot; pronunciation--or phonemic baseform (Lucassen and Mercer 1984)--of the word. Also, in recognition we have a requirement to perform the inverse mapping, i.e., for conversion from phonemes to text. Perhaps the techniques employed for the forward mapping can also be applied &amp;quot;in reverse&amp;quot; for phoneme-to-text conversion.</Paragraph>
    <Paragraph position="1"> Yet another reason for being interested in the problem of automatic phonemization is * Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK. E-maih yrn@ecs.soton.ac.uk t Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK. E-mail: rid@ecs.soton.ac.uk @ 2000 Association for Computational Linguistics Computational Linguistics Volume 26, Number 2 that (literate) humans are able to read aloud, so that systems that can pronounce print serve as models of human cognitive performance.</Paragraph>
    <Paragraph position="2"> Modern text-to-speech (TTS) systems use lookup in a large dictionary or lexicon (we use the terms interchangeably) as the primary strategy to determine the pronunciation of input words. However, it is not possible to list exhaustively all the words of a language, so a secondary or backup strategy is required for the automatic phonemization of words not in the system dictionary. The latter are mostly (but not exclusively) proper names, acronyms, and neologisms. At this stage of our work, we concentrate on English and assume that any such missing words are dictionary-like with respect to their spelling and pronunciation, as will probably be the case for many neologisms.</Paragraph>
    <Paragraph position="3"> Even if the missing words are dictionary-like, automatic determination of pronunciation is a hard problem for languages like English and French (van den Bosch et al.</Paragraph>
    <Paragraph position="4"> 1994). In fact, English is notorious for the lack of regularity in its spelling-to-sound correspondence. That is, it has a deep orthography (Coltheart 1978; Liberman et al. 1980; Sampson 1985) as opposed to the shallow orthography of, for example, Serbo-Croatian (Turvey, Feldman, and Lukatela 1984). To a large extent, this reflects the many complex historical influences on the spelling system (Venezky 1965; Scragg 1975; Carney 1994).</Paragraph>
    <Paragraph position="5"> Indeed, Abercrombie (1981, 209) describes English orthography as &amp;quot;one of the least successful applications of the Roman alphabet.&amp;quot; We use 26 letters in English orthography yet about 45-55 phonemes in specifying pronunciation. It follows that the relation between letters and phonemes cannot be simply one-to-one. For instance, the letter c is pronounced/s/in cider but/k/in cat. On the other hand, the/k/sound of kitten is written with a letter k. Nor is this lack of invariance between letters and phonemes the only problem. There is no strict correspondence between the number of letters and the number of phonemes in English words. Letter combinations (ch, gh, II, ea) frequently act as a functional spelling unit (Coltheart 1984)--or grapheme--signaling a single phoneme. Thus, the combination ough is pronounced /Af/ in enough, while ph is pronounced as the single phoneme/f/in phase. However, ph in uphill is pronounced as two phonemes,/ph/. Usually, there are fewer phonemes than letters but there are exceptions, e.g., (six,/sIks/). Pronunciation can depend upon word class (e.g., convict, subject). English also has noncontiguous markings (Wijk 1966; Venezky 1970) as, for instance, when the letter e is added to (mad,/mad/) to make (made,/meId/), also spelled maid! The final e is not sounded; rather it indicates that the vowel is lengthened or dipthongized. Such markings can be quite complex, or long-range, as when the suffix y is added to photograph or telegraph to yield photography or telegraphy, respectively. As a final comment, although not considered further here, English contains many proper nouns (place names, surnames) that display idiosyncratic pronunciations, and loan words from other languages that conform to a different set of (partial) regularities. These further complicate the problem.</Paragraph>
    <Paragraph position="6"> This paper is concerned with an analogical approach to letter-to-sound conversion and related string rewriting problems. Specifically, we aim to improve the performance of pronunciation by analogy (PbA) by information fusion, an approach to automated reasoning that seeks to utilize multiple sources of information in reaching a decision-in this case, a decision about the pronunciation of a word. The remainder of this paper is organized as follows: In the next section, we contrast traditional rule-based and more modern data-driven approaches (e.g., analogical reasoning) to language processing tasks, such as text-to-phoneme conversion. In Section 3, we describe the original (PRONOUNCE) PbA system of Dedina and Nusbaum (1986) in some detail as this forms the basis for the later work. Section 4 reviews our own work in this area.</Paragraph>
    <Paragraph position="7"> Next, in Section 5, we make some motivating remarks about information fusion and its use in computational linguistics in general. In Section 6, we present in some detail the  Marchand and Damper Improving Pronunciation by Analogy multistrategy (or fusion) approach to PbA that enables us to obtain clear performance improvements, as described in Section 7. Finally, conclusions are drawn and directions for future studies are proposed in Section 8.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML