File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/p00-1029_intro.xml
Size: 3,636 bytes
Last Modified: 2025-10-06 14:00:53
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1029"> <Title>Inducing Probabilistic Syllable Classes Using Multivariate Clustering</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper we present an approach to unsupervised learning and automatic detection of syllable structure. The primary goal of the paper is to demonstrate the application of EM-based clustering to multivariate data.</Paragraph> <Paragraph position="1"> The suitability of this approach is exemplied by the induction of 3- and 5-dimensional probabilistic syllable classes. A secondary goal is to outline a novel approach to the conversion of graphemes to phonemes (g2p) which uses a context-free grammar (cfg) to generate all sequences of phonemes corresponding to a given orthographic input word and then ranks the hypotheses according to the probabilistic information coded in the syllable classes.</Paragraph> <Paragraph position="2"> Our approach builds on two resources. The rst resource is a cfg for g2p conversion that was constructed manually by a linguistic expert (Muller, 2000). The grammar describes howwords are composed of syllables and how syllables consist of parts that are conventionally called onset, nucleus and coda, which in turn are composed of phonemes, and corresponding graphemes. The second resource consists of a multivariate clustering algorithm that is used to reveal syllable structure hidden in unannotated training data. In a rst step, we collect syllables by going through a large text corpus, looking up the words and their syllabications in a pronunciation dictionary and counting the occurrence frequencies of the syllable types. Probabilistic syllable classes are then computed by applying maximum likelihood estimation from incomplete data via the EM algorithm. Two-dimensional EM-based clustering has been applied to tasks in syntax (Rooth et al., 1999), but so far this approach has not been used to derive models of higher dimensionality and, to the best of our knowledge, this is the rst time that it is being applied to speech. Accordingly, we have trained 3- and 5-dimensional models for English and German syllable structure.</Paragraph> <Paragraph position="3"> The obtained models of syllable structure were evaluated in three ways. Firstly, the 3-dimensional models were subjected to a pseudo-disambiguation task, the result of which shows that the onset is the most variable part of the syllable. Secondly, the resulting syllable classes were qualitatively evaluated from a phonological and phonotactic point of view. Thirdly, a 5-dimensional syllable model for German was tested in a g2p conversion task. The results compare well with the best currently available data-driven approaches to g2p conversion (e.g., (Damper et al., 1999)) and suggest that syllable strucclass 0 0.212</Paragraph> <Paragraph position="5"/> <Paragraph position="7"> ture represents valuable information for pronunciation systems. Such systems are critical components in text-to-speech (TTS) conversion systems, and they are also increasingly used to generate pronunciation variants in automatic speech recognition.</Paragraph> <Paragraph position="8"> The rest of the paper is organized as follows. In Section 2 we introduce the multi-variate clustering algorithm. In Section 3 we present four experiments based on 3- and 5-dimensional data for German and English.</Paragraph> <Paragraph position="9"> Section 4 is dedicated to evaluation and in Section 5 we discuss our results.</Paragraph> </Section> class="xml-element"></Paper>