File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-2034_intro.xml

Size: 3,006 bytes

Last Modified: 2025-10-06 14:04:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2034">
  <Title>Speaker Adaptation Using Multiple Reference Speakers</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> We have, in the past, reported our work in speaker adaptation for large vocabulary continuous speech recognition using a probabilistic spectral mapping \[5\]. In that work we transformed well-trained phonetic hidden Markov models of a single reference speaker so that they were appropriate for a new (target) speaker. This method reduced the recognition error rate by about a factor of five relative to a cross-speaker model (trained on one speaker, tested on another). However, the resulting error rate was still 2 to 3 times that obtained with a speaker-dependent model for the target speakers.</Paragraph>
    <Paragraph position="1"> In recent years several researchers have demonstrated speaker-independent recognition using essentially the same recognition algorithms used for speaker-dependent recognition, but with a model derived by simply pooling the training speech of over 100 speakers as if it all were produced by one speaker. For these systems, the error rate is again 2 to 3 times that of speaker-dependent models. This shows that there is value in simple pooling of data from many speakers. The logical extension of these two results would be to use the pooled speaker-independent model as a reference model for speaker adaptation. However, we know that pooled training yields a model that has very broad (less dis-&amp;quot; criminating) distributions compared to those produced by speaker-dependent training. Since the adaptation procedures that we have investigated also smooth the original model, we expect that a straightforward application of them to a pooled speaker-independent model will fail to yield improvements due to excessive smoothing.</Paragraph>
    <Paragraph position="2"> The approach we propose here consists of three steps: 1) To reduce the smearing of the model distributions, we estimate and apply a deterministic spectral transformation to each reference speaker so that their speech parameters lie in a single common space.</Paragraph>
    <Paragraph position="3"> 2) We then treat all the transformed speech as if it came from one speaker for training the reference HMM.</Paragraph>
    <Paragraph position="4"> 3) Finally, we estimate and apply our usual probabilistic spectrum transformation to the pooled reference HMM to model a new target speaker.</Paragraph>
    <Paragraph position="5"> In the next section, we describe our basic speaker-adaptation system in terms of its two primary speakertransformation strategies; speech normalization and PDF mapping. Section 3 contains experimental results which establish our current performance for a single reference speaker system and introduce preliminary evidence in support of our proposal for using multiple reference speakers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML