File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1056_intro.xml

Size: 2,525 bytes

Last Modified: 2025-10-06 14:05:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1056">
  <Title>REDUCED CHANNEL DEPENDENCE FOR SPEECH RECOGNITION</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> A number of techniques have been developed to compensate for the effects that varying microphone and channels have on the acoustic signal. Erell and Weintraub \[4, 5\] have used additive corrections in the filter-bank log energy or cepstral domains based on equalizing the long-tenn average of the observed filter-bank log energy or cepstral vector to that of the training data. The techniques developed by Rose and Paul \[6\] and Acero \[7\] used an iterative technique for estimating the cepstral bias vector that will maximize the likelihood of the input utterance. Nadas et el. \[8\] used an adaptive linear transformation applied to the input representation, where the adaptation uses the VQ distortion vector with respect to a predefined codebook. VanCompemolle \[10\] scaled the filter-bank log energies to a specified range using running histograms, and Rohlicek \[9\] experimented with a number of histogram-based compensation metrics based on equalizing different aspects of the probability distribution. null One important limitation of the above approaches is that they rely on a speech/nonspeech detector. Each of the above approaches computes spectral properties of the input speech sentence and subsequently compensates for the statistical differences with certain properties of the training data. If the input acoustical signal is not segmented by sentence (e.g. open microphone with no push-to-talk button) and there are long periods of silence, the above approaches would not be able to operate without some type of reliable automatic speech-input/sentence-detection mechanism. An automatic sentence-detection mechanism would have considerable difficulty in reliably computing the average speech spectrum if there were many other nonspeech sounds m the environment.</Paragraph>
    <Paragraph position="1"> A second class of techniques developed around auditory models (Lyon \[11\]; Cohen \[12\]; Seneff \[13\]; Ghitza \[14\]). These techniques use various automatic gain control and other auditory-type modeling techniques to output a spectral vector that has been adapted based on the acoustic history. A potential limitation of this approach is that many of these techniques are very computationally intensive.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML