File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1054_metho.xml

Size: 3,140 bytes

Last Modified: 2025-10-06 14:13:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1054">
  <Title>SESSION 8B: ROBUST SPEECH PROCESSING</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SESSION 8B: ROBUST SPEECH PROCESSING
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> Four papers are briefly reviewed.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. The Papers
</SectionTitle>
    <Paragraph position="0"> This session consists of two types of papers. The first two, &amp;quot;Multiple approaches to robust speech recognition&amp;quot; and &amp;quot;Reduced channel dependence for speech recognition&amp;quot; present computational methods for minimizing the acoustic and speaker differences in particular recognizers. The third paper, &amp;quot; Experimental results for base-line speech recognition performance ...&amp;quot; presents preliminary experiments in using an array of microphones for acoustic focusing, while the last, Phonetic classification on wide-band and telephone quality speech&amp;quot;, presents a baseline phonetic recognition result for telephone TIMIT.</Paragraph>
    <Paragraph position="1"> In the first paper, the Carnegie Mellon gang define several algorithms for jointly compensating for noise and linear filtering in incoming data. Codeword Dependent Cepstral Normalization was found to be advantageous when training with one microphone and testing with another. It was also helpful when used with data from a microphone array. Results were less clear when the algorithm was applied to an auditory front end, but work is continuing.</Paragraph>
    <Paragraph position="2"> The SRI paper introduced a long-term filtering algorithm to adjust for acoustic differences between training and test. The best results were found using highpass filtering on channel energies in conjunction with simple noise removal. It was interesting to note that, even after these algorithms, simultaneous recordings through different microphones were quite different.</Paragraph>
    <Paragraph position="3"> The Brown paper reports early results on a microphone beam-steering array. They report a series of interesting problems, some solved (microphone mounting), and some not (ceiling reflections). The search for an effective array continues.</Paragraph>
    <Paragraph position="4"> Finally, the NYNEX paper reports on comparative phonetic recognition of TIMIT vs NTIMIT. The telephone version of TIMIT appears to induce 1.3 times as many errors as TIMIT, with a frequency distribution of errors which is expected from the inherent power of the underlying phonemes. This work is offered as a benchmark against which to measure future systems.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="273" type="metho">
    <SectionTitle>
2. Discussion
</SectionTitle>
    <Paragraph position="0"> Discussion was congenial and to the point. More work in this area will appear in future meetings.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML