File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2031_intro.xml
Size: 3,562 bytes
Last Modified: 2025-10-06 14:01:42
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2031"> <Title>Auditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In general, the performance of existing speech recognition systems, whose designs are predicated on relatively noise-free conditions, degrades rapidly in the presence of a high level of adverse conditions. However, a recognizer can provide good performance even in very noisy background conditions if the exact testing condition is used to provide the training material from which the reference patterns of the vocabulary are obtained, which is practically not always the case. In order to cope with the adverse conditions, different approaches could be used. The approaches that have been studied for achieving noise robustness can be summarized into two fundamentally different approaches. The first approach attempts to preprocess the corrupted speech input signal prior to the pattern matching in an attempt to enhance the SNR. The second approach attempts to modify the pattern matching itself in order to account for the effects of noise. For more details see (O'Shaughnessy, 2000).</Paragraph> <Paragraph position="1"> In a previous work, we introduced an auditory-based multi-stream paradigm for ASR (Tolba et al., 2002).</Paragraph> <Paragraph position="2"> Within this multi-stream paradigm, we merge different sources of information about the speech signal that could be lost when using only the MFCCs to recognize uttered speech. Our experiments showed that the use of some auditory-based features and formant cues via a multi-stream paradigm approach leads to an improvement of the recognition performance. This proves that the MFCCs loose some information relevant to the recognition process despite the popularity of such coefficients in all current ASR systems. In our experiments, we used a 3stream feature vector. The First stream vector consists of the classical MFCCs and their first derivatives, whereas the second stream vector consists of acoustic cues derived from hearing phenomena studies. Finally, the magnitudes of the main resonances of the spectrum of the speech signal were used as the elements of the third stream vector.</Paragraph> <Paragraph position="3"> In this paper, we extend our work presented in (Tolba et al., 2002) to evaluate the robustness of the proposed features (the acoustic distinctive cues and the spectral cues) using a multi- stream paradigm for ASR in noisy car environments. As mentioned above, the first stream consists of the MFCCs and their first derivatives, whereas the second stream vector consists of the acoustic cues are computed from an auditory-based analysis applied to the speech signal modeled using the Caelen Model (Caelen, 1985). Finally, the values of the main peaks of the spectrum of the speech signal were used as the elements of the third stream vector. The magnitudes of the main peaks were obtained through an LPC analysis.</Paragraph> <Paragraph position="4"> The outline of this paper is as follows. In section 2, an overview on the auditory Caelen Model is given. Next, we describe briefly in section 3 the statistical framework of the multi-stream paradigm. Then in section 4, we proceed with the evaluation of the proposed approach for ASR. Finally, in section 5 we conclude and discuss our results.</Paragraph> </Section> class="xml-element"></Paper>