XML Viewer - n03-2031

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-2031_evalu.xml
Size: 3,964 bytes
Last Modified: 2025-10-06 13:58:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2031">
  <Title>Auditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Experiments &amp; Results
</SectionTitle>
    <Paragraph position="0"> In the following experiments the TIMIT database was used. The TIMIT corpus contains broadband recordings of a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States, each reading 10 phonetically rich sentences. To simulate a noisy environment, car noise was added artificially to the clean speech. Throughout all experiments the HTK-based speech recognition platform system described in (Cambridge University Speech Group, 1997) has been used. The toolkit was designed to support continuous-density HMMs with any numbers of state and mixture components.</Paragraph>
    <Paragraph position="1"> In order to evaluate the use of the proposed features for ASR in noisy car environments, we repeated the same experiments performed in our previous study (Tolba et al., 2002) using the subsets dr1 &amp; dr2 of a noisy version of the TIMIT database at different values of SNR which varies from 16 dB to -4 dB. In all our experiments, 12 MFCCs were calculated on a 30-msec Hamming window advanced by 10 msec each frame. Moreover, the normalized log energy is also found, which is added to the 12 MFCCs to form a 13-dimensional (static) vector. This static vector is then expanded to produce a 26-dimensional (static+dynamic) vector. This latter was expanded by adding the seven acoustic distinctive cues that were computed based on the Caelen model analysis.</Paragraph>
    <Paragraph position="2"> This was followed by the computation of the main spectral peak magnitudes, which were added to the MFCCs and the acoustic cues to form a 37-dimensional vector  MFCCEDP- and MFCCEDEP-based HTK ASR systems to the baseline HTK using (a) 2-mixture, (b) 4-mixture and (c) 8-mixture triphone models and the dr1 &amp; dr2 subsets of the TIMIT database when contaminated by additive car noise for different values of SNR.</Paragraph>
    <Paragraph position="3"> upon which the hidden Markov models (HMMs), that model the speech subword units, were trained. The main spectral peak magnitudes were computed based on an LPC analysis using 12 poles followed by a peak picking algorithm. The proposed system used for the recognition task uses tri-phone Gaussian mixture HMM system. Three different sets of experiments has been carried out on the noisy version of the TIMIT database. In the first set of these experiments, we tested our recognizer using a 30-dimensional feature vector (MFCCEDP), in which we combined the magnitudes of the main spectral peaks to the classical MFCCs and their first derivatives to form two streams that have been used to perform the recognition process. We found through experiments that the use of these two streams leads to an improvement in the accuracy of the word recognition rate compared to the one obtained when we used the classical MFCCEDA feature vector, Table 1. These tests were repeated using the 2stream feature vector, in which we combined the acoustic distinctive cues to the classical MFCCs and their first derivatives to form two streams (MFCCEDE). Again, using these two streams, an improvement in the accuracy of the word recognition rate has been obtained when we tested our recognizer using N mixture Gaussian HMMs using triphone models for different values of SNR, Table 1. We repeated these tests using the proposed features which combines the MFCCs with the acoustic distinctive cues and the formant frequencies to form a three-stream feature vector (MFCCEDEP). Again, using these combined features, an improvement in the accuracy of the word recognition rate was obtained, Table 1.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML