File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/h92-1058_evalu.xml

Size: 4,796 bytes

Last Modified: 2025-10-06 14:00:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1058">
  <Title>PHONETIC CLASSIFICATION ON WIDE-BAND AND TELEPHONE QUALITY SPEECH</Title>
  <Section position="7" start_page="292" end_page="294" type="evalu">
    <SectionTitle>
6. RESULTS
</SectionTitle>
    <Paragraph position="0"> The eigenvectors are ordered according to the amount of variance that they account for in the original feature space; we can therefore draw a plot of the percentage of the total variance the pnncipal components account for of the original data as the number of pnncipal components increases.</Paragraph>
    <Paragraph position="1"> Figure 2 displays the number of pnncipal components in the system vs. the percentage of the total variance that is accounted for by those principal components. In N-TIMIT  information in the spectrum above 3400 Hz is small (due to the bandpass characteristics of the telephone network) and so the variance of the features that represent this information is small. Consequently fewer principal components are needed to account for the variability of these features. This can be seen in Figure 2, where the N-TIMIT curve is higher than the TIMIT curve. A larger percentage of the variance is accounted for in N-TIMIT than in TIMIT for the same number of eigenvectors.</Paragraph>
    <Paragraph position="2"> Figure 3 is a plot of TIMIT error rate and N-TIMIT error rate on the cross-validation set. It is interesting to note that after the top 10 principal components have been used, the mean value of the ratio of N-TIMIT error rate to TIMIT error rate is 1.3, with a standard deviation of only 0.019.</Paragraph>
    <Paragraph position="3"> The error rate with 10 principal components is 39.6% and 48.1% for TIMIT and N-TIMIT respectively and goes down to a minimum of 25.8% and 34.1% for TIMIT and N-TIMIT respectively on the cross-validation set. The number of principal components discovered to give the best classification performance on the cross-validation set was 58 for the TIMIT classifier and 65 for the N-TIMIT classifier. The improvements in classification accuracy, however are very small after approximately 35 principal components have been included.</Paragraph>
    <Paragraph position="4">  error rates for TIMIT and N-TIMIT classifiers on the cross-validation set.</Paragraph>
    <Paragraph position="5"> The two procedures for ranking the principal components were compared. The first procedure ranked the principal components according to the variance they accounted for; the second ranked them according to their discriminative power. No difference in classification accuracy was found between these two procedures. This finding concurs with Brown\[8\]; The performance of his system when a large number of principal components was used was the same as when he used discriminative analysis.</Paragraph>
    <Paragraph position="6"> The first-choice accuracies of the TIMIT and N-TIMIT classifiers on the test set are 74.8% and 66.5% respectively. Error rates of the two classifiers on the test set appear in Table 1. As on the cross-validation set, the phonetic classification error rate on the test set is also increased by a factor of 1.3 by the telephone network. In order to determine whether TIM1T and N-TIMIT classification accuracy differ significantly, a McNemar test of symmetry was conducted.</Paragraph>
    <Paragraph position="7"> The results of this analysis revealed significant differences between TIMIT and N-TIMIT classifier performance (p &lt; 0.01).</Paragraph>
    <Paragraph position="8">  A McNemar test of symmetry was also conducted separately on each of the 39 phonemes to determine which phonemes accounted for the significant differences. The results of this analysis revealed a significant effect of database on 13 of the 39 phonemes (p &lt; 0.01). These phonemes are shown in Table 2. The percentage of N-TIMIT phonemes  (p &lt; 0.01) between TIMIT and N-TIMIT.</Paragraph>
    <Paragraph position="9"> correctly classified were subtracted from the percentage of TIMIT phonemes correctly classified. Results are presented in decreasing order. For example, the accuracy on the pho- null neme,/f/, is 29% higher on TIMIT than on N-TIMIT. A large number of these errors are predictable based on the acoustic characteristics of the segments and their sensitivity ,to band-passing or noise. A spectrogram of the same TIMIT J and N-TIMIT utterance is shown in Figure 1. This utterance was chosen because it highlights several of the phonemes that are classified significantly differently in TIMIT and N-TIMIT. Many of the classification errors are explainable from the spectrogram. The ffication for/s/, for example, is a visible and salient cue in the TIMIT utterance, but is nearly non-existent in the telephone quality N-TIMIT version.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML