File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/h92-1035_evalu.xml

Size: 3,277 bytes

Last Modified: 2025-10-06 14:00:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1035">
  <Title>Improving State-of-the-Art Continuous Speech Recognition Systems Using the N-Best Paradigm with Neural Networks</Title>
  <Section position="8" start_page="182" end_page="182" type="evalu">
    <SectionTitle>
EXPERIMENTAL CONDITIONS AND
RESULTS
</SectionTitle>
    <Paragraph position="0"> Experiments to test the performance of the hybrid SNN/HMM system were performed on the Speaker Independent (SI) portion of the DARPA 1000-word Resource Management speech corpus, using the standard word-pair grammar (perplexity 60). The training set consisted of utterances from 109 speakers, 2830 utterances from male speakers and 1160 utterances from female speakers, and the February '89 test set was used for development of the system. The October '89 test set was used for the final independent test.</Paragraph>
    <Paragraph position="1"> In our initial experiments, we used the February '89 development set. Table 1 shows the word error rates when we rescored the N=20 N-best lists at the various stages of development of the SNN. It should be noted that the figures do not reflect the unaided performance of the SNN in recognition, since the N-best list was generated by a HMM system, but instead illustrate the effectiveness of the respective improvements. null The original l-layer SNN was trained using the 1-best training algorithm and the MSE criterion; it gave an error rate of 13.7%. The incorporation of the duration term and the adoption of the log-error training criterion both resulted in some improvement, bringing the error rate to 11.6%.</Paragraph>
    <Paragraph position="2"> When we used the N-best training (which used the SNN produced by the 1-best training as an initial estimate), the error rate dropped to 9.0%, confirming our belief that the N-best training is more effective than the 1-best training in the N-best rescoring paradigm. This final condition was then used to generate the SNN score to examine the behavior of the hybrid SNN/HMM system.</Paragraph>
    <Paragraph position="3"> Table 2 shows the results of combining the HMM and SNN scores in the re-ordering of the N-Best list. Taking the top answer of the N-best list (as produced by the HMM system) gave an error rate of 3.5% on the February '89 development test set. Upon re-ordering the N=20 list on the basis of the SNN score alone, the error rate was 9.0%. However, upon combining the HMM and SNN scores, the error rate decreased over that of the HMM alone. The error rate decreased as the value of N used in the N-best list was increased. For N=2, the error decreased to 3.3%, then to 2.9% for N=4, and finally to 2.8% for N=20.</Paragraph>
    <Paragraph position="4"> Based upon the results of the February'89 development set, we rescored the 20-best lists generated from the October '89 with the hybrid system. This independent test yielded an even larger improvement, reducing the error rate from 3.8% in the HMM-based system to 3.0% in the SNN/HMM system. This represents a 20% reduction in error rate.</Paragraph>
    <Paragraph position="5"> Given that the HMM system used in our experiments represented the state of the art in CSR, the hybrid SNN/HMM system has now established a new state of the art.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML