File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/89/h89-1016_concl.xml
Size: 3,216 bytes
Last Modified: 2025-10-06 13:56:21
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-1016"> <Title>RECENT PROGRESS IN THE SPHINX SPEECH RECOGNITION SYSTEM</Title> <Section position="10" start_page="128" end_page="129" type="concl"> <SectionTitle> 10. Results </SectionTitle> <Paragraph position="0"> The SPHINX System was tested on 150 sentences from 15 speakers. These sentences were the official DARPA test data for evaluations in March and October 1987. The word accuracies for various versions of SPHINX with the word-pair grammar (perplexity 60) and the null grammar (perplexity 991) are shown in Table 1.</Paragraph> <Paragraph position="1"> Word accuracy is defined as the percent of words correct minus the percent of insertions.</Paragraph> <Paragraph position="2"> The first improvement was obtained by adding additonal feature sets and codebooks. Next, we found duration modeling to be helpful when no grammar was used. Modeling function words and generalized triphones both led to substantial improvements. We also found that generalized triphones outperformed triphones, while saving 60% memory*. The improvements from function-phrase dependent modeling encouraged us to implement between-word triphone models. This led to substantial improvements with no increase in the number of models. Finally, we showed the effectiveness of our extension of the corrective training algorithm to speaker-independent continuous speech.</Paragraph> <Paragraph position="3"> Since the above experiments were repeatedly run on the same set of test data, it is important to verify that SPHINX is capable of achieving comparable levels of performance on new test data. Recently, SPHINX was evaluated on two new sets of test data (June 1988 evaluation and February 1989 evaluation). With no grammar, recognition accuracies of 78.1% and 76.4% were obtained on these two test sets. With the word-pair grammar, the accuracies were 95.7% and 93.9%. *More detailed descriptions and results on contextual modeling can be found in \[2\] or \[3\].</Paragraph> <Paragraph position="4"> II. Conclusion This paper has presented an up-to-date description of the SPHINX Speech Recognition System. We have described a number of recent improvements, including function-phrase modeling, between-word coarticulation modeling, and corrective and reinforcement training. Through these techniques we demonstrated that accurate large-vocabulary speaker-independent continuous speech recognition is feasible. We report recognition accuracies of 82% and 96% with grammars of perplexity 997 and 60. The results degraded somewhat on new test data, but remain highly accurate. These results were made possible by three important factors: (I) ample training data, (2) a powerful learning paradigm, and (3) knowledge-guided detailed models.</Paragraph> <Paragraph position="5"> Encouraged by these results, we will continue in the current SPHINX framework, and direct our future efforts to improving each of these three areas. We feel that work in each of the three directions will lead to substantial progress, and hope that our future work will contribute to the next generation of accurate, robust, and versatile speech recognition systems.</Paragraph> </Section> class="xml-element"></Paper>