File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1077_intro.xml
Size: 3,538 bytes
Last Modified: 2025-10-06 14:05:47
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1077"> <Title>A LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION ALGORITHM AND ITS APPLICATION TO A MULTI-MODAL TELEPHONE DIRECTORY ASSISTANCE SYSTEM</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> One of the main problems with very-large-vocabulary continuous speech recognition is how to accurately and efficiently reduce the search space without pruning the correct candidate. Our speech recognition system is based on the HMM-LR algorithm \[1\] which utilizes a generalized LR parser \[2\] as a language model and hidden Markov models (HMMs) as phoneme models. Applying this algorithm to large-vocabulary continuous speech requires: (1) accurate scoring for phoneme sequences, (2) reduction of trellis calculation, and (3) efficient pruning of phoneme sequence candidates. null For the first requirement, several speech recognition algorithms that calculate the backward trelhs likelihood from the end of the utterance, as well as the forward trellis likehhood, have been proposed \[3\]\[41 . We also use forward and backward trellis likelihoods for accurate scoring. For the second requirement, we use an adjusting window, which chooses only the probable part of the trellis according to the predicted phoneme. For the third requirement, we use an algorithm for merging candidates which have the same allophonle phoneme sequences and the same context-free grammar states \[5\]. In addition, candidates are also merged at the meaning level \[6\].</Paragraph> <Paragraph position="1"> Speech HMMs are sensitive to incoming noise and this often results in a large decrease in the recognition. One solution is to train HMMs on noisy speech to obtain the corresponding optimum HMMs. For large-vocabulary continuous speech recognition, however, the computation load of this solution becomes too high, because all the HMMs need to be re-trained each time the characteristics of the background noise (such as its level) change. Taking inspiration from I-IMM decomposition \[7\], we proposed an HMM-composition technique to easily adapt the speech recognition system based on clean-speech HMMs to background noise \[8\]. This technique is similar to the technique of Nolasco Flores et al. \[9\] which was investigated independently.</Paragraph> <Paragraph position="2"> Providing access to directory information via spoken names and addresses is an interesting and useful application of large-vocabulary continuous speech recognition technology in telecommunication networks. Although many systems based on recognizing spoken spelled names are being investigated, it is unreasonable to expect users to correctly spell the names of the persons whose telephone number they want. In addition, there are several sets of letters having similar pronunciations, such as the English E-rhyming letters, and pronunciation of the spelling of another person's names is often unstable, since this is not a familiar action for people. Therefore, it is not easy to correctly recognize alphabetically spelled names, and a more successful approach might be to recognize naturally spoken names, even if the machine has to recognize hundreds of thousand names. We applied our speech recognition technology to a directory assistance system recognizing names and addresses continuously spoken in Japanese. This system was evaluated from the human-machine-interface point of view.</Paragraph> </Section> class="xml-element"></Paper>