File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/h93-1013_abstr.xml
Size: 5,844 bytes
Last Modified: 2025-10-06 13:47:46
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1013"> <Title>SESSION 3: CONTINUOUS SPEECH RECOGNITION*</Title> <Section position="1" start_page="0" end_page="68" type="abstr"> <SectionTitle> SESSION 3: CONTINUOUS SPEECH RECOGNITION* </SectionTitle> <Paragraph position="0"> The papers in this session focus on techniques for and applications of large-vocabulary continuous speech recognition. The technique oriented papers discuss techniques for channel compensation, fast search, acoustic modeling, and adaptive language modeling. The applications oriented papers discuss methods for using recognizers for language identification, speaker identification, speakersex identification, and keyword spotting.</Paragraph> <Paragraph position="1"> In &quot;Efficient Cepstral Normalization for Robust Speech Recognition,&quot; Liu et al. discuss several preprocessors for channel (including microphone) compensation. Several of these techniques cover only channel equalization and several also account for additive noise. The authors obtained the their best unknown-microphone performance using a technique that accounts for both the equalization and the additive noise.</Paragraph> <Paragraph position="2"> In &quot;Comparative ExPeriments on Large Vocabulary Speech Recognition,&quot; Schwartz et al. describe several aspects of the BBN recognition system. They briefly describe their use of forward-backward N-best search.</Paragraph> <Paragraph position="3"> They also found a number of small modeling improvements which add up to a significant total improvement in performance. Finally, they describe their results on channel compensation--which are not completely in agreement with the results of the previous paper.</Paragraph> <Paragraph position="4"> &quot;An Overview of the SPHINX-II Speech Recognition System&quot; by Huang et al. describes the CMU SPHINX-II recognition system. It describes their feature set, their use of tied-mixture (semicontinuous) pdfs, their statewise-clustered phone models (senones) and their search strategy. It also describes a technique for combination of the acoustic and language model probabilities which does not assume statistical independence between the two information sources.</Paragraph> <Paragraph position="5"> Murveit et al. describe the search strategy used in the SRI recognizer in &quot;Progressive-Search Algorithms for Large Vocabulary Speech Recognition.&quot; This progres*This work was sponsored by the Advanced Research Projects Agency. The views expressed are those of the author and do not reflect the official policy or position of the U.S. Government. sive search strategy performs the search several times, initially using inexpensive coarse models and then progressively more detailed and expensive models on each iteration. Information from each iteration is used to produce a smaller word network to constrain the search space of the next iteration.</Paragraph> <Paragraph position="6"> In &quot;Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies,&quot; Nguyen et al. describe the techniques used at BBN to achieve real-time recognition of a 20K word task. The techniques center on using a very fast approximate forward search.</Paragraph> <Paragraph position="7"> Information saved from this forward search is then used to constrain a backwards A* search. This backwards search is inherently fast and can provide an N-best sentence list for more detailed reevaluation.</Paragraph> <Paragraph position="8"> Gauvain and Lamel, in &quot;Identification on Non-Linguistic Speech Features,&quot; apply a phonetic recognizer to several other purposes. By using multiple phone sets running independently in parallel, they use the output likelihoods to identify speaker sex, speaker identity, and the language. In each case the phone sets are matched to the aspect to be identified.</Paragraph> <Paragraph position="9"> &quot;On the Use of Tied-Mixture Distributions&quot; by Kimball and Ostendorf discusses the use of tied Gaussian-mixture pdfs, which have been shown to yield good recognition performance in standard HMM recognizers at a number of sites. They discuss the application of tied mixtures to their stochastic segment recognition models and show improved performance over a non-mixture based system.</Paragraph> <Paragraph position="10"> In &quot;Adaptive Language Modeling Using the Maximum Entropy Principle,&quot; Lau et al. describe a new method for recognition-time adaptation of the of the language model based upon the recent past. The technique uses &quot;trigger&quot; words that signal an increased probability for other words in the near future. They report a greater reduction in perplexity than that obtained by the use of a &quot;caching&quot; adaptive language model.</Paragraph> <Paragraph position="11"> tem,&quot; Weintraub describes use of a large-vocabulary recognizer to a keyword-spotting task. He shows significantly improved performance over the traditional technique of searching for only the keywords against a background of unknown words.</Paragraph> <Paragraph position="12"> Peskin et al., in &quot;Topic and Speaker Identification via Large Vocabulary Continuous Speech Recognition,&quot; describe the use of the Dragon large-vocabulary recognizer to perform both topic and speaker identification. The technique described here uses a topic and speaker-independent recognizer to produce a word sequence.</Paragraph> <Paragraph position="13"> This word sequence can then be economically rescored using topic-dependent language models for topic identification or speaker-dependent acoustic models for speaker identification. The authors report good performance on both tasks.</Paragraph> </Section> class="xml-element"></Paper>