File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-2044_intro.xml
Size: 4,309 bytes
Last Modified: 2025-10-06 14:04:49
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-2044"> <Title>ACOUSTICAL PRE-PROCESSING FOR ROBUST SPEECH RECOGNITION</Title> <Section position="3" start_page="0" end_page="311" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> The acceptability of any voice interface depends on its ease of use. Although users in some application domains will accept the headset-mounted microphones that are commonly used with current speech recognition systems, there are many other applications that require a desk microphone or a wall-mounted microphone. The use of other types of microphones besides the &quot;close-talking&quot; headset generally degrades the performance of spoken-language systems. Even a relatively &quot;quiet&quot; office environment can be expected to provide a significant amount of additive noise from fans, door slams, as well as competing conversations and reverberation arising from surface reflections within a room. Applications such as inspection or inventory on a factory floor, or an outdoor automatic banking machine demand an even greater degree of environmental robustness. Our goal has been to develop practical spoken-language systems for real-world environments that are robust with respect to changes in acoustical ambience and microphone type as well as with respect to speaker and dialect.</Paragraph> <Paragraph position="1"> Although a number of techniques have been proposed to improve the quality of degraded speech, researchers have only recently begun to evaluate speech-enhancement in terms of the improvement in recognition accuracy that they provide for speech-recognition systems operating in natural environments. We are incorporating into our system a combination of techniques that come into play at different levels of the system, including pre-processing of the acoustical waveform, the development of physiologically and psychophysically motivated peripheral processing models (i.e. &quot;ear models&quot;), adaptive multimicrophone array processing, and dynamic adaptation to new speakers and environments by modifying the parameters used to represent the speech sounds. In this talk we will focus only on There are many sources of acoustical distortion that can degrade the accuracy of speech-recognition systems. For example, obstacles to robustness include additive noise from machinery, competing talkers, etc., reverberation from surface reflections in a room, and spectral shaping by microphones and the vocal tracts of individual speakers.</Paragraph> <Paragraph position="2"> These sources of distortion cluster into two complementary classes: additive noise (as in the first two examples) and distortions resulting the convolution of the speech signal with an unknown linear system (as in the remaining three).</Paragraph> <Paragraph position="3"> In the classical speech-enhancement literature, two complementary techniques have been proposed to cope with these problems: spectral subtraction and spectral normalization. In spectral subtraction one estimates the amount of background noise present during non-speech intervals, and subtracts the estimated spectral density of the noise from the incoming signal (e.g. Boll, 1979; Berouti et al., 1979). In spectral normalization (sometimes referred to as &quot;blind deconvolution&quot;), one estimates the average spectrum when speech is present and applies a multiplicative normalization factor with respect to a reference spectrum (e.g. Stockham et al., 1975). While these procedures were once thought to be of limited practical benefit, based on the results of experiments concerning the human perception of speech, results of recent applications of them to automatic speech-recognitions systems have been more encouraging (e.g. Porter and Boll, 1984; Van Compernolle, 1987).</Paragraph> <Paragraph position="4"> In this report we will review the database used to evaluate efficient implementations of spectral subtraction and normalization in the cepstral domain, discuss the results of analyses of baseline studies of recognition performance, describe the effectiveness of the spectral subtraction and normalization algorithms, and discuss the motivations for some of our work in progress.</Paragraph> </Section> class="xml-element"></Paper>