File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-2049_intro.xml

Size: 8,121 bytes

Last Modified: 2025-10-06 14:04:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2049">
  <Title>SPEECH RECOGNITION IN PARALLEL</Title>
  <Section position="2" start_page="0" end_page="354" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Concomitantly with recent advances in speech coding, recognition and production, parallel computer systems are now commonplace delivenng raw computing power measured in hundreds of MIPS and Megaflops. It seems inevitable that within the next decade or so, gigaflop parallel processors will be achievable at modest cost. Indeed, gigaflops per cubic foot is now becoming a standard of measure for parallel computers.</Paragraph>
    <Paragraph position="1"> Now that affordable massively parallel computing is approaching reality, it is conceivable that raw computing power can bnng gigaflops to bear on the speech recognition problem. Experiments can now be undertaken in a more timely manner and systems can be organized, perhaps with improved recognition accuracies. It is our present thesis that speech recognizers can be improved, not only in response time but in reduced error rates as well by parallel computing. How? We are investigating parallel algorithms that combine a number of concurrent and independent acoustic processors and speech recognizers that may, we conjecturer synergistically deliver better overall recognition performance than any component in isolation. Simply stated, our approach is to utilize a number of concurrent and competing recognizers in the aggregate r utilizing much more information from the speech signal than has been attempted before in one individual recognition system. Thus, rather than committing a single recognition system to model and recognize phones and words from a particular set of information, processed from the raw acoustic signal, we conjecture that utilizing much more of the information available from the signal may effect better overall recognition performance. We aim to compose multiple independently-executing recognizers into one recognition system through trained, weighted voting schemes.</Paragraph>
    <Paragraph position="2"> Our sights are aimed at the lower level of the recognition process: better phone recognition leading to improved word recognition. Clearly, the general problem of speech recognition requires &amp;quot;higher order&amp;quot; information to completely recognize sequences of words and sentences and their semantics. Such knowledge-based approaches serve to bias the recognition process in favor of &amp;quot;sensible&amp;quot; word utterances and to disambiguate word utterances by reducing the size of the search space of possible words at each point of an utterance through language, pragmatic and semantic constraints. We suspect, however, that such higher order information can be rendered more effective if the lower level recognizer can achieve near perfect recognition of the phones in the utterance initially. That is, ff the ordenng of likelihoods of candidate phones and, ultimately, words, are wrong, or worse if the wrong set of candidates is posited, no amount of higher-level knowledge will allow the recognition of an utterance with high reliability, unless a language model over-constrains the problem. (Clearly, as a consequence, utterances governed by low perplexity grammars are easier to recognize than ones with higher perplexity grammars.)  The first aim of our work will be to produce more accurate phone, and thus word, recognition. Isolated word recognition stands the most to gain initially from our approach. However, ff successful, the approach will allow higher levels of knowledge to more effectively do the job of word sequence recognition and understanding in connected and continuous recognition tasks.</Paragraph>
    <Paragraph position="3"> We plan to test the interaction of higher level knowledge with lower level recognition by incorporating a syntactic parser also implemented using parallel algorithms and hardware. In later stages of our work we shall incorporate higher level constraints from syntax, and possibly semantics, through the addition of a separate independently-executing recognizer that will vote on the likelihood of the next word based on syntactic and semantic expectations. We will use a functional unification grammar \[Kay 79, McKeown &amp; Paris 87, Elhadad 89\] for this task because it is suited for representing complex interaction between multiple often conflicting constraaints, it will easily lend itself to parallelization, and it will allow for later extensions to speech synthesis quite easily. This higher level recognizer will be incorporated into the speech system later in our proposed research plan. In the initial phase of our work, we will focus on the adaptation of the grammar for interpretation (we have been using it solely for generation) and on the parallelization of the unification process.</Paragraph>
    <Paragraph position="4"> This approach can certainly be pursued with a serial computing system simply by expanding the set of features extracted from the speech signal to include much more information from the acoustic signal in the first place and executing each recognition task sequentially in turn. However, the computation required for a serial approach expands in an amount proportional to the number of features extracted, and the number of recognizers. The problems of realizing realtime performance, as well as reducing the overhead computational cost of model training, thus become exacerbated.</Paragraph>
    <Paragraph position="5"> Processing each recognizer in parallel with the others, however, results in a system no slower than the slowest individual recognizer. Furthermore, if each recognizer itself can be executed as a parallel activity, realtime performance might be achievable with the current generation of hardware. This approach, therefore, calls for computing structures with many processing elements, typically called massively parallel computers.</Paragraph>
    <Paragraph position="6"> Our aim, however, is to design massively parallel computing structures that are economical; minimum parallel computing requirements may be met by simple parallel computing structures that scale economically.</Paragraph>
    <Paragraph position="7"> Another approach we are pursuing is to explore recent advances in dynamic programming algorithms for sequence matching tasks that have been shown to reduce the time of conventional serial algorithms by as much as an order of magnitude (see \[Eppstein 89a\] for example). Hence, it may be possible that realfime performance of speech recognition may be approached with faster serial pattern matching algorithms. Furthermore, ff we find the means of efficiently parallelizing these newer serial algorithms, realtime recognition might be directly achieved.</Paragraph>
    <Paragraph position="8"> By way of summary, we attack the problems of realtime speech recognition with improved recognition accuracy by: * utilizing much more information from the raw acoustic signal, * composing multiple recognition systems into one aggregate recognizer through trained, weighted voting schemes, * using higher level syntactic, and possibly semantic, constraints for speech recognition through the incorporation of an additional recognizer using natural language approaches, * exploiting recent algorithmic advances in dynamic programming,  * parallelizing the dynamic programming algorithms and multiple recognizer paradigm, * and demonstrating these features on a speech recognition system running on economical and scalable parallel hardware.</Paragraph>
    <Paragraph position="9"> In our ongoing work, we have investigated and implemented parallel computing systems to run pattern recognition algorithms at high speeds. The research devoted to the multiple recognizer paradigm is in its initial slages of inquiry. Presently, we are completing experimental parallel hardware, with operational software systems to be ported from an earlier version of the machine, to conduct a set of experiments. This work will be outlined later.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML