File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1018_intro.xml

Size: 5,162 bytes

Last Modified: 2025-10-06 14:05:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1018">
  <Title>Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies</Title>
  <Section position="2" start_page="0" end_page="91" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The statistical approach to speech recognition requires that we compare the incoming speech signal to our model of speech and choose as our recognized sentence that word string that has the highest probability, given our acoustic models of speech and our statistical models of language.</Paragraph>
    <Paragraph position="1"> The required computation is fairly large. When we realized that we needed to include a model of understanding, our estimate of the computational requirement was increased, because we assumed that it was necessary for all of the knowledge sources in the speech recognition search to be tightly coupled.</Paragraph>
    <Paragraph position="2"> Over the years DARPA has funded major programs in special-purpose VLSI and parallel computing environments specifically for speech recognition, because it was taken for granted that this was the only way that real-time speech recognition would be possible. However, these directions became major efforts in themselves. Using a small number of processors in parallel was easy, but efficient use of a large number of processors required a careful redesign of the recognition algorithms. By the time high efficiency was obtained, there were often faster uniprocessors available.</Paragraph>
    <Paragraph position="3"> Design of special-purpose VLSI obviously requires considerable effort. Often by the time the design is completed, the algorithms implemented are obsolete and much faster general purpose processors are available in workstations. The result is that neither of these approaches has resulted in real-time recognition with vocabularies of 1,000 words or more. Another approach to the speech recognition search problem is to reduce the computation needed by changing the search algorithm. For example, IBM has developed a flexible stack-based search algorithm and several fast match algorithms that reduce the search space by quickly eliminating a large fraction of the possible words at each point in the search.</Paragraph>
    <Paragraph position="4"> In 1989 we, at BBN \[1\], and others \[2, 3\] developed the N-best Paradigm, in which we use a powerful but inexpensive model for speech to find the top N sentence hypotheses for an utterance, and then we rescore each of these hypotheses with more complex models. The result was that the huge search space described by the complex models could be avoided, since the space was constrained to the list of N hypotheses. Even so, an exact algorithm for the N-best sentence hypotheses required about 100 times more computation than the simple Viterbi search for the most likely sentence.</Paragraph>
    <Paragraph position="5"> In 1990 we realized that we could make faster advances in the algorithms using off-the-shelf hardware than by using special hardware. Since then we have gained orders of magnitude in speed in a short time by changing the search algorithms in some fundamental ways, without the need for additional or special hardware other than a workstation. This has resulted in a major paradigm shift. We no longer think in terms of special-purpose hardware - we take it for granted that recognition of any size problem will be possible with a software-only solution.</Paragraph>
    <Paragraph position="6"> There are several obvious advantages to software-based recognizers: greater flexibility, lower cost, and the opportunity for large gains in speed due to clever search algorithms.</Paragraph>
    <Paragraph position="7">  1. Since the algorithms are in a constant state of flux, any special-purpose hardware is obsolete before it is finished.</Paragraph>
    <Paragraph position="8"> 2. Software-only systems are key to making the technology broadly usable.</Paragraph>
    <Paragraph position="9"> - Many people will simply not purchase extra hardware.</Paragraph>
    <Paragraph position="10"> - Integration is much easier.</Paragraph>
    <Paragraph position="11">  - 'Iqae systems are more flexible.</Paragraph>
    <Paragraph position="12"> 3. For those people who already have workstations, software is obviously less expensive.</Paragraph>
    <Paragraph position="13"> 4. Most importantly, it is possible to obtain much larger  gains in speed due to clever search algorithms than from faster hardware.</Paragraph>
    <Paragraph position="14"> We have previously demonstrated real-time software-only recognition for the ATIS task with over 1,000 words. More recently, we have developed new search algorithms that perform recognition of 20,000 words with fully-connnected bi-gram and trigram statistical grammars in strict real-time with little loss in recognition accuracy relative to research levels. First, we will very briefly review some of the search algorithms that we have developed. Then we will explain how the Forward-Backward Search can be used to achieve real-time 20,000-word continuous speech recognition.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML