File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1035_intro.xml

Size: 1,976 bytes

Last Modified: 2025-10-06 14:05:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1035">
  <Title>Improving State-of-the-Art Continuous Speech Recognition Systems Using the N-Best Paradigm with Neural Networks</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> In February 1991, we introduced at the DARPA Speech and Natural Language Workshop the concept of a Segmental Neural Net (SNN) for phonetic modeling in continuous speech recognition \[1\]. The SNN was introduced to overcome some of the well-known limitations of hidden Markov models (HMM) which now represent the state of the art in continuous speech recognition (CSR). Two such limitations are (i) the conditional-independence assumption, which prevents a HMM from taking full advantage of the correlation that exists among the frames of a phonetic segment, and (ii) the awkwardness with which segmental features (such as duration) can be incorporated into HMM systems. We developed the concept of SNN specifically to overcome the two HMM limitations just mentioned for phonetic modeling in speech. However, neural nets are known to require a large amount of computation, especially for training. Also, there is no known efficient search technique for finding the best scoring segmentation with neural nets in continuous speech.</Paragraph>
    <Paragraph position="1"> Therefore, we have developed a hybrid SNN/HMM system that is designed to take full advantage of the good properties of both methods: the phonetic modeling properties of SNNs and the good computational properties of HMMs.</Paragraph>
    <Paragraph position="2"> The two methods are integrated through the use of the N-best paradigm, which was developed in conjunction with the BYBLOS system at BBN \[7,6\].</Paragraph>
    <Paragraph position="3"> A year ago, we presented very preliminary results using our hybrid system on the speaker-dependent portion of the</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML