File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-1027_intro.xml

Size: 3,280 bytes

Last Modified: 2025-10-06 14:04:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1027">
  <Title>THE MIT SUMMIT SPEECH RECOGNITION SYSTEM: A PROGRESS REPORT*</Title>
  <Section position="2" start_page="0" end_page="179" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> For slightly over a year, we have focused our research effort on the development of a phonetically-based spoken language understanding system called SUMMIT. Our approach is based on the belief that advanced human/machine communication systems must build on our understanding of the human communication process. Despite recent development of some speech recognition systems with high accuracy, the performance of such systems typically falls far short of human capabilities. We are placing heavy emphasis on designing systems that can make use of the knowledge gained over the past four decades on human communication, in the hope that such systems will one day have a performance approaching that of humans.</Paragraph>
    <Paragraph position="1"> We are basing the design of our system on the premise that robust speech recognition is tied to our ability to successfully extract the linguistic information from the speech signal and discard those aspects that are extra-linguistic. Like others before us, we have chosen phonemes and other related descriptors such as distinctive features and syllables as the units to relate words in the lexicon to the speech signal. However, there are several aspects that collectively distinguish our approach from those pursued by others. First, we believe that many of the acoustic cues for phonetic contrast are encoded at specific times in the speech signal. Therefore, one must explicitly establish acoustic landmarks in the speech signal in order to fully utilize these acoustic attributes. Second, unlike previous attempts at explicit utilization of speech knowledge by heuristic means, we seek to make use of the available speech knowledge by embedding such knowledge in a formal framework whereby powerful mathematical tools can be utilized to optimize its use. Third, the system must have a stochastic component to deal with the present state of ignorance in our understanding of the human communication process and its inherent variabilities throughout. It is our belief that speechspecific knowledge will enable us to build more sophisticated stochastic models than what is currently being attempted, and to reduce the amount of training data necessary for high performance. Finally, the ultimate goal of our research is the understanding of the spoken message, and the subsequent accomplishment of a task based on this understanding. To achieve this goal, we must fully integrate the speech recognition part of the problem with natural language processing so that higher level linguistic constraints can be utilized.</Paragraph>
    <Paragraph position="2"> *This research was supported by DARPA under Contract N00039-85-C-0254, monitored through Naval Electronic Systems Command.</Paragraph>
    <Paragraph position="3">  This paper describes those parts of our system dealing with acoustic segmentation, phonetic classification, and lexical access, and documents its current performance on the DARPA Resource Management task \[1\].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML