File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-2028_intro.xml

Size: 5,497 bytes

Last Modified: 2025-10-06 14:04:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2028">
  <Title>Advisory Committee:</Title>
  <Section position="2" start_page="0" end_page="204" type="intro">
    <SectionTitle>
1 INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> This is a proposed specification for an interface between an acoustic matcher, such as a Hidden Markov Model (HMM) Continuous Speech Recognizer (CSR), and a grammatical component such as a natural language parser (NLP). Its purpose is to allow independently developed CSR and NLP systems to be interconnected by a well specified and well structured interface. It can also be used to provide a simulated SLS environment for developing a CSR or NLP by providing an interface to a simulator of the other component. After initial independent component development has been completed, the interface specification will guarantee that the real components can be interconnected for operation or joint development. It might also be used for NLP evaluation testing by providing a common (simulated) acoustic recognizer to use in conjunction with the NLPs under test.</Paragraph>
    <Paragraph position="1"> The fundamental purpose of this specification is to provide an interface specification for connecting the two components so that independent sites can join their modules together. It is hoped that sites which can produce both components internally will consider this specification on its own merit and the potential value of being able to interface to modules developed at other sites.</Paragraph>
    <Paragraph position="2">  This specification provides for two modes of operation: integrated and decoupled. In the integrated mode, both the CSR and the NLP contribute to the search control. If (or when) the CSR and NLP technologies are sufficiently mature, this will probably be the preferred mode. The decoupled mode allows the CSR component to output a list of possible sentences with acoustic match likelihoods.</Paragraph>
    <Paragraph position="3"> The NLP can then process this list as it sees fit. Since information flow in the decoupled mode is strictly feed-forward, no NL information is available to help constrain the search in the CSR component.</Paragraph>
    <Paragraph position="4"> The specification contains overall control architecture and interface definitions. The resulting system consists of a combined stack-controller/CSR (SC-CSR) and NLP interconnected by UNIX pipes. Simulators for each component will be provided to allow sites which are developing only one of the components to work within the context of a full SLS system and to allow sites which are developing both components to perform independent development of both modules if they so wish.</Paragraph>
    <Paragraph position="5"> The basic algorithmic constraints required by this interface are fairly mild: the interprocess interface uses UNIX pipes, and both the CSR and NLP components operate left-to-right on their respective input data. (However, the decoupled mode allows the NLP to use non-left-to-right strategies such as island-driven. The decoupled mode may increase the CPU requirements of the overall system.) The original idea and the definition of this interface is the work of D. Paul. An Advisory Committee of both NL and CSR people has reviewed the proposal from both viewpoints. The committee members are:  The comments of these committee members have been very useful to the author. However, their membership does not imply agreement with all provisions of this specification. A draft has been distributed to all sites in the DARPA SLS program for comment before its presentation at the October 1989 meeting.</Paragraph>
    <Section position="1" start_page="203" end_page="204" type="sub_section">
      <SectionTitle>
1.1 The Basic System Concept
</SectionTitle>
      <Paragraph position="0"> The basic concept requires three parts:  1. A stack controller (similar to the IBM stack decoder). The &amp;quot;stack&amp;quot; is a sorted list of partial theories.</Paragraph>
      <Paragraph position="1"> 2. A CSR capable of evaluating the probability of the acoustic data for a given left sentence fragment.</Paragraph>
      <Paragraph position="2"> 3. An NLP capable of evaluating the probability of a given left sentence fragment. The basic system operation is: 1. The stack controller starts with a null theory.</Paragraph>
      <Paragraph position="3"> 2. Take the most probable partial theory (left sentence fragment) off the stack.</Paragraph>
      <Paragraph position="5"> If this theory consumes all acoustic data and is a full sentence, this is the recognized sentence.</Paragraph>
      <Paragraph position="6"> Terminate. (If more than one hypothesized sentence is desired, continue until a sufficient number of sentences axe output. This is Top-N mode, see Sec. 2.5.) For each possible succeeding word, add the word to the theory, ask the CSR for the acoustic probability, ask the NLP for the grammatical probability, and insert the new theory into the stack at a position determined by a combination of the probabilities. (&amp;quot;Fast matches&amp;quot; can he used to limit the number of succeeding words in order to reduce the search space.) Note: In general, the CSR probabilities are distributions over time.</Paragraph>
      <Paragraph position="7">  The above is an implementation of a &amp;quot;uniform&amp;quot; \[2\] search, which will find the correct (most probable) answer far to slowly to be practical. A more efficient version is outlined below.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML