XML Viewer - h93-1084

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1084_metho.xml
Size: 5,776 bytes
Last Modified: 2025-10-06 14:13:26
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1084">
  <Title>SPOKEN-LANGUAGE RESEARCH AT CARNEGIE MELLON</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SPOKEN-LANGUAGE RESEARCH AT CARNEGIE MELLON
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> The goal of speech research at Carnegie Mellon continues to be the development of spoken language systems that effectively intelFate speech processing into the human-computer interface in a way that facilitates the use of computers in the performance of practical tasks. Research in spoken language is currently focussed in the following areas: * Improved speech recognition technologies: Extending the useful vocabulary of SPHINX-II by use of better phonetic and linguistic models and better search techniques, providing for rapid configuration for new tasks.</Paragraph>
    <Paragraph position="1"> * Fluent human/machine interfaces: Developing tools that allow users to easily communicate with computers by voice and understanding the role of voice in the computer interface. null * Understanding spontaneous spoken language: Developing flexible recognition and parsing strategies to cope with phenomena peculiar to the lexical and grammaticoal structure of spontaneous spoken language. Investigate methods of integrating speech recognition and natural language understanding. Development of automatic training procedures for these grammars.</Paragraph>
    <Paragraph position="2"> * Acoustical and environmental robustness: Developing procedures to enable good recognition in office environments with desktop microphones and a useful level of recognition in more severe environments.</Paragraph>
    <Paragraph position="3"> * Rapid integration of speech technology: Developing an approach that will enable application developers and end users to incorporate speech recognition into their applications quickly and easily, as well as the dynamic modification of grammars and vocabularies.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> * SPmNX-II has been extended with a multi-pass search algorithm that incorporates two passes of beam search and a final A-star pass that can apply long-distance language models as well as produce alternative hypotheses.</Paragraph>
    <Paragraph position="1"> * Joint training of acoustic models and language models is currently being explored in the context of the Unified Stochastic Engine (USE).</Paragraph>
    <Paragraph position="2"> * A framework for long-distance language modeling was  developed, in collaboration with IBM researchers. A pilot system using this model yielded significant reduction in perplexity over the trigram model.</Paragraph>
    <Paragraph position="3"> * Developed improved recognition, grammar coverage and context handling that reduced SLS errors for the ATIS Benchmark by 67%. We also improved the robusmess and user feedback in our live ATIS demo.</Paragraph>
    <Paragraph position="4"> * Developed and evaluated two methods for more tightly integrating speech recognition and natural language understanding, producing error reductions of 20% compared to the loosely-coupled system.</Paragraph>
    <Paragraph position="5"> * Added automatic detection capability for out-of-vocabulary words and phrases. New words are now entered instantly into the phone dialer application given only their spelling. * Acoustical pre-processing algorithms for environmental robustness were extended to the CSR domain and made mote efficient.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="390" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> * Use our existing language modeling framework to model long-distance dependence on words and word combinations. These new models will be allow the recognizer to take advantage of improved linguistic knowledge at the earliest possible stage.</Paragraph>
    <Paragraph position="1"> * Implement confidence measures for large-vocabulary SLS systems, for new-word detection and greater accuracy.</Paragraph>
    <Paragraph position="2"> * Continue to explore issues associated with very large vocabulary (lO0,O00-worcO recognition systems.</Paragraph>
    <Paragraph position="3"> * Continue to develop methods for automatically acquisition of Natural Language information used by an SLS system.</Paragraph>
    <Paragraph position="4"> * Improve user interaction in the ATIS system, including clarification and mixed initiative dialogs, speech output and form-based displays.</Paragraph>
    <Paragraph position="5"> * Begin to develop a new SLS application, such as a telephone-based form filling application.</Paragraph>
    <Paragraph position="6"> * Provide grammar switching and instantaneous new word addition for the general SPmNX-II decoder.</Paragraph>
    <Paragraph position="7"> * Develop and test a 100,000-word pronunciation lexicon that will be available in the public domain.</Paragraph>
    <Paragraph position="8"> * Continue to improve our cepstrum-based environmental compensation procedu~s.</Paragraph>
    <Paragraph position="9"> * Demonstrate more robust microphone-array techniques.</Paragraph>
    <Paragraph position="10"> * Extend our work on environmental robustness to long-distance telephone lines.</Paragraph>
    <Paragraph position="11"> * Continue to enhance our spoken language interfaces, by introducing speech response capabilities and facilities for user customizing. Continue to investigate the appropriate use of speech in multi-modal interfaces.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML