File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1100_metho.xml

Size: 4,631 bytes

Last Modified: 2025-10-06 14:13:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1100">
  <Title>Segment-Based Acoustic Models with Multi-level Search Algorithms for Continuous Speech Recognition</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> The goal of this project is to develop improved acoustic models for speaker-independent recognition of continuous speech, together with efficient search algorithms appropriate for use with these models. The current work on acoustic modelling is focussed on stochastic, segment-based models that capture the time correlation of a sequence of observations (feature vectors) that correspond to a phoneme. Since the use of segment models is computationally complex, we are investigating multi-level, iterative algorithms to achieve a more efficient search. Furthermore, these algorithms will provide a formalism for incorporating higher-order information. This research is jointly sponsored by DARPA and NSF.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> * Developed methods for robust context modeling for the stochastic segment model (SSM) using tied covariance distributions, and investigated different regions of tying using clustering techniques. On the RM Oct 89 test set, improvements reduced the error rate of the SSM by a factor of two (9.1% to 4.8% word error), and the current BBN-ItMM/BU-SSM combined system achieves 3.3% word error.</Paragraph>
    <Paragraph position="1"> * Determined that linear models have predictive power similar to non-linear models of cepstra within segments, and explored different models of the statistical dependence of cepstral coefficients in the context of a dynamical system (DS) model.</Paragraph>
    <Paragraph position="2"> * Evaluated the dynamical system model in phoneme recognition (as opposed to classification in previous work) using the split-and-merge search algorithm.</Paragraph>
    <Paragraph position="3"> The DS model outperforms the independent-frame model on the TIMIT corpus.</Paragraph>
    <Paragraph position="4"> * Reformulated the recognition problem as a classification and segmentation scoring problem, which allows more general types of classifiers and non-traditional feature analysis. Demonstrated that for equivalent feature sets and context-independent models, the two methods give similar results.</Paragraph>
    <Paragraph position="5"> * Investigated duration models conditioned on speaking rate and pre-pausM location, and improved performance by increasing the weight of duration by ineluding the duration probabilities separately in the N-best score combination.</Paragraph>
    <Paragraph position="6"> * Analyzed the behavior of recognition error over the weight space for HMM and SSM scores in the N-best rescoring paradigm. Addressed the problem of local optima with a grid-based search, determined that the relative weights for the HMM and SSM scores are similar, and discovered a significant mismatch problem between training and test data.</Paragraph>
    <Paragraph position="7"> * Extended Bayesian techniques for speaker adaptation and evaluated these in the RM word recognition task, achieving 16% reduction in error using 3 minutes of speech with simple mean adaptation techniques. Covariance adaptation techniques seem to require more speakers for training the priors.</Paragraph>
    <Paragraph position="8"> * Developed a multi-level stochastic model of speech that can take advantage of multi-rate signal analysis; evaluating the model for the two-level case with cepstral features shows improved performance over a single-level model.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="467" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> The plans for the coming year reflect the fact that this grant ends in summer 1992.</Paragraph>
    <Paragraph position="1"> * Continue work in the classification and segmentation scoring paradigm: demonstrate improvements associated with novel models and/or features, and extend the probabilistic framework to allow context-dependent models.</Paragraph>
    <Paragraph position="2"> * Extend context modeling through further exploration of clustering and to recently developed DS or multi-level variations.</Paragraph>
    <Paragraph position="3"> * Implement different auditory-based signal processing algorithms, and evaluate their usefulness for recognition through a series of experiments on the TIMIT corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML