File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1083_metho.xml
Size: 4,249 bytes
Last Modified: 2025-10-06 14:13:26
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1083"> <Title>Segment-Based Acoustic Models for Continuous Speech Recognition</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> The goal of this project is to develop improved acoustic models for speaker-independent recognition of continuous speech, together with emeient search algorithms appropriate for use with these models. The current work on acoustic modeling is focussed on stochastic, segment-based models that capture the time correlation of a sequence of observations (feature vectors) that correspond to a phoneme, hierarchical stochastic models that capture higher level intra-utterance correlation, and multi-pass search algorithms for implementing these more complex models. This research has been jointly sponsored by DARPA and NSF under NSF grant IPd-8902124 and by DARPA and ONR under ONR grant N00014-92-J-1778.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> * Implemented different auditory-based signal proceasing algorithms and evaluated their use in recognition on the TIMIT corpus, finding no performance gains relative to cepstral parameters probably due to the non-Gaussian nature of auditory features.</Paragraph> <Paragraph position="1"> * Improved the score combination technique for N-Best rescoring, through normalizing scores by sentence length to obtain more robust weights that alleviate problems associated with test set mismatch.</Paragraph> <Paragraph position="2"> * Further investigated agglomerative and divisive clustering methods for estimating robust context.dependent models, and introduced a new clustering criterion based on a likelihood ratio test; obtained a slight improvement in performance with an associated reduction in storage costs of a factor of two.</Paragraph> <Paragraph position="3"> * Extended the classification and segmentation scoring formalism to handle context-dependent models without requiring the assumption of independence of features between phone segments (using maximum entropy methods); evaluated different segmentation scores with results suggesting more work is needed in this area.</Paragraph> <Paragraph position="4"> * Evaluated a new distribution mapping, which led to an 8% reduction in error on the development test set but no improvement on other test sets.</Paragraph> <Paragraph position="5"> * Investigated the use of different phone sets and probabilistic multiple-pronunciation networks; no improvements were obtained on the RM corpus, though there may be gains in another domain.</Paragraph> <Paragraph position="6"> * Extended the two level segment/rnicrosegment formalism to application in word recognition using context-dependent models; evaluated the trade-offs associated with modeling trajectories vs. (non-tied) microsegment mixtures, finding that mixtures are more useful for context-independent modeling but representation of a trajectory is more useful for context-dependent modeling.</Paragraph> <Paragraph position="7"> * Investigated the use of tied mixtures at the frame level (as opposed to the microsegment level), evaluating different covariance assumptions and training conditions; developed new, faster mixture training algorithms; and achieved a 20% reduction in word error over our previous best results on the Resource Management task. Current SSM performance rates are 3.6% word error on the Oct89 test set and 7.3% word error on the Sep92 test set.</Paragraph> </Section> <Section position="3" start_page="0" end_page="389" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR * Continue work in the classification and segmenta- </SectionTitle> <Paragraph position="0"> tion scoring paradigm; demonstrate improvements associated with novel models and/or features.</Paragraph> <Paragraph position="1"> * Port the BU recognition system to the Wall Street Journal (WSJ) task, 5000 word vocabulary.</Paragraph> <Paragraph position="2"> * Develop a stochastic formalism for modeling intra-utterance dependencies assuming a hierarchical structure.</Paragraph> </Section> class="xml-element"></Paper>