File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1092_metho.xml
Size: 4,302 bytes
Last Modified: 2025-10-06 14:13:50
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1092"> <Title>Segment-Based Acoustic Models for Continuous Speech Recognition</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> The goal of this project is to develop improved statistical models for speaker-independent recognition of continuous speech, together with efficient search algorithms appropriate for use with these models. The current work on acoustic modeling is focussed on: stochastic, segment-based models that capture the time correlation of a sequence of observations (feature vectors) that correspond to a phoneme; hierarchical stochastic models that capture higher level intra-utterance correlation; and multi-pass search algorithms for implementing these more complex models. In addition, we have extended the effort on models of high order statistical dependence to language modeling. This research has been jointly sponsored by ARPA and NSF under NSF grant IRI-8902124 and by ARPA and ONR under ONR grant N00014-92-J-1778.</Paragraph> </Section> <Section position="2" start_page="0" end_page="449" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> Recent results on this project are summarized below with names of the students primarily responsible for the work indicated in parentheses.</Paragraph> <Paragraph position="1"> * Ported the BU recognition system to the Wall Street Journal (WSJ) task and Switchboard task, obtaining results similar to H/VIM systems on those tasks. Also implemented several software changes to handle large vocabularies and allow for larger N-best lists by using more efficient score caching, as well as to accommodate the full amount of training data available. (F. Richardson, S.</Paragraph> <Paragraph position="2"> Tibrewal, A. Kannan) * Continued investigation of mixture distribution modeling at both the segment and frame levels, shifting our focus primarily to &quot;untied&quot; segmental mixture systems. We have established baseline results and investigated various parameter allocation choices for these models in experiments on the Resource Management task.</Paragraph> <Paragraph position="3"> For context-independent models, performance is found to improve over uni-modal and fled-mixture systems, through combining segmental and frame-level mixtures.</Paragraph> <Paragraph position="4"> Further work on initialization is needed for estimating context-dependent models. (O. Kimball) Implemented a new duration model that uses speakingrate adapted parameters.</Paragraph> <Paragraph position="5"> Developed a sentence-level mixture n-gram language model to handle topic-related language dynamics, and evaluated recognition performance with this model on the 5k WSJ task in the N-best re, scoring framework, obtaining a slight improvement over standard trigrams. (R. Iyer) Developed the theoretical framework for an automatic mapping of distributions to arbitrary subsets of a variable-length segment feature matrix, as an alternative to the linear-time frame mapping currently used in the SSM.</Paragraph> <Paragraph position="6"> Developed the theoretical framework for a hierarchical model of intra-utterance observation correlation.</Paragraph> <Paragraph position="7"> Developed a new algorithm for fast search of a word lattice for multi-pass recognition scoring. (F. Richardson)</Paragraph> </Section> <Section position="3" start_page="449" end_page="449" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> Implement and evaluate the hierarchical stochastic model of intra-utterance dependencies, first in TIM1T classification and latex in the WSJ system ff initial experiments are successful.</Paragraph> <Paragraph position="1"> Investigate unsupervised adaptation in the WSJ task domain. null Investigate algorithms to improve recognition accuracy for telephone speech.</Paragraph> <Paragraph position="2"> Assess accuracy/speed trade-offs for different lattice search algorithms for the WSJ task.</Paragraph> <Paragraph position="3"> Extend work in mixture language modeling to capture more language dynamics and/or task domain change through adaptation.</Paragraph> </Section> class="xml-element"></Paper>