File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1049_intro.xml
Size: 1,848 bytes
Last Modified: 2025-10-06 14:05:02
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1049"> <Title>A DYNAMICAL SYSTEM APPROACH TO CONTINUOUS SPEECH RECOGNITION</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> A new direction in speech recognition via statistical methods is to move from frame-based models, such as Hidden Markov Models (HMMs), to segment-based models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint model for a sequence of observations, allowing explicit modeling of time correlation. Originally in the SSM, a phoneme was modeled as a sequence of feature vectors that obeyed a multivariate Gaussian distribution. The variable length of an observed phoneme was handled either by modeling a fixed-length transformation of the observations \[6\] or by assuming the observation was a partially observed sample of a trajectory represented by a fixed-length model \[7\]. In the first case, the maximum likelihood estimates of the parameters can be obtained directly, but the Estimate-Maximize algorithm \[2\] may be required in the second case.</Paragraph> <Paragraph position="1"> Unfortunately, the joint Gaussian model suffers from estimation problems, given the number of acoustic features and the analysis-frame rate that modern continuous speech recognizers use. Therefore, a more constrained assumption about the correlation structure must be made. In previous work \[3\], we chose to constrain the model to a time-inhomogeneous Gauss-Markov process. Under the Gauss-Markov assumption, we were able to model well the time correlation of the first few cepstral coefficients, but the performance decreased when a larger number of features were</Paragraph> </Section> class="xml-element"></Paper>