File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1036_intro.xml

Size: 4,409 bytes

Last Modified: 2025-10-06 14:05:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1036">
  <Title>MAP Estimation of Continuous Density HMM : Theory and Applications</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Estimation of hidden Marknv model (HMM) is usually obtained by the method of maximum likelihood (ML) \[1, 10, 6\] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM).</Paragraph>
    <Paragraph position="1"> The MAP estimate can be seen as a Bayes estimate of the vector parameter when the loss function is not specified \[2\]. This estimation technique provides a way of incorporatimg prior information in the training process, which is particularly useful to deal with problems posed by sparse training data for which the ML approach gives inaccurate estimates. This approach can be applied to two classes of estimation problems, namely, parameter smoothing and model adaptation, both related to the problem of sparse training data.</Paragraph>
    <Paragraph position="2"> In the following the sample x = (zl, ...,z,~) is a given set of n observations, where zl, ..., z n are either independent and identically distributed (i.i.d.), or are drawn from a probabilistic function of a Markov chain.</Paragraph>
    <Paragraph position="3"> The difference between MAP and ML estimation lies in the assumption of an appropriate prior disliibution of the parameters to be estimated. If 0, assumed to be a random vector taking values in the space O, is the parameter vector to be estimated from the sample x with probability density function (p.d.f.) f(.lO), and if g is the prior p.d.f, of 0, then the MAP estimate, 0~p, is defined as the mode of the posterior p.d.f, of 0, i.e.</Paragraph>
    <Paragraph position="4"> Oma, = argmoax f(xlO)g(O) (I) If 9 is assumed to be fixed but unknown, then there is no knowledge about 8, which is equivalent to assuming a non-informative improper prior, i,e. g(8) ----constant. Equation (1) then reduces to the familiar ML formulation.</Paragraph>
    <Paragraph position="5"> Given the MAP formulation two problems remain: the choice of the prior distribution family and the evaluation of the maximum a ~This work was done while Jean-Luc Gauvain was on leave from the Speech Communication Group at LIMSI/CNRS, Orsay, France.</Paragraph>
    <Paragraph position="6"> posteriori. These two problems are closely related, since the appropilate choice of the prior distribution can greatly simplify the MAP estimation. Like for ML estimation, MAP estimation is relatively easy if the famay ofp.d.f.'s {f(-10), 0 ~ O} possesses a sufficient statistic of fixed dimension t(x). In this case, the natural solution is to choose the prior density in a conjugate family, {k(.ko), ~o E ~}, which includes the kernel density of f(. lO), i.e. Vx t(x) e ~b \[4, 2\]. The MAP estimation is then reduced to the evaluation of the mode of k(Ol~o' ) = k(Oko)k(Olt(x)), a problem almost identical to the ML estimation problem. However, among the families of interest, only exponential families have a sufficient statistic of fixed dimension \[7\]. When there is no sufficient statistic of fixed dimension, MAP estimation, like ML estimation, is a much more difficult problem because the posterior density is not expressible in terms of a fixed number of parameters and cannot be maximized easily. For both finite mixture density and hidden Markov model, the lack of a sufficient statistic of fixed dimension is due to the underlying hidden process, i.e. a multinomial model for the mixture and a Markov chain for an HMM. In these cases ML estimates are usually obtained by using the expectation-maximization (EM) algorithm \[3, I, 13\].</Paragraph>
    <Paragraph position="7"> This algorithm exploits the fact that the complete-data likelihood can be simpler to maximize than the likelihood of the incomplete data, as in the case where the complete-data model has sufficient statistics of fixed dimension. As noted by Dempster et al. \[3\], the EM algorithm can also be applied to MAP estimation. In the next two sections the formulations of this algorithm for MAP estimation of Gaussian mixture and CDHMM with Gaussian mixture observation densities are derived.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML