File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-4021_concl.xml
Size: 2,032 bytes
Last Modified: 2025-10-06 13:54:04
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4021"> <Title>Feature-based Pronunciation Modeling for Speech Recognition</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> We have motivated our pronunciation model as part of an overall strategy of feature-based speech recognition.</Paragraph> <Paragraph position="1"> One way in which this model could fit into a complete recognizer is, as mentioned above, by adding a variable A representing the acoustic observations, with the Sj as its parents. The modeling of p(AjS1; : : : ; SM ) (where M is the number of features) is a significant problem in its own right. Alternatively, as this study suggests, there may be some benefit to this type of model even if the acoustic model is phone-based. One possible setup would be to use a phonetic recognizer to produce a phone lattice, then convert the phones into features and proceed as in our Switchboard experiments.</Paragraph> <Paragraph position="2"> Thus far we have not trained the variable distributions. With the exception of the sync variables, these can be trained from feature transcriptions (i.e. Sj observations) using the Expectation-Maximization (EM) algorithm (Dempster et al., 1977). In the absence of actual feature transcriptions, they can be approximated by converting detailed phonetic transcriptions, as we have done in our decoding experiments above. The sync distributions cannot be trained via EM, since they are always observed with value 1. They can either be treated as experimental parameters or trained discriminatively. We are currently working on a new formulation in which the synchronization constraints can be trained via EM.</Paragraph> <Paragraph position="3"> In addition, we are currently investigating extensions to the model, including context-dependent feature substitutions. We also plan to extend this study to a larger data set and to multi-word utterances.</Paragraph> </Section> class="xml-element"></Paper>