File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1038_intro.xml
Size: 3,240 bytes
Last Modified: 2025-10-06 14:05:18
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1038"> <Title>Recognition Using Classification and Segmentation Scoring*</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Although hidden-Markov-model (HMM) based speech recognition systems have achieved very high performance, it may be possible to improve on their performance by addressing the known deficits of the HMM.</Paragraph> <Paragraph position="1"> Perhaps the most obvious weaknesses of the model are the reliance on frame-based feature extraction and the assumption of conditional independence of these features given an underlying state sequence. The assumption of independence disagrees with what is known of the actual speech signal, and when this framework is accepted, it is difficult to incorporate potentially useful measurements made across an entire segment of speech. Much of the linguistic knowledge of acoustic-phonetic properties of speech is most naturally expressed in such segmental measurements, and the inability to use such measurements may represent a significant loss in potential performance. null In an attempt to address this issue, a number of models have been proposed that use segmental features as the basis of recognition. Although these models allow the use of segmental measurements, they have not yet achieved significant performance gains over HMMs *This research was jointly funded by NSF and DARPA under NSF grant number IRI-8902124.</Paragraph> <Paragraph position="2"> because of difficulties associated with modeling a variable length observation with segmental features. Many of these models represent the segmental characteristics as a fixed-dimensional vector of features derived from the variable-length observation sequence. Although such features may work quite well for classification of individual units, such as phonemes or syllables, it is less obvious how to use fixed-length features to score a sequence of these units where the number and location of the units is not known. For example, simply taking the product of independent phoneme classification probabilities using fixed length measurements is inadequate. If this is done, the total number of observations used for an utterance is F x N, where F is the fixed number of features per segment and N is the number of phonemes in the hypothesized sentence. As a result, the scores for hypotheses with different numbers of phonemes will effectively be computed over different dimensional probability spaces, and as such, will not be comparable. In particular, long segments will have lower costs per frame than short segments. null In this paper, we address the segment modeling problem using an approach that decomposes the recognition process into a segment classification problem and a segmentation scoring problem. The explicit use of a classification component allows the direct use of segmental measures as well as a variety of classification techniques that are not readily accommodated with other formulations. The segmentation score component effectively normalizes the scores of different length sequences, making them comparable.</Paragraph> </Section> class="xml-element"></Paper>