File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/a92-1005_intro.xml
Size: 4,446 bytes
Last Modified: 2025-10-06 14:05:06
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1005"> <Title>Real-time linguistic analysis for continuous speech understanding*</Title> <Section position="3" start_page="0" end_page="33" type="intro"> <SectionTitle> 2 Recognition and understanding activities </SectionTitle> <Paragraph position="0"> Speech understanding requires the use of different pieces of knowledge. Consequently, it is not obvious a priori what type of architecture will give the best results.</Paragraph> <Paragraph position="1"> Homogeneous, knowledge-based architectures date back to the late 1970s \[Erman et ai. 1980\] and spurred interesting research work in the subsequent years. However, unified approaches contain a weakness: they have difficulty in coping with problems of different nature through specific, focused techniques. A division may be traced between lower-level processing of speech, mostly based on acoustical knowledge, and upper-level processing, mostly based on natural language knowledge.</Paragraph> <Paragraph position="2"> Therefore, a two-level architecture has been developed based on this idea \[Fissore et ai. 1988\]. The former stage, called recognition stage (Fig. la), hypothesizes a set of words all over the utterance and feeds the lat- null ter stage, or understanding stage, which completes the recognition activity by finding the most plausible word sequence and by understanding its meaning. In this way each level can focus on its own basic problems and develop specific techniques, still maintaining the advantage of the integration.</Paragraph> <Paragraph position="3"> Most of the approaches based on this idea (e.g.</Paragraph> <Paragraph position="4"> \[Hayes et al. 1986\]) are characterized by the use of knowledge engineering techniques at both levels, while our recognition stage is based on a probabilistic technique, the hidden Markov models (HMM). The most recent research indicates that, as far as word recognition is concerned, the HMM give the best results \[Lee 1990, Fissore et al. 1989\].</Paragraph> <Paragraph position="5"> The set of word hypotheses produced by the recognition stage is called lattice (Fig. lb). Every word hypothesis is characterized by the starting and ending points of the utterance portion in which it has been spotted, and its score, expressing its acoustic likelihood, i.e. a measure of the probability for the word of having been uttered in that position. Many more hypotheses than the actually uttered words are present in the lattice (there are about 30 times as many word hypotheses as there are words), and they are overlapping on one another.</Paragraph> <Paragraph position="6"> The aim of the understanding stage is then twofold: on one side it has to complete the recognition task by extracting the correct word sequence out of the lattice; on the other it has to understand the sequence meaning.</Paragraph> <Paragraph position="7"> In practice these two activities are performed simultaneously. The correct word sequence extracted by the understanding stage may be fed back to the recognizer (Fig. la) for a post-processing phase called feedback verification, described below, aimed at increasing the understanding accuracy.</Paragraph> <Paragraph position="8"> The problem of analyzing lattices is considered from the natural language perspective: the goal is to develop techniques to process typed input and to extend them in order to process a &quot;corrupted&quot; form of input such as a lattice is. The understanding stage result called solution is a sequence of word hypotheses spanning the whole utterance time so that 1) the sentence is syntactically correct and meaningful according to the linguistic knowledge of the understanding stage, and 2) it ha~ the best acoustical score among all of the possible sequences that satisfy point 1). The great problem is thai the search for a solution cannot be made exhaustively: since the lattice contains many incorrect word hypotheses, there would be far too many admissible word combinations to examine. In addition there is the risk oJ incorrect understanding due to the possible selection o ~ even only one incorrect word hypothesis. Coping witt them imposes to carefully design linguistic knowledg~ representation methods and analysis control strategie., in order to gain in both efficiency and correct under.</Paragraph> <Paragraph position="9"> standing reliability.</Paragraph> </Section> class="xml-element"></Paper>