File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/a92-1005_evalu.xml
Size: 2,902 bytes
Last Modified: 2025-10-06 14:00:09
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1005"> <Title>Real-time linguistic analysis for continuous speech understanding*</Title> <Section position="6" start_page="37" end_page="37" type="evalu"> <SectionTitle> 6 Experimental results </SectionTitle> <Paragraph position="0"> In order to evaluate the performance of a speech understanding system it is necessary to define some metric.</Paragraph> <Paragraph position="1"> Unfortunately, metrics are still far from standards in this field. Let us briefly describe the measures used in our evaluation and shown in Table 5. Understood refers to the percentage of correctly understood sentences. We define that a sentence has been understood if the word sequence selected by the parser and refined by the feedback verification procedure (if applied) is equal to the uttered sentence or differs from it only for short function words that are not essential for understanding.</Paragraph> <Paragraph position="2"> The failure rate is the percentage of sentences for which no result has been obtained by the parser within the real-time imposed constraints. The misunderstood case arises when the selected solution is not the uttered one.</Paragraph> <Paragraph position="3"> Note that failures and misunderstandings have not the same effect: in fact in the case of failure the system is aware of not having understood the question and in a dialogue system the failure can activate a recovery action.</Paragraph> <Paragraph position="4"> The parser has been implemented using the C language and presently runs on a Sun SparcStation 1.</Paragraph> <Paragraph position="5"> Experiments have been performed starting from 60C lattices produced by the recognition system from 60C different sentences uttered by 10 speakers and pertaining to the voice access to E-mail messages. The recognizer \[Fissore et al. 1989\] employs 305 context-dependent units, each of which is represented by a 3state discrete density HMM. HMMs are trained with 8800 sentences uttered by 110 speakers. The speech signal, recorded from a PABX, is low-pass filtered at kHz and sampled at 16 kHz. Features, computed every 10 ms time frame, include 12 cepstrum and 12 delta, cepstrum coefficients, plus energy and delta-energy.</Paragraph> <Paragraph position="6"> urations, each evaluated with the feedback verificatiol procedure disactivated (no vet.) or activated (verify) The first configuration is the baseline one, in which lattice is analyzed as described in the above sections In the second configuration, we add into the lattice th, best-scored sequence of words initially found by the rec ognizer as a side-effect of its analysis. This sequence though rarely correct, takes better into account inter word coarticulation and hence may contribute to th, overall accuracy. In both configurations the maximur processing time is 5 seconds.</Paragraph> </Section> class="xml-element"></Paper>