File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1029_intro.xml
Size: 7,491 bytes
Last Modified: 2025-10-06 14:04:59
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1029"> <Title>COLLECTION AND ANALYSIS OF DATA FROM REAL USERS: IMPLICATIONS FOR SPEECH RECOGNITION/UNDERSTANDING SYSTEMS</Title> <Section position="3" start_page="0" end_page="164" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Speech recognition/understanding systems will ultimately establish their usefulness by working well under real application conditions. Success in the field will depend not only on the technology itself but also on the behavior of real users. Real user behavior can be characterized in terms of 1. what people say and 2.</Paragraph> <Paragraph position="1"> how they say it.</Paragraph> <Paragraph position="2"> What people say: Real user compliance Until the advent of a high performance continuous speech, unconstrained vocabulary/grammar, interactive speech understanding system, users must constrain their spoken interactions with speech recognition/understanding systems. Constraints may require speaking words in isolation, conforming to a limited vocabulary or grammar, restricting queries to a particular knowledge domain, etc. The users' willingness to comply with instructions specifying these constraints will determine the success of the technology. If users are willing or even able to confine themselves to one of two words (e.g., yes or no), a two-word speech recognition system may succeed. If users are non-compliant (e.g., say the target words embedded in phrases, say synonyms of the target words, reject the service as a result of the constraining instructions), the technology will fail in the field; despite high accuracy laboratory performance.</Paragraph> <Paragraph position="3"> How compliant are real users? The answer may be applicationspecific, dependent on particulars such as 1. frequency of repeat usage of the system, 2. motivation of the users, 3. cost of an error, 4. nature of the constraint, etc. It would be useful to understand the factors that predict compliance, and to know whether generalizations can be made across applications. In addition, it would be useful to have a better understanding of how to maximize user compliance.</Paragraph> <Paragraph position="4"> Moreover, there is value in analyzing non-compliant behavior.</Paragraph> <Paragraph position="5"> To the extent that non-compliance takes the form of choosing synonyms of the target words, the recognizer's vocabulary must be expanded. If non-compliance takes the form of embedding the target word in a phrase, word spotting or continuous speech recognition is required. If non-compliance is manifested by the user consistently wandering outside the knowledge domain of the speech recognition/understanding system, better instructions may be required. Data from real users should provide researchers and developers with the information necessary to both specify and develop the technology required for successful deployment of speech recognition/understanding systems.</Paragraph> <Paragraph position="6"> How people speak: Real user speech It seems intuitively obvious that to maximize the probability of successfully automating an application with speech recognition, a recognizer should be trained and tested on real user speech. This requires the collection of data from casual users interacting with an automated or pseudo-automated system, thereby producing spontaneous goal-directed speech under application conditions.</Paragraph> <Paragraph position="7"> These databases can be difficult and expensive to collect and so it is not surprising that speech recognition systems are most typically trained and tested on speech data collected under laboratory conditions. Laboratory databases can be gathered relatively quickly and inexpensively by recording speech produced by cooperative volunteers who are aware that they are participating in a data collection exercise. But these databases typically have relatively few talkers and speech that is recited rather than spontaneouslyproduced. null Potential differences between real user and laboratory speech databases would be of little interest if speech recognition systems were performing as well in field applications as they are in the laboratory. However, there is data to suggest that this is not the case; systems performing well in the laboratory often achieve significantly poorer results when confronted with real user data \[1,2\].</Paragraph> <Paragraph position="8"> A number of features that differentiate real user from laboratory database collection procedures may have an impact on the performance of speech recognition systems. One that has received specific attention in the literature is that of spontaneously-produced vs. read speech. Jelinek et al. \[3\] compared the performance of a speech recognition system when tested on pre-recorded, read and spontaneous speech produced by five talkers. Results indicate decreasing performance for the three sets of test material (98.0%, 96.9% and 94.3% correct, respectively). Rudnicky et al. \[4\] evaluated their speech recognition system on both read and spontaneous speech produced by four talkers and found that performance was roughly equal for the two data sets (94.0% vs.</Paragraph> <Paragraph position="9"> 94.9% correct, respectively). It is important to note, however, that the spontaneous speech used for this comparison was &quot;live clean speech&quot; defined as &quot;only those utterances that both contain no interjected material (e.g., audible non-speech) and that are grammatical&quot;. Degradation in performance was indeed seen when the test set included all of the &quot;live speech&quot; (92.7%). Zue etal. \[5\] also evaluated their speech recognition system on read and spontaneous speech samples. Word and sentence accuracy were similar for the two data sets. For each of these studies, 'real user' speech samples were recorded under wideband application-like conditions. For at least two of the studies (\[4\], \[5\]), the 'real users' were apparently aware that they were participating in an experiment.</Paragraph> <Paragraph position="10"> It has not been possible to collect databases that are matched with respect to speakers for telephone speech, probably because the anonymity of the users of telephone services makes it difficult to obtain read versions of spontaneously-produced speech from the same set of talkers. Therefore, there is little published data on the effects of read vs. spontaneous speech on the performance of recognition systems for telephone applications. Differences in speakers not withstanding, there is recent data to suggest that recognition performance can be significantly poorer when testing on real user telephone speech as compared to tests using telephone speech collected under laboratory conditions (\[1\], \[2\]).</Paragraph> <Paragraph position="11"> In summary, laboratory and real user behavior can be characterized along at least two important dimensions: compliance and speech characteristics. To gain a better understanding of how to improve the field performance of speech recognition/understanding systems, we have been collecting and analyzing both laboratory and real user data. The goal of this paper is to summarize our work in the analysis of 1. real user compliance for telephone applications and 2. laboratory vs. real user speech data for the development of speech recognition/understanding systems.</Paragraph> </Section> class="xml-element"></Paper>