File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1074_intro.xml
Size: 5,839 bytes
Last Modified: 2025-10-06 14:05:16
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1074"> <Title>CSR Corpus Development</Title> <Section position="3" start_page="364" end_page="366" type="intro"> <SectionTitle> (&quot;EVALUATION TEST&quot;). </SectionTitle> <Paragraph position="0"> The CSR February 1992 dry run evaluation The recommended baseline performance evaluations were defined by selection of training data set(s), testing data set(s), recognition conditions (vocabulary and language model), and scoring conditions. In the course of discussion on these issues it became clear that consensus was not possible on definition of a single set of evaluation conditions. This was in addition to the distinct differences between speaker-dependent (SD) and speaker-independent (SI) evaluation data and conditions. Some committee members felt that there should be no constraint on training material, to allow as much freedom as possible to improve performance through training data. Others believed strongly that calibration of performance improvement was paramount and therefore all sites should be required to use a single baseline set of training data. In the end, the committee was able only to identify a number of different training and test conditions as &quot;recommended&quot; altematives for a baselnie evaluation.</Paragraph> <Paragraph position="1"> For training the recommended SI training corpus comprised 7240 utterances from 84 speakers. The recommended SD training corpus comprised the 600 training sentences for each of the 12 SD speakers. For the large-data speaker-dependent (LSD) training condition, the recommended SD training corpus comprised the 2400 training sentences for each of the 3 LSD speakers.</Paragraph> <Paragraph position="2"> For testing there were a total of 1200 SI test utterances and 1120 SD test utterances. These data comprised, similarly and separately for SI and SD recognition, approximately 400 sentences constrained to a 5000-word vocabulary, 400 sentences unconstrained by vocabulary, 200 sentences of spontaneous dictation, and these 200 sentences as read later from a prompting text.</Paragraph> <Paragraph position="3"> The vocabulary and language models used for the abovedefined test sets were either unspecified (for the spontaneous and read versions of the spontaneous dictation), or were the 5000-word vocabulary and bigram grammar as supplied by Doug Paul from an analysis of the preprocessed WSJ corpus. (Actually, two different sets of bigram model probabilities were used, one modeling verbalized punctuation and one modeling nonverbalized punctuation. These two were used appropriately for the verbalized and nonverbalized punctuation portions of the test sets, respectively.) Given the rather massive computational challenge of training and testing in such a new recognition domain, with larger vocabulary and greater amount of test data, not all of the test material was processed by all of the sites performing evaluation. Also, because of the variety of training and evaluation conditions, few results were produced that could be compared across sites. Two test sets, however, were evaluated on by more than a single site: Two sites produced results on the SD 5000-word VP test set (Dragon and Lincoln), and three sites produced results on the SI 5000-word VP test set (CMU, Lincoln, and SRI). These results are given in a companion paper on &quot;CSR Pilot Corpus Performance Evaluation&quot; by David Pallett.</Paragraph> <Paragraph position="4"> Future CSR corpus effort and issues Several issues have been identified that bear on the CSR corpus and on potential changes in the design of the corpus: * Verbalized punctuation. There is a significant argument to discontinue verbalized punctuation, for several reasons: It doubles the number of language models and test sets and thus the number of evaluation conditions. It is artificial in the sense that it is statistically unlike normal dictation, it is more difficult for many subjects to read, and it seems superfluous to the development of the underlying speech recognition technology.</Paragraph> <Paragraph position="5"> * Preprocessed prompting text. There is argument to prompt the user with the natural unpreprocessed text from the WSJ rather than with the preprocessed word strings as produced by the text preprocessor.</Paragraph> <Paragraph position="6"> The reason is that the word strings do not represent the actual statistics of natural speech (see the companion paper by Phillips et. al entitled &quot;Collection and Analyses of WSJ-CSR Data at MIT&quot;).</Paragraph> <Paragraph position="7"> * Spontaneous speech. There is argument that the current paradigm for collecting spontaneous speech is not adequately refined to represent those aspects of spontaneous speech that are important in actual usage, and that spontaneous speech should remain in an experimental and developmental mode during the next CSR corpus phase.</Paragraph> <Paragraph position="8"> * Adaptation. Speaker adaptation and adaptation to the acoustical environment has emerged as a major interest. It is clear that adaptive systems must be accommodated in the next phase of the CSR corpus.</Paragraph> <Paragraph position="9"> * CSR corpus development effort. It is acknowledged that the CSR corpus development effort is a key activity in the support and direction of CSR research, and that this effort therefore requires program continuity and should not be treated as an occasional production demand that can be easily started and stopped.</Paragraph> <Paragraph position="10"> These issues are currently under debate in the CCCC, and the next installment of the CSR corpus, to be called the CSR corpus, phase two, will no doubt reflect a continued distillation of opinion on these issues.</Paragraph> </Section> class="xml-element"></Paper>