File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1076_intro.xml
Size: 1,298 bytes
Last Modified: 2025-10-06 14:05:19
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1076"> <Title>SPONTANEOUS SPEECH COLLECTION FOR THE CSR CORPUS</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2. INTRODUCTION </SectionTitle> <Paragraph position="0"> The CSR (Continuous Speech Recognition) Corpus collection can be considered the successor to the Resource Management (RM) corpus\[l\], it focuses on the further development of speech recognition technology toward larger or open vocabularies, speaker and task independence, and is moving toward spontaneous speech. The default task in the pilot collection has been dictation of newspaper articles as if for the Wall Street Journal (WSJ).</Paragraph> <Paragraph position="1"> Thus, the largest part of the effort to collect a pilot version of the CSR corpus has been recording people reading selected short passages from the WSJ itself. The pilot CSR corpus was designed, however, such that a significant portion of the material was to be spontaneous and a subset of the speakers who read WSJ texts also were asked to dictate spontaneous articles in the WSJ style.</Paragraph> <Paragraph position="2"> This paper describes the methods used in the collection of the spontaneous portion of the CSR corpus.</Paragraph> </Section> class="xml-element"></Paper>