File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1071_intro.xml
Size: 2,121 bytes
Last Modified: 2025-10-06 14:05:01
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1071"> <Title>Collection of Spontaneous Speech for the ATIS Domain and Comparative Analyses of Data Collected at MIT and TI 1</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> AWLS, or Air Travel Information Service, is the designated common task of the DARPA Spoken Language Systems (SLS) Program \[8\]. As part of our development of a spoken language system in this domain, we have recently begun a small-scale effort in collecting spontaneous speech data. This effort is motivated partly by our desire to contribute to the data collection efforts already underway elsewhere \[4,2,1\], so that more data can be available to the community sooner for system development, training, and evaluation. In addition, we were interested in exploring various alternatives of the data collection procedure itself. It is our belief that we as a community do not fully understand how goal-directed spontaneous speech should best be collected. This is not surprising, since we have little experience in this area. Nevertheless, data collection is an impori~ant area of research for the SLS Program, since the type of data that we collect will directly affect the capabilities of systems that we develop, and the evaluations that we can perform. Therefore, we thought it would be appropriate to experiment with different aspects of this process. There is evidence that even very small changes in the procedure, such as the instructions to the subject, can drastically alter the nature of the data collected \[l\].</Paragraph> <Paragraph position="1"> The paper is organized as follows. We will first discuss some methodological considerations that led to the particular collection procedure that we adopted. We will then briefly describe the procedure itself. This will be followed by some comparative analyses of a subset of the data that we have collected with those collected at Texas Instruments (TI). Implications of our findings will be discussed.</Paragraph> </Section> class="xml-element"></Paper>