File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1097_metho.xml

Size: 3,075 bytes

Last Modified: 2025-10-06 14:12:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1097">
  <Title>CORPUS COLLECTION FOR ATIS</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CORPUS COLLECTION FOR ATIS
Jared Ber~tein
SRI International
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> The project goal is to collect and deliver a corpus of speech data that supports DARPA SL~ system development. As of February 1991, SRI has set up a hardware and software environment for the collection of spoken interactions with a simulated Air Travel Information System (ATIS), established a data collection procedure, collected and dis~buted prototype data, and evaluated the prototype data with feedback from the SIS system developers. Having implemented revisions in the environment and procedures, SKI has begun collecting and distributing a corpus of data for ATIS SLS development.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> Completed a plan for the interface to the relational database, the collection of the prototype and production data, and the subject environment.</Paragraph>
    <Paragraph position="1"> Collected 10 prototype subject sessions, prepered speech and auxiliary files, and shipped data to NIST for disUibution to interested SLS developers. Interacted with NIST and with SIS sites to refine certain aspects of the data collection environment and procedures.</Paragraph>
    <Paragraph position="2"> Modified and augmented the tools used in data collection and file preparation; e.g., automated parts of the transcription task and the derivation of additional auxiliary files, and augmented wizard tools to accelerate database responses.</Paragraph>
    <Paragraph position="3"> Provided yield and cost estimates for revised transcription protocols and for extended categorization of utterances.</Paragraph>
    <Paragraph position="4"> Shipped 35 subject sessions to N\[ST. Recorded and Wanscribed sessions, generated auxiliary files, prepared session logs and categorized utterances; checked and prepared material for shipment to NIST.</Paragraph>
    <Paragraph position="5"> Shipped 32 more subject sessions to NIST. Categorized, prepared auxiliary flies for, checked, and shipped 32 sessions previously recorded and transcribed in summer 1990 under SKI's ATIS .SL.S contract.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="423" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> * Resume and accelerate data collection in the ATIS domain.</Paragraph>
    <Paragraph position="1"> * Document systems and procedures in preparation for export of the wizard data collection system.</Paragraph>
    <Paragraph position="2"> * Work with NIST and the DARPA community to define and implement new speech corpus collections.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML