File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1117_metho.xml
Size: 4,954 bytes
Last Modified: 2025-10-06 14:13:14
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1117"> <Title>ANNOTATION OF ATIS DATA</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> ANNOTATION OF ATIS DATA </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> The performance of spoken language systems on utterances from the ATIS domain is evaluated by comparing system-produced responses with hand-crafted (and -verified) standard responses to the same utterances. The objective of SRI's annotation project is to provide SLS system developers with the correct responses to human utterances produced during experimental sessions with ATIS domain interactive systems. These correct responses are then used in system training and evaluation.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> In a previous project, SRI devised a set of procedures for transcribing and annotating utterances collected during interactions between human subjects and a simulated voiceinput computer system that provided information on air travel, i.e. utterances in the ATIS domain. During spring of 1991, it was decided to expand the collection of these human-machine interactions so that most DARPA speech and natural language sites would be collecting this type of data. However, SRI was to remain the only site providing the 'standard' answers.</Paragraph> <Paragraph position="1"> At the start of the project, a basic set of principles for interpreting the meaning of ATIS utterances was agreed upon by the DARPA community and documented in a networkaccessible file known as the Principles of Interpretation. Initial annotation procedures used at that time at SRI were documented in a net note dated July 12, 1991.</Paragraph> <Paragraph position="2"> During the earlier project, SPRI had installed software that produced answer files in the format required by NIST. The essential component of the software used by the annotators was and is NLParse, a menu-driven program developed by Texas Instruments that converts English-like sentences into database queries expressed in SQL.</Paragraph> <Paragraph position="3"> As standard responses were generated for use in system training, some aspects of the Principles oflnterpretation were changed. This process has continued throughout the project. In July, SRI worked with NIST to establish a committee of representatives from each data collection site to modify the Principles of Interpretation document as needed. The SRI annotators have worked closely with this committee, contributing knowledge of the data corpus gained in the annotation process.</Paragraph> <Paragraph position="4"> Software used in the production of the standard response files was modified and expanded upon. SRI modified NLParse itself to accommodate changes in the software environment and new testing roles which limited the size of legal answers. Also, a few high-level programs were written to drive and monitor the results of the various low-level routines that had been used in SRI's previous project. This consolidation eliminated the need for annotators to monitor the process at each stage, thus eliminating opportunities for human error.</Paragraph> <Paragraph position="5"> A dry-run system evaluation was held in October, 1991, for which SRI produced the standard responses. The dry run offered an opportunity to measure the real accuracy of a sample of 'standard' responses in the annotated data. In the dry run test, about 6% of the annotations were incomplete or inappropriate in some way; some due to human error and some to software error. In an effort to improve data quality, SRI revised its human checking procedures and added new checking programs to the software involved in the production of the answer files. It had originally been hoped that human double checking could be decreased after an imtial period of annotation, but based on the adjudication of the dry run data, 100% double checking has continued.</Paragraph> <Paragraph position="6"> Since June, 1991, SRI has produced classification and response files for 8000 utterances of training data, and nearly 1000 utterances of test data. Currently, we annotate about 190 utterances per annotator-week.</Paragraph> <Paragraph position="7"> For the first official system evaluations in February, 1992, SRI again worked with NIST to produce the standard response files.</Paragraph> </Section> <Section position="4" start_page="0" end_page="484" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> In the next year, SRI will work with NIST and the DARPA community to develop and implement more efficient evaluation procedures for SLS systems.</Paragraph> </Section> class="xml-element"></Paper>