File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1053_evalu.xml
Size: 2,466 bytes
Last Modified: 2025-10-06 14:00:16
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1053"> <Title>STATISTICAL LANGUAGE PROCESSING USING HIDDEN UNDERSTANDING MODELS</Title> <Section position="9" start_page="281" end_page="281" type="evalu"> <SectionTitle> 6 EXPERIMENTAL RESULTS </SectionTitle> <Paragraph position="0"> We haw: implemented a hidden understanding system and performed a variety of experiments. In addition, we participated in the 1993 ARPA ATIS NL evaluation.</Paragraph> <Paragraph position="1"> One experiment involved a 1000 sentence ATIS corpus, annotated according to a simple specialized sublanguage model. To annotate the training data, we used a bootstrapping process in which only the first 100 sentences were annotated strictly by hand. Thereafter, we worked in cycles of: 1. Running the training program using all available annotated data.</Paragraph> <Paragraph position="2"> 2. Running the understanding component to annotate new sentences.</Paragraph> <Paragraph position="3"> 3. Hand correcting the new annotations.</Paragraph> <Paragraph position="4"> Annotating in this way, we found that a single annotator could produce 200 sentences per day. We then extracted the first 100 sentences as a test set, and trained the system on the remaining 900 sentences. The results were as follows: * 61% matched exactly.</Paragraph> <Paragraph position="5"> * 21% had correct meanings, but did not match exactly. * 28% had the wrong meaning.</Paragraph> <Paragraph position="6"> Another experiment involved a 6000 sentence ATIS corpus, annotated according to a more sophisticated meaning model. In this experiment, the Delphi system automatically produced the annotation by printing out its own internal representation for each sentence, converted into a more readable form. We then removed 300 sentences as a test set, and trained the system on the remaining 5700. The results were as follows: For the ARPA evaluation, we coupled our hidden understanding system to the discourse and backend components of the Delphi system. Using the entire 6000 sentence corpus described above as training data, the system produced a score of 23% simple error on the ATIS NL evaluation. By examining the errors, we have reached the conclusion that nearly half are due to simple programming issues, especially in the interface between Delphi and the hidden understanding system. In fact, the interface was still incomplete at the time of the evaluation.</Paragraph> </Section> class="xml-element"></Paper>