File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/86/j86-1002_evalu.xml
Size: 9,847 bytes
Last Modified: 2025-10-06 14:00:00
<?xml version="1.0" standalone="yes"?> <Paper uid="J86-1002"> <Title>THE CORRECTION OF ILL-FORMED INPUT USING HISTORY-BASED EXPECTATION WITH APPLICATIONS TO SPEECH UNDERSTANDING</Title> <Section position="11" start_page="4257" end_page="4257" type="evalu"> <SectionTitle> 6 EXPERIMENTAL RESULTS </SectionTitle> <Paragraph position="0"> An experiment was run using VNLCE to test the error correction capabilities in different situations. These situations were simulated by making the test subjects perform certain tasks on the system that resulted in different dialogue structures, or schemas. The four tests made on VNLCE in this experiment are considered to be representative of the possible schemas that can be produced by different dialogues in different situations.</Paragraph> <Paragraph position="1"> All possible dialogue schemas actually produce a continuum of patterns from totally-ordered to totally-unordered. The tests described below are simply points on this continuum.</Paragraph> <Paragraph position="2"> I) Totally-Ordered Schema This type of schema occurs whenever the system has at most two sentences at a time in its expected sentence set and one of these always has a probability rating over 80%.</Paragraph> <Section position="1" start_page="4257" end_page="4257" type="sub_section"> <SectionTitle> II) Partially-Ordered Schema </SectionTitle> <Paragraph position="0"> In this case, there is a general order to the sentences being spoken, but there is not usually just one highly probable sentence in the expected sentence set at a time, but several with varying degrees of probability.</Paragraph> <Paragraph position="1"> III) Totally-Unordered Schema This occurs when there is no over-all order to the sentences being spoken. Essentially any sentence in Computational Linguistics, Volume 12, Number 1, January-March 1986 23 Pamela K. Fink and Alan W. Biermann The Correction of Ill-Formed Input the expectation dialogue has a probability of being spoken next.</Paragraph> <Paragraph position="2"> IV) Totally-Ordered Schema with Arguments This test is an example of a totally-ordered schema, but the system does not know exactly what will be said all the time because one or more of the expected sentences contain an argument.</Paragraph> <Paragraph position="3"> Each of the four tests was run on three different test subjects to acquire data concerning how fast a user speaks, what types of errors are produced by the voice recognizer, and how well the expectation system acquires and uses the expected dialogue to help error correct the input.</Paragraph> <Paragraph position="4"> To begin the experiment session, the subject trained the voice recognizer, a NEC DP-200, on a specific vocabulary of 49 different words in connected speech mode. The DP-200 can handle only 150 word slots in connected speech mode, so 49 allowed for some repetitive training. The subject was then given a brief tutorial that lead him/her through a few features of the VNLCE system and gave him/her some practice in talking to the NEC device. This training session usually took a total of about 45 minutes. The subject was then given one or more of the test sheets representing the problems to be solved. The number was based on the amount of time that the subject was willing to donate to the effort.</Paragraph> <Paragraph position="5"> Each test dialogue had a similar over-all structure in that it required a certain amount of repetition, thus creating a loop structure in the expected dialogue. In all tests, except test II, the subject was provided with the specific sequence of sentences to be spoken. This guaranteed that the desired level of repetition was actually achieved. How much repetition there was in each dialogue depended on the expected dialogue schema being imitated. In test I, which was done to demonstrate a totally-ordered schema, the test subject had to repeat an identical sequence of six sentences nine times in a row except for the seventh time when four new sentences were inserted into the loop. A sample schema can be seen in Figure 7. In test II, the user had much more freedom, since its purpose was to demonstrate a partially-ordered schema. Here the subject had to solve six sets of simultaneous linear equations with two equations and correction works when the dialogue seems random, creating a totally-unordered schema. To create such an environment, the user was asked to repeat four sentences in random order eight times. An example expected dialogue schema that resulted from this test is shown in Figure 9. In the last test, test IV, the subject was asked to repeat a sequence of four sentences six times, each time through changing the value of the row number spoken. This demonstrates the argument creation facility in a totally-ordered dialogue schema. The expected dialogue generated from this test appears in Figure 10.</Paragraph> <Paragraph position="6"> Each test has associated with it three charts indicating the results. The first graph represents the average sentence error and correction rates, the second shows the average word error and correction rates, while the third illustrates the average rate-of-speech in words-per-second spoken by the subject while doing the experiment. The charts indicating the average error and correction rates of the four tests reflect the loop structure of the dialogues. Each chart is a series of bar graphs, each bar graph representing the average error and correction rates over the sentences spoken by the subjects in a particular loop of the dialogue. The highest point on each of these bars represents the raw error rate of the voice recognizer. The different markings within the bars themselves represent the percentage of the errors that were corrected by a particular facility of the expectation system. The horizontal design associated with &quot;loosening&quot; indicates the percentage of the errors that were corrected by the use of the flexible parsing techniques, such features as the synophones and the parser commands SKIPWORD, EXTRAWS, and LOSTWS. The vertical design associated with expectation indicates the percentage of the errors that were corrected by use of the expected sentence set alone. The blank area indicates the percentage of the errors that were corrected by using both of the above facilities. Finally, the dot design shows the percentage of the errors that were not corrected. Thus, for example, in the top chart in Figure 11, the eighth loop of the dialogue had an 85% sentence error rate from the voice recognizer. Of those errors, 6% were corrected using the facilities associated with loosening the search, while 25% were corrected by using only expectation. Another 63% were corrected using features from both categories. Only 6% could not be corrected.</Paragraph> <Paragraph position="7"> Test I, using a totany-ordered dialogue schema, was done to show how well the expectation system can error correct errorful input when it can predict exactly what will be said next. As can be seen from the graphs in Figure 11, as the ability to predict what will be said next increases, so does the ability to error correct. In loop seven of the dialogue, we deliberately had each user add four extra sentences between the fourth and fifth sentences of the loop. This was done to show that the expectation system had not become a complete automaton, but that it was still capable of dealing with unexpected input. However, as can be seen from these graphs (Figure 11), the expectation system's error correcting power decreases in that particular loop of the dialogue since there is no expectation at certain points to help it. Test II, creating a partially-ordered dialogue schema, was done to show how the expectation acquisition algorithm dealt with dialogues containing some pattern and to see how well error correction could work when expectation was not perfect. The results are shown in Figure 12. Test III demonstrates the error correction capabilities of the system when expectation only knows that one of a group of sentences will be said next. It produces a totally-unordered dialogue schema. The results of the systems error correction capabilities in such a situation appear in Figure 13.</Paragraph> <Paragraph position="8"> Test IV uses a totally-ordered dialogue schema, but with a variation from test I. Each sentence sooner or later contains an argument so that the system does not know everything about the sentence that will be said next. The data given in Figure 14 shows the error correction rates for this dialogue. It clearly shows how error correction failures increase until after the third loop when argument creation begins so that the system no longer error corrects incorrectly.</Paragraph> <Paragraph position="9"> Figure 15 shows the graphs of the average speech rate of the speakers for each of the four tests. Like the other eight graphs, these graphs reflect the loop structure of the dialogues. As can be seen, the speakers tended to increase their speech rate as they talked to the system.</Paragraph> <Paragraph position="10"> This behavior was hoped for because as the speech rate increased, so did the error rate of the speech recognizer, thus placing more of a burden on the error correcting abilities of the expectation system. Note that, in all eight graphs in Figures 11 through 14, the word and sentence error rates from the voice recognizer generally increased with the progress through the dialogue. This is due to the increased rate of speech. However, the actual failure rate of VNLCE did not increase by the same amount. These extra errors were corrected by the expectation system.</Paragraph> <Paragraph position="11"> Figure 16 gives a summary of the average error and correction rates for each test and over all.</Paragraph> </Section> </Section> class="xml-element"></Paper>