File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/89/h89-1017_evalu.xml

Size: 2,390 bytes

Last Modified: 2025-10-06 14:00:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1017">
  <Title>THE MINDS SYSTEM: USING CONTEXT AND DIALOG TO ENHANCE SPEECH RECOGNITION</Title>
  <Section position="12" start_page="134" end_page="134" type="evalu">
    <SectionTitle>
EVALUATION
</SectionTitle>
    <Paragraph position="0"> To test the ability of our layered predictions to both reduce search space and to improve speech recognition performance, we used an independent test set. This means that the utterances processed by the system were never before seen by the system or its developers. Additionally, the test set did not include any clarification dialogs. We used ten speakers (8 male, 2 female) who had not been used to train the recognizer. Each speaker read 20 sentences from adapted (to be consistent with the CMU database) versions of three test scenarios provided by the Navy. Each of these utterances was recorded. The speech recordings were then run through the SPHINX recognition system in two conditions: * using the system grammar (all legal sentences) * using the grammar from the successful prediction layer merged with all unsuccessful layers The results can be seen in Table 1. As can be seen, the system performed significantly better with the  predictions. Error rate decreased by a factor of five. Perhaps more important, however, is the nature of the errors. In the &amp;quot;layered predictions&amp;quot; condition, 89 percent of the insertions and deletions were the word &amp;quot;the&amp;quot;. Additionally, 67 percent of the substitutions were &amp;quot;his&amp;quot; for &amp;quot;its&amp;quot;. Furthermore, none of the errors in the &amp;quot;layered predictions&amp;quot; condition resulted in an incorrect database query. Because both our database and the Navy's database shared the same fields and were implemented using Informix TM, we could directly assess the accuracy of the SQL database queries to Informix. Hence, semantic accuracy, defined as a correct database query, was 100% in the &amp;quot;layered prediction&amp;quot; condition. Finally, we assessed the percentage of false alarms, where the recognizer output a sequence of words deemed acceptable from a prediction layer which did not contain a correct parse of the speech input. For the 30 utterances which could not be parsed at the most specific prediction layer, there were no false alarms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML