File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/h92-1067_evalu.xml

Size: 2,453 bytes

Last Modified: 2025-10-06 14:00:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1067">
  <Title>An A* algorithm for very large vocabulary continuous speech recognition I</Title>
  <Section position="5" start_page="335" end_page="335" type="evalu">
    <SectionTitle>
5. Experimental Results
</SectionTitle>
    <Paragraph position="0"> Our first experimental results obtained from two books on tape (analog recordings) appear in Table 1. The first book &amp;quot;White Fang&amp;quot; by Jack London was recorded by a male speaker; the second book &amp;quot;Washington Square&amp;quot; by Henry James was recorded by a female.</Paragraph>
    <Paragraph position="1">  The second and third columns in this table give the training and test set sizes in words; the third column gives the accuracy which is calculated as N- (Substitutions + 1/2\[Deletions + Insertions\]) N where N is the size of the test set.</Paragraph>
    <Paragraph position="2"> These experiments were run using 41 phonemic mixture HMMs for each speaker, a 60,000 word dictionary which was edited to include all of the words in both books (1.5% of the words in each of the books had to be added) and a trigram language model which was trained on 60,000,000 words of newspaper texts. No attempt was made to tailor the language model to the task domains. The test set perplexity was 1,743 in the case of &amp;quot;White Fang&amp;quot; and 749 in the case of &amp;quot;Washington Sqaure&amp;quot;. These perplexities can be reduced to 576 and 347 respectively by smoothing the language model statistics using word frequencies collected from the training set but we did not take advantage of this in running our experiments. The CPU time required to run the &amp;quot;Washington Square&amp;quot; experiment was 120 times real time on a HP 720 workstation. We had to use a block advance of only 10 frames (1 frame = 10 ms) in order to keep the stack size within reasonable bounds. The parameter A was set at 140 frames. The stack was implemented as a heap and the maximum number of stack entries was set to 60,000. (When this figure is reached, the size of the stack is cut back to 30,000). The number of theories passed from one block to the next was 3,000. In the case of &amp;quot;'White Fang&amp;quot; a larger stack was needed to prevent search errors and the execution time was longer. We will have to run the recognizer on several more speakers before attempting to optimize these parameters.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML