File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/89/h89-2026_evalu.xml

Size: 2,766 bytes

Last Modified: 2025-10-06 14:00:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2026">
  <Title>A STACK DECODER FOR CONTINOUS SPEECH RECOGNITION 1</Title>
  <Section position="10" start_page="196" end_page="197" type="evalu">
    <SectionTitle>
PERFORMANCE AND TESTBED
</SectionTitle>
    <Paragraph position="0"> As the stack decoder is still in the intermediate stages of development, significant performance results are not yet available. We will in this section describe the preliminary results with a caveat that these results should not be taken as any indication of the future performance of the stack decoder, but rather as an indication that the acoustic models that we have described model coarticulation reasonably well, and that they have a chance of performing well on more difficult tasks.</Paragraph>
    <Paragraph position="1"> The first test of the stack decoder on speech data used a vocabulary consisting of the ten digits. The models for phonemes in context were trained by one speaker. Two lists of 100 7-digit sequences were constructed with the property that all digits appeared with every possible left and right context in each list. A second speaker provided utterances for every sequence on these lists. Using the 100 utterances from the first list, the PICs were adapted to model the second speaker's voice. The 100 utterances from the second list were used as test data. Two replacement errors resulted.</Paragraph>
    <Paragraph position="2"> A second test was based on 100 frequent words appearing in a large (several million words) corpus of radiology reports. Sentences were constructed from these words by considering the most common 8-grams in the corpus. Models for phonemes in context were built, and utterances of the sentences were collected from the speaker who provided the training data for the PICs.</Paragraph>
    <Paragraph position="3"> So far the stack decoder has not performed well - on a sample of 10 of the sentences, in five instances the correct transcription does not appear on the choice list, even though when the CTE was applied to the thresholded transcriptions, the scores would have placed them at the top or high in the list. In the other five, the correct transcription appears first three times, second once, and third once. Over the ten sentences, there was one insertion error, seven replacement errors, and two deletion errors out of the 85 words in the sentences. We hope that installation of the language model, together with implementation of a superior PTE, will cause the thresholding problem to diminish.</Paragraph>
    <Paragraph position="4"> In the future, we will use as the testbed for our continuous speech recognition algorithms a 1000-word vocabulary and language model based on the radiology corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML