XML Viewer - h89-1020

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-1020_intro.xml
Size: 3,431 bytes
Last Modified: 2025-10-06 14:04:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1020">
  <Title>GOATS TO SHEEP: CAN RECOGNITION RATE BE IMPROVED FOR POOR TANGORA SPEAKERS?</Title>
  <Section position="3" start_page="0" end_page="145" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> There is a great deal of variability in the accuracy with which users of large vocabulary automatic speech recognition (hereafter ASR) systems are recognized. In a typical finding, Brown, Vosburgh &amp; Canetti (in preparation) reported that recognition error for a group of first time users of a 20,000 word ASR system varied from 2.0% to 14%. Two conclusions may be drawn from such results. First, the technology is good enough to produce a high degree of recognition accuracy for some speakers. Second, there are some speakers who encounter severe problems and, for them, the technology is probably not usable. This research was motivated by the latter group. It is concerned with whether recognition accuracy can be improved through behavioral means for speakers who are initially poorly recogniT.ed by an ASR system.</Paragraph>
    <Paragraph position="1"> Can a user modify his or her speaking style in ways which will be acceptable to the user and will result in a significant improvement in recognition, thereby making ASR systems more widely usable? IBM's experimental TANGORA system, implemented on the Personal Computer AT, was used in this investigation. This system functions in real-time and has the capacity to recognize 20,000 words.</Paragraph>
    <Paragraph position="2"> TANGORA is an isolated word system; this requires that users pause briefly between words. Further, it is a speaker-dependent system and must be &amp;quot;trained&amp;quot; to the user's voice. Such a system is most accurate when it has a description or model of the acoustic characteristics of a user's voice. This speaker model is generated by TANGORA from a sample (1200-2400 words) of the user's speech, collected during a &amp;quot;training session.&amp;quot; A description of the TANGORA system can be found in Averbuch et al., (1986).</Paragraph>
    <Paragraph position="3"> This investigation had four general goals. The first was to investigate recognition performance for a group of new users during their first month of experience with the TANGORA system. The focus was on determining the rate and amount of improvement, if any, in recognition accuracy. It is an important, but unanswered question, whether poor ASR speakers can improve substantially with experience. The second goal was to determine whether re-training TANGORA after users have had experience speaking in isolated-word mode is a useful strategy for improving recognition performance. One might expect that experience with an ASR system leads users to modify their speaking style. Consequently, use of an up-to-date speaker model which reflects these changes might result in improved recognition accuracy. The third goal of this study was to identify those aspects of a user's speaking style which resulted in errors by  the TANGORA system. A description of these problems would serve as the basis for suggestions to the user on how to modify his or her speaking style in order to produce more accurate recognition performance.</Paragraph>
    <Paragraph position="4"> The final goal was to characterize speakers who are recognized accurately by TANGORA.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML