File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1088_intro.xml

Size: 3,066 bytes

Last Modified: 2025-10-06 14:05:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1088">
  <Title>TOWARDS USING PROSODY IN SPEECH RECOGNITION/UNDERSTANDING SYSTEMS: DIFFERENCES BETWEEN READ AND SPONTANEOUS SPEECH</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> This work addresses two related questions. One is whether spontaneous goal-directed utterances collected from real users in a particular application domain exhibit reliable prosodic patterns that could be exploited by recognition algorithms. We focus on to-be-recognized words that are spoken within longer utterances, in order to investigate whether these embedded words have particular prosodic characteristics that could help a recognizer to locate them.</Paragraph>
    <Paragraph position="1"> One of the original motivations for this study was our observation from informal listening to our corpus that such embedded words bear nuclear pitch accents. If this is a consistent pattern, it would mean that they are (1) louder, longer and more clearly articulated than they would be without nuclear accents, and (2) they would bear characteristic fundamental frequency movements.</Paragraph>
    <Paragraph position="2"> Corpora of spontaneous goal-directed speech from real users are not readily obtainable, and so it is common practice to record speech read out by volunteers in order to develop, train and test recognition algorithms. To the extent that the prosody of read speech differs from that of spontaneous goal-directed speech, such &amp;quot;laboratory&amp;quot; corpora may obscure or misrepresent any reliable prosodic properties found in spontaneous &amp;quot;real user&amp;quot; speech. Consequently the second question investigated in this work is whether such patterns can also be found in recordings of speech reead out by volunteers. We are interested in prosodic differences between read and spontaneous speech because of their relevance to speech recognition, and for increasing the naturalness of synthetic speech.</Paragraph>
    <Paragraph position="3"> It is worth pointing out a methodological issue at this stage: the prosody used when people are reading can of course differ dramatically from that used in spontaneous communiccation. Speech databases that we know of vary widely in how much effort was taken to ensure that the prosody of the speech realistically reflects the speech that recognizers have to deal with in real-world applications. In this experiment we chose to do everything we could to encourage our volunteers to use realistic spontaneous-sounding prosody.</Paragraph>
    <Paragraph position="4"> Most existing speech corpora used in the recognition field have been collected with less emphasis on realistic prosody. We therefore believe that the read speech in this experiment is as similar as possible to spontaneously produced utterances. The degree of prosodic similarity that we report between read and spontaneous speech represent a &amp;quot;best case&amp;quot;.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML