File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/h92-1088_concl.xml

Size: 2,053 bytes

Last Modified: 2025-10-06 13:56:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1088">
  <Title>TOWARDS USING PROSODY IN SPEECH RECOGNITION/UNDERSTANDING SYSTEMS: DIFFERENCES BETWEEN READ AND SPONTANEOUS SPEECH</Title>
  <Section position="6" start_page="438" end_page="439" type="concl">
    <SectionTitle>
6. CONCLUSION
</SectionTitle>
    <Paragraph position="0"> Read speech differs from spontaneous speech in some important ways: (i) although the tunes on focussed words are selected from the same inventory in both read and spontaneeous speech, the prior probabilities of the tunes differ greatly -- spontaneous speech predominantly contains rises, read speech predominantly contains falls, (ii) pauses in read speech are shorter than in spontaneous speech, and they pre- null dominantly are located at structurally predictable positions (grammatical boundaries), whereas in spontaneous speech this generalization hardly holds true at all, (iii) read speech tends to not contain filled pauses. These differences argue that algorithms which are developed to exploit this information will need to be developed and trained on the basis of spontaneous speech from real users, rather than just from read speech.</Paragraph>
    <Paragraph position="1"> These results are encouraging for locating embedded targets in speech recognition tasks: they show that when users respond to a query from an automated system, they mark the embedded information-bearing words with an acoustically-salient nuclear pitch accent and often precede and/or follow them by a pause.</Paragraph>
    <Paragraph position="2"> For speech synthesis in the context of spoken language systems, these results suggest that listeners will better be able to understand and interpret synthesized utterances if the focussed information that they contain is (i) bears a nuclear tune, and (ii) is preceded by some lengthening of the immediately-preceding material and perhaps even the insertion of a short pause. Further investigations will address prediction of the tonal makeup of these patterns.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML