File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/h93-1064_evalu.xml

Size: 2,429 bytes

Last Modified: 2025-10-06 14:00:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1064">
  <Title>ON CUSTOMIZING PROSODY IN SPEECH SYNTHESIS: NAMES AND ADDRESSES AS A CASE IN POINT</Title>
  <Section position="6" start_page="320" end_page="321" type="evalu">
    <SectionTitle>
4.4. Results
</SectionTitle>
    <Paragraph position="0"> So far results have been analyzed for 17 listeners. Summing over all transcriptions, the maximum possible transcription score for each synthesizer was 5032. The per-word error rate for items spoken with the synthesizer's default prosody was 14.6%. With the domain-specific prosody this was only 6.4%. Thus listeners could transcribe the vowels and consonants significantly more accurately even though the vowels and consonants are pronounced by exactly the same segmental rules in both cases. The only difference is the prosody. null Transcription scores do not reflect how much effort listeners expended to achieve their transcription accuracy. One measure of that effort is the number of repeats they requested. Listeners needed on average 2.6 repeats per listing for the default prosody, but only 1.1 repeats per listing with the domain-specific prosody. Interestingly, in a prior transcription test with a human voice saying a superset of the listings used in this experiment, listeners needed 1.2 repeats per listing (Sara Basson, personal communication).</Paragraph>
    <Paragraph position="1"> On the &amp;quot;ease of understanding&amp;quot; scale, the default prosody scored 1.8 (standard deviation = 0.8), while domain-specific prosody scored 3.3 (standard deviation = 0.8). Thus listeners' subjective perceptions matched their objective transcription results: they were aware that the version with domain-specific prosody was easier to understand, though clearly it was not effortless.</Paragraph>
    <Paragraph position="2"> On the &amp;quot;naturalness&amp;quot; scale, the default prosody scored 1.9 (standard deviation = 0.9) and domain-specific prosody scored 2.9 (standard deviation = 0.8). Though statistically significant, this difference is smaller than on the previous scale. Alteration of the just the pitch and duration made the  speech made the speech sound somewhat more natural, but it is still is a long way from sounding &amp;quot;extremely natural&amp;quot;. One the preference ratings, so far all of the listeners preferred the speech versions with domain-specific prosody.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML