File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/h93-1065_evalu.xml

Size: 2,941 bytes

Last Modified: 2025-10-06 14:00:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1065">
  <Title>QUANTITATIVE MODELING OF SEGMENTAL DURATION</Title>
  <Section position="7" start_page="325" end_page="326" type="evalu">
    <SectionTitle>
4. RESULTS
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="325" end_page="326" type="sub_section">
      <SectionTitle>
4.1. Statistical fit
</SectionTitle>
      <Paragraph position="0"> Forty-two sums-of-products models were constructed - one for each &amp;quot;leaf&amp;quot; of the category tree. Overall, 619 parameters were estimated (32 for vowels, 196 for intervocalic consonants, and 391 for non-intervocalic consonants). On average, each parameter was based on eight data points.</Paragraph>
      <Paragraph position="1"> The overall correlation (over all 41,588 segments) between observed and predicted durations was 0.93 (0.90, 0.90, and 0.87, when computed separately for vowels, intervocalic con- null sonants, and non-intervocalic consonants, respectively).</Paragraph>
      <Paragraph position="2"> When we computed average durations for each feature vector in two equal-sized subsets of the data base, and estimated parameters for the sums-of-products model for vowels separately on each subset, the durations predicted from the two parameter sets correlated 0.987. Similarly, when we estimated parameters from data obtained on a second (female) speaker, male durations (feature vector means) were predicted with a correlation of 0.96.</Paragraph>
      <Paragraph position="3"> In addition to these correlational findings, we also found that the key interactions were mimicked closely by the predicted durations (e.g., see Figs. 14-16 in \[12\]).</Paragraph>
      <Paragraph position="4"> 4.2. Text-to-speech synthesizer evaluation A new duration module for the AT&amp;T Bell Laboratories text-to-speech synthesizer was written based on the 42 sums-of-products models and their parameter estimates. We then compared the durations generated by the new module with those generated by the old module in a subjective listening experiment using naive listeners (see \[20\] for details). The old module consists of a list of several hundred duration rules similar to, but somewhat simpler than, the Klatt rules \[5\]. In the experiment, a listener heard two versions of the same sentence, selected the preferred version, and indicated strength of choice on a 1---6 scale (where 1 denotes complete indifference and 6 the strongest possible preference). All listeners preferred the new version. Across listeners, the new version was preferred on 73 percent of the presentations (80 percent for strength ratings of three or more). On only one of the 200 sentences was there a statistically significant majority of listeners preferring the old version; on 81 percent of the sentences listeners preferred the new version- on 60 percent with a statistically significant majority.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML