XML Viewer - h93-1064

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/h93-1064_concl.xml
Size: 3,308 bytes
Last Modified: 2025-10-06 13:57:02
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1064">
  <Title>ON CUSTOMIZING PROSODY IN SPEECH SYNTHESIS: NAMES AND ADDRESSES AS A CASE IN POINT</Title>
  <Section position="7" start_page="321" end_page="321" type="concl">
    <SectionTitle>
5. CONCLUSION
</SectionTitle>
    <Paragraph position="0"> Although this evaluation is preliminary, it suggests that even in such simple material as names and addresses domain-specific prosody can make a clear improvement to synthetic speech quality. The transcription error rate was more than halved, the number of repetitions was more than halved, the speech was rated as more natural and easier to understand, aud it was preferred by listeners. This result encourages further research on methods for capitalizing on application constraints to improve prosody. The principles in the literature for customizing the prosody will generalize to other domains where the structure of the material and discourse purpose can be inferred.</Paragraph>
    <Paragraph position="1"> The second conclusion is that at least in this domain, although domain-specific rules can improve synthetic prosody over that in domain-independent rules, the domain-specific customization can be severely limited if the synthesizer does not make the fight prosodic controls available. In an ideal world, the markers that are embedded in the text would specify exactly how the text is to be spoken.</Paragraph>
    <Paragraph position="2"> In reality, however, they specify at best an approximation.</Paragraph>
    <Paragraph position="3"> This exercise is constrained by the controls made available by that synthesizer. Some manipulations that are needed for this type of customization are not available, and some of the controls that are available interact in mutually-detrimental ways. Consequently to the extent that the application-specific prosody did indeed improve synthesis quality, this is all the more supporting evidence for both the importance of generating domain-relevant prosody on the one hand, and for NOT doing it with such an improper prosodic model on the other.</Paragraph>
    <Paragraph position="4"> The immediate next steps in this work are to more systematically evaluate the perceptual impact of the above rules, both in transcription tests and with the quantitative measures of acceptance by real users that are already being used in the field trial. In addition, we are currently developing a set of rules to customize the prosody in a spoken language system for remote financial transactions, combining text-specific rules of the type evaluated in this work, with rules that will use the discourse history to dynamically derive information about topics, discourse functions of replies, and given versus new information.</Paragraph>
    <Paragraph position="5"> The development and evaluation of this work furthers our understanding of (i) how to use prosody to clarify names and addresses in particular, and other texts in general; (ii) prosody's importance in a real application context, rather than in laboratory-generated unrepresentative sentences; (iii) one way to incorporate user-modelling of speaking rate into speech synthesis (speakers should not ignore their listeners); and (iv) what prosodic controls a synthesizer should make available.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML