File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2125_intro.xml
Size: 2,456 bytes
Last Modified: 2025-10-06 14:03:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2125"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An HMM-Based Approach to Automatic Phrasing for Mandarin Textto-Speech Synthesis</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Owing to the limitation of vital capacity and contextual information, breaks or pauses are always an important ingredient of human speech.</Paragraph> <Paragraph position="1"> They play a great role in signaling structural boundaries. Similarly, in the area of text-to-speech (TTS) synthesis, assigning breaks is very crucial to naturalness and intelligibility, particularly in long sentences.</Paragraph> <Paragraph position="2"> The challenge in achieving naturalness mainly results from prosody generation in TTS synthesis.</Paragraph> <Paragraph position="3"> Generally speaking, prosody deals with phrasing, loudness, duration and speech intonation. Among these prosodic features, phrasing divides utterances into meaningful chunks of information, called hierarchic breaks. However, there is no unique solution to prosodic phrasing in most cases. Different solution in phrasing can result in different meaning that a listener could perceive.</Paragraph> <Paragraph position="4"> Considering its importance, recent TTS research has focused on automatic prediction of prosodic phrase based on the part-of-speech (POS) feature or syntactic structure(Black and Taylor, 1994; Klatt, 1987; Wightman, 1992; Hirschberg 1996; Wang, 1995; Taylor and Black, 1998).</Paragraph> <Paragraph position="5"> To our understanding, POS is a grammar-based structure that can be extracted from text. There is no explicit relationship between POS and the prosodic structure. At least, in Mandarin speech synthesis, we cannot derive the prosodic structure from POS sequence directly. By contrast, a word carries rich information related to phonetic feature. For example, in Mandarin, a word can reveal many phonetic features such as pronunciation, syllable number, stress pattern, tone, light tone (if available) and retroflexion (if available) etc. So we begin to explore the role of word in predicting prosodic phrase and propose a word-based statistical method for prosodicphrase grouping. This method chooses Hidden Markov Model (HMM) as the training and predicting model.</Paragraph> </Section> class="xml-element"></Paper>