File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2156_intro.xml
Size: 2,943 bytes
Last Modified: 2025-10-06 14:00:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2156"> <Title>Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean *</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> During 1;15(; past thw years, there has l)een a great deal of interest in high quality text-to-speech (TTS) systelns (van Santen et al., 1997). One of the essential prolflenlS ill developing high quality TTS systems is to predict phrase breaks flora texts. Phrase breaks are especially essential fbr subsequent processing in the TTS systems such as grapheme-to-iflloneme conversion and prosodic feature generation. Moreover, gral)helnes in the phrase,-break boundaries are not phonologically changed and should be i)ronommed as their original corresponding p honenles.</Paragraph> <Paragraph position="1"> There have been two apln'oaches to predict phrase breaks (Taylor and Black, 1998). The * This paper was supported by the University Research Program of the Ministry of Intbrmation & Communication in South Korea through the IITA(1998.7-2000.6). first: uses some sort of syntactic information to In:edict prosodic boundaries based on the fact that syntactic structure and prosodic structure are co-related. This method needs a reliable parser and syntax-to-prosody 1nodule. These modules are usnally implemented in rule-driven methods, consequently, they are difficult to write, modi(y, maintain and adapt to new domains and languages. Ill addition, a greater use of syntactic information will require, more con> lmtation for finding n more detailed syntactic parse. Considering these shortcomings, the second approach uses some probabilistic methods on the crude POS sequence of the text:, and this lnethod will be fln:ther developed in this paper.</Paragraph> <Paragraph position="2"> However, t:he. probabilistic method alone usually sufl'ers front pertbrmance degradation due to inherent data sparseness problems.</Paragraph> <Paragraph position="3"> So we adopted decision tree-based error COl're, ction to overconm these training data limitations. Decision tree induction iv 1;t5(; most widely used \]calming reel;hod. Espcci~flly in lla~l;m:al language and speech processing, decision tree learning has been apt)lied to many probh,.nls including stress acquisition fl'om texts, gralflmme to phonenm conversion and prosodic phrase, modeling (Daelemans et al., 1994) (van Santen et al., 1997) (Lee and Oh, 1999).</Paragraph> <Paragraph position="4"> In the next section, linguistic fb, atures of Korean relevant to phrase break prediction are described. Section 3 presents the probabilistic phrase break prediction method and the tree-based error correction method. Section 4 shows experimental results to demonstrate the t>erfor mam:e of the method and section 5 draws st)me conclusions.</Paragraph> </Section> class="xml-element"></Paper>