File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2120_metho.xml
Size: 10,726 bytes
Last Modified: 2025-10-06 14:07:16
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2120"> <Title>Matching a tone-based and tune-based approach to English intonation for concept-to-speech generation</Title> <Section position="4" start_page="829" end_page="832" type="metho"> <SectionTitle> 2 Intonation Annotation </SectionTitle> <Paragraph position="0"> The majority of text-to-speech systems that allow for the manipulation of an input string so as to control intonation employ the ToBI system (Silverman et al., 19961, which is based on the autosegmental-metrical approach originally set up by Pierrehumbert (19801 to describe American English intonation. Versions of ToBI for other languages have been developed, e.g., Grice et al. (19961 for German, and are also widely used in computational contexts. One major theoretical difference between the ToBI approach and the British School approaches, such as the one advocated by SFG, is that in the latter there is a built-in focus on the relation between intomttion and nmaning. In spG, intonation contours are distinguished according to their di, f fcrcntial meanings, i.e., they label pitch movements that are commonly interpreted by the speakers of (British) English as having quite different pragmatic purport (cf. Teich et al.</Paragraph> <Paragraph position="1"> (1997)). This is what snakes the SFO approach attractive in the context of concept-to-speech generation, in which it is crucial to be able to represent criteria for selecting an intonation contour appropriate in a given context. TOBI, on the other hand, is a phonetic-phonological annotation scheme tbr intonation. Since it is widely used, there exist nmnerous tools supporting analysis with a high degree of analytical rigor. It seems theretbre doubly significant to combine the two approaches in an attempt to achieve high-quality synthesized speech output.</Paragraph> <Paragraph position="2"> While clearly some fimdamental theoretical ditferences exist between the ToBI and SFG approaches, more technically there is a basic commortality. Any annotation scheme tbr intonation nmst establish three principal constructs for the representation of intonation: the units of intonation, a set of categories that describe the pitch movement occurring in that unit, and a set of labels that mark the nuclear stress oi1 which the pitch movement is realised.</Paragraph> <Paragraph position="3"> In the remainder of this section we briefly describe how these constructs are realised in ToBI (Sec. 2.1) and in SFG (See. 2.2) and sketch the mQor differences between them.</Paragraph> <Section position="1" start_page="829" end_page="831" type="sub_section"> <SectionTitle> 2.1 ToBI </SectionTitle> <Paragraph position="0"> There are two tiers to the ToBI analysis, the tonal analysis and the analysis of the strength of the word boundaries, which is referred to as the &quot;break index&quot;. The Tom tones are either high (H) or low (L). The break index gives the strength of a word's association with the tbllowing word, where 0 is the strongest perceived conjoining and 4 is the most disjoint (Beckman gc Ayers, 19971. In our analysis (See. 3), we only consider the tonal part of TOBI.</Paragraph> <Paragraph position="1"> The Tom intonational phonology model aligns a tune with the words of an utterance (cf. Harrington 8c Cassidy (1999)), wherc some of these words are accented. The words of an utterance are grouped into phrases. There are two types of phrases, intonational and intermediate ph, mses. Utterances always consist of one or more intonational phrases which iu tm:n consist of one or lnore intermediate phrases.</Paragraph> <Paragraph position="2"> The break between two intonational 1)hrases is greater than 1)etween two intermediate t)hrases, the bl'eak index being 4 in the former case and 3 or 2 in the latter.</Paragraph> <Paragraph position="3"> Words that have prominence in a phrase or utterance m:e accented (sentence level stress).</Paragraph> <Paragraph position="4"> Unlike lexical stress which is usually fixed, sentence level stress is variable. When a word carries sentence level stress, a pitch accent is associated with the syllable of primary stress.</Paragraph> <Paragraph position="5"> Pitch accents are denoted by *. The most common pitch accent is an H*, which is usually realised as a pitch peak near tim vowel in the primary stressed syllable, it is also possible to have pitch accents which are a combination of a pitch movement towards and including a peak or trough. One sudl bitonal accent is L+H*, which moves from a low in pitch towards a high.</Paragraph> <Paragraph position="6"> Intermediate and intonational phrases carry edge tones. Intermediate phrases carry phrase tones, indicated by -. The phrase tone L- is low pitch following the final pitch accent of a phrase. The phrase tone H- represents high pitdt following the last pitch accent. Tile tone associated with an intonation phrase is a boundary tone and is indicated by %. The boundary tone H% represents a final rise and the L% boundary tone is typically interpreted as the absence of a final rise (cf. Ladd (1996)).</Paragraph> <Paragraph position="7"> Every intermediate phrase must have at least one pitch accent. By definition, the last accented word in any intermediate phrase is always the nuclear accented word, and it is usually perceived as more prominent than any other accented word. The utterance (a) in Fig. 1 is produced by an H'L-L% combination and typically interpreted as a neutral declarative. The second utterance (b) has a H'L'H-H% combination (yes/no question). The final example (c) illustrates a complex ntterance, made up of more than one intonation phrase.</Paragraph> </Section> <Section position="2" start_page="831" end_page="831" type="sub_section"> <SectionTitle> 2.2 SFG </SectionTitle> <Paragraph position="0"> According to SFG the unit to which intonation is attributed is the tone group. A tone group consists of.feet, and feet consist of syllables. A tone group carries a tune or tone, which can be falling (tone 1), rising (tone 2), level (tone 3), faning-risiug (tone 4), or rising-f~lling (tone 5).</Paragraph> <Paragraph position="1"> See Fig. 2 giving these five options with their approximate pragmatic meanings. The examples in Fig. 3 show how tone is annotated in SFG: the nmnber gives the kind of tone, the double slashes snark the tone group boundaries and the single slashes mark feet. Also, there may be combinations of different; tones in one utterance, e.g., tone 4 followed by tone 1 (example (c) in Fig. 3).</Paragraph> <Paragraph position="2"> Each tone group contains an element which carries the nuclear stress, called Tonic. In the default case, the Tonic is placed on the last lex- null ical elenmnt in tile tone group (unmarked nuclear stress). In marked cases, the Tonic can be placed on other elements in the tone group.</Paragraph> <Paragraph position="3"> For an example of the tbrmer see (b) in Fig. 3 (Tonic denoted by underlining); an example of the latter is (a) in Fig. 3.</Paragraph> <Paragraph position="4"> The Tonic represents the nuclear stress and is part of the tonic segment of the tone group.</Paragraph> <Paragraph position="5"> If the Tonic does not fall on the frst syllable of the tone group, there is an element preceding it, called the pretonic segment. It carries a so-called Pretonic stress (see (b) in Fig.3).</Paragraph> </Section> <Section position="3" start_page="831" end_page="832" type="sub_section"> <SectionTitle> 2.3 Preliminary comparison </SectionTitle> <Paragraph position="0"> On a technical level, the major differences we can observe between the ToBI and SFG annotation schemata of intonation are the following.</Paragraph> <Paragraph position="1"> Units. While there is a rough correspondence between the intonation phrase/intermediate phrase in ToB~ and the tone group in SFG (cf. Harrington & Cassidy (1999)), in Tom the refit of the foot is not acknowledged.</Paragraph> <Paragraph position="2"> Pitch movement. While in ToBI, the primitives of description of pitch movement are distinct highs (It) and lows (L), where a particular pitch movement is described by a sequence of highs and/or lows in the pitch, in SFC the primitive of description is the tune, i.e., a relative concept, such as a rising, falling or level tune. Nuclear stress. While in ToBI, the mmlear stress is marked by the last starred tone in the sequence of tones and is thus only implicitly indicated in the annotation, SFG marks nuclear stress explicitly by marking up the Tonic) While there is a basic match in terms of accounting for the pitch movement and we cast thus expect to be able to recast ToBI tone sequences as SFC tones, we may encounter some problems due to the non-acknowledgement of tile unit of foot in ToBI on the one hand, and due to ToBI marking up pitch accents other ICE Sec. 2.1, however: the nuclear stress in Tom is by definition the last starred tone.</Paragraph> <Paragraph position="3"> than the nuclear stress, on the other hand.</Paragraph> </Section> </Section> <Section position="5" start_page="832" end_page="832" type="metho"> <SectionTitle> 3 Method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="832" end_page="832" type="sub_section"> <SectionTitle> 3.1 The Corpus </SectionTitle> <Paragraph position="0"> The eorl)us was obtained from tlm recorded (lat~ which colnes with Italliday (1970). We inv(;stigated tones 1, 2, and/l, and tone sequen('es 1 & 1, l& 2, 2 & l, 2 & 2, l & 4, mid4 & 1. A total of 290 utterances were analysed (= 1700 words of text, approx. 350 tone groul)s). The utter~mces ranged fl:om inono- and polysyllabic words to sentences. The utterances varied in tone, number of feet, the position of the Tonic, and whether there were silent t)eats in the tone group. Also, some of the utteran(:es had a pretonic segmenl;, others did not.</Paragraph> </Section> <Section position="2" start_page="832" end_page="832" type="sub_section"> <SectionTitle> 3.2 Labelling </SectionTitle> <Paragraph position="0"> The labelling of the data a(:(:or(ling to SFG (:riteria was obtained from Halliday (1970). The labelling of the dater using ToBI was done l)y a trained acoustic l)honeti(:ian. 2 The exisl;ing recording was digitised at 20 kltz as 16 bit san> ples, and stored on a Unix machine. The pitch tracks were calculated using ESPS WAVES+.</Paragraph> <Paragraph position="1"> The labelling of the data was done in F, MU (Cassidy & Harrington, 1996). All the intonational and inl;ermedit~te l)hrases were marked, as', were the pit(:h ac(',ents, 1)hrasal and 1)oun(lary tones.</Paragraph> </Section> </Section> class="xml-element"></Paper>