File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1094_intro.xml

Size: 5,203 bytes

Last Modified: 2025-10-06 14:01:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1094">
  <Title>Integrating Linguistic and Performance-Based Constraints for Assigning Phrase Breaks</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Normal spoken language is not delivered in an uninterrupted monotone; prosodic cues such as pauses or boundary tones greatly help the listener to understand an utterance. Most text-to-speech systems use statistical models to find the appropriate locations for prosodic phrase breaks. In this work we use insights gained from the linguistics literature to develop a computational model which assigns prosodic structure to unrestricted text.</Paragraph>
    <Paragraph position="1"> We start by briefly reviewing the relationship between syntactic and prosodic structure. Figure 1 shows an example of the right-branching syntactic structure that is standardly assigned to English sentences. Figure 2 shows a much flatter tree which corresponds to widely accepted views of the same sentence's prosodic structure. According to the latter, the Utterance level is partitioned into intonaa little girl who didn't like big dogsHe would tease  tional (I-) phrases,1 which in turn are partitioned into phonological (a0 -) phrases. (We ignore lower levels of representation such as prosodic words and syllables for the purposes of this paper.) In their investigation of the syntax-prosody mapping, Nespor and Vogel (1986) define a0 -phrases as consisting of a lexical head (e.g., a verb, noun or adjective) together with all the material on its non-recursive side up until the next head.2 In the ex1Intonational phrases are phonologically defined as units which are associated with a characteristic intonational contour; in particular, an I-phrase is marked by the presence of a major pitch accent. The boundary of an I-phrase is canonically manifested as a perceptible pause, accompanied by a local fall or rise in Fa1 (fundamental frequency); it can also be marked by constituent-final syllable lengthening, and stronger articulation of constituent-initial consonants.</Paragraph>
    <Paragraph position="2"> 2Here, 'nonrecursive' is intended to cover modifiers and deample of Figures 1 and 2 tease, little, girl, like, big and dogs are lexical heads. These heads--barring the adjectives--are bundled with the material to their left. The adjectives are included in the same a0 -phrases as the nouns they modify because they are still inside the maximal projection (NP) of the nouns.</Paragraph>
    <Paragraph position="3"> The level of a0 -phrases can fairly easily be derived from syntax. However, the same is not true of I-phrases. According to the strict layer hypothesis (Selkirk, 1984), an intonational phrase must consist of complete a0 -phrases. But syntax does not determine how many a0 -phrases go to make up an Iphrase. To illustrate this point, consider (1), discussed by Gee and Grosjean (1983), where ' a0 ' is used to indicate I-phrase boundaries. Both phrasings are acceptable.</Paragraph>
    <Paragraph position="4">  (1) By making his plan known a0 he brought out a0 the objections of everyone. a0 (2) By making his plan known a0 he brought out the objections of everyone. a0  Nevertheless, the a0 -structure provides a strong constraint on the location of breaks between I-phrases, since an I-phrase can never interrupt a a0 -phrase.</Paragraph>
    <Paragraph position="5"> Although a0 -structure has been used by others to assign prosodic structure algorithmically (Gee and Grosjean, 1983; Bachenko and Fitzpatrick, 1990), there is no generally accepted method for bundling a0 -phrases into I-phrases. The main consensus is that I-phrases have &amp;quot;a more or less uniform 'average' length&amp;quot; (Nespor and Vogel, 1986, p.194). In a similar vein, Gee and Grosjean (1983) observe that utterances tend to be split into two or three I-phrases of roughly equal length.</Paragraph>
    <Paragraph position="6"> Gee and Grosjean (1983) (and subsequently, Bachenko and Fitzpatrick (1990)) construct I-phrases by comparing the length of the prosodic constituents on both the left-hand side and the right-hand side of the utterance's main verb (or the a0 -phrase containing the verb), and grouping the verb with the shorter neighbouring constituent. They give little consideration to the grouping of constituents which are not adjacent to the verb. This limitation in their model seems innocuous when dealing with the rather artificially 'well-behaved' set of sentences in their sample. (This 14 sentence terminers as opposed to complements. It is also required that the 'next head' referred to in the definition be outside the maximal projection of the head which forms the basis of the a1 phrase. null corpus, also used by Bachenko and Fitzpatrick, only contains sentences of 11-13 words in length and does not scale up to unrestricted text). However, to be useful in a realistic TTS system our model should robustly run with unrestricted text and not rely - like Bachenko and Fitzpatrick's model - on a correct parser output. Consequently, we need to adopt a different strategy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML