File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1089_metho.xml
Size: 15,529 bytes
Last Modified: 2025-10-06 14:13:07
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1089"> <Title>Intonational Features of Local and Global Discourse Structure</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> THE EMPIRICAL STUDY </SectionTitle> <Paragraph position="0"> The corpus used in the empirical study consists of three AP news stories, which had been recorded by a profesfisional newscaster from texts available to us. The texts averaged about 450 words in length and the recordings averaged about three and one-half minutes. In this paper, we present our findings for one of these stories (approximately 550 words and four minutes long), as labeled by seven labelers. 4.1. Discourse Segmentation entheticals, since these are not always disambiguated orthographically. In addition, we asked Group S to mark direct quotations. Tags and speaker attributions were identified independently by the authors from the text. 4.2.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Intonational Structure Features of Discourse </SectionTitle> <Paragraph position="0"> We developed a set of labeling instructions based on G&S for guiding labelers in segmenting the news stories and identifying various local structural elements.</Paragraph> <Paragraph position="1"> Seven labelers participated in the study. Four (Group T) worked from the text alone. Three others (Group S) labeled from the recording and the text; they were allowed to replay passages as many times as they wished. All of the labelers provided segmentations of one story; three members of Group T and two of Group S also labeled local phenomena for this story. Figure 1 illustrates a sample labeling for this text by one member of Group T. (Note that labelers were allowed to segment according to any division of the text they preferred, although most used the orthographic sentence as their unit of analysis for global structure. The schema presented in Figure 1 identifies only global structure.) At the global level, we asked labelers to identify segment beginnings and endings and to specify which other segment (if any) the segment was embedded in. In Figure 1, the segments for one labeler are indicated by bracketings of the text; hierarchical relationships among segments are indicated by tabbing. Any unit of analysis (phrase or utterance) in the global segmentations can be described by one of five categories: segment initial sister (SIS), segment initial embedded (SIE), segment final (SF), segment medial immediately following an SF utterance - - i.e. a POP ( S M P ) , or segment medial not following a pop (SM). In Figure 1, the first phrases of (a) and (c) illustrate SIS utterances; that of (b), (d), and (e) represent SIEs; SF examples are found at the end of (b), (e), and (f); the first phrase of (f) represents an S M P ; and all other phrases within the segments (not identified schematically for reasons of space) would represent SM units. Differences among utterances in categories SIE and SiS will not be discussed in this paper; we will refer to them together as segment beginnings (SBEG). Our instructions to labelers for labeling at the global level were cast in terms of the meaning and purpose of the text, because G&S stipulates that intentions are the basic root of discourse segmentation. At the local level, we examined five types of constituents: parentheticals, direct quotations and their tags, indirect reported speech, and speaker attributions for reported speech. We asked both Group T and Group S labelers to mark par443 To identify intonational features in the read speech, we labeled the speech for accentuation and phrasing, according to Pierrehumbert's [18] theory of English intonation, using WAVES speech analysis software [19]. We then calculated values for pitch range, as indicated (indirectly) by the fundamental frequency (f0) maximum for the vowel of accented syllables in the phrase; 1 amount of f0 change between phrases, f0(phrase[i])/f0(phrase[i+l]; amplitude, measured within the vowel of the syllable containing the phrase's f0 peak; difference in intensity from prior phrase, measured in decibels (db); contour type; speaking rate, measured in syllables per second (sps); and pausal duration between phrases. We used as our primary unit of analysis Pierrehumbert's phrasal category of intermediate phrase. Each of these features was then examined as a potential predictor of discourse structure. ~ We compared individual and consensus labelings (i.e. those on which every member of a group agreed) from Group T with those from Group S for direct quotations, tags, indirect reported speech and attributions, parentheticals, and the segment boundaries S B E G , SF, and S M P . Here, we discuss only quotations, parentheticals and segment boundaries. 5.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1. RESULTS Discourse AND ANALYSIS Segmentation </SectionTitle> <Paragraph position="0"> We found that discourses can indeed be segmented dependably using our instructions. While no two segmentations were identical, we found no statistically significant difference among six of our seven labelers for labelings of S B E G phrases (using Cochran's Q). For SF phrases, the seven labelers fell into two groups with no significant difference among members of each group; we hypothesize that each group settled upon a distinct but plausible interpretation of the text's structure. While we had hypothesized that we might find fewer differences among members of Group S than among Group T 1Results presented here are based on m e a s u r e m e n t off0 maxima for each phrase within the vowel of the syllable containing the p h r a s e ' s f0 peak.</Paragraph> <Paragraph position="1"> Results from a m o r e conservative m e a s u r e m e n t at the vowel's a m p l i t u d e m a x i m u m were similar. 2 In results presented below we have either controlled for phrasal position or performed ANOVAs with b o t h p h r a s a l position a n d the intonational variable in question as factors, with statistically significant results in each case for t h e latter.</Paragraph> <Paragraph position="2"> jet with one hundred Americans aboard lost a nine foot piece of its tail today while trying to set a speed record on a world circling journey, but landed safely in Sydney. b. [William F. Buckley, Jr. and his wife were on board, CBS News reported. The author and commentator had helped organize the trip,which cost each passenger thirty nine thousand dollars.] c. [A British Airways spokesman said part of the rudder disintegrated while the supersonic jet was flying at forty thousand feet at about fifteen hundred miles an hour nearly twice the speed of sound from Christ church, New Zealand. &quot;It experienced a shudder while over the Tasman Sea that was thought to have been air turbulence,&quot; said Stanton. d. [He said the pilot was unaware of any problem until he was alerted by the control tower at Sydney's Kingsford Smith International Airport.</Paragraph> <Paragraph position="3"> e. [However, at least one passenger on the one thousand mile flight, which lasted one hour and twenty five minutes, said the plane had shuddered and passengers were tense.] f. &quot;It was a normal landing, there was no emergency,&quot; Stanton said. &quot;The pilot, Capt. David Leney, was told by the control tower that a piece of the tail was missing.&quot; ]] ...] labelers, this hypothesis was not in fact borne out. Consistency across Group S labelers was no greater than consistency among all members of the two groups, for labelings of either S B E G or SF. Many of the utterances on which labelers disagreed fell into two categories: (1) utterances that might have initiated (or by themselves formed) small separate segments and were thus classified as S M by some labelers and SIE by others; (2) utterances classified as beginnings by some labelers and S M P by others. In the latter case, all of the labelers agreed that there was a discourse break of some kind, but they disagreed about the relationship of the utterance in question to the immediately (linearly) preceding segment; in the following section we provide an analysis of some utterances fell into this class.</Paragraph> <Paragraph position="4"> sity compared with other sentence-final phrases. Quoted and non-quoted phrases in non-sentence-initial position differed significantly in pitch range (means of 256 Hz vs. 230 Hz; tstat=l.85; df=79; p<.035). Quoted and nonquoted phrases also differed in amount of change from prior phrase in db (1.92 db vs. 5.13 db; tstat=l.71; df=24;p<.05) and between quoted utterance-final and other utterance-final phrases (-5.65 db vs. 1.47 db; tstat=2.87; df=4; p<.025).</Paragraph> <Paragraph position="5"> Comparing these findings with the intonational features of quotations Group S had identified, we found that similar differences in pitch range existed between quoted phrases identified from speech and other phrases, but no significant difference in intensity. For parentheticals identified by Group T, we also found significant effects for range (195 Hz vs. 258 Hz; tstat=3.6?;df=106;p<.001) and for percent change over prior phrase, 81% vs. 107%; tstat=2.29; df=105; p<.02) and intensity (-3.08 db vs..024 db, tstat=2.04, df=106, p<.025). Our speaker uttered parenthetical phrases in a low pitch range, dropping both pitch and intensity markedly from preceding phrases. Group S's parentheticals were even lower in range (166 Hz vs. 256 Hz; tstat=3.38; df=106; p<.001) than those identified by Group T and exhibited an even more pronounced decrease in pitch (70% vs. 106%; tstat=2.09; df=105; p<.02) and in intensity (-5.10 db vs. 0.13 db; tstat=2.09; df=105; p<0.02). They also were uttered significantly more rapidly than other phrases (6.05 sps vs. 5.06 sps; tstat=l.94; df=106; p<0.03).</Paragraph> <Paragraph position="6"> 5.2. Intonational Correlates of Discourse Structure Results for our first text are summarized in Table 1. A '+' indicates the row's discourse structural element is characterized by higher values for the column's intonational feature; '-' indicates that the structural element is characterized by relatively low values for the intonational feature. For example, '+' in the 'Pitch Range' column for direct quotations indicates that these phrases are generally higher in range than other phrases. As shown in Table 1, quoted phrases for Group T were, in general, uttered in a higher pitch range and with less increase in intensity than other phrases; quote-final phrases were produced with a pronounced drop in inten444</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Subs Pause Rate </SectionTitle> <Paragraph position="0"> We take as evidence that these intonational features were used by Group S to identify local structure, the fact that Group S quotations are in general marked more reliably by differences in pitch range than Group T quotations, and that Group S parentheticals are in general marked by larger differences in range and db change than Group T's. We obtained similar results for tags, indirect reported speech, and attributions for such speech. For global structure, we again found much similarity between intonational features correlated with Group Tidentified discourse elements and those correlated with discourse features identified by Group S - - with one notable exception which we discuss below.</Paragraph> <Paragraph position="1"> However, for global structures we did n o t find that the intonational features Group S apparently found salient exhibited more pronounced differences over other phrases than Group T-related features. Intonational features of phrases labeled SF and S M P by Group T are virtually identical to those for phrases labeled by Group S. For both Group T and Group S SF, we find a single intonational correlate, subsequent pause (for T: 1329 msec. vs. 740 msec.; tstat=2.22; df=24; p<0.02; for S: 1386 msec. vs. 555 msec.; tstat=3.38; df=17; p<0.002). SF identified by Group S are followed by only slightly longer average pauses than those identified by Group T - - but the ratio of segment-ending pauses to pauses following other sentence-final phrases is greater for Group S.</Paragraph> <Paragraph position="2"> For S M P , there is a significant effect for pitch range (340 Hz vs. 296 Hz ; tstat=2.08; df=24; p<0.025) and for preceding pause (1329 msec. vs. 603 msec.; tstat=2.66; df=15; p<0.01) for Group T. And for Group S we see significant effects for the same factors, pitch range (337 Hz vs. 295 Hz; tstat=2.17; 445 df=24; p<0.02) and preceding pause (1386 msec. vs. 698 msec.; tstat=3.04; df=24; p<0.005). These findings support similar results in [2, 1, 3]. For S B E G , however, while pitch range, amplitude, rate and subsequent pause are significantly correlated with phrases identified by Group T, only preceding and subsequent pause variation distinguishes phrases identified as S B E G by Group S. In light of our findings for other global discourse structure elements, we were puzzled at the disparity between our groups with respect to SBEG. We were also puzzled that intonational features such as pitch range, which previous production and perception studies had found highly correlated with discourse structure, had no effect on Group S judgments of SBEG. One explanation might be found by examining a superordinate category for S B E G . Recall that phrases of the categories S B E G ( S I E + S I S ) and S M P share the property of n o t being part of the same discourse segment as their preceding phrase; this more general class ( S B E G + S M P ) encompasses shifts to a different segment, some initiating new segments (SBEG) and others returning to an embedding one (SMP). As we noted above (Section 5.1), labelers often agreed on broader aspects of structure while disagreeing over finer-grained details of the segmentation.</Paragraph> <Paragraph position="3"> In fact, the intonational features characterizing phrases of this more general category for Groups T and S are indeed consistent with the pattern we saw for intonational characteristics of SF and S M P . 3 For S B E G + S M P identified by Group T, there are significant effects only for pitch range (336 3Note t h a t S B E G significantly outnumber S M P phrases when we collapse these categories, so o u r r e s u l t s do n o t arise f r o m the latter category dominating the former.</Paragraph> <Paragraph position="4"> fiHz vs. 294 Hz; tstat=2.41; df=25; p<0.02) and subsequent pause (25 msec. vs.</Paragraph> <Paragraph position="5"> 169 msec.; tstat=2.00; dr=25; p<0.03). These same intonational features are also significantly correlated with S B E G + S M P identified by Group S (pitch range: 325 Hz vs. 295 Hz; tstat=1.77; df=25; p<0.05; subsequent pause: 30 reset, vs. 183 msec.; tstat=2.34; df=25; p<0.02), as is preceding pause (1215 msec. vs.</Paragraph> <Paragraph position="6"> 659 msec.; tstat=2.82; df=24; p<0.005). Thus this more general category identifying 'segmentation shifts' yields a comparison of intonational features for Group T and Group S phrases which is consistent with our other findings for global structure, as well as with previous studies of intonation and 'topic shift'.</Paragraph> </Section> </Section> class="xml-element"></Paper>