File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1038_metho.xml

Size: 7,678 bytes

Last Modified: 2025-10-06 14:14:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1038">
  <Title>A Prosodic Analysis of Discourse Segments in Direction- Giving Monologues</Title>
  <Section position="4" start_page="0" end_page="286" type="metho">
    <SectionTitle>
1 Also called DISCOURSE MARKERS or DISCOURSE PAR-
</SectionTitle>
    <Paragraph position="0"> TICLES, these are items such as now, first, and by the way, which explicitly mark discourse structure.</Paragraph>
    <Paragraph position="1">  fled as an indicator of topic structure, with longer pauses marking major topic shifts (Lehiste, 1979; Brown, Currie, and Kenworthy, 1980; Avesani and Vayra, 1988; Passonneau and Litman, 1993); Woodbury (1987), however, found no such correlation in his data. Amplitude was also found to increase at the start of a new topic and decrease at the end (Brown, Currie, and Kenworthy, 1980). Swerts and colleagues (1992) found that melody and duration can pre-signal the end of a discourse unit, in addition to marking the discourse-unit-final utterance itself. And speaking rate has been found to correlate with structural variation; in several studies (Lehiste, 1980; Brubaker, 1972; Butterworth, 1975) segment-initial utterances exhibited slower rates, and segment-final, faster rates. Swerts and Ostendorf (1995), however, report negative rate results. In general, these studies have lacked an independently-motivated notion of discourse structure. With few exceptions, they rely on intuitive analyses of topic structure; operational definitions of discourse-level properties (e.g., interpreting paragraph breaks as discourse segment boundaries); or 'theory-neutral' discourse segmentations, where subjects are given instructions to simply mark changes in topic. Recent studies have focused on the question of whether discourse structure itself can be empirically determined in a reliable manner, a pre-requisite to investigating linguistic cues to its existence. An intention-based theory of discourse was used in (Hirschberg and Grosz, 1992; Grosz and Hirschberg, 1992) to identify intonational correlates of discourse structure in news stories read by a professional speaker. Discourse structural elements were determined by experts in the Grosz and Sidner (1986) theory of discourse structure, based on either text alone or text and speech. This study revealed strong correlations of aspects of pitch range, amplitude, and timing with features of global and local structure for both segmentation methods. Passonneau and Litman (to appear) analyzed correlations of pause, as well as cue phrases and referential relations, with discourse structure; their segmenters were asked to identify speakers' communicative &amp;quot;actions&amp;quot;. The present study addresses issues of speaking style and segmentation method while exploring in more detail than previous studies the prosodic parameters that characterize initial, medial, and final utterances in a discourse segment.</Paragraph>
  </Section>
  <Section position="5" start_page="286" end_page="287" type="metho">
    <SectionTitle>
3 Methods
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="286" end_page="286" type="sub_section">
      <SectionTitle>
3.1 The Boston Directions Corpus
</SectionTitle>
      <Paragraph position="0"> The current investigation of discourse and intonation is based on analysis of a corpus of spontaneous and read speech, the Boston Directions Corpus. 2 This  lected in collaboration with Barbara Grosz.</Paragraph>
      <Paragraph position="1"> corpus comprises elicited monologues produced by multiple non-professional speakers, who were given written instructions to perform a series of nine increasingly complex direction-giving tasks. Speakers first explained simple routes such as getting from one station to another on the subway, and progressed gradually to the most complex task of planning a round-trip journey from Harvard Square to several Boston tourist sights. Thus, the tasks were designed to require increasing levels of planning complexity. The speakers were provided with various maps, and could write notes to themselves as well as trace routes on the maps. For the duration of the experiment, the speakers were in face-to-face contact with a silent partner (a confederate) who traced on her map the routes described by the speakers.</Paragraph>
      <Paragraph position="2"> The speech was subsequently orthographically transcribed, with false starts and other speech errors repaired or omitted; subjects returned several weeks after their first recording to read aloud from transcriptions of their own directions.</Paragraph>
    </Section>
    <Section position="2" start_page="286" end_page="286" type="sub_section">
      <SectionTitle>
3.2 Acoustic-Prosodic Analysis
</SectionTitle>
      <Paragraph position="0"> For this paper, the spontaneous and read recordings for one male speaker were acoustically analyzed; fundamental frequency and energy were calculated using Entropic speech analysis software. The prosodic transcription, a more abstract representation of the intonational prominences, phrasing, and melodic contours, was obtained by hand-labeling. We employed the ToBI standard for prosodic transcription (Pitrelli, Beckman, and Hirschberg, 1994), which is based upon Pierrehumbert's theory of American English intonation (Pierrehumbert, 1980). The ToBI transcription provided us with a breakdown of the speech sample into minor or INTERMEDIATE PHRASES (Pierrehumbert, 1980; Beckman and Pierrehumbert, 1986). This level of prosodic phrase served as our primary unit of analysis for measuring both speech and discourse properties. The portion of the corpus we report on consists of 494 and 552 intermediate phrases for read and spontaneous speech, respectively.</Paragraph>
    </Section>
    <Section position="3" start_page="286" end_page="287" type="sub_section">
      <SectionTitle>
3.3 Discourse Segmentation
</SectionTitle>
      <Paragraph position="0"> In our research, the Grosz and Sidner (1986) theory of discourse structure, hereafter G&amp;S, provides a foundation for segmenting discourses into constituent parts. According to this model, at least three components of discourse structure must be distinguished. The utterances composing the discourse divide into segments that may be embedded relative to one another. These segments and the embedding relationships between them form the LIN-GUISTIC STRUCTURE. The embedding relationships reflect changes in the ATTENTIONAL STATE, the dynamic record of the entities and attributes that are salient during a particular part of the discourse.</Paragraph>
      <Paragraph position="1"> Changes in linguistic structure, and hence atten- null tional state, depend on the discourse's INTENTIONAL STRUCTURE; this structure comprises the intentions or DISCOURSE SEGMENT PURPOSES (DSPs) underlying the discourse and relations between DSPs.</Paragraph>
      <Paragraph position="2"> Two methods of discourse segmentation were employed by subjects who had expertise in the G~:S theory. Following Hirschberg and Grosz (1992), three subjects labeled from text alone (group T) and three labeled from text and speech (group S). Other than this difference in input modality, all subjects received identical written instructions. The text for each task was presented with line breaks corresponding to intermediate phrase boundaries (i.e., ToBI BREAK INDICES of level 3 or higher (Pitrelli, Beckman, and Hirschberg, 1994)). In the instructions, subjects were essentially asked to analyze the linguistic and intentional structures by segmenting the discourse, identifying the DSPs, and specifying the hierarchical relationships among segments.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML