File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1089_intro.xml
Size: 8,673 bytes
Last Modified: 2025-10-06 14:05:16
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1089"> <Title>Intonational Features of Local and Global Discourse Structure</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION 3. SCOPE OF THE STUDY </SectionTitle> <Paragraph position="0"> Although computational theories of discourse make different claims about the basis of discourse structure - e.g. coherence relations [9, 10, 11, 12], syntactic features [13], intentions [8] - - all agree that utterances in a discourse group together into segments and that the determination of discourse meaning depends crucially on identifying the ways segments fit together.</Paragraph> <Paragraph position="1"> However, discourse segment boundaries do not always align with paragraph boundaries or other orthographic markers in text. And there have been no systematic studies of human labeling of discourse segmentation. As a result, attempts to apply theories of discourse structure have sometimes been frustrated by apparent ambiguities in the structure of a single discourse. Thus, one goal of our study was to identify similarities and differences among labelers in the segmentation of discourses from text and speech. We wanted to (1) determine whether a set of instructions could be devised that would lead to consistency in segmentation across different labelers and different texts; (2) test the hypothesis that spoken language is less ambiguous than text with respect to discourse segment structure; and (3) identify intonational features that were strongly correlated with discourse structure elements. We did not, of course, expect all labelings to be iden-The hypothesis that discourse structure is signalled by variation in intonational features such as pitch range, timing, and amplitude has been examined in studies such as [1, 2, 3, 4, 5, 6, 7]. However, as Brown and her colleagues note [2, p. 27]: &quot;... until an independent theory of topic-structure is formulated, much of our argument in this area is in danger of circularity.&quot; In this paper we examine the relationship between discourse structure and variation in intonational features using just such an independent model of discourse structure, that proposed by Grosz and Sidner [8] (G&S). We present results of an empirical study comparing intonational features of read text with elements of both the local and global structure of discourse. Our study has immediate application to the generation of appropriate intonational features for synthetic speech, and future applicability to the recognition of discourse structure in speech recognition tasks. Our corpus consisted of AP news stories recorded by a professional speaker. The intonational features we considered included pitch range, contour, timing, and amplitude. The discourse structural elements we examined at the local level included parentheticals, quotations, tags, and indirect reported speech; at the global level, we studied discourse segmentation - - the division of a discourse into constituents that provide the basis for determining discourse meaning. The discourses were labeled by two groups: one group labeled from text; the other group fitical. Just as a sentence may have multiple parses, a discourse may have several plausible segmentations. The goal of this part of our study was to determine the extent to which segmentations done by different people varied, identify those characteristics of a text that occasioned structural ambiguity, and develop methods for comparing segmentations. Variation in pitch range has often been seen as conveying 'topic structure' in discourse. Brown et al. [2] found that subjects typically started new topics relatively high in their pitch range and finished topics by compressing their range; they hypothesized that internal structure within a topic was similarly marked. Silverman [3] found that manipulation of pitch range alone, or in conjunction with pausal duration between utterances, enabled subjects to disambiguate reliably potentially ambiguous topic structures. Avesani and Vayra [6] also found variation in range in productions by a professional speaker which appear to correlate with topic structure, and Ayers [7] found that pitch range appears to correlate more closely with hierarchical topic structure in read speech than in spontaneous speech. Duration of pause between utterances or phrases has also been identified as an indicator of topic structure by [2, 1, 6], with longer pauses marking major topic shifts; [4], however, found no such correlation in his data.</Paragraph> <Paragraph position="2"> Amplitude was also found by [2] to increase at the start of a new topic and decrease at the end. And speaking rate has also been investigated [14] as a correlate of structural variation. Our second goal was to examine the conjecture that speech provides information that enables a listener to identify one of several possible analyses of a discourse as that which a speaker intends to communicate. In their model, G&S propose that discourse be understood in terms of the purposes that underlie it. They argue that three distinct components play a role in discourse structure: the utterances composing the discourse divide into segments forming the LINGUISTIC STRUCTURE; this structure derives from a combination of the INTENTIONAL STRUCTURE, which is a structure of the purposes or intentions underlying the discourse, and the ATTENTIONAL STATE which represents the entities and attributes that are salient during a particular portion of the discourse. Discourses are analyzed as hierarchies of discourse segments. Each segment has an underlying purpose intended by the speaker/writer to be recognized by the listener/reader, the DISCOURSE SEGMENT PURPOSE (DSP).</Paragraph> <Paragraph position="3"> Each DSP contributes to the overall DISCOURSE PURPOSE (DP) of the discourse. For example, a discourse might have as its DP the intention that the listener be informed that there was a plane accident, and individual segments forming that discourse might have as their DSP's intentions that the listener be informed that the plane lost a piece of its tail (an intention contributing information about the accident) and that the passengers were upset (an intention contributing information about the effect of this event). DSP's may in turn be represented as hierarchies of intentions. DSPs a and b may be related to one another in two ways: a DOMINATES b if the DSP of a is partially fulfilled by the DSP of b (equivalently, b CONTRIBUTES TO a). Segment a SATISFACTION-PRECEDESb if the DSP of a must be achieved in order for the DSP of b to be successful. According to this model, part of understanding a discourse is reconstructing the DP, DSPs and relations among them. We expected differences between the segmentations provided by labelers who labeled solely from text and those who labeled from speech. We also hoped to discover independent, albeit indirect, evidence from intonational variation for the existence of segment boundaries, as well as to provide information about the ways in which intonational features might signal discourse segmentation. In addition to investigating relationships between discourse structure and intonation at the global level, our study examined several local discourse-structural elements. For spoken language, the determination of discourse structural units at the local level (e.g. identifying parenthetical constituents and quotations) may crucially affect meaning. For example, the sentence 'The governmeat claims the defendants knew that William Parkin a private consultant hired by Teledyne Electronics was paying bribes to Stuart Berlin the Navy official&quot; may, depending upon how it is uttered, be interpreted to mean that (a) the government claims that the defendants knew X (simple complement); (b) the government claims X, but the defendants knew, X (right-node-raising); or, (c) the defendants knew that the government claims that X (parenthetical) - - where X='that William Parkin a private consultant hired by Teledyne Electronics was paying bribes to Stuart Berlin the Navy official'. Because these locally distinct units are often marked orthographically in text, it is presumably easier for readers to agree upon them than on the identification of segment boundaries. Thus, looking for intonational features associated with these local structures minimizes the potential for interlabeler disagreement. As a result, they may provide less equivocal evidence of how speakers use intonational features to convey information about discourse structure.</Paragraph> </Section> class="xml-element"></Paper>