File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4035_intro.xml
Size: 3,028 bytes
Last Modified: 2025-10-06 14:02:23
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4035"> <Title>Prosody-based Topic Segmentation for Mandarin Broadcast News</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Natural spoken discourse is composed a sequence of utterances, not independently generated or randomly strung together, but rather organized according to basic structural principles. This structure in turn guides the interpretation of individual utterances and the discourse as a whole. Formal written discourse signals a hierarchical, tree-based discourse structure explicitly by the division of the text into chapters, sections, paragraphs, and sentences. This structure, in turn, identifies domains for interpretation; many systems for anaphora resolution rely on some notion of locality (Grosz and Sidner, 1986).</Paragraph> <Paragraph position="1"> Similarly, this structure represents topical organization, and thus would be useful in information retrieval to select documents where the primary sections are on-topic, and, for summarization, to select information covering the different aspects of the topic.</Paragraph> <Paragraph position="2"> Unfortunately, spoken discourse does not include the orthographic conventions that signal structural organization in written discourse. Instead, one must infer the hierarchical structure of spoken discourse from other cues.</Paragraph> <Paragraph position="3"> Prior research (Nakatani et al., 1995; Swerts, 1997) has shown that human labelers can more sharply, consistently, and confidently identify discourse structure in a word-level transcription when an original audio recording is available than they can on the basis of the transcribed text alone. This finding indicates that substantial additional information about the structure of the discourse is encoded in the acoustic-prosodic features of the utterance. Given the often errorful transcriptions available for large speech corpora, we choose to focus here on fully exploiting the prosodic cues to discourse structure present in the original speech, rather than on the lexical cues or term frequencies of the transcription.</Paragraph> <Paragraph position="4"> In the current set of experiments, we concentrate on sequential segmentation of news broadcasts into individual stories. While a richer hierarchical segmentation is ultimately desirable, sequential story segmentation provides a natural starting point. This level of segmentation can also be most reliably performed by human labelers and thus can be considered most robust, and segmented data sets are publicly available.</Paragraph> <Paragraph position="5"> Furthermore, we apply prosodic-based segmentation to Mandarin Chinese. Not only is the use of prosodic cues to topic segmentation much less well-studied in general than is the use of text cues, but the use of prosodic cues has been largely limited to English and other European languages.</Paragraph> </Section> class="xml-element"></Paper>