File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1071_intro.xml
Size: 2,517 bytes
Last Modified: 2025-10-06 14:01:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1071"> <Title>Discourse Segmentation of Multi-Party Conversation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Topic segmentation aims to automatically divide text documents, audio recordings, or video segments, into topically related units. While extensive research has targeted the problem of topic segmentation of written texts and spoken monologues, few have studied the problem of segmenting conversations with many participants (e.g., meetings). In this paper, we present an algorithm for segmenting meeting transcripts. This study uses recorded meetings of typically six to eight participants, in which the informal style includes ungrammatical sentences and overlapping speakers. These meetings generally do not have pre-set agendas, and the topics discussed in the same meeting may or may not related.</Paragraph> <Paragraph position="1"> The meeting segmenter comprises two components: one that capitalizes on word distribution to identify homogeneous units that are topically cohesive, and a second component that analyzes conversational features of meeting transcripts that are indicative of topic shifts, like silences, overlaps, and speaker changes. We show that integrating features from both components with a probabilistic classifier (induced with c4.5rules) is very effective in improving performance.</Paragraph> <Paragraph position="2"> In Section 2, we review previous approaches to the segmentation problem applied to spoken and written documents. In Section 3, we describe the corpus of recorded meetings intended to be segmented, and the annotation of its discourse structure. In Section 4, we present our text-based segmentation component. This component mainly relies on lexical cohesion, particularly term repetition, to detect topic boundaries. We evaluated this segmentation against other lexical cohesion segmentation programs and show that the performance is state-of-theart. In the subsequent section, we describe conversational features, such as silences, speaker change, and other features like cue phrases. We present a machine learning approach for integrating these conversational features with the text-based segmentation module. Experimental results show a marked improvement in meeting segmentation with the incorporation of both sets of features. We close with discussions and conclusions.</Paragraph> </Section> class="xml-element"></Paper>