File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-3009_concl.xml
Size: 1,721 bytes
Last Modified: 2025-10-06 13:53:36
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-3009"> <Title>Spoken and Written News Story Segmentation using Lexical Chains</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> In this paper we have presented a lexical chaining based approach to coarse-grained segmentation of CNN news transcripts and concatenated Reuters newswire articles.</Paragraph> <Paragraph position="1"> We have shown that the performance of our SeLeCT system exceeds that of the TextTiling and C99 systems when detecting topic shifts in CNN transcripts. However the results of a similar experiment on Reuters news stories showed that the C99 system outperformed all other systems on a written news collection. Overall, lower CNN segmentation results were attributed to the information loss caused by prosodic and paralinguistic characteristics of speech and grammatical differences between written and spoken modes of expression. Further experiments showed that by limiting the input of all the segmentation systems to nouns, adjectives, and nominalized verbs and adjectives, the effect of these grammatical differences on CNN segmentation performance was significantly reduced. Additional SeLeCT performance improvements were also achieved by using referential and conjunctive relationships as additional evidence of cohesion in the boundary detection step. In future experiments we plan to compare SeLeCT's performance on written and spoken news texts with two recently proposed systems, U00 (Utiyama 2001) and CWM (Choi 2001), which have marginally outperformed the C99 algorithm on Choi's (2000) test corpus.</Paragraph> </Section> class="xml-element"></Paper>