File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-2032_evalu.xml

Size: 3,998 bytes

Last Modified: 2025-10-06 13:59:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2032">
  <Title>Story Segmentation of Brodcast News in English, Mandarin and Arabic</Title>
  <Section position="6" start_page="126" end_page="127" type="evalu">
    <SectionTitle>
5 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> We report the results of our system on English, Mandarin and Arabic in Table 5. All results use show-speci c modeling, which consistently improved our results across all metrics, reducing errors by between 10% and 30%. In these tables, we report the F-measure of identifying the precise location of a story boundary as well as three metrics designed speci cally for this type of segmentation task: the pk metric (Beeferman et al., 1999), WindowDiff (Pevzner and Hearst, 2002) and Cseg (Pseg = 0.3) (Doddington, 1998). All three are derived from the pk metric (Beeferman et al., 1999), and for all, lower values imply better performance. For each of these three metrics we let k = 5, as prescribed in (Beeferman et al., 1999).</Paragraph>
    <Paragraph position="1"> In every system, the best peforming results are achieved by including all features from the lexical, acoustic and speaker-dependent feature sets. Across all languages, our precision and false alarm rates are better than recall and miss rates. We believe that inserting erroneous story boundaries will lead to more serious downstream errors in anaphora resolution and summarization than a boundary omission will. Therefore, high precision is more important than high recall for a helpful story segmentation system. In the English and Mandarin systems, the lexical and acoustic feature sets perform similarly, and combine to yield improved results. However, on the Arabic data, the acoustic feature set performs quite poorly, suggesting that the use of vocal cues to topic transitions may be fundamentally different in Arabic. Moreover, these differences are not simply differences of degree or direction. Rather, the acoustic indicators of topic shifts in English and Mandarin are, simply, not discriminative when applied to Arabic. This difference may be due to the style of Arabic newscasts or to the language itself. Across con gurations, we nd that the inclusion of features derived from automatic speaker identi cation (feature set S), errorful as it is, signi cantly improves performance. This improvement is particularly pronounced on the Mandarin material; in China News Radio broadcasts, story boundaries are very strongly correlated with speaker transitions.</Paragraph>
    <Paragraph position="2"> It is dif cult to determine how well our system performs against state-of-the-art story segmentation.</Paragraph>
    <Paragraph position="3"> There are no comparable results for the TDT-4 corpus. On the English TDT-2 corpus, (Shriberg et al., 2000) report a Cseg score of 0.1438. While our score of .0670 is half that, we hesitate to conclude that our system is signi cantly better than this system; since the (Shriberg et al., 2000) results are based on a word-level segmentation, the discrepancy may be inuenced by the disparate datasets as well as the performance of the two systems. On CNN and Reuters stories from the TDT-1 corpus, (Stokes, 2003) report a Pk score of 0.25 and a WD score of 0.253.</Paragraph>
    <Paragraph position="4"> Our Pk score is better than this on TDT-4, while our WD score is worse. (Chaisorn et al., 2003) report an F-measure of 0.532 using only audio-based features on the TRECVID 2003 corpus , which is higher than our system, however, this allows for correct boundaries to fall within 5 seconds of reference boundaries. (Franz et al., 2000) present a system which achieves Cseg scores of 0.067 and Mandarin BN and 0.081 on English audio in TDT-3. This suggests that their system may be better than ours on Mandarin, and worse on English, although we trained and tested on different corpora. Finally, we are unaware of any reported story segmentation results on Arabic BN.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML