File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1033_evalu.xml
Size: 6,905 bytes
Last Modified: 2025-10-06 13:58:45
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1033"> <Title>Using collocations for topic segmentation and link detection</Title> <Section position="8" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Topic segmentation </SectionTitle> <Paragraph position="0"> For evaluating TOPICOLL about segmentation, we applied it to the &quot;classical&quot; task of discovering boundaries between concatenated texts. TOPICOLL was adapted for aligning boundaries with ends of sentences. We used the probabilistic error metric Pk proposed in (Beeferman et al., 1999) for measuring segmentation accuracy4. Recall and precision was computed for the Le Monde corpus to compare TOPICOLL with older systems5. In this case, the match between a boundary from TOPICOLL and a document break was accepted if the boundary was not farther than 9 plain words.</Paragraph> <Paragraph position="1"> The evaluation corpus for French was made up of 49 texts, 133 words long on average, from the Le 4 Pk evaluates the probability that a randomly chosen pair of words, separated by k words, is wrongly classified, i.e. they are found in the same segment by TOPICOLL while they are actually in different ones (miss of a document break) or they are found in different segments by TOPICOLL while they are actually in the same one (false alarm).</Paragraph> <Paragraph position="2"> 5 Precision is given by Nt / Nb and recall by Nt / D, with D the number of document breaks, Nb the number of boundaries found by TOPICOLL and Nt the number of boundaries that are document breaks.</Paragraph> <Paragraph position="3"> Monde newspaper. Results in Tables 1 and 2 are average values computed from 10 different sequences of them. The baseline procedure consisted in randomly choosing a fixed number of sentence ends as boundaries. Its results in Tables 1 and 2 are average values from 1,000 draws.</Paragraph> <Paragraph position="4"> TOPICOLL1 is the system described in section 4.</Paragraph> <Paragraph position="5"> TOPICOLL2 is the same system but without its link detection part. The results of these two variants show that the search for links between segments doesn't significantly debase TOPICOLL's capabilities for segmentation. TOPICOLL3 is a version of TOPICOLL that only relies on word recurrence.</Paragraph> <Paragraph position="6"> SEGCOHLEX and SEGAPSITH are the systems described in (Ferret, 1998) and (Ferret and Grau, 2000). TextTiling is our implementation of Hearst's algorithm with its standard parameters.</Paragraph> <Paragraph position="7"> First, Table 1 shows that TOPICOLL is more accurate when its uses both word recurrence and collocations. Furthermore, it shows that TOPICOLL gets better results than a system that only relies on a collocation network such as SEGCOHLEX. It also gets better results than a system such as TextTiling that is based on word recurrence and as TOPICOLL, works with a local context. Thus, Table 1 confirms the fact reported in (Jobbins and Evett, 1998) that using collocations together with word recurrence is an interesting approach for text segmentation.</Paragraph> <Paragraph position="8"> Moreover, TOPICOLL is more accurate than a system such as SEGAPSITH that depends on topic representations. Its accuracy is also slightly higher than the one reported in (Bigi et al., 1998) for a system that uses topic representations in a probabilistic way: 0.75 as precision, 0.80 as recall and 0.77 as f1-measure got on a corpus made of Le Monde's articles too.</Paragraph> <Paragraph position="9"> For English, we used the artificial corpus built by Choi (Choi, 2000) for comparing several segmentation systems. This corpus is made up of 700 samples defined as follows: &quot;A sample is a concatenation of ten text segments. A segment is the first n sentences of a randomly selected document for the Brown corpus&quot;. Each column of Table 3 states for an interval of values for n.</Paragraph> <Paragraph position="10"> The first seven lines of Table 3 results from Choi's experiments (Choi, 2001). The baseline is a procedure that partitions a document into 10 segments of equal length. CWM is described in (Choi, 2001), U00 in (Utiyama and Isahara, 2001), C99 in (Choi, 2000), DotPlot in (Reynar, 1998) and Segmenter in (Kan et al., 1998).</Paragraph> <Paragraph position="11"> Table 3 confirms first that the link detection part of TOPICOLL doesn't debase its segmentation capabilities. It also shows that TOPICOLL's results on this corpus are significantly lower than its results on the Le Monde corpus. This is partially due to our collocation network for English: its density, i.e. the ratio between the size of its vocabulary and its number of collocations, is 30% lower than the density of the network for French, which has certainly a significant effect. Table 3 also shows that TOPICOLL has worse results than systems such as CWM, U00, C99 or DotPlot. This can be explained by the fact that TOPICOLL only works with a local context whereas these systems rely on the whole text they process. As a consequence, they have a global view on texts but are more costly than TOPICOLL from an algorithmic viewpoint. Moreover, link detection makes TOPICOLL functionally richer than they are.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Global evaluation </SectionTitle> <Paragraph position="0"> The global evaluation of a system such as TOPICOLL faces a problem: a reference for link detection is relative to a reference for segmentation. Hence, mapping it onto the segments delimited by a system to evaluate is not straightforward. To bypass this problem, we chose an approach close the one adopted in TDT for the link detection task: we evaluated the probability of an error in classifying each couple of positions in a text as being part of the same topic (Cpsame) or belonging to different topics (Cpdiff). A miss is detected if a couple is found about different topics while they are about the same topic and a false alarm corresponds to the complementary case.</Paragraph> <Paragraph position="1"> As the number of Cpdiff couples is generally much larger than the number of Cpsame couples, we randomly selected a number of Cpdiff couples equal to the number of Cpsame couples in order to have a large range of possible values. Table 4 shows the results of TOPICOLL for the considered measure and compares them to a baseline procedure that randomly set a fixed number of boundaries and a fixed number of links between the delimited segments. This measure is a first proposition that should certainly be improved, especially for balancing more soundly misses and false alarms.</Paragraph> </Section> </Section> class="xml-element"></Paper>