File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1623_evalu.xml
Size: 4,073 bytes
Last Modified: 2025-10-06 13:59:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1623"> <Title>Inducing Temporal Graphs</Title> <Section position="10" start_page="194" end_page="195" type="evalu"> <SectionTitle> 7 Results </SectionTitle> <Paragraph position="0"> We evaluate temporal segmentation using leave-one-out cross-validation on our corpus of 60 summaries. The segmentation algorithm achieves a performance of 83% F-measure, with a recall of 78% and a precision of 89%.</Paragraph> <Paragraph position="1"> 8It took approximately one hour to build a TDAG for each segmented document.</Paragraph> <Paragraph position="2"> To evaluate segment ordering, we employ leave-one-out cross-validation on 30 annotated TDAGs that overall contain 13,088 edges in their transitive closure. In addition to the three global inference algorithms, we include a majority base-line that classifies all edges as forward, yielding a chronological order.</Paragraph> <Paragraph position="3"> Our results for ordering the manually annotated temporal segments are shown in Table 1. All inference methods outperform the baseline, and their performance is consistent with the complexity of the inference mechanism. As expected, the ILP strategy, which supports exact global inference, achieves the best performance -- 84.3%.</Paragraph> <Paragraph position="4"> An additional point of comparison is the accuracy of the pairwise classification, prior to the application of global inference. The accuracy of the local ordering is 81.6%, which is lower than that of ILP. The superior performance of ILP demonstrates that accurate global inference can further refine local predictions. Surprisingly, the local classifier yields a higher accuracy than the two other inference strategies. Note, however, the local ordering procedure is not guaranteed to produce a consistent TDAG, and thus the local classifier cannot be used on its own to produce a valid TDAG.</Paragraph> <Paragraph position="5"> Table 2 shows the ordering results at the clausal level. The four-way classification is computed using both manually and automatically generated segments. Pairs of clauses that belong to the same segment stand in the equal relation, otherwise they have the same ordering relation as the segments to which they belong.</Paragraph> <Paragraph position="6"> On the clausal level, the difference between the performance of ILP and BF is blurred. When evaluated on manually-constructed segments, ILP out-performs BF by less than 1%. This unexpected result can be explained by the skewed distribution of edge types -- the two hardest edge types to classify (see Table 3), backward and null, account only for 7.4% of all edges at the clause level.</Paragraph> <Paragraph position="7"> When evaluated on automatically segmented text, ILP performs slightly worse than BF. We hypothesize that this result can be explained by the difference between training and testing conditions for the pairwise classifier: the classifier is trained on manually-computed segments and is tested on automatically-computed ones, which negatively affects the accuracy on the test set. While all the strategies are negatively influenced by this discrepancy, ILP is particularly vulnerable as it relies over clauses, computed over manually and automatically generated segments.</Paragraph> <Paragraph position="8"> on the score values for inference. In contrast, BF only considers the rank between the scores, which may be less affected by noise.</Paragraph> <Paragraph position="9"> We advocate a two-stage approach for temporal analysis: we first identify segments and then order them. A simpler alternative is to directly perform a four-way classification at the clausal level using the union of features employed in our two-stage process. The accuracy of this approach, however, is low -- it achieves only 74%, most likely due to the sparsity of clause-level representation for four-way classification. This result demonstrates the benefits of a coarse representation and a two-stage approach for temporal analysis.</Paragraph> </Section> class="xml-element"></Paper>