File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1032_evalu.xml
Size: 5,043 bytes
Last Modified: 2025-10-06 13:59:22
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1032"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 249-256, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Bayesian Learning in Text Summarization</Title> <Section position="5" start_page="252" end_page="255" type="evalu"> <SectionTitle> 4 Results and Discussion </SectionTitle> <Paragraph position="0"> Tables 2 through 4 show how the Bayesian summarist performs on G1K3, G2K3, and G3K3. The tables list results in precision at compression rates (r) of interest (0 < r < 1). The figures thereof indicate performance averaged over leave-one-out cross validation folds. What this means is that you leave out one text for testing and use the rest for training, which you repeat for each one of the texts in the data.</Paragraph> <Paragraph position="1"> Since we have 25 texts for each data set, this leads to a 25-fold cross validation. Precision is defined by the ratio of hits (positive sentences) to the number of sentences retrieved, i.e., r-percent of sentences in the text.6 In each table, figures to the left of the vertical line indicate performance of summarizers with BIC/MC and those to the right that of summarizers without them. Parenthetical figures like '(5K)' and '(20K)' indicate the number of iterations we took them to: thus BIC(5K) refers to a summarizer based on C4.5/BIC with scores averaged over 5,000 runs.</Paragraph> <Paragraph position="2"> BSE denotes a reference summarizer based on a regular C4.5, which it involves no resampling of training data. LEAD refers to a summarizer which works their favor are marked positive, that is, for each sentence marked positive, at least three people are in favor of including it in a summary.</Paragraph> <Paragraph position="3"> by selecting sentences from the top of the text. It is generally considered a hard-to-beat approach in the summarization literature.</Paragraph> <Paragraph position="4"> Table 4 shows results for G3K3 (a news story domain). There we find a significantly improvement to performance of C4.5, whether it operates with BIC or MC. The effect is clearly visible across a whole range of compression rates, and more so at smaller rates.</Paragraph> <Paragraph position="5"> Table 3 demonstrates that the Bayesian approach is also effective for G2K3 (an editorial domain), outperforming both BSE and LEAD by a large margin.</Paragraph> <Paragraph position="6"> Similarly, we find that our approach comfortably beats LEAD in G1K3 (a column domain). Note the dashes for BSE. What we mean by these, is that we obtained no meaningful results for it, because we were unable to rank sentences based on predictions by BSE. To get an idea of how this happens, let us look at a decision tree BSE builds for G1K3, which is shown in figure 4. What we have there is a decision tree consisting of a single leaf.7 Thus for whatever sentence we feed to the tree, it throws back the same membership probability, which is 65/411. But then this would make a BSE based summarizer utterly useless, as it reduces to generating a summary by picking at random, a particular portion of text.8 Now Figure 5 shows what happens with the Bayesian model (MC), for the same data. There we see a tree of a considerable complexity, with 24 leaves and 18 split nodes.</Paragraph> <Paragraph position="7"> Let us now turn to the issues with l. As we might recall, l influences the shape of a Dirichlet distribution: a large value of l causes the distribution to have less variance and therefore to have a more acute peak around the expectation. What this means is that increasing the value of l makes it more likely to have us drawing samples closer to the expectation. As a consequence, we would have the MC model acting more like the BIC model, which is based on MAP estimates. That this is indeed the case is demonstrated by table 5, which gives results for the MC model on G1K3 to G3K3 at l = 1. We see that the MC behaves less like the BIC at l = 1 than at l = 5 (table 2 through 4).</Paragraph> <Paragraph position="8"> Of a particular interest in table 5 is G1K3, where the MC suffers a considerable degradation in performance, compared to when it works with l = 5.</Paragraph> <Paragraph position="9"> G2K3 and G3K3, again, witness some degradation in performance, though not as extensive as in G1K3.</Paragraph> <Paragraph position="10"> It is interesting that at times the MC even works better with l = 1 than l = 5 in G2K3 and G3K3.9 to: 0.1466 (r = 0.05), 0.1453 (r = 0.1), 0.1508 (r = 0.15), 0.1530 (r = 0.2), 0.1534 (r = 0.25), and 0.1544 (r = 0.3). All in all, the Bayesian model proves more effective in leveraging performance of the summarizer on a DOV exhibiting a complex, multiply peaked form as in G1K3 and G2K3, and less on a DOV which has a simple, single-peak structure as in G3K3 (cf.</Paragraph> <Paragraph position="11"> figure 1).10</Paragraph> </Section> class="xml-element"></Paper>