File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1202_evalu.xml

Size: 2,804 bytes

Last Modified: 2025-10-06 13:59:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1202">
  <Title>Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization</Title>
  <Section position="5" start_page="0" end_page="2" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We randomly selected 60 documents from the CMPLG corpus  for our experiments. The only restriction is that each document has an abstract provided which will serve as the objective summary. Among these 60 documents, 50 are used for training and the remaining 10 are used for testing. The function set for the GEP to evolve sentence ranking functions includes (+, -, *, /, power, sqrt, exp, log, min, max, and constant 1, 2, 3, 5, 7). The length of the chromosome is 128. Other GEP control parameters are set as follows: population, 256; probability of crossover, 0.5; probability of mutation, 0.2; probability of rotation, 0.2; generations, 10,000-50,000 (in five runs). Our system has produced a fivesentence extractive summary for each of the testing documents, and calculated the similarity between the produced summary and the abstract coming along with the document.</Paragraph>
    <Paragraph position="1"> Ideally, we would like to compare our system with other summarizers. However, due to the unavailability of other summarization systems to perform the same task, we designed three baseline methods, namely lead-based, randomly-selected, and random-lead-based, to generate summaries for performance comparison, which were also adopted by (Brandow et al. 1995; Zechner 1996; Radev et al.</Paragraph>
    <Paragraph position="2"> 2003). The baseline methods are detailed as  CMPLG corpus is composed of 183 documents from the Computation and Language (cmp-lg) collection, which has been marked up in XML. The documents are scientific papers which appeared in association for Computational Linguistics (ACL) sponsored conferences.</Paragraph>
    <Paragraph position="3"> follows: o The lead-based method selects the first sentences from the first five paragraphs as the summary of each of the testing documents.</Paragraph>
    <Paragraph position="4"> o The randomly-selected method chooses five sentences from a document at random to compose a summary.</Paragraph>
    <Paragraph position="5"> o The random-lead-based method chooses five sentences among the first sentences from all paragraphs in the document at random.</Paragraph>
    <Paragraph position="6"> We performed the random selection 1,000 times, and calculated the average similarity of the testing documents for each of the random-based methods.</Paragraph>
    <Paragraph position="7"> The experimental results are plotted in Figure 2, which have demonstrated that our system outperforms all three baseline methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML