File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1202_metho.xml

Size: 2,661 bytes

Last Modified: 2025-10-06 14:08:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1202">
  <Title>Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 System Architecture
</SectionTitle>
    <Paragraph position="0"> In addition to the traditional way of extracting the highest ranked sentences in a document to compose a summary as in (Edmundson 1969; Lin 1999; Kupiec et al. 1995; Brandow 1995; Zechner 1996), we embedded a machine learning mechanism in our system. The system architecture is shown in Figure 1 where the GEP module is highlighted. In the training stage, each of the training documents is passed to the GEP module after being preprocessed into a set of sentence feature vectors. The GEP runs m generations, and in each generation a population of p sentence scoring functions in the form of chromosomes in GEP is generated. Every candidate scoring function is then applied to sentence feature vectors from every training document and produces a score accordingly. Then all sentences in the same training document are ranked according to their scores, and n sentences with top scores are selected as an extract.</Paragraph>
    <Paragraph position="1"> The next step is to measure how similar the extract is to the objective summary. As discussed by (McLellan et al. 2001; Goldstein et al. 1999; McKeown et al. 2001), evaluating the quality of a summary often requires involvement of human subjects.</Paragraph>
    <Paragraph position="2"> This is almost impractical in a machine learning procedure. Thus we chose an alternative similarity measure as the approximation, i.e. a cosine function that is often seen in Information Retrieval to calculate the relevance of two documents, to compute the similarity between an extract and the objective summary. We compute the similarity values for each of the obtained extracts and their objective summaries respectively, and feed the results into the Fitness Calculation module to get a fitness measure for the current candidate sentence ranking function under consideration:</Paragraph>
    <Paragraph position="4"> is its objective summary.</Paragraph>
    <Paragraph position="5"> After the fitness value for every chromosome in the current generation is computed, the GEP population undergoes all genetic operators to produce the next generation. After the specified number of generations has been reached, the final best chromosome is returned as an optimal sentence ranking function for the training set and is ready to use in a test document to produce an extractive summary.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML