File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/p04-3020_evalu.xml

Size: 7,067 bytes

Last Modified: 2025-10-06 13:59:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3020">
  <Title>Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization</Title>
  <Section position="6" start_page="0" end_page="54" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The TextRank sentence extraction algorithm is evaluated in the context of a single-document summarization task, using 567 news articles provided during the Document Understanding Evaluations 2002 (DUC, 2002). For each article, TextRank generates a 100-words summary -- the task undertaken by other systems participating in this single document summarization task.</Paragraph>
    <Paragraph position="1"> For evaluation, we are using the ROUGE evaluation toolkit, which is a method based on Ngram statistics, found to be highly correlated with human evaluations (Lin and Hovy, 2003a). Two manually produced reference summaries are provided, and used in the evaluation process4.</Paragraph>
    <Paragraph position="2"> 2In single documents, sentences with highly similar content are very rarely if at all encountered, and therefore sentence redundancy does not have a significant impact on the summarization of individual texts. This may not be however the case with multiple document summarization, where a redundancy removal technique - such as a maximum threshold imposed on the sentence similarity - needs to be implemented.</Paragraph>
    <Paragraph position="3">  ROUGE, which was found to have the highest correlation with human judgments, at a confidence level of 95%. Only the first 100 words in each summary are considered.</Paragraph>
    <Paragraph position="4">  10: The storm was approaching from the southeast with sustained winds of 75 mph gusting to 92 mph.</Paragraph>
    <Paragraph position="5"> 11: &amp;quot;There is no need for alarm,&amp;quot; Civil Defense Director Eugenio Cabral said in a television alert shortly after midnight Saturday.</Paragraph>
    <Paragraph position="6"> 12: Cabral said residents of the province of Barahona should closely follow Gilbert's movement. 13: An estimated 100,000 people live in the province, including 70,000 in the city of Barahona, about 125 miles west of Santo Domingo.</Paragraph>
    <Paragraph position="7"> 14. Tropical storm Gilbert formed in the eastern Carribean and strenghtened into a hurricaine Saturday night.</Paragraph>
    <Paragraph position="8"> 15: The National Hurricaine Center in Miami reported its position at 2 a.m. Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo.</Paragraph>
    <Paragraph position="9"> 16: The National Weather Service in San Juan, Puerto Rico, said Gilbert was moving westard at 15 mph with a &amp;quot;broad area of cloudiness and heavy weather&amp;quot; rotating around the center of the storm.</Paragraph>
    <Paragraph position="10"> 17. The weather service issued a flash flood watch for Puerto Rico and the Virgin Islands until at least 6 p.m. Sunday.</Paragraph>
    <Paragraph position="11"> 18: Strong winds associated with the Gilbert brought coastal flooding, strong southeast winds, and up to 12 feet to Puerto Rico's south coast.</Paragraph>
    <Paragraph position="12"> 19: There were no reports on casualties.</Paragraph>
    <Paragraph position="13"> 20: San Juan, on the north coast, had heavy rains and gusts Saturday, but they subsided during the night.</Paragraph>
    <Paragraph position="14"> 21: On Saturday, Hurricane Florence was downgraded to a tropical storm, and its remnants pushed inland from the U.S. Gulf Coast.</Paragraph>
    <Paragraph position="15"> 22: Residents returned home, happy to find little damage from 90 mph winds and sheets of rain. 23: Florence, the sixth named storm of the 1988 Atlantic storm season, was the second hurricane. 24: The first, Debby, reached minimal hurricane strength briefly before hitting the Mexican coast last month.</Paragraph>
    <Paragraph position="16"> 8: Santo Domingo, Dominican Republic (AP) 9: Hurricaine Gilbert Swept towrd the Dominican Republic Sunday, and the Civil Defense alerted its heavily populated south coast to prepare for high winds, heavy rains, and high seas. 4: BC[?]Hurricaine Gilbert, 0348 3: BC[?]HurricaineGilbert, 09[?]11 339 5: Hurricaine Gilbert heads toward Dominican Coast 6: By Ruddy Gonzalez 7: Associated Press Writer  from a newspaper article.</Paragraph>
    <Paragraph position="17"> We evaluate the summaries produced by TextRank using each of the three graph-based ranking algorithms described in Section 2. Table 1 shows the results obtained with each algorithm, when using graphs that are: (a) undirected, (b) directed forward, or (c) directed backward.</Paragraph>
    <Paragraph position="18"> For a comparative evaluation, Table 2 shows the results obtained on this data set by the top 5 (out of 15) performing systems participating in the single document summarization task at DUC 2002 (DUC, 2002).</Paragraph>
    <Paragraph position="19"> It also lists the baseline performance, computed for 100-word summaries generated by taking the first sentences in each article.</Paragraph>
    <Paragraph position="20"> Discussion. The TextRank approach to sentence extraction succeeds in identifying the most important sentences in a text based on information exclusively  Rank sentence extraction. Graph-based ranking algorithms: HITS, Positional Function, PageRank.</Paragraph>
    <Paragraph position="21"> Graphs: undirected, directed forward, directed backward. null Top 5 systems (DUC, 2002)  for top 5 (out of 15) DUC 2002 systems, and baseline.</Paragraph>
    <Paragraph position="22"> drawn from the text itself. Unlike other supervised systems, which attempt to learn what makes a good summary by training on collections of summaries built for other articles, TextRank is fully unsupervised, and relies only on the given text to derive an extractive summary.</Paragraph>
    <Paragraph position="23"> Among all algorithms, the HITSA and PageRank algorithms provide the best performance, at par with the best performing system from DUC 20025. This proves that graph-based ranking algorithms, previously found successful in Web link analysis, can be turned into a state-of-the-art tool for sentence extraction when applied to graphs extracted from texts.</Paragraph>
    <Paragraph position="24"> Notice that TextRank goes beyond the sentence &amp;quot;connectivity&amp;quot; in a text. For instance, sentence 15 in the example provided in Figure 1 would not be identified as &amp;quot;important&amp;quot; based on the number of connections it has with other vertices in the graph6, but it is identified as &amp;quot;important&amp;quot; by TextRank (and by humans - according to the reference summaries for this text).</Paragraph>
    <Paragraph position="25"> Another important advantage of TextRank is that it gives a ranking over all sentences in a text - which means that it can be easily adapted to extracting very short summaries, or longer more explicative summaries, consisting of more than 100 words.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML