File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/p00-1041_intro.xml
Size: 2,168 bytes
Last Modified: 2025-10-06 14:00:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1041"> <Title>Headline Generation Based on Statistical Translation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Generating effective summaries requires the ability to select, evaluate, order and aggregate items of information according to their relevance to a particular subject or for a particular purpose.</Paragraph> <Paragraph position="1"> Most previous work on summarization has focused on extractive summarization: selecting text spans - either complete sentences or paragraphs - from the original document. These extracts are Vibhu Mittal is now at Xerox PARC, 3333 Coyote Hill Road, Palo Alto, CA 94304, USA. e-mail: vmittal@parc.xerox.com; Michael Witbrock's initial work on this system was performed whilst at Just Research.</Paragraph> <Paragraph position="2"> then arranged in a linear order (usually the same order as in the original document) to form a summary document. There are several possible drawbacks to this approach, one of which is the focus of this paper: the inability to generate coherent summaries shorter than the smallest textspans being considered - usually a sentence, and sometimes a paragraph. This can be a problem, because in many situations, a short headline style indicative summary is desired. Since, in many cases, the most important information in the document is scattered across multiple sentences, this is a problem for extractive summarization; worse, sentences ranked best for summary selection often tend to be even longer than the average sentence in the document.</Paragraph> <Paragraph position="3"> This paper describes an alternative approach to summarization capable of generating summaries shorter than a sentence, some examples of which are given in Figure 1. It does so by building statistical models for content selection and surface realization. This paper reviews the framework, discusses some of the pros and cons of this approach using examples from our corpus of news wire stories, and presents an initial evaluation.</Paragraph> </Section> class="xml-element"></Paper>