File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1202_intro.xml
Size: 3,552 bytes
Last Modified: 2025-10-06 14:02:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1202"> <Title>Using Thematic Information in Statistical Headline Generation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Ours is an age where many documents are archived electronically and are available whenever needed. In the midst of this plethora of information, the successful completion of a research task is affected by the ease with which users can quickly identify the relevant electronic documents that satisfy their information needs.</Paragraph> <Paragraph position="1"> To do so, a researcher often relies on generated summaries that reflect the contents of the original document.</Paragraph> <Paragraph position="2"> We explore the problem of single sentence summarisation, the primary focus of this paper. Instead of identifying and extracting the most important sentence, we generate a new sentence from scratch. The resulting sentence summary may not occur verbatim in the source document but may instead be a paraphrase combining key words and phrases from the text.</Paragraph> <Paragraph position="3"> As a precursor to single sentence summarisation, we first explore the particular case of headline generation in the news domain, specifically English news. Although headlines are often constructed to be sensationalist, we regard headline generation as an approximation to single sentence summarisation, given that a corpus of single sentence summaries does not exist.</Paragraph> <Paragraph position="4"> Our system re-uses words from the news article to generate a single sentence summary that resembles a headline. This is done by selecting and then appending words from the source article. This approach has been explored by a number of researchers (eg. see Witbrock and Mittal, 1999; Jin and Hauptmann, 2002) and we will describe their work further in the next section. In existing approaches, a word is selected on the basis of two criteria: how well it acts as a summary word, and how grammatical it will be given the preceding summary words that have already been chosen.</Paragraph> <Paragraph position="5"> The purpose of this paper is to present work which investigates the use of Singular Value Decomposition (SVD) as a means of determining if a word is a good candidate for inclusion in the headline.</Paragraph> <Paragraph position="6"> To introduce the notion of using SVD for single sentence summarisation in this paper, we examine the simplest summarisation scenario.</Paragraph> <Paragraph position="7"> Thus, presently we are only concerned with single document summarisation. In addition, we limit the focus of our discussion to the generation of generic summaries.</Paragraph> <Paragraph position="8"> In the remainder of this paper, we describe our motivation for using SVD by describing difficulties in generating headlines in Section 2. In Section 3, as motivation for our approach, we illustrate how words can be used out of context, resulting in factually incorrect statements.</Paragraph> <Paragraph position="9"> Section 4 provides an overview of related work. In Section 5, we give a detailed description of how we generate the sentence summary statistically and how we use SVD to guide the generation process. In Section 6, we present our experimental design in which we evaluated our approach, along with the results and corresponding discussion. Finally, in Section 7, we present our conclusions and future work.</Paragraph> </Section> class="xml-element"></Paper>