File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-3003_intro.xml
Size: 4,309 bytes
Last Modified: 2025-10-06 14:03:25
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-3003"> <Title>An Approach to Summarizing Short Stories</Title> <Section position="2" start_page="0" end_page="55" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In the course of recent years the scientific community working on the problem of automatic text summarization has been experiencing an upsurge. A multitude of different techniques has been applied to this end, some of the more remarkable of them being (Marcu, 1997; Mani et al. 1998; Teufel and Moens, 2002; Elhadad et al., 2005), to name just a few. These researchers worked on various text genres: scientific and popular scientific articles (Marcu, 1997; Mani et al., 1998), texts in computational linguistics (Teufel and Moens, 2002), and medical texts (Elhadad et al., 2002). All these genres are examples of texts characterized by rigid structure, relative abundance of surface markers and straightforwardness. Relatively few attempts have been made at summarizing less structured genres, some of them being dialogue and speech summarization (Zechner, 2002; Koumpis et al.</Paragraph> <Paragraph position="1"> 2001). The issue of summarizing fiction remains largely untouched, since a few very thorough earlier works (Charniak, 1972; Lehnert, 1982).</Paragraph> <Paragraph position="2"> The work presented here seeks to fill in this gap. The ultimate objective of the project is stated as follows: to produce indicative summaries of short works of fiction such that they be helpful to a potential reader in deciding whether she would be interested in reading a particular story or not.</Paragraph> <Paragraph position="3"> To this end, revealing the plot was deemed unnecessary and even undesirable. Instead, the current approach relies on the following assumption: when a reader is presented with an extracted summary outlining the general settings of a story (such as time, place and who it is about), she will have enough information to decide how interested she would be in reading a story. For example, a fragment of such a summary, produced by an annotator for the story The Cost of Kindness by Jerome K. Jerome is presented in Figure 1.</Paragraph> <Paragraph position="4"> The plot, which is a tale of how one local family decides to bid a warm farewell to Rev. Cracklethorpe and causes the vicar to change his mind and remain in town, is omitted.</Paragraph> <Paragraph position="5"> The data used in the experiments consisted of 23 short stories, all written in XIX - early XX century by main-stream authors such as Katherine Mansfield, Anton Chekhov, O.Henry, Guy de Maupassant and others (13 authors in total).</Paragraph> <Paragraph position="6"> The genre can be vaguely termed social fiction with the exception of a few fairy-tales. Such vagueness as far as genre is concerned was deliberate, as the author wished to avoid producing a system relying on cues specific to a particular genre. Average length of a story in the corpus is 3,333 tokens (approximately 4.5 letter-sized pages) and the target compression rate is 6%.</Paragraph> <Paragraph position="7"> In order to separate the background of a story from events, this project relies heavily on the notion of aspect (the term is explained in Section 3.1). Each clause of every sentence is described in terms of aspect-related features. This representation is then used to select salient descriptive sentences and to leave out those which describe events.</Paragraph> <Paragraph position="8"> The organization of the paper follows the overall architecture of the system. Section 2 provides a generalized overview of the pre-processing stage of the project, during which pronominal and nominal anaphoric references (the term is explained in Section 2) were resolved and main characters were identified. Section 3 briefly reviews the concept of aspect, gives an overview of the system and provides the linguistic motivation behind it. Section 4 describes the classification procedures (machine learning and manual rule creation) used to distinguish between descriptive elements of a story and passages that describe events. It also reports results. Section 5 draws some conclusions and outlines possible directions in which this work may evolve.</Paragraph> </Section> class="xml-element"></Paper>