File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/j02-4006_abstr.xml
Size: 3,607 bytes
Last Modified: 2025-10-06 13:42:23
<?xml version="1.0" standalone="yes"?> <Paper uid="J02-4006"> <Title>Using Hidden Markov Modeling to Decompose Human-Written Summaries</Title> <Section position="2" start_page="0" end_page="528" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> We define a problem referred to as summary sentence decomposition. The goal of a decomposition program is to determine the relations between phrases in a summary and phrases in the corresponding original document. Our analysis of a set of human-written summaries has indicated that professional summarizers often rely on cutting and pasting text from the original document to produce summaries. Unlike most current automatic summarizers, however, which extract sentences or paragraphs without any modification, professional summarizers edit the extracted text using a number of revision operations.</Paragraph> <Paragraph position="1"> Decomposition of human-written summaries involves analyzing a summary sentence to determine how it is constructed by humans. Specifically, we define the summary sentence decomposition problem as follows: Given a human-written summary sentence, a decomposition program needs to answer three questions: (1) Is this summary sentence constructed by reusing the text in the original document? (2) If so, what phrases in the sentence come from the original document? and (3) From where in the document do the phrases come? Here, the term phrase refers to any sentence component that is cut from the original document and reused in the summary. A phrase can be at any granularity, from a single word to a complicated verb phrase to a complete sentence.</Paragraph> <Paragraph position="2"> There are two primary benefits of solving the summary sentence decomposition problem. First, decomposition can lead to better text generation techniques in summarization. Most domain-independent summarizers rely on simple extraction to produce summaries, even though extracted sentences can be incoherent, redundant, or misleading. By decomposing human-written sentences, we can deduce how summary sen[?] 600 Mountain Avenue, Murray Hill, NJ 07974. E-mail: hjing@research.bell-labs.com. The work reported here was completed while the author attended Columbia University.</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 28, Number 4 tences are constructed by humans. By learning how humans use revision operations to edit extracted sentences, we can develop automatic programs to simulate these revision operations and build a better text generation system for summarization. Second, the decomposition result also provides large corpora for extraction-based summarizers. By aligning summary sentences with original-document sentences, we can automatically annotate the most important sentences in an input document. By doing this automatically, we can afford to mark content importance for a large set of documents, thereby providing valuable training and testing data sets for extraction-based summarizers.</Paragraph> <Paragraph position="4"> We propose a hidden Markov model solution to the summary sentence decomposition problem. In the next section, we show by example the revision operations used by professional summarizers. In Section 3, we present our solution to the decomposition problem by first mathematically formulating the decomposition problem and then presenting the Hidden Markov Model. In Section 4, we present three evaluation experiments and their results. Section 5 describes applications, and Section 6 discusses related work.</Paragraph> </Section> class="xml-element"></Paper>