File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0405_metho.xml
Size: 27,342 bytes
Last Modified: 2025-10-06 14:07:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0405"> <Title>Multi-Document Summarization By Sentence Extraction</Title> <Section position="3" start_page="0" end_page="40" type="metho"> <SectionTitle> 2. A group of articles may contain a temporal dimen- </SectionTitle> <Paragraph position="0"> sion, typical in a stream of news reports about an unfolding event. Here later information may override earlier more tentative or incomplete accounts.</Paragraph> <Paragraph position="1"> 3. The compression ratio (i.e. the size of the summary with respect to the size of the document set) will typically be much smaller for collections of dozens or hundreds of topically related documents than for single document summaries. The SUMMAC evaluation (TIPSTER, 1998a) tested 10% compression summaries, but in our work summarizing 200document clusters, we find that compression to the 1% or 0.1% level is required. Summarization becomes significantly more difficult when compression demands increase.</Paragraph> <Paragraph position="2"> 4. The co-reference problem in summarization presents even greater challenges for multi-document than for single-document summarization (Baldwin and Morton, 1998).</Paragraph> <Paragraph position="3"> This paper discusses an approach to multi-document summarization that builds on previous work in single-</Paragraph> <Paragraph position="5"> document summarization by using additional, available information about the document set as a whole, the relationships between the documents, as well as individual documents.</Paragraph> </Section> <Section position="4" start_page="40" end_page="41" type="metho"> <SectionTitle> 2 Background and Related Work </SectionTitle> <Paragraph position="0"> Generating an effective summary requires the summarizer to select, evaluate, order and aggregate items of information according to their relevance to a particular subject or purpose. These tasks can either be approximated by IR techniques or done in greater depth with fuller natural language processing. Most previous work in summarization has attempted to deal with the issues by focusing more on a related, but simpler, problem. With text-span deletion the system attempts to delete &quot;less important&quot; spans of text from the original document; the text that remains is deemed a summary. Work on automated document summarization by text span extraction dates back at least to work at IBM in the fifties (Luhn, 1958). Most of the work in sentence extraction applied statistical techniques (frequency analysis, variance analysis, etc.) to linguistic units such as tokens, names, anaphora, etc. More recently, other approaches have investigated the utility of discourse structure (Marcu, 1997), the combination of information extraction and language generation (Klavans and Shaw, 1995; McKeown et al., 1995), and using machine learning to find patterns in text (Teufel and Moens, 1997; Barzilay and Elhadad, 1997; Strzalkowski et al., 1998).</Paragraph> <Paragraph position="1"> Some of these approaches to single document summarization have been extended to deal with multi-document summarization (Mani and Bloedern, 1997; Goldstein and Carbonell, 1998; TIPSTER, 1998b; Radev and McKeown, 1998; Mani and Bloedorn, 1999; McKeown et al., .!999; Stein et al., 1999). These include comparing templates filled in by extracting information - using specialized, domain specific knowledge sources - from the doc&quot;ument, and then generating natural language summaries from the templates (Radev and McKeown, 1998), com-* paring named-entities - extracted using specialized lists - between documents and selecting the most relevant section (TIPSTER, 1998b), finding co-reference chains in the document set to identify common sections of interest (TIPSTER, 1998b), or building activation networks of related lexical items (identity mappings, synonyms, hypernyms, etc.) to extract text spans from the document set (Mani and Bloedern, 1997). Another system (Stein et al., 1999) creates a multi-document summary from multiple single document summaries, an approach that can be sub-optimal in some cases, due to the fact that the process of generating the final multi-document summary takes as input the individual summaries and not the complete documents. (Particularly if the single-document summaries can contain much overlapping information.) The Columbia University system (McKeown et al., 1999) creates a multi-document summary using machine learning and statistical techniques to identify similar sections and language generation to reformulate the summary.</Paragraph> <Paragraph position="2"> The focus of our approach is a multi-document system that can quickly summarize large clusters of similar documents (on the order of thousands) while providing the key relevant useful information or pointers to such information. Our system (1) primarily uses only domain-independent techniques, based mainly on fast, statistical processing, (2) explicitly deals with the issue of reducing redundancy without eliminating potential relevant information, and (3) contains parameterized modules, so that different genres or corpora characteristics can be taken into account easily.</Paragraph> </Section> <Section position="5" start_page="41" end_page="42" type="metho"> <SectionTitle> 3 Requirements for Multi-Document Summarization </SectionTitle> <Paragraph position="0"> There are two types of situations in which multi-document summarization would be useful: (1) the user is faced with a collection of dis-similar documents and wishes to assess the information landscape contained in the collection, or (2) there is a collection of topically-related documents, extracted from a larger more diverse collection as the result of a query, or a topically-cohesive cluster. In the first case, if the collection is large enough, it only makes sense to first cluster and categorize the documents (Yang et al., 1999), and then sample from, or summarize each cohesive cluster. Hence, a &quot;summary&quot; would constitute of a visualization of the information landscape, where features could be clusters or summaries thereof. In the second case, it is possible to build a synthetic textual summary containing the main point(s) of the topic, augmented with non-redundant background information and/or query-relevant elaborations. This is the focus of our work reported here, including the necessity to eliminate redundancy among the information content of multiple related documents.</Paragraph> <Paragraph position="1"> Users' information seeking needs and goals vary tremendously. When a group of three people created a multi-document summarization of 10 articles about the Microsoft Trial from a given day, one summary focused on the details presented in court, one on an overall gist of the day's events, and the third on a high level view of the goals and outcome of the trial. Thus, an ideal multi-document summarization would be able to address the different levels of detail, which is difficult without natural language understanding. An interface for the summarization system needs to be able to permit the user to enter information seeking goals, via a query, a background interest profile and/or a relevance feedback mechanism.</Paragraph> <Paragraph position="2"> Following is a list of requirements for multi-document summarization: * clustering: The ability to cluster similar documents and passages to find related information.</Paragraph> <Paragraph position="3"> * coverage: The ability to find and extract the main points across documents.</Paragraph> <Paragraph position="4"> * anti-redundancy: The ability to minimize redundancy between passages in the summary.</Paragraph> <Paragraph position="5"> *. summary cohesion criteria: The ability to combine text passages in a useful manner for the reader.-This may include: - document ordering: All text segments of highest ranking document, then all segments from the next highest ranking document, etc.</Paragraph> <Paragraph position="6"> - news-story principle (rank ordering):present the most relevant and diverse information first so that the reader gets the maximal information content even if they stop reading the summary.</Paragraph> <Paragraph position="7"> - topic-cohesion: Group together the passages by topic clustering using passage similarity criteria and present the information by the cluster&quot; centroid passage rank.</Paragraph> <Paragraph position="8"> -time line ordering: Text passages ordered based on the occurrence of events in time.</Paragraph> <Paragraph position="9"> * coherence: Summaries generated should be readable and relevant to the user.</Paragraph> <Paragraph position="10"> . context: Include sufficient context so that the summary is understandable to the reader.</Paragraph> <Paragraph position="11"> * identification of source inconsistencies: Articles often have errors (such as billion reported as million, etc.); multi-document summarization must be able to recognize and report source inconsistencies.</Paragraph> <Paragraph position="12"> * summary updates: A new multi-document summary must take into account previous summaries in generating new summaries. In such cases, the system needs to be able to track and categorize events.</Paragraph> <Paragraph position="13"> * effective user interfaces: - Attributability: The user needs to be able to easily access the source of a given passage.</Paragraph> <Paragraph position="14"> This could be the single document summary.</Paragraph> <Paragraph position="15"> - Relationship: The user needs to view related passages to the text passage shown, which can highlight source inconsistencies.</Paragraph> <Paragraph position="16"> - Source Selection: The user needs to be able to ,- select or eliminate various sources. For example, the user may want to eliminate information from some less reliable foreign news reporting sources.</Paragraph> <Paragraph position="17"> - Context: The user needs to be able to zoom in on the context surrounding the chosen passages. null - Redirection: The user should be able to highlight certain parts of the synthetic summary and give a command to the system indicating that these parts are to be weighted heavily and that other parts are to be given a lesser weight.</Paragraph> <Paragraph position="18"> 4 Types of Multi-Document Summarizers In the previous section we discussed the requirements for a multi-document summarization system. Depending on a user's information seeking goals, the user may want to create summaries that contain primarily the common portions of the documents (their intersection) or an overview of the entire cluster of documents (a sampling. of the space that the documents span). A user may also want to have a highly readable summary, an overview of pointers (sentences or word lists) to further information, * or a combination of the two. Following is a list of various methods of creating multi-document summaries by extraction: 1. Summary from Common Sections of Documents: Find the important relevant parts that the cluster of documents have in common (their intersection) and use that as a summary.</Paragraph> <Paragraph position="19"> 2. Summary from Common Sections and Unique Sections of Documents: Find the important relevant parts that the cluster of documents have in common and the relevant parts that are unique and use that as a summary.</Paragraph> <Paragraph position="20"> 3. Centroid Document Summary: Create a single document summary from the centroid document in the * cluster.</Paragraph> <Paragraph position="21"> 4. Centroid Document plus Outliers Summary: Create a single document summary from the centroid document in the cluster and add some representation from outlier documents (passages or keyword extraction) to provide a fuller coverage of the document set. 2 5. Latest Document plus Outliers Summary: Create a single document summary from the latest time stamped document in the cluster (most recent information) and add some representation of outlier documents to provide a fuller coverage of the document set.</Paragraph> <Paragraph position="22"> 6. Summary from Common Sections and Unique Sections of Documents with Time Weighting Factor: Find the important relevant parts that the cluster of documents have in common and the relevant parts that are unique and weight all the information by the time sequence of the documents in which they appear and use the result as a summary. This allows the more recent, often updated information to be more likely to be included in the summary.</Paragraph> <Paragraph position="23"> There are also much more complicated types of summary extracts which involve natural language processing and/or understanding. These types of summaries include: (1) differing points of view within the document collection, (2) updates of information within the document collection, (3) updates of information from the document collection with respect to an already provided summary, (4) the development of an event or subtopic of</Paragraph> <Paragraph position="25"> an event (e.g., death tolls) over time, and (5) a comparative development of an event.</Paragraph> <Paragraph position="26"> Naturally, an ideal multi-document summary would include a natural language generation component to create cohesive readable summaries (Radev and McKeown, 1998; McKeown et al., 1999). Our current focus is on the extraction of the relevant passages.</Paragraph> </Section> <Section position="6" start_page="42" end_page="43" type="metho"> <SectionTitle> 5 System Design </SectionTitle> <Paragraph position="0"> In the previous sections we discussed the requirements and types of multi-document summarization systems.</Paragraph> <Paragraph position="1"> This section discusses our current implementation of a multi-document summarization system which is designed to produce summaries that emphasize &quot;relevant novelty.&quot; Relevant novelty is a metric for minimizing redundancy and maximizing both relevance and diversity.</Paragraph> <Paragraph position="2"> A first approximation to measuring relevant novelty is to measure relevance and novelty independently and provide a linear combination as the metric. We call this linear combination &quot;marginal relevance&quot; .-- i.e., a text passage has high marginal relevance if it is both relevant to the query and useful for a summary, while having minimal similarity to previously selected passages. Using this metric one can maximize marginal relevance in retrieval and summarization, hence we label our method &quot;maximal marginal relevance&quot; (MMR) (Carboneli and Goldstein, 1998).</Paragraph> <Paragraph position="3"> The Maximal Marginal Relevance Multi-Document (MMR-MD) metric is defined in Figure 1. Sirnl and Sire2 cover some of the properties that we discussed in Section 3. 3 : For Sirnl, the first term is the cosine similarity metric for query and document. The second term computes a coverage score for the passage by whether the passage is in one or more clusters and the size of the cluster.</Paragraph> <Paragraph position="4"> The third term reflects the information content of the pas.sage by taking into account both statistical and linguistic features for summary inclusion (such as query expan.sion, position of the passage in the document and presence/absence of named-entities in the passage). The final term indicates the temporal sequence of the document in the collection allowing for more recent information to have higher weights.</Paragraph> <Paragraph position="5"> For Sire2, the first term uses the cosine similarity metric to compute the similarity between the passage and previously selected passages. (This helps the system to minimize the possibility of including passages similar to ones already selected.) The second term penalizes passages that are part of clusters from which other passages have already been chosen. The third term penalizes documents from which passages have already been selected; however, the penalty is inversely proportional to document length, to allow the possibility of longer documents contributing more passages. These latter two terms allow for a fuller coverage of the clusters and documents.</Paragraph> <Paragraph position="6"> Given the above definition, MMR-MD incrementally computes the standard relevance-ranked list - plus some additional scoring factors - when the parameter A= 1, and computes a maximal diversity ranking among the passages in the documents when A=0. For intermediate values of A in the interval \[0,1 \], a linear combination of both criteria is optimized. In order to sample the information space in the general vicinity of the query, small values of can be used; to focus on multiple, potentially overlapping or reinforcing relevant passages, A can be set to a value closer to 1. We found that a particularly effective search strategy for document retrieval is to start with a small A (e.g., A = .3) in order to understand the information space in the region of the query, and then to focus on the most important parts using a reformulated query (possibly via relevance feedback) and a larger value of (e.g., A = .7) (Carboneli and Goldstein, 1998).</Paragraph> <Paragraph position="7"> Our multi-document summarizer works as follows: * Segment the documents into passages, and index them using inverted indices (as used by the IR engine). Passages may be phrases, sentences, nsentence chunks, or paragraphs.</Paragraph> <Paragraph position="8"> * Identify the passages relevant to the query using cosine similarity with a threshold below which the passages are discarded.</Paragraph> <Paragraph position="9"> * Apply the MMR-MD metric as defined above. Depending on the desired length of the summary, select a number of passages to compute passage redundancy using the cosine similarity metric and use the passage similarity scoring as a method of clustering passages. Users can select the number of passages or the amount of compression.</Paragraph> <Paragraph position="10"> * Reassemble the selected passages into a summary document using one of the summary-cohesion criteria (see Section 3).</Paragraph> <Paragraph position="11"> The results reported in this paper are based on the use of the SMART search engine (Buckley, 1985) to compute cosine similarities (with a SMART weighting of lnn for both queries and passages), stopwords eliminated from the indexed data and stemming turned on.</Paragraph> </Section> <Section position="7" start_page="43" end_page="45" type="metho"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> The TIPSTER evaluation corpus provided several sets of topical clusters to which we applied MMR-MD summarization. As an example, consider a set of 200 apartheidrelated news-wire documents from the Associated Press and the Wall Street Journal, spanning the period from 1988 to 1992. We used the TIPSTER provided topic description as the query. These 200 documents were on an average 31 sentences in length, with a total of 6115 sentences. We used the sentence as our summary unit.</Paragraph> <Paragraph position="1"> Generating a summary 10 sentences long resulted in a</Paragraph> <Paragraph position="3"> where Sire1 is the similarity metric for relevance ranking Sim~ is the anti-redundancy metric D is a document collection P is the passages from the documents in that collection (e.g., ~j is passage j from document Di) Q is a query or user profile R = IR(D, P, Q, 8), i.e., the ranked list of passages from documents retrieved by an IR system, given D, P, Q and a ' relevance threshold O, below which it will not retrieve passages (O can be degree of match or number of passages) ._5&quot; is the subset of passages in R already selected R\S is the set difference, i.e., the set of as yet unselected passages in R ' C is the set of passage clusters for the set of documents (7vw is the subset of clusters of (7 that contains passage Pvw (7~ is the subset of clusters that contain passages from document D~ Ikl is the number of passages in the individual cluster k IC~,~ N Cijl is the number of clusters in the intersection of (7,,,nand(Tij wi..are weights for the terms, which can be optimized W is a word in the passage/~j type is a particular type of word, e.g., city name IOil is the length of document i.</Paragraph> <Paragraph position="5"> sentence compression ratio of 0.2% and a character compression of 0.3%, approximately two orders of magnitude different with compression ratios used in single document summarization. The results of summarizing this document set with a value of A set to I (effectively query relevance, but no MMR-MD) and A set to 0.3 (both query relevance and MMR-MD anti-redundancy) are shown in Figures 2 and 3 respectively. The summary in Figure 2 clearly illustrates the need for reducing redundancy and maximizing novel information.</Paragraph> <Paragraph position="6"> Consider for instance, the summary shown in Figure 2.</Paragraph> <Paragraph position="7"> The fact that the ANC is fighting to overthrow the gov- null i. wsJg10204-0176:1 CAPE TOWN, South Africa - President EW. de Klerk's proposal to repeal the major pillars of apartheid drew a generally positive response from black leaders, but African National Congress leader Nelson Mandela called on the international community to continue economic sanctions against South Africa until the government takes further steps.</Paragraph> <Paragraph position="8"> 6. AP880823-0069:17 The ANC is the main guerrilla group fighting to overthrow the South African government and end apartheid, the system of racial segregation in which South Africa's black majority has no vote in national affairs.</Paragraph> <Paragraph position="9"> 7. AP880803-0158:26 South Africa says the ANC, the main black group fighting to overthrow South Africa's whiteled government, has seven major military bases in Angola, and it wants those bases closed down. 8. AP880613-0126:15 The ANC is fighting to topple the South African government and its policy of apartheid, under which the nation's 26 million blacks have no voice in national affairs and the 5 million whites control the economy and dominate government.</Paragraph> <Paragraph position="10"> 9. AP880212-0060:13 The African National Congress is the main rebel movement fighting South Africa's white-led government and SWAPO is a black guerrilla group fighting for independence for Namibia, which is administered by South Africa.</Paragraph> <Paragraph position="11"> I0. WSJ870129-0051:1 Secretary of State George Shultz, in a meeting with Oliver Tambo, head of the African National Congress, voiced concerns about Soviet influence on the black South African group and the ANC's use of violence in the struggle against apartheid.</Paragraph> <Paragraph position="12"> * ernment is mentioned seven times (sentences #2,-#4,#6#9),&quot;which constitutes 70% of the sentences in the summary. Furthermore, sentence #3 is an exact duplicate of sentence #2, and sentence #7 is almost identical to sentence #4. In contrast, the summary in Figure 3, generated using MMR-MD with a value of A set to 0.3 shows significant improvements in eliminating redundancy. The fact that the ANC is fighting to overthrow the government is mentioned only twice (sentences #3,#7), and one of these sentences has additional information in it. The new summary retained only three of the sentences from the earlier summary.</Paragraph> <Paragraph position="13"> Counting clearly distinct propositions in both cases, yields a 60% greater information content for the MMR-MD case, though both summaries are equivalent in length.</Paragraph> <Paragraph position="14"> When these 200 documents were added to a set of 4 other topics of 200 documents, yielding a document-set with 1000 documents, the query relevant multi-document summarization system produced exactly the same resuits. null We are currently working on constructing datasetsfor experimental evaluations of multi-document summarization. In order to construct these data sets, we attempted to categorize user's information seeking goals for multi-document summarization (see Section 3). As can be seen in Figure 2, the standard IR technique of using a query to extract relevant passages is no longer sufficient for multi-document summarization due to redundancy. In addition, query relevant extractions cannot capture temporal sequencing. The data sets will allow us to measure the effects of these, and other features, on multi-document summarization quality.</Paragraph> <Paragraph position="15"> Specifically, we are constructing sets of 10 documents, * which either contain a snapshot of an event from multiple sources or the unfoldment of an event over time.</Paragraph> <Paragraph position="16"> I I 1. WSJ870129-0051 1 Secretary of State George Shultz, in a meeting with Oliver Tambo, head of the African Na null The ANC wants a simple one-man, one-vote majority rule system, while the government claims that will lead to black domination and insists on constitutional protection of the rights of minorities, including the whites. 7. WSJ900807-0037 1 JOHANNESBURG, South Africa - The African National Congress suspended its 30-year armed struggle against the whiie minority government, clearing the way for the start of negotiations over a new constitution based on black-white power sharing.</Paragraph> <Paragraph position="17"> 8. WSJ900924-011920 The African National Congress, South Africa's main black liberation group, forged its sanctions strategy as a means of pressuring the government to abandon white-minority rule. 9. WSJ910702-0053 36 At a, meeting in South Africa this week, the African National Congress, the major black group, is expected to take a tough line again st the white-rnn government. 10. wsJg10204-01761 CAPE TOWN, South Africa - President EW. de Klerk's proposal to repeal the major pillars of apartheid drew a generally positive response from black leaders, but African National Congress leader Nelson Mandela called on the international community to continue economic sanctions against South Africa until the From these sets we are performing two types of experiments. In the first, we are examining how users put sentences into pre-defined clusters and how they create sentence based multi-document summaries. The result will also serve as a gold standard for system generated summaries - do our systems pick the same summary sentences as humans and are they picking sentences from the same clusters as humans? The second type Of experiment is designed to determine how users perceive the output summary quality. In this experiment, users are asked to rate the output sentences from the summarizer as good, okay or bad. For the okay or bad sentences, they are asked to provide a summary sentence from the document set that is &quot;better&quot;, i.e., that makes a better set of sentences to represent the information content of the document set. We are comparing our proposed summarizer #6 in Section 4 to summarizer #1, the common portions of the document sets with no anti-redundancy and summarizer #3, single document summary of a centroid document using our single document summarizer (Goldstein et al., 1999).</Paragraph> </Section> class="xml-element"></Paper>