File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/j05-3002_evalu.xml
Size: 9,797 bytes
Last Modified: 2025-10-06 13:59:25
<?xml version="1.0" standalone="yes"?> <Paper uid="J05-3002"> <Title>Sentence Fusion for Multidocument News Summarization</Title> <Section position="7" start_page="318" end_page="320" type="evalu"> <SectionTitle> 5. Related Work </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="318" end_page="320" type="sub_section"> <SectionTitle> 5.1 Text-to-Text Generation </SectionTitle> <Paragraph position="0"> Unlike traditional concept-to-text generation approaches, text-to-text generation methods take text as input and transform it into a new text satisfying some constraints (e.g., length or level of sophistication). In addition to sentence fusion, compression algorithms (Chandrasekar, Doran, and Bangalore 1996; Grefenstette 1998; Mani, Gates, and Bloedorn 1999; Knight and Marcu 2002; Jing and McKeown 2000; Reizler et al. 2003) and methods for expansion of a multiparallel corpus (Pang, Knight, and Marcu 2003) are other instances of such methods.</Paragraph> <Paragraph position="1"> Compression methods have been developed for single-document summarization, and they aim to reduce a sentence by eliminating constituents which are not crucial for understanding the sentence and not salient enough to include in the summary.</Paragraph> <Paragraph position="2"> These approaches are based on the observation that the &quot;importance&quot; of a sentence constituent can often be determined based on shallow features, such as its syntactic role and the words it contains. For example, in many cases a relative clause that is Table 10 An example of incorrect reference selection. Subscripts in the generated sentence indicate the theme sentence from which the words were extracted.</Paragraph> <Paragraph position="3"> #1 The segments will revive the &quot;Point-Counterpoint&quot; segments popular until they stopped airing in 1979, but will instead be called &quot;Clinton/Dole&quot; one week and &quot;Dole/Clinton&quot; the next week.</Paragraph> <Paragraph position="4"> Computational Linguistics Volume 31, Number 3 peripheral to the central point of the document can be removed from a sentence without significantly distorting its meaning. While earlier approaches for text compression were based on symbolic reduction rules (Grefenstette 1998; Mani, Gates, and Bloedorn 1999), more recent approaches use an aligned corpus of documents and their human written summaries to determine which constituents can be reduced (Knight and Marcu 2002; Jing and McKeown 2000; Reizler et al. 2003). The summary sentences, which have been manually compressed, are aligned with the original sentences from which they were drawn.</Paragraph> <Paragraph position="5"> Knight and Marcu (2000) treat reduction as a translation process using a noisy-channel model (Brown et al. 1993). In this model, a short (compressed) string is treated as a source, and additions to this string are considered to be noise. The probability of a source string s is computed by combining a standard probabilistic context-free grammar score, which is derived from the grammar rules that yielded tree s, and a word-bigram score, computed over the leaves of the tree. The stochastic channel model creates a large tree t from a smaller tree s by choosing an extension template for each node based on the labels of the node and its children. In the decoding stage, the system searches for the short string s that maximizes P(s|t), which (for fixed t) is equivalent to maximizing P(s) x P(t|s).</Paragraph> <Paragraph position="6"> While this approach exploits only syntactic and lexical information, Jing and McKeown (2000) also rely on cohesion information, derived from word distribution in a text: Phrases that are linked to a local context are retained, while phrases that have no such links are dropped. Another difference between these two methods is the extensive use of knowledge resources in the latter. For example, a lexicon is used to identify which components of the sentence are obligatory to keep it grammatically correct. The corpus in this approach is used to estimate the degree to which a fragment is extraneous and can be omitted from a summary. A phrase is removed only if it is not grammatically obligatory, is not linked to a local context, and has a reasonable probability of being removed by humans. In addition to reducing the original sentences, Jing and McKeown (2000) use a number of manually compiled rules to aggregate reduced sentences; for example, reduced clauses might be conjoined with and.</Paragraph> <Paragraph position="7"> Sentence fusion exhibits similarities with compression algorithms in the ways in which it copes with the lack of semantic data in the generation process, relying on shallow analysis of the input and statistics derived from a corpus. Clearly, the difference in the nature of both tasks and in the type of input they expect (single sentence versus multiple sentences) dictates the use of different methods. Having multiple sentences in the input poses new challenges--such as a need for sentence comparison--but at the same time it opens up new possibilities for generation. While the output of existing compression algorithms is always a substring of the original sentence, sentence fusion may generate a new sentence which is not a substring of any of the input sentences. This is achieved by arranging fragments of several input sentences into one sentence.</Paragraph> <Paragraph position="8"> The only other text-to-text generation approach able to produce new utterances is that of Pang, Knight, and Marcu (2003). Their method operates over multiple English translations of the same foreign sentence and is intended to generate novel paraphrases of the input sentences. Like sentence fusion, their method aligns parse trees of the input sentences and then uses a language model to linearize the derived lattice. The main difference between the two methods is in the type of the alignment: Our algorithm performs local alignment, while the algorithm of Pang, Knight, and Marcu (2003) performs global alignment. The differences in alignment are caused by differences in input: Pang, Knight, and Marcu's method expects semantically equivalent sentences, while our algorithm operates over sentences with only partial meaning overlap. The Barzilay and McKeown Sentence Fusion for Multidocument News Summarization presence of deletions and insertions in input sentences makes alignment of comparable trees a new and particularly significant challenge.</Paragraph> </Section> <Section position="2" start_page="320" end_page="320" type="sub_section"> <SectionTitle> 5.2 Computation of an Agreement Tree </SectionTitle> <Paragraph position="0"> The alignment method described in Section 3 falls into a class of tree comparison algorithms extensively studied in theoretical computer science (Sankoff 1975; Finden and Gordon 1985; Amir and Keselman 1994; Farach, Przytycka, and Thorup 1995) and widely applied in many areas of computer science, primarily computational biology (Gusfield 1997). These algorithms aim to find an overlap subtree that captures structural commonality across a set of related trees. A typical tree similarity measure considers the proximity, at both the node and the edge levels, between input trees.</Paragraph> <Paragraph position="1"> In addition, some algorithms constrain the topology of the resulting alignment based on the domain-specific knowledge. These constraints not only narrow the search space but also increase the robustness of the algorithm in the presence of a weak similarity function.</Paragraph> <Paragraph position="2"> In the NLP context, this class of algorithms has been used previously in example-based machine translation, in which the goal is to find an optimal alignment between the source and the target sentences (Meyers, Yangarber, and Grishman 1996). The algorithm operates over pairs of parallel sentences, where each sentence is represented by a structure-sharing forest of plausible syntactic trees. The similarity function is driven by lexical mapping between tree nodes and is derived from a bilingual dictionary. The search procedure is greedy and is subject to a number of constraints needed for alignment of parallel sentences.</Paragraph> <Paragraph position="3"> This algorithm has several features in common with our method: It operates over syntactic dependency representations and employs recursive computation to find an optimal solution. However, our method is different in two key aspects. First, our algorithm looks for local regions with high similarity in nonparallel data, rather than for full alignment, expected in the case of parallel trees. The change in optimization criteria introduces differences in the similarity measure--specifically, the relaxation of certain constraints--and the search procedure, which in our work uses dynamic programming.</Paragraph> <Paragraph position="4"> Second, our method is an instance of a multisequence alignment, in contrast to the pairwise alignment described in Meyers, Yangarber, and Grishman (1996). Combining evidence from multiple trees is an essential step of our algorithm--pairwise comparison of nonparallel trees may not provide enough information regarding their underlying correspondences. In fact, previous applications of multisequence alignment have been shown to increase the accuracy of the comparison in other NLP tasks (Barzilay and Lee 2002; Bangalore, Murdock, and Riccardi 2002; Lacatusu, Maiorano, and Harabagiu 2004); unlike our work these approaches operate on strings, not trees, and with the exception of (Lacatusu, Maiorano, and Harabagiu 2004), they apply alignment to parallel data, not comparable texts.</Paragraph> </Section> </Section> class="xml-element"></Paper>