File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/j05-3002_concl.xml
Size: 4,264 bytes
Last Modified: 2025-10-06 13:54:36
<?xml version="1.0" standalone="yes"?> <Paper uid="J05-3002"> <Title>Sentence Fusion for Multidocument News Summarization</Title> <Section position="8" start_page="320" end_page="323" type="concl"> <SectionTitle> 6. Conclusions and Future Work </SectionTitle> <Paragraph position="0"> In this article, we have presented sentence fusion, a novel method for text-to-text generation which, given a set of similar sentences, produces a new sentence containing the information common to most sentences. Unlike traditional generation methods, Computational Linguistics Volume 31, Number 3 sentence fusion does not require an elaborate semantic representation of the input but instead relies on the shallow linguistic representation automatically derived from the input documents and knowledge acquired from a large text corpus. Generation is performed by reusing and altering phrases from input sentences.</Paragraph> <Paragraph position="1"> As the evaluation described in Section 4 shows, our method accurately identifies common information and in most cases generates a well-formed fusion sentence. Our algorithm outperforms the shortest-sentence baseline in terms of content selection, without a significant drop in grammaticality. We also show that augmenting the fusion process with paraphrasing knowledge improves the output by both measures.</Paragraph> <Paragraph position="2"> However, there is still a gap between the performance of our system and human performance.</Paragraph> <Paragraph position="3"> An important goal for future work on sentence fusion is to increase the flexibility of content selection and realization. We believe that the process of aligning theme sentences can be greatly improved by having the system learn the similarity function, instead of using manually assigned weights. An interesting question is how such a similarity function can be induced in an unsupervised fashion. In addition, we can improve the flexibility of the fusion algorithm by using a more powerful language model. Recent research (Daume et al. 2002) has show that syntax-based language models are more suitable for language generation tasks; the study of such models is a promising direction to explore.</Paragraph> <Paragraph position="4"> An important feature of the sentence fusion algorithm is its ability to generate multiple verbalizations of a given fusion lattice. In our implementation, this property is utilized only to produce grammatical texts in the changed syntactic context, but it can also be used to increase coherence of the text at the discourse level by taking context into account. In our current system, each sentence is generated in isolation, independently from what is said before and what will be said after. Clear evidence of the limitation of this approach is found in the selection of referring expressions. For example, all summary sentences may contain the full description of a named entity (e.g., President of Columbia University Lee Bollinger), while the use of shorter descriptions such as Bollinger or anaphoric expressions in some summary sentences would increase the summary's readability (Schiffman, Nenkova, and McKeown 2002; Nenkova and McKeown 2003). These constraints can be incorporated into the sentence fusion algorithm, since our alignment-based representation of themes often contains several alternative descriptions of the same object.</Paragraph> <Paragraph position="5"> Beyond the problem of referring-expression generation, we found that by selecting appropriate paraphrases of each summary sentence, we can significantly improve the coherence of an output summary. An important research direction for future work is to develop a probabilistic text model that can capture properties of well-formed texts, just as a language model captures properties of sentence grammaticality. Ideally, such a model would be able to discriminate between cohesive fluent texts and ill-formed texts, guiding the selection of sentence paraphrases to achieve an optimal sentence sequence.</Paragraph> <Paragraph position="6"> Computational Linguistics Volume 31, Number 3 maps two top nodes of the tree one to another. The function returns the score of the alignment and the mapping itself.</Paragraph> </Section> class="xml-element"></Paper>