File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-1612_concl.xml

Size: 4,180 bytes

Last Modified: 2025-10-06 13:55:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1612">
  <Title>Explorations in Sentence Fusion[?]</Title>
  <Section position="7" start_page="5" end_page="5" type="concl">
    <SectionTitle>
5 Discussion and Future work
</SectionTitle>
    <Paragraph position="0"> In this paper we have described our ongoing work on sentence fusion for Dutch. Starting point was the sentence fusion model proposed by [Barzilay et al., 1999; Barzilay, 2003] in which dependency analyses of pairs of sentences are first aligned, after which the aligned parts (representing the common information) are fused. The resulting fused dependency tree is subsequently transfered into natural language. Our new contributions are primarily in two areas. First, we carried out an explicit evaluation of the alignment - both human and automatic alignment - whereas [Barzilay, 2003] only evaluates the output of the complete sentence fusion process. We found that annotators can reliably align phrases and assign relation labels to them, and that good results can be achieved with automatic alignment, certainly above an informed baseline, albeit still below human performance. Second, Barzilay and co-workers developed their sentence fusion model in the context of multi-document summarization, but arguably the approach could also be applicable for applications such as question answering or information extraction. This seems to call for a more refined version of sentence fusion, which has consequences for alignment, merging and realization. We have therefore introduced five different types of semantic relations between strings, namely equals, restates, specifies, generalizes and intersects. This increases the expressiveness of the representation, and supports generating restatements, generalizations and specifications. Finally, we described and evaluated our first results on sentence realization based on these refined alignments, with promising results.</Paragraph>
    <Paragraph position="1"> Similar work is described in [Pang et al., 2003], who describe a syntax-based algorithm that builds word lattices from parallel translations which can be used to generate new paraphrases. Their alignment algorithm is less refined, and there is only type of alignment and hence output (only restatements), but their mapping of aligned trees to a word lattice (or FSA) seems worthwhile to explore in combination with the approach we have proposed here.</Paragraph>
    <Paragraph position="2"> One of the issues that remains to be addressed in future work is the effect of parsing errors. Such errors were not manually corrected, but during manual alignment, however, we sometimes found that substrings could not be properly aligned because the parser failed to identify them as syntactic constituents. The repercussions of this for the generation should be investigated by comparing the results obtained here with alignments on perfect parses. Furthermore, our work on automatic alignment so far only concerned the alignment of nodes, not the determination of the relation type. We intend to address this task with machine learning, initially relying on shallow features such as the length of the respective token strings and the amount of overlap. It is also clear that more work is needed on merging and surface realization. One possible direction here is to exploit the relatively rich linguistic representation of the input sentences (POS tags, lemmas and dependency structures), for instance, along the lines of [Bangalore and Rambow, 2000]. Yet another issue concerns the type of text material. The sentence pairs from our current corpus are relatively close, in the sense that there is usually a 1to-1 mapping between sentences, and both translations more or less convey the same information. Although this seems a good starting point to study alignment, we intend to continue with other types of text material in future work. For instance, in extending our work to the actual output of a QA system, we expect to encounter sentences with far less overlap. Of particular interest to us is also whether sentence fusion can be shown to improve the quality of QA system output.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML