File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1005_concl.xml
Size: 1,372 bytes
Last Modified: 2025-10-06 13:54:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1005"> <Title>Vocabulary Usage in Newswire Summaries</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> Previous research on the degree of agreement between documents and summaries, and between summaries, has generally indicated that there are significant differences in the vocabulary used by authors of summaries and the source document.</Paragraph> <Paragraph position="1"> Our study extends the investigation to a corpus currently popular in the text summarization research community and finds the majority opinion to be borne out there. In addition, our data suggests that summaries resemble the source document more closely than they do each other. The limited number of summaries available for any individual source document prevents us from learning any characteristics of the population of possible summaries. Would more summaries distribute themselves evenly throughout the semantic space defined by the source document's vocabulary? Would clumps and clusters show themselves, or a single cluster as van Halteren and Teufel suggest? If the latter, such a grouping would have a good claim to call itself a consensus summary of the document and a true gold standard would be revealed.</Paragraph> </Section> class="xml-element"></Paper>