File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-1005_abstr.xml
Size: 780 bytes
Last Modified: 2025-10-06 13:43:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1005"> <Title>Vocabulary Usage in Newswire Summaries</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Analysis of 9000 manually written summaries of newswire stories used in four Document Understanding Conferences indicates that approximately 40% of their lexical items do not occur in the source document. A further comparison of different summaries of the same document shows agreement on 28% of their vocabulary. It can be argued that these relationships establish a performance ceiling for automated summarization systems which do not perform syntactic and semantic analysis on the source document.</Paragraph> </Section> class="xml-element"></Paper>