File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1005_intro.xml
Size: 1,361 bytes
Last Modified: 2025-10-06 14:02:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1005"> <Title>Vocabulary Usage in Newswire Summaries</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Automatic summarization systems rely on manually prepared summaries for training data, heuristics and evaluation. Generic summaries are notoriously hard to standardize; biased summaries, even in a most restricted task or application, also tend to vary between authors. It is unrealistic to expect one perfect model summary, and the presence of many, potentially quite diverse, models introduces considerable uncertainty into the summarization process. In addition, many summarization systems tacitly assume that model summaries are somehow close to the source documents.</Paragraph> <Paragraph position="1"> We investigate this assumption, and study the variability of manually produced summaries. We first describe the collection of documents with summaries which has been accumulated over several years of participation in the Document Understanding Conference (DUC) evaluation exercises sponsored by the National Institute of Science and Technology (NIST). We then present our methodology, discuss the rather pessimistic results, and finally draw a few simple conclusions.</Paragraph> </Section> class="xml-element"></Paper>