File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0406_intro.xml

Size: 1,681 bytes

Last Modified: 2025-10-06 14:01:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0406">
  <Title>Manual and Automatic Evaluation of Summaries</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Previous efforts in large-scale evaluation of text summarization include TIPSTER SUMMAC (Mani et al. 1998) and the Document Understanding Conference (DUC) sponsored by the National Institute of Standards and Technology (NIST). DUC aims to compile standard training and test collections that can be shared among researchers and to provide common and large scale evaluations in single and multiple document summarization for their participants.</Paragraph>
    <Paragraph position="1"> In this paper we discuss manual and automatic evaluations of summaries using data from the Document Understanding Conference 2001 (DUC-2001). Section 2 gives a brief overview of the evaluation procedure used in DUC-2001 and the Summary Evaluation Environment (SEE) interface used to support the DUC-2001 human evaluation protocol. Section 3 discusses evaluation metrics. Section 4 shows the instability of manual evaluations. Section 5 outlines a method of automatic summary evaluation using accumulative n-gram matching score (NAMS) and proposes a view that casts summary evaluation as a decision making process. It shows that the NAMS method is bounded and in most cases not usable, given only a single reference summary to compare with. Section 6 discusses why this is so, illustrating various forms of mismatching between human and system summaries. We conclude with lessons learned and future directions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML