File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0906_intro.xml

Size: 2,631 bytes

Last Modified: 2025-10-06 14:03:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0906">
  <Title>Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 41-48, Ann Arbor, June 2005. c(c)2005 Association for Computational Linguistics Evaluating Summaries and Answers: Two Sides of the Same Coin?</Title>
  <Section position="2" start_page="0" end_page="41" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent developments in question answering (QA) and multi-document summarization point to many  interestingconvergencesthatpresentexcitingopportunities for collaboration and cross-fertilization between these largely independent communities. This position paper attempts to draw connections between the task of answering complex natural language questions and the task of summarizing multiple documents, the boundaries between which are beginning to blur, as anticipated half a decade ago (Carbonell et al., 2000).</Paragraph>
    <Paragraph position="1"> Although the complementary co-evolution of question answering and document summarization presents new directions for system-building, this paper primarily focuses on implications for evaluation. Although assessment of answer and summaryqualityemploysdifferentmethodologies, there are many lessons that each community can learn from the other. The summarization community has extensive experience in intrinsic metrics based on n-gram overlap for automatically scoring system outputs against human-generated reference texts-these techniques would help streamline aspects of question answering evaluation. In the other direction, because question answering has its roots in information retrieval, much work has focused on extrinsic metrics based on relevance and topicality, which may be valuable to summarization researchers. null This paper is organized as follows: In Section 2, we discuss the evolution of question answering research and how recent trends point to the convergence of question answering and multi-document summarization. In Section 3, we present a case study of automatically evaluating definition questions by employing metrics based on n-gram overlap, a general technique widely used in summarization and machine translation evaluations. Section 4 highlights some opportunities for knowledge transfer in the other direction: how the notions of rele- null vance and topicality, well-studied in the information retrieval literature, can guide the evaluation of topic-oriented summaries. We conclude with thoughts about the future in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML