File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0902_intro.xml

Size: 3,462 bytes

Last Modified: 2025-10-06 14:03:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0902">
  <Title>Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 9-16, Ann Arbor, June 2005. c(c)2005 Association for Computational Linguistics On the Subjectivity of Human Authored Short Summaries</Title>
  <Section position="3" start_page="9" end_page="9" type="intro">
    <SectionTitle>
2 Related Works
</SectionTitle>
    <Paragraph position="0"> There have been a number of studies concerned with collating and analysing of human authored summaries, with the aim of producing and evaluating machine generated summaries. A phrase weighting process called the 'pyramid method' was described in (Nenkova and Passonneau, 2004). They exploited the frequency of the same (similar) information that was in multiple summaries of the same story. It was referred to as a summarisation content unit (SCU).</Paragraph>
    <Paragraph position="1"> Increasing stability of pyramid scores was observed as the pyramid grew larger. The authors concluded, however, that the initial creation of the pyramid was a tedious task because a large number of SCUs had to be hand annotated.</Paragraph>
    <Paragraph position="2"> In (Van Halteren and Teufel, 2003), the co-occurrence of atomic information elements, called factoids, was examined whilst analysing 50 different summaries of two stories. A candidate summary was compared with the reference using factoids in order to measure the informativeness. The authors observed that from a wide selection of factoids only a small number were included in all summaries. From a pool of factoids, approximately 30% were taken to build a consensus summary that could be used as a 'gold standard'.</Paragraph>
    <Paragraph position="3"> Summary evaluation has been recognised as a sensitive, non-trivial task. In (Radev and Tam, 2003) the relative utility was calculated based on a significance ranking assigned to each sentence. A word network based summary evaluation scheme was proposed in (Hori et al., 2003), where the accuracy was weighted by the posterior probability of the manual summaries in the network. Significantly, they surmised the independence of their criterion from the variations in hand summaries.</Paragraph>
    <Paragraph position="4"> A regression analysis was performed in (Hirohata et al., 2005) and concluded that objective evaluations were more effective than subjective approaches. Although their experiments were concerned with presentation speech, the results do have a universal appeal.</Paragraph>
    <Paragraph position="5"> Another notable development in the field is the a0 -gram co-occurrence matching technique as proposed in (Lin and Hovy, 2003a). Their tool, ROUGE, compares the number of a0 -gram matches between a reference and a machine generated summary. Recently, ROUGE was piloted for evaluation of summaries from newspaper/newswire articles (Over and Yen, 2004). ROUGE simulated the manual evaluation well for that task, although it is still unclear how closely it well to other tasks.</Paragraph>
    <Paragraph position="6"> To some extent, the work described in this paper is close to that of (Nenkova and Passonneau, 2004) and (Van Halteren and Teufel, 2003). We analyse human authored summaries associating human subjectivity with their unique interpretation of stories. We consider their effect when evaluating machine generated summaries.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML