File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1033_intro.xml

Size: 4,866 bytes

Last Modified: 2025-10-06 14:06:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1033">
  <Title>Building a Generation Knowledge Source using Internet-Accessible Newswire</Title>
  <Section position="2" start_page="0" end_page="221" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In our work to date on news summarization at Columbia University (McKeown and Radev, 1995; Radev, 1996), information is extracted from a series of input news articles (MUC, 1992; Grishman et al., 1992) and is analyzed by a generation component to produce a summary that shows how perception of the event has changed over time. In this summarization paradigm, problems arise when information needed for the summary is either missing from the input article(s) or not extracted by the information extraction system. In such cases, the information may be readily available in other current news stories, in past news, or in online databases. If the summarization system can find the needed information in other online sources, then it can produce an improved summary by merging information from multiple sources with information extracted from the input articles.</Paragraph>
    <Paragraph position="1"> In the news domain, a summary needs to refer to people, places, and organizations and provide descriptions that clearly identify the entity for the reader. Such descriptions may not be present in the original text that is being summarized. For example, the American pilot Scott O'Grady, downed in Bosnia in June of 1995, was unheard of by the American public prior to the incident. If a reader tuned into news on this event days later, descriptions from the initial articles may be more useful. A summarizer that has access to different descriptions will be able to select the description that best suits both the reader and the series of articles being summarized.</Paragraph>
    <Paragraph position="2"> In this paper, we describe a system called PROFILE that tracks prior references to a given entity by extracting descriptions for later use in summarization. In contrast with previous work on information extraction, our work has the following features: * It builds a database of profiles for entities by storing descriptions from a collected corpus of * past news.</Paragraph>
    <Paragraph position="3"> * It operates in real time, allowing for connections with the latest breaking, online news to extract information about the most recently mentioned individuals and organizations.</Paragraph>
    <Paragraph position="4"> * It collects and merges information from distributed sources thus allowing for a more complete record of information.</Paragraph>
    <Paragraph position="5"> * As it parses and identifies descriptions, it builds a lexicalized, syntactic representation of the description in a form suitable for input to the FUF/SURGE language generation system (Elhadad, 1993; Robin, 1994).</Paragraph>
    <Paragraph position="6"> The result is a system that can combine descriptions from articles appearing only a few minutes before the ones being summarized with descriptions from past news in a permanent record for future use. Its utility lies in its potential for representing entities, present in one article, with descriptions found in other articles, possibly coming from another source.</Paragraph>
    <Paragraph position="7"> Since the system constructs a lexicalized, syntactic functional description (FD) from the extracted description, the generator can re-use the description in new contexts, merging it with other  descriptions, into a new grammatical sentence.</Paragraph>
    <Paragraph position="8"> This would not be possible if only canned strings were used, with no information about their internal structure. Thus, in addition to collecting a knowledge source which provides identifying features of individuals, PROFILE also provides a lexicon of domain appropriate phrases that can be integrated with individual words from a generator's lexicon to flexibly produce summary wording.</Paragraph>
    <Paragraph position="9"> We have extended the system by semantically categorizing descriptions using WordNet (Miller et al., 1990), so that a generator can more easily determine which description is relevant in different contexts.</Paragraph>
    <Paragraph position="10"> PROFILE can also be used in a real-time fashion to monitor entities and the changes of descriptions associated with them over the course of time. In the following sections, we first overview related work in the area of information extraction. We then turn to a discussion of the system components which build the profile database, followed by a description of how the results are used in generation. We close with our current directions, describing what parameters can influence a strategy for generating a sequence of anaphoric references to the same entity over time.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML