File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2222_intro.xml

Size: 2,483 bytes

Last Modified: 2025-10-06 14:06:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2222">
  <Title>Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications</Title>
  <Section position="3" start_page="0" end_page="1364" type="intro">
    <SectionTitle>
2 Searchable LEAD Overview
</SectionTitle>
    <Paragraph position="0"> Searchable LEAD was originally implemented to provide LEXIS-NEXIS customers with the means to limit Boolean queries to key parts of news documents. It is based on the premise that major entities and topics of news stories are usually introduced in the leading portion of news documents. Searchable LEAD targets the subset of news information customers who want to retrieve documents that contain major references to their targeted topics but not documents that only mention those topics in passing.</Paragraph>
    <Paragraph position="1"> These customers generally can expect higher precision and lower recall when they restrict their Boolean queries to the headline and leading text than if they were to apply their queries to the full text.</Paragraph>
    <Paragraph position="2"> Documents in our news database have several text fields including HEADLINE and BODY fields.</Paragraph>
    <Paragraph position="3"> Searchable LEAD software identifies the leading portion of the BODY field and labels it the &amp;quot;LEAD&amp;quot; field. The amount of the BODY field that is ineluded in the LEAD is based on document length.</Paragraph>
    <Paragraph position="4"> Minimum thresholds for the number of words, sentences and paragraphs to include in LEADs increase as document length increases.</Paragraph>
    <Paragraph position="5"> In an examination of more than 9,000 news documents from more than 250 publications, we found that short documents usually begin with good topicsummarizing leading sentences, what we call the logical lead. Longer documents, however, more often begin with anecdotal information before presenting the logical lead. LEAD fields must be longer for these documents in order to include the logical lead in most instances. Using a fixed amount of leading text regardless of document length would have resulted in LEADs that include  too much text beyond the logical lead for shorter documents, and LEADs that miss the logical lead entirely for longer documents.</Paragraph>
    <Paragraph position="6"> Customers can limit part or all of a Boolean query to the LEAD, as the following query shows:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML