File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1026_intro.xml

Size: 5,702 bytes

Last Modified: 2025-10-06 14:01:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1026">
  <Title>Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent work in statistical text summarization has put forward systems that do not merely extract and concatenate sentences, but learn how to generate new sentences from &lt;Summary, Text&gt; tuples. Depending on the chosen task, such systems either generate single-sentence &amp;quot;headlines&amp;quot; for multi-sentence text (Witbrock and Mittal, 1999), or they provide a sentence condensation module designed for combination with sentence extraction systems (Knight and Marcu, 2000; Jing, 2000). The challenge for such systems is to guarantee the grammaticality and summarization quality of the system output, i.e. the generated sentences need to be syntactically well-formed and need to retain the most salient information of the original document. For example a sentence extraction system might choose a sentence like: The UNIX operating system, with implementations from Apples to Crays, appears to have the advantage. null from a document, which could be condensed as: UNIX appears to have the advantage.</Paragraph>
    <Paragraph position="1"> In the approach of Witbrock and Mittal (1999), selection and ordering of summary terms is based on bag-of-words models and n-grams. Such models may well produce summaries that are indicative of the original's content; however, n-gram models seem to be insufficient to guarantee grammatical well-formedness of the system output. To overcome this problem, linguistic parsing and generation systems are used in the sentence condensation approaches of Knight and Marcu (2000) and Jing (2000).</Paragraph>
    <Paragraph position="2"> In these approaches, decisions about which material to include/delete in the sentence summaries do not rely on relative frequency information on words, but rather on probability models of subtree deletions that are learned from a corpus of parses for sentences and their summaries.</Paragraph>
    <Paragraph position="3"> A related area where linguistic parsing systems have been applied successfully is sentence simplification. Grefenstette (1998) presented a sentence reduction method that is based on finite-state technology for linguistic markup and selection, and Carroll et al. (1998) present a sentence simplification system based on linguistic parsing. However, these approaches do not employ statistical learning techniques to disambiguate simplification decisions, but iteratively apply symbolic reduction rules, producing a single output for each sentence.</Paragraph>
    <Paragraph position="4"> The goal of our approach is to apply the fine-grained tools for stochastic Lexical-Functional Grammar (LFG) parsing to the task of sentence condensation. The system presented in this paper is conceptualized as a tool that can be used as a standalone system for sentence condensation  or simplification, or in combination with sentence extraction for text-summarization beyond the sentence-level. In our system, to produce a condensed version of a sentence, the sentence is first parsed using a broad-coverage LFG grammar for English. The parser produces a set of functional (f)-structures for an ambiguous sentence in a packed format. It presents these to the transfer component in a single packed data structure that represents in one place the substructures shared by several different interpretations. The transfer component operates on these packed representations and modifies the parser output to produce reduced f-structures. The reduced f-structures are then filtered by the generator to determine syntactic well-formedness. A stochastic disambiguator using a maximum entropy model is trained on parsed and manually disambiguated f-structures for pairs of sentences and their condensations. Using the disambiguator, the string generated from the most probable reduced f-structure produced by the transfer system is chosen. In contrast to the approaches mentioned above, our system guarantees the grammaticality of generated strings through the use of a constraint-based generator for LFG which uses a slightly tighter version of the grammar than is used by the parser. As shown in an experimental evaluation, summarization quality of our system is high, due to the combination of linguistically fine-grained analysis tools and expressive stochastic disambiguation models.</Paragraph>
    <Paragraph position="5"> A second goal of our approach is to apply the standard evaluation methods for parsing to an automatic evaluation of summarization quality for sentence condensation systems. Instead of deploying costly and non-reusable human evaluation, or using automatic evaluation methods based on word error rate or n-gram match, summarization quality can be evaluated directly and automatically by matching the reduced f-structures that were produced by the system against manually selected f-structures that were produced by parsing a set of manually created condensations. Such an evaluation only requires human labor for the construction and manual structural disambiguation of a reusable gold standard test set. Matching against the test set can be done automatically and rapidly, and is repeatable for development purposes and system comparison. As shown in an experimental evaluation, a close correspondence can be established for rankings produced by the f-structure based automatic evaluation and a manual evaluation of generated strings.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML