File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1426_metho.xml

Size: 2,827 bytes

Last Modified: 2025-10-06 14:15:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1426">
  <Title>THE PRACTICAL VALUE OF N'GRAN IS IN GENERATION</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Langkilde and Knight (1998) introduced Nitrogen, a system that implements a new style of generation in which corpus-based ngram statistics are used in place of deep, extensive symbolic knowledge to provide Very large-scale generation (lexicons and knowledge bases on the order of 200,000 entities), and simultaneously simplify the input and improve robustness for sentence generation. Nitrogen's generation occurs in two stages, as shown in Figure 1. First the input is mapped tO a word lattice, a compact representation of multiple generation possibilities. Then, a statistical extractor selects the most fluent path through the lattice.</Paragraph>
    <Paragraph position="1"> The word lattice encodes alternative English expressions for the input when the symbolic knowledge is unavailable (whether from the input, or from the knowledge bases) for making realization decisions. The Nitrogen statistical extractor ranks these alternative s using bigram (adjacent word pairs) and unigram (single word) statistics Collected from two years of the Wall Street Journal. The extraction algorithm is presented in (Knight and Hatzivassiloglou, 1995).</Paragraph>
    <Paragraph position="2">  cal Knowledge in a Natural Language Generator (Knight and Hatzivassil0glou, 1995).</Paragraph>
    <Paragraph position="3"> In essence, Nitrogen uses ngram statistics to robustly make a wide variety of decisions, from tense to word choice* to syntactic subcategorization, that traditionally are handled either with defaults (e.g., assume present tense, use the alphabetically-first synonyms, use nominal arguments), explicit input specification, or by using deep, detailed knowledge bases.</Paragraph>
    <Paragraph position="4"> However, in scaling up a generator system, these methods become unsatisfactory. Defaults are too rigid and limit quality; detailed input specs are difficult or complex to construct, or m~' be unavailable; and</Paragraph>
    <Paragraph position="6"> large-scale comprehensive knowledge bases of detailed linguistic relations do not exist, and even those on a smaller scale tend to be brittle.</Paragraph>
    <Paragraph position="7"> This paper examines the synergy between symbolic and statistical language processing and discusses Nitrogen's performance in practice. The analysis provides insight into the kinds of linguistic decisions that bigram frequency statistics can make, and how it improves sealability. It also discusses the limits of bigram statistical knowledge. It is organized around specific examples of Nitrogen's output.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML