XML Viewer - p02-1012

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1012_concl.xml
Size: 5,242 bytes
Last Modified: 2025-10-06 13:53:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1012">
  <Title>Pronominalization in Generated Discourse and Dialogue</Title>
  <Section position="11" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Implementation and Evaluation
</SectionTitle>
    <Paragraph position="0"> STORYBOOK (Callaway and Lester, 2001b; Callaway and Lester, in press) is an implemented narrative generation system that converts a pre-existing Sentences as seen by the reader (antecedents underlined, pronouns in bold): Now, it happened that a wolf  narrative (discourse) plan into a multi-page fictional narrative in the fairy tale domain. Using a pipelined generation architecture, STORYBOOK performs pronominalization before sentence planning, and includes a revision component that is sensitive to pronominalization choices during clause aggregation. A previous large-scale evaluation of STORYBOOK (Callaway and Lester, 2001a) which included both a full version and a version with the pronominalization component ablated showed that including such a component significantly increases the quality of the resulting prose.</Paragraph>
    <Paragraph position="1"> However, there are significant practical obstacles to comparing the performance of different pronominalization algorithms using corpus matching criteria instead of &amp;quot;quality&amp;quot; as evaluated by human judges. Because systems that can handle a large quantity of text are very recent and because it can require years to create and organize the necessary knowledge to produce even one multi-paragraph text, much research on anaphora generation has instead relied on one of two techniques: AF Checking algorithms by hand: One verification method is to manually examine a text, identifying candidates for pronominalization and simulating the rules of a particular theory. However, this method is prone to human error.</Paragraph>
    <Paragraph position="2"> AF Checking algorithms semiautomatically: Other researchers opt instead to annotate a corpus for pronominalization and their antecedents as well as the pronoun forms that should occur, and then simulate a pronominalization algorithm on the marked-up text (Henschel et al., 2000). Similarly, this approach can suffer from interannotator agreement errors (Poesio et al., 1999b).</Paragraph>
    <Paragraph position="3"> To verify our pronominalization algorithm more rigorously, we instead used the STORYBOOK deep generation system to recreate pre-existing multi-page texts with automatically selected pronouns.  Without a full-scale implementation, it is impossible to determine whether an algorithm performs imperfectly due to human error, a lack of available corpus data for making decisions, or if it is a fault with the algorithm itself.</Paragraph>
    <Paragraph position="4"> Using the algorithm described in Figure 1, we modified STORYBOOK to substitute the types of pronouns described in Section 3. We then created the discourse plan and lexicon necessary to generate the same three articles from the New York Times as (McCoy and Strube, 1999). The results for both the newspaper texts and the Little Red Riding Hood narrative described in (Callaway and Lester, in press) are shown in Table 1.</Paragraph>
    <Paragraph position="5"> With the same three texts from the New York Times, STORYBOOK performed better than the previous reported results of 85-90% described in (Mc-Coy and Strube, 1999; Henschel et al., 2000) on both animate and all anaphora using a corpus matching technique. Furthermore, this was obtained solely by adjusting the recency parameter to 4 (it was 3 in our narrative domain), and without considering other enhancements such as gender/number constraints or domain-specific alterations.</Paragraph>
    <Paragraph position="6">  It is important to note, however, that our counts of pronouns and antecedents do not match theirs. This may stem from a variety of factors, such as including single instances of nominal descriptions, whether dialogue pronouns were considered, and if borderline quantifiers and words like &amp;quot;everyone&amp;quot; were counted. The generation community to-date has not settled on standard, marked corpora for comparison purposes as has the rest of the computational linguistics community.</Paragraph>
    <Paragraph position="7"> texts. Previous approaches, based largely on theoretical approaches such as Centering Theory, deal exclusively with anaphoric pronouns and have complex processing and definitional requirements.</Paragraph>
    <Paragraph position="8"> Given the full rhetorical structure available to an implemented generation system, we devised a simpler method of determining appropriate pronominalizations which was more accurate than existing methods simulated by hand or performed semiautomatically. This shows that approaches designed for use with anaphora resolution, which must build up discourse knowledge from scratch, may not be the most desirable method for use in NLG, where discourse knowledge already exists. The positive results from our simple counting algorithm, after only minor changes in parameters from a narrative domain to that of newspaper text, indicates that future high-quality prose generation systems are very near.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML