File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1012_metho.xml
Size: 14,038 bytes
Last Modified: 2025-10-06 14:07:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1012"> <Title>Pronominalization in Generated Discourse and Dialogue</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Examples of Pronominalization </SectionTitle> <Paragraph position="0"> Pronominalization is the appropriate determination, marking and grammatical agreement of pronouns (he, she, their, herself, it, mine, those, each other, one, etc.) as a short-hand reference to an entity or event mentioned in the discourse. As with anaphora resolution, the task of a pronominalization algorithm is to correctly predict which pronoun a person would prefer in the same situation. The range of possibilities includes leaving the noun phrase as it is, reducing it by removing some of its modifiers, or replacing it with a pronoun construction.</Paragraph> <Paragraph position="1"> Our corpora analyses have identified a number of motivations for converting nouns into pronouns: 1. Anaphoric pronouns: These are the moststudied cases of pronoun occurrences, which sequentially follow a specific entity known as the referent. Anaphors are divided into two classes, short-distance (within the same sentence) and long-distance (previous sentences).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> But John </SectionTitle> <Paragraph position="0"/> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> CX </SectionTitle> <Paragraph position="0"> had never been to New Orleans, and he</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> CX </SectionTitle> <Paragraph position="0"> couldn't remember if anyone in his</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> CX </SectionTitle> <Paragraph position="0"> family had either.</Paragraph> <Paragraph position="1"> 2. Cataphoric pronouns: According to Quirk et al. (1985), cataphors are those pronouns which occur before their referents in the linear flow of text within the same sentence, where the pronoun is either at a lower structural level or is part of a fronted circumstantial clause or prepositional phrase which could have appeared after the reference. Additionally, this category could include clefting pronouns.</Paragraph> <Paragraph position="2"> category includes document deixis (via a demonstrative pronoun), authorial or reader reference, and situational pronouns.</Paragraph> <Paragraph position="3"> This is the first document to show . . .</Paragraph> <Paragraph position="4"> We discuss these strategies in the next section. The group had never seen anything like it.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. Reflexive and Reciprocal Pronouns: Most </SectionTitle> <Paragraph position="0"> verbs use special pronouns when the subject and object corefer. A discourse history algorithm can employ that knowledge to mark reflexive and reciprocal pronouns appropriately.</Paragraph> <Paragraph position="1"> when playing.</Paragraph> <Paragraph position="2"> 5. Partitive pronouns: It is important to know con null ceptually what it is that the pronoun is trying to replace. Otherwise, it becomes impossible to achieve the types of pronominalizations that authors are routinely capable of creating. This requires accurate information in the knowledge base or linguistic structure from which the sentences are derived.</Paragraph> <Paragraph position="3"> As the horses ran by, she roped one.</Paragraph> <Paragraph position="4"> * As the horses ran by, she roped it.</Paragraph> <Paragraph position="5"> * As the horses ran by, she roped them.</Paragraph> <Paragraph position="6"> In addition to these motivations, we identified several factors that prevent pronouns from occurring where they otherwise might: 6. Pronouns across boundaries: After a chapter, section or other obvious boundary, such as a change in time, place, or both, as in (McCoy and Strube, 1999), authors will typically &quot;reset&quot; pronominalization just as if it were the beginning of the entire text. Antecedent references that break these boundaries are sometimes marked by the authors in academic texts: As we saw in the previous section, . . .</Paragraph> <Paragraph position="7"> 7. Restrictions from modifiers: Because pronouns cannot have modifiers like nouns, adding an adjective, relative clause, or some other modifier prevents a noun from being replaced by a pronoun. For instance: The mayor had already read the full proposal.</Paragraph> <Paragraph position="8"> * The mayor had already read the full it.</Paragraph> <Paragraph position="9"> 8. Focused nouns: Especially after a vocally stressed discourse marker (Wolters and Byron, 2000) or some other marked shift in topic, a word that normally would be pronominalized is often not, as in this example: . . . and you frequently find that mice occupy an important part of the modern medical laboratory. In other words, mice are especially necessary for diagnosing human cancers . . .</Paragraph> <Paragraph position="10"> 9. Semantic and syntactic considerations:A small number of semantic relations and syntactic constructions prohibit pronominalization: * The stranger was just called him. (Bob) * Roberta was no longer a her. (child) * The father, a tyrant of a him, . . . (man) 10. Optional pronominalization: Often there are borderline cases where some authors will use pronouns while others won't. A single algorithm may be tuned to match a particular author's style, but parameterization will be necessary to match a variety of styles. Thus it is extremely difficult to exactly match any particular text without having the ability to adjust the pronominalization algorithm.</Paragraph> <Paragraph position="11"> Pronominalization occurs equally as often in exposition as in dialogue, but dialogue can have slightly different pronominalizations depending on the relationship between the utterer and the hearer: 11. Speaker self-reference: &quot;John thinks John will go find John's shoes,&quot; John said.</Paragraph> <Paragraph position="12"> changes to first person singular pronouns: &quot;I think I will go find my shoes,&quot; John said. 12. Speaker references hearer(s): &quot;Mary should go find Mary's shoes,&quot; John said.</Paragraph> <Paragraph position="13"> changes to second person pronouns: &quot;You should go find your shoes,&quot; John said. 13. Reference to speaker and hearer (or to speaker and a third party): &quot;John and Mary should go find John and Mary's shoes,&quot; John said.</Paragraph> <Paragraph position="14"> changes to first person plural pronouns: &quot;We should go find our shoes,&quot; John said. 14. Reference to a third party: &quot;Bob and Mary went to eat Bob and Mary's breakfast,&quot; John said.</Paragraph> <Paragraph position="15"> changes to third person plural pronouns: &quot;They went to eat their breakfast,&quot; John said. 15. Finally, the treatment of pronouns differs de- null pending if they are inside or outside of the direct quotation. For example: &quot;Oh man, I forgot to eat my breakfast!&quot; John muttered to himself while grabbing his shoes. Although this enumeration is surely incomplete, it provides a basic description of the types of phenomena that must be handled by a generation system in order to produce text with the types of pronouns found in routine human-produced prose.</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Architectural Concerns </SectionTitle> <Paragraph position="0"> In order to correctly account for these phenomena during generation, it is necessary to have detailed information about the underlying discourse structure. Although a template generation system could be augmented to record this information, in practice only deep structure, full-scale NLG systems have the requisite flexibility. Because a pronominalization algorithm typically follows the discourse planner, it frequently has access to the full discourse plan.</Paragraph> <Paragraph position="1"> A typical discourse plan is a tree structure, where internal nodes represent structuring relations while leaf nodes represent individual sentential elements that are organized semantically. In addition, the elements of the discourse tree are typically rooted in the semantic knowledge base which the discourse planner drew from when constructing the discourse plan.</Paragraph> <Paragraph position="2"> The discourse plan supplies the following information that is useful for pronominalization: AF Linearization: The sequencing information stored in the discourse tree can be used to motivate anaphoric and cataphoric pronouns as shown in items 1 & 2 of Section 3.</Paragraph> <Paragraph position="3"> AF Semantic Structure: The original subgraphs (or semantic subnetworks) derived from the knowledge base can motivate content vs. situational knowledge (item 3) reflexive and reciprocal pronouns via argument lists (item 4), partitive pronouns (item 5), and the existence of NP modifiers (item 7), and can identify semantic types in relations (item 9).</Paragraph> <Paragraph position="4"> AF Discourse Structure: The rhetorical relations that hold between different sentences typically imply where section boundaries are located (item 6), indicate what types of discourse markers are employed (item 8), and in the case of dialogue, know which actors are speaking, listening, or not present (items 11-15).</Paragraph> <Paragraph position="5"> This detailed knowledge of the discourse is available to an implemented pronominalization component utilizing any theory, including Centering theory. We thus now turn our attention to what role this information plays in a pronominalization algorithm.</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 A Simple Pronominalization Algorithm </SectionTitle> <Paragraph position="0"> At an abstract level, the pronominalization algorithms derived from Centering theory are easily expressed: if Centering theory predicts a pronoun would be used in anaphora resolution in a given segment of text, then generate the appropriate pronoun.</Paragraph> <Paragraph position="1"> While this works for many cases of anaphoric pronouns [84.7% in (McCoy and Strube, 1999), 8790% in (Henschel et al., 2000)], we have seen that these form only a subset of the potential reasons for pronominalization. Furthermore, this approach assumes that the discourse tree was constructed with Centering theory in mind.</Paragraph> <Paragraph position="2"> Given: C4C6BX, the linearized list of nominal elements C6BX, the current nominal element CBBXBXC6, the list of encountered nominal elements BW, the dialogue state of the current leaf node CACB, the rhetorical structure near the leaf node CBBV, the sentence counter However, it is not clear that Centering theory itself is necessary in generation, let alone its accompanying algorithms and data structures. Because Centering theory is typically applied to parsing (which starts with no discourse tree), it may not be the most efficient technique to use in generation (which has a complete discourse tree available for inference).</Paragraph> <Paragraph position="3"> Instead, we attempted to determine if the information already present in the discourse tree was enough to motivate a simpler algorithm based on the following available data: AF Ordered sequence of nominal elements: Because the discourse tree is linearized and individual leaves of the tree annotate which elements have certain semantic roles, a very good guess can be made as to which nominal elements precede others at the clause level.</Paragraph> <Paragraph position="4"> AF Known paragraph and sentence boundaries: Analysis of the rhetorical structure of the discourse tree allows for the determination of boundaries and thus the concept of metric distance between elements.</Paragraph> <Paragraph position="5"> AF Rhetorical relations: The rhetorical relations can tell us which nominal elements follow discourse markers and which are used reflexively or reciprocally.</Paragraph> <Paragraph position="6"> AF Dialogue: By recording the participants in dialogue, the discourse tree allows for the appropriate assignment of pronouns both inside and outside of the direct quote itself.</Paragraph> <Paragraph position="7"> The algorithm we developed considers the current discourse leaf node and the rhetorical structure above it, and also makes use of the following data: AF Nominal element distance: How many total (non-distinct) nominal elements ago a particular element was last used.</Paragraph> <Paragraph position="8"> AF Recency: How many distinct nominal elements have been seen since its last use.</Paragraph> <Paragraph position="9"> AF Sentential distance: How many sentences (prototypical clauses) have appeared since the last usage of this nominal element.</Paragraph> <Paragraph position="10"> The algorithm itself (Figure 1) is best characterized as a counting method, that is, it loops once through the linearized list of nominal elements and makes pronominalization decisions based on the local information described above, and then updates those numerical counters. Numerical parameters (e.g., D6CTCRCTD2CRDDB4C6BXB5 BQ BF) are derived from empirical experimentation in generating multi-page prose in a narrative domain.</Paragraph> <Paragraph position="11"> While it lacks the explanatory power of a relatively mature linguistic theory, it also lacks the accompanying complexity and is immediately applicable to real-world deep generation systems. The algorithm is traced in Figure 2, although due to space limitations some phenomena such as dialogue, long distance and reflexive pronouns are not shown.</Paragraph> </Section> class="xml-element"></Paper>