File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2241_metho.xml
Size: 6,406 bytes
Last Modified: 2025-10-06 14:15:10
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2241"> <Title>A Preliminary Model of Centering in Dialog*</Title> <Section position="3" start_page="0" end_page="1475" type="metho"> <SectionTitle> 2 The Centering model </SectionTitle> <Paragraph position="0"> The centering framework (Grosz et al., 1995) makes three main claims: 1) given an utterance Un, the * The authors would like to thank James Alien, Marflyn Walker, and the anonymous reviewers for many helpful comments on a preliminary draft of the paper. This material is based on work supported by NSF grant IRI-96-23665, ONR grant N00014-95-1-1088 and Columbia University grant OPG: 1307.</Paragraph> <Paragraph position="1"> IA more detailed report of this study is available as URCS TR #687 (Byron and Stent, 1998) model predicts which discourse entity will be the focus of Un+l; 2) when local focus is maintained between utterances, the model predicts that it will be expressed with a pronoun; and 3) when a pronoun is encountered, the model provides a preference ordering on possible antecedents from the prior utterance. These data structures are created for each \[In: 2 1. A partially-ordered list of forward-looking centers Cfn that includes all discourse entities in utterance n. Its first element is the 'preferred center', Cpn.</Paragraph> <Paragraph position="2"> 2. A backward-looking center Cbn, the highest ranked element of Cfn- 1 that is in Cfn.</Paragraph> <Paragraph position="3"> The framework defines a preference ordering on techniques for effecting a topic change, ranked according to the inference load each places on the addressee. The transitions are called 'shift', 'retain' and 'continue' and differ based on whether Cbn = Cbn+l and whether Cbn = Cpn.</Paragraph> <Paragraph position="4"> At the heart of the theory are two centering rules: Rule 1: If any member of Cfn is realized by a pronoun in Cfn+l, Cbn+l must be a pronoun.</Paragraph> <Paragraph position="5"> Rule 2: Sequences of continues are preferred over sequences of retains, and sequences of retains are preferred over sequences of shifts.</Paragraph> <Paragraph position="6"> 3 Centering and multi-party discourse A variety of issues must be addressed to adapt centering to two-party dialog. They include: 1. Utterance boundaries are difficult to pin down in spoken dialog, and their determination affects the Cf lists. Just how the speaker turns are broken into utterances has a huge impact on the success of the model (Brennan, 1998).</Paragraph> <Paragraph position="7"> 2. Should the dialog participants, referred to via first- and second-person pronouns (I/2PPs), be considered 'discourse entities' and included in cy? 2We provide only the briefest sketch of the centering frame null work. Readers unfamiliar with the model are referred to (Grosz et al., 1995) for more details.</Paragraph> <Paragraph position="8"> 3. Which utterance should be considered 'previous' for locating Cfn-l: the same speaker's previous utterance or the immediately preceding utterance, regardless of its speaker? 4. What should be done with abandoned or partial utterances and those with no discourse entities.</Paragraph> </Section> <Section position="4" start_page="1475" end_page="1475" type="metho"> <SectionTitle> 4 Experimental method </SectionTitle> <Paragraph position="0"> Our data is from four randomly chosen dialogs in the CALLHOME-English corpus 3 (LDC, 1997).</Paragraph> <Paragraph position="1"> Table 1 describes the three models we created to address the issues described in Section 3.</Paragraph> <Paragraph position="2"> Cf elements Use both speakers' from I/2PPs previous utt to find Cb utterance boundaries as transcribed 4, even if an utterance was a fragment properly belonging at the end of the one preceding. For instance, the following two utterances seem as though they should be just one: Example 1 \[dialog 45711 A ... and she called me one day when A there was nobody in the house but her... For compound sentences, we broke each nonsubordinate clause into a new utterance. The utterance break added in Example 2 is indicated by/: Example 2 \[dialog 42481 A It does make a difference / like I always thought formula smells kind of disgusting.</Paragraph> <Paragraph position="3"> factors in the original model are left to the algorithm implementer: the selection of items for Cf and their rank order* Both are active areas of research. In our models, all elements of Cf are created from nouns in the utterance. We do not include entities referred to by complex nominal constituents such as infinitives. Associations (eg. part/subpart) and ellipsed items are not allowed in determining elements of Cf. We adopted a commonly used Cf ordering: Subj > DO > IO > Other. Linear sentence position is used to order multiple 'other' constituents. Whether discourse participants should be considered discourse entities is very perplexing ances (containing no discourse entities) are skipped in determining C f,.,_l. Empty utterances include acknowledgements and utterances like &quot;hard to leave behind&quot; with no explicitly mentioned objects. The dialogs were annotated for discourse structure, so Un-1 is the previous utterance in the discourse segment, not necessarily linear order. 5 In model2, the highest ranked element of Cf from either the current speaker's prior utterance or the other speaker's previous utterance is Cb6; models l&3 consider only the immediately preceding utterance. We also annotated the 'real' topic of each utterance, selected according to the annotator's intuition of what the utterance is 'about'. It must be explicitly referred to in the utterance and can be an entity referred to using a I/2PP.</Paragraph> <Paragraph position="4"> After the three models were defined, one dialog was used to train the annotators (the authors) 7, then the other three were independently annotated according to the rules outlined above. The annotators compared their results and agreed upon a reconciled version of the data, which was used to produce the results reported in Section 5. Annotator accuracy as measured against the reconciled data over all categories ranged from 80% to 89%. Accuracy was calculated by counting the number of utterances that differed from the reconciled data (including different ordering of C f), divided by total utterances. 8</Paragraph> </Section> class="xml-element"></Paper>