File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/w99-0103_abstr.xml

Size: 6,207 bytes

Last Modified: 2025-10-06 13:49:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0103">
  <Title>e e e O e O O e O O O O O O O O 0 O e O O O O O O O O O O O e O O O e O O O O Anaphora Resolution using an Extended Centering Algorithm in a Multi-modal Dialogue System</Title>
  <Section position="1" start_page="0" end_page="22" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Anaphora in multi-modal dialogues have different aspects compared to the anaphora in language-only dialogues.</Paragraph>
    <Paragraph position="1"> They often refer to the items signified by a gesture or by visual means. In this paper, we define two kinds of anaphora: screen anaphora and referring anaphora, and propose two general methods to resolve these anaphora. One is a simple mapping algorithm that can find items refetxed with/without pointing gestures on a screen. The other is the centering algorithm with a dual cache model, which Walker's centering algorithm is extended to for a multi-modal dialogue system. The extended algorithm is appropriate to resolve various anaphora in a multi-modal dialogue .because it keeps utterances, visual information and screen switching,.tim.e. In the experiments, the system. Correctly resolved 384 anaphora out of 402 anaphom in 40 dialogues (0..54 anaphom per utterance) showing 95.5% correctness.</Paragraph>
    <Paragraph position="2"> Introduction Human face&lt;o-face communication is an ideal model for humm-computa- interface. One of the major features of face-to-face communication is its multiplicity of communication channels that acts on multiple modalities. By providing a number of channels through which information may pass between a user and a computer, a multi-modal dialogue system gives the user a * more convenient and natural interface than a language-only dialogue system. In the system, a user often uses a variety of anaphodc expressions like this, the red item, it, etc. User's intention is passed to the system through multiple channels, e.g., the auditory channel (carrying speech) and the visual channel (c~nrying gestures and/or facial expressions). For example, a user can say utterance (4) in Figure ! ! while touching an item on the screen. The user may also say utterance (8) without touching the screen when there is only one red item displayed on the screen. Moreover, the user can use anaphoric expression to refer to an entity in previous utterances as in utterance (10).</Paragraph>
    <Paragraph position="3">  (1) S: May I help you7 (2) U: I want to see some desks.</Paragraph>
    <Paragraph position="4"> (3) S: (displaying mode/200 and mode/250) We have these modeb.</Paragraph>
    <Paragraph position="5"> (4) U: (pointing to the model200) How much is thLv? (5) S: It is 150,000 Won.</Paragraph>
    <Paragraph position="6"> (6) U: I'd like to see some chairs, too.</Paragraph>
    <Paragraph position="7"> (7) S: (displaying model 100 and model 150) We have t/w~re medeb.</Paragraph>
    <Paragraph position="8"> (8) U: How much is the red item? (9) S: It is 80,000 Won.</Paragraph>
    <Paragraph position="9"> (10) U: (pointing to the model 100) l'd like to buy thb and tke prev/ous se/ect/on.  Previous rescaw, h on a multi-modal dialogue system was focused on finding the relationship between a pointing gesture and a deictic expression (Bolt (1980), Neal et al. (1988), Salisbury et al. (1990.), Shimazu et al. (1994), Shimazu and Takmhima (1996))and on mapping a predefined symbol to a simple t S means a multi-modal dialogue system and U means a user. Our goal is developing a multi-modal dialogue system (Kim and Son (1997)). of which domain is home shopping and in which a user purchases furniture using Korean utC~',mccs with pointing gestures on a touch screw.</Paragraph>
    <Paragraph position="10">  command (Johnston et al. (1997)). None of them, however, suggest methods of resolving deictic expressions with which pointing gestures are omitted: e.g.. the red item in utterance (8). These approaches do not consider resolving an anaphoric expression that refers an object mentioned in previous utterances or displayed on previous screens. It, however, is important also for a multi-modal dialogue system to resolve all of these anaphora so that the system should correctly catch his/her intention. In this paper, we propose general methods to resolve a variety of anaphoric expressions that are found in a multi-modal dialogue. We classify anaphora into two types: deictic expression with/without a pointing gesture and referring expression, and propose methods to resolve them.</Paragraph>
    <Paragraph position="11"> To resolve deictic expression like this in utterance (4) which c~rs with a pointing gesture and the red item in utterance (8) which is uttered with no pointing gestures, the system counts the &amp;quot;number of pointing gestures and the number of anaphoric noun phrases included in a user's utterance, and compares them. Then, the system maps the noun phrases to pointed items. To resolve referring expression, one of the well known methods is centering theory developed by Grosz, Jo~hi, and Weinstein (Grosz et al. (1983)). The centering algorithm was further developed by Brennan, Friedman and Pollard for pronoun resolution (Brennan et al. (1987)) and was improved by Walker (Walker (1998)). However,' those centering algorithms are not applicable to resolve anaphora in a multi-medal dialogue because the algurithm excludes the gestures and facial * expression of a dialogue partner, which are important clues to mgierstand his/her uttexances. And. the algorithm cannot resolve complex anaphora like the previous selection in (10)beeanse it does not keep the time When the previous screen is switched to the current screen. To resolve inch anaphom, we extend Walker's centedng algorithm to the one with a dual cache model, which keeps the information displayed on a ~ With screen switching-time.</Paragraph>
    <Paragraph position="12"> The rest of this paper begins with describing our approach in section !. After showing two methods to resolve anaphora in a rrmlti-modal dialogue system in section 2, we report experimental results on these methods in section 3. Finally. we draw some conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML