File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/j95-1003_abstr.xml
Size: 7,274 bytes
Last Modified: 2025-10-06 13:48:22
<?xml version="1.0" standalone="yes"?> <Paper uid="J95-1003"> <Title>Automatic Referent Resolution of Deictic and Anaphoric Expressions</Title> <Section position="2" start_page="0" end_page="60" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> This paper deals with the automatic referent resolution of deictic and anaphoric expressions in a research prototype of a multimodal user interface called EDWARD.</Paragraph> <Paragraph position="1"> The primary aim of our project is the development and the assessment of an interface that combines the positive features of the language mode and the action mode of interaction (Claassen et al. 1990). EDWARD (Huls and Bos 1993; Bos et al. 1994) integrates a graphical graph-editor called Gr 2 (Bos in press) and a Dutch natural language (NL) dialogue system called DoNaLD (Claassen and Huls 1991). One of the application domains involves a file system environment with documents, authors, a garbage container, and so on. The user can interact with EDWARD by manipulating the graphical representation of the file system (a directed graph), by menus, by written natural or formal language, or by combinations of these. EDWARD responds in NL (either written or spoken) and graphics.</Paragraph> <Paragraph position="2"> In this paper we will go into the semantic and pragmatic processes involved in the referent resolution of deictic and deixis-related expressions by EDWARD. (Syntactic issues will not be discussed here; for these, see Claassen and Huls 1991.) The proper interpretation of deictic expressions depends on the identity of the speaker(s) and the audience, the time of speech, the spatial location of speaker and audience at the time of speech, and non-linguistic communicative acts like facial expressions and eye, hand, and body movements. Lyons (1977, p. 637), provides the following definition of deixis: * Nijmegen Institute for Cognition and Information, P.O. Box 9104, 6500 HE Nijmegen, The Netherlands. E-mail: huls@nici.kun.nl @ 1995 Association for Computational Linguistics Computational Linguistics Volume 21, Number 1 the location and identification of persons, objects, events, processes and activities being talked about, or referred to, in relation to the spatiotemporal context created and sustained by the act of utterance and the participation in it, typically, of a single speaker and at least one addressee.</Paragraph> <Paragraph position="3"> In the context of the present paper, we distinguish three types of deixis: personal, temporal, and spatial deixis. Personal deixis involves first- and second-person pronouns (e.g., I, we, and you). Temporal deixis is realized by the tense system of a language (e.g., he lives in Amsterdam) and by temporal modifiers (e.g., in an hour). Temporal deixis relates the time of speech to the relation(s) expressed by the utterance. Spatial deixis involves demonstratives or other referring expressions that are produced in combination with a pointing gesture (e.g., thisS file, in which 7 represents the pointing gesture). In the present paper, most attention will be given to spatial deixis.</Paragraph> <Paragraph position="4"> Deictic expressions can be contrasted with anaphors. Unlike deictic expressions, anaphors can be interpreted without regard to the spatiotemporal context of the speaking situation. Their interpretation depends merely on the linguistic expressions that precede them in the discourse. For example, this is an anaphor in Print the file about dialogue systems. Delete this. In many languages, the words used in deictic expressions are also used in anaphoric expressions.</Paragraph> <Paragraph position="5"> Deictic and anaphoric expressions frequently cause problems for NL analysis.</Paragraph> <Paragraph position="6"> Sijtsma and Zweekhorst (1993) find referent resolution errors in all three commercial NL interfaces they evaluate. In research laboratories, a couple of systems capable of interpreting deictic expressions recently have been developed. Allgayer et al. (1989) describe XTRA, a German NL interface to expert systems, currently applied to supporting the user's filling out a tax form. XTRA uses a dialogue memory and a tax-form hierarchy to interpret multimodal referring expressions. Data from the dialogue memory and from gesture analysis are combined (e.g., by taking the intersection of two sets of potential referents suggested by these information sources). Neal and Shapiro (1991) describe a research prototype called CUBRICON, which combines NL (English) with graphics. The application domain is military tactical air control. Like XTRA, CUBRICON uses two models to interpret deictic expressions: an attentional discourse focus space representation (adapted from Grosz and Sidner 1986) and a display model. Stock (1991) describes ALFresco, a prototype built for the exploration of frescoes, using NL (Italian) and pictures. For referent resolution in ALFresco, topic spaces (Grosz, 1978) are combined with Haji~ov~'s (1987) approach, in which entities are assumed to &quot;fade away&quot; slowly. Cohen (1992) presents Shoptalk, a prototype information and decision-support system for semiconductor and printed-circuit board manufacturing with a NL (English) component. In Shoptalk too, the interpretation process is based on the approach of Grosz and Sidner. We believe that the fact that these systems use two separate mechanisms for modeling linguistic and perceptual context is a disadvantage over the use of only one mechanism for referent resolution. From a computational and an engineering position, one mechanism that handles both deictic and anaphoric expressions in the same way is preferable.</Paragraph> <Paragraph position="7"> We will (try to) show how both deictic and anaphoric references can be resolved using a single model. We have used the framework presented by Alshawi (1987) to develop a general context model that is able to represent linguistic as well as non-linguistic effects on the dialogue context. This model is used, in conjunction with a knowledge base, by EDWARD's interpretation component to solve deictic and anaphoric referring expressions. The same model and knowledge base are used by EDWARD's generation component to decide the form (e.g., he, the writer, a man), the The main components of EDWARD.</Paragraph> <Paragraph position="8"> content (e.g., the writer, the husband), and the mode (e.g., linguistic or simulated pointing gesture; Claassen 1992; Claassen et al. 1993) of referring expressions. In this paper, however, we focus on the use of the context model to resolve deictic and anaphoric expressions keyed in by the user.</Paragraph> <Paragraph position="9"> The rest of this paper is structured as follows: in Section 2, we present an overview of EDWARD. Next, we describe the knowledge sources EDWARD uses to interpret deictic and anaphoric expressions (Section 3). In Section 4, we go into the process of interpreting deictic and anaphoric expressions in some detail. Subsequently, in Section 5, we present some user interactions with EDWARD and we compare the results of EDWARD's referent resolution model with two other models including that of Grosz and Sidner (1986).</Paragraph> </Section> class="xml-element"></Paper>