File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4039_intro.xml
Size: 3,181 bytes
Last Modified: 2025-10-06 14:02:23
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4039"> <Title>Converting Text into Agent Animations: Assigning Gestures to Text</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The significant advances in computer graphics over the last decade have improved the expressiveness of animated characters and have promoted research on interface agents, which serve as mediators of human-computer interactions. As an interface agent has an embodied figure, it can use its face and body to display nonverbal behaviors while speaking.</Paragraph> <Paragraph position="1"> Previous studies in human communication suggest that gestures in particular contribute to better understanding of speech. About 90% of all gestures by speakers occur when the speaker is actually uttering something (McNeill, 1992). Experimental studies have shown that spoken sentences are heard twice as accurately when they are presented along with a gesture (Berger & Popelka, 1971). Comprehension of a description accompanied by gestures is better than that accompanied by only the speaker's face and lip movements (Rogers, 1978). These previous studies suggest that generating appropriate gestures synchronized with speech is a promising approach to improving the performance of interface agents. In previous studies of multimodal generation, gestures were determined according to the instruction content (Andre, Rist, & Muller, 1999; Rickel & Johnson, 1999), the task situation in a learning environment (Lester, Stone, & Stelling, 1999), or the agent's communicative goal in conversation (Cassell et al., 1994; Cassell, Stone, & Yan, 2000). These approaches, however, require the contents developer (e.g., a school teacher designing teaching materials) to be skilled at describing semantic and pragmatic relations in logical form. A different approach, (Cassell, Vilhjalmsson, & Bickmore, 2001) proposes a toolkit that takes plain text as input and automatically suggests a sequence of agent behaviors synchronized with the synthesized speech. However, there has been little work in computational linguistics on how to identify and extract linguistic information in text in order to generate gestures.</Paragraph> <Paragraph position="2"> Our study has addressed these issues by considering two questions. (1) Is the lexical and syntactic information in text useful for generating meaningful gestures? (2) If so, how can the information be extracted from the text and exploited in a gesture decision mechanism in an interface agent? Our goal is to develop a media conversion technique that generates agent animations synchronized with speech from plain text.</Paragraph> <Paragraph position="3"> This paper is organized as follows. The next section reviews theoretical issues about the relationships between gestures and syntactic information. The empirical study we conducted based on these issues is described in Sec. 3. In Sec. 4 we describe the implementation of our presentation agent system, and in the last section we discuss future directions.</Paragraph> </Section> class="xml-element"></Paper>