File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1045_intro.xml

Size: 3,539 bytes

Last Modified: 2025-10-06 14:02:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1045">
  <Title>Predicting Student Emotions in Computer-Human Tutoring Dialogues</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper explores the feasibility of automatically predicting student emotional states in a corpus of computer-human spoken tutoring dialogues. Intelligent tutoring dialogue systems have become more prevalent in recent years (Aleven and Rose, 2003), as one method of improving the performance gap between computer and human tutors; recent experiments with such systems (e.g., (Graesser et al., 2002)) are starting to yield promising empirical results. Another method for closing this performance gap has been to incorporate affective reasoning into computer tutoring systems, independently of whether or not the tutor is dialogue-based (Conati et al., 2003; Kort et al., 2001; Bhatt et al., 2004). For example, (Aist et al., 2002) have shown that adding human-provided emotional scaffolding to an automated reading tutor increases student persistence.</Paragraph>
    <Paragraph position="1"> Our long-term goal is to merge these lines of dialogue and affective tutoring research, by enhancing our intelligent tutoring spoken dialogue system to automatically predict and adapt to student emotions, and to investigate whether this improves learning and other measures of performance.</Paragraph>
    <Paragraph position="2"> Previous spoken dialogue research has shown that predictive models of emotion distinctions (e.g., emotional vs. non-emotional, negative vs. nonnegative) can be developed using features typically available to a spoken dialogue system in real-time (e.g, acoustic-prosodic, lexical, dialogue, and/or contextual) (Batliner et al., 2000; Lee et al., 2001; Lee et al., 2002; Ang et al., 2002; Batliner et al., 2003; Shafran et al., 2003). In prior work we built on and generalized such research, by de ning a three-way distinction between negative, neutral, and positive student emotional states that could be reliably annotated and accurately predicted in human-human spoken tutoring dialogues (Forbes-Riley and Litman, 2004; Litman and Forbes-Riley, 2004). Like the non-tutoring studies, our results showed that combining feature types yielded the highest predictive accuracy.</Paragraph>
    <Paragraph position="3"> In this paper we investigate the application of our approach to a comparable corpus of computer-human tutoring dialogues, which displays many different characteristics, such as shorter utterances, little student initiative, and non-overlapping speech.</Paragraph>
    <Paragraph position="4"> We investigate whether we can annotate and predict student emotions as accurately and whether the relative utility of speech and lexical features as predictors is the same, especially when the output of the speech recognizer is used (rather than a human transcription of the student speech). Our best models for predicting three different types of emotion classi cations achieve accuracies of 66-73%, representing relative improvements of 19-36% over majority class baseline errors. Our computer-human results also show interesting differences compared with comparable analyses of human-human data.</Paragraph>
    <Paragraph position="5"> Our results provide an empirical basis for enhancing our spoken dialogue tutoring system to automatically predict and adapt to a student model that includes emotional states.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML