File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-1026_concl.xml

Size: 2,932 bytes

Last Modified: 2025-10-06 13:54:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1026">
  <Title>Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources</Title>
  <Section position="10" start_page="0" end_page="0" type="concl">
    <SectionTitle>
8 Conclusions and Current Directions
</SectionTitle>
    <Paragraph position="0"> We have examined the utility of different features for automatically predicting student emotions in a corpus of tutorial spoken dialogues. Our emotion annotation schema distinguishes negative, neutral and positive emotions, with inter-annotator agreement and Kappa values that exceed those obtained for other types of spoken dialogues. From our annotated student turns we extracted a  variety of acoustic and prosodic, text-based, and contextual features. We used machine learning to examine the impact of different feature sets (with and without identi er features) on prediction accuracy. Our results show that while acoustic-prosodic features outperform a baseline, non-acoustic-prosodic features, and combinations of both types of features, perform even better. Adding certain types of contextual features and identi er features also often improves performance. Our best performing feature set, which contains speech and text-based features extracted from the current and previous student turns, yields an accuracy of 84.75% and a 44% relative improvement in error reduction over a baseline. Our experiments suggest that ITSPOKE can be enhanced to automatically predict student emotions.</Paragraph>
    <Paragraph position="1"> We are currently exploring the use of other emotion annotation schemas for emotion prediction, such as those that incorporate categorizations encompassing multiple dimensions (Craggs, 2004; Cowie et al., 2001) and those that examine emotions at smaller units of granularity than turns (Batliner et al., 2003). With respect to predicting emotions, we plan to explore additional features found to be useful in other studies of spoken dialogue (e.g., language model, speaking style, dialog act, part-ofspeech, repetition, emotionally salient keywords, word-level prosody (Batliner et al., 2003; Lee et al., 2002; Ang et al., 2002)) and in text-based applications (Qu et al., 2004). We are also exploring methods of combining information other than by feature level combination, such as data fusion across multiple classi ers (Lee et al., 2002; Batliner et al., 2003). For evaluation, we would like to see whether the ordering preferences among feature sets (as in Figure 5) are the same when recall, precision, and F-measure are plotted instead of accuracy. Furthermore, we are investigating whether greater tutor response to emotions correlates with greater student learning. Finally, when ITSPOKE's evaluation is completed, we will address the same questions for our human-computer dialogues that we have addressed here for our corresponding human-human dialogues.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML