File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-3006_metho.xml

Size: 7,969 bytes

Last Modified: 2025-10-06 14:10:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-3006">
  <Title>Detecting Emotion in Speech: Experiments in Three Domains</Title>
  <Section position="4" start_page="232" end_page="233" type="metho">
    <SectionTitle>
3 Work-in-progress
</SectionTitle>
    <Paragraph position="0"> In this section I describe research I have begun to conduct and plan to complete in the coming year, as agreed-upon in February, 2006 by my dissertation committee. I will explore features that are not well studied in emotion classi cation research, primarily pitch contour and voice quality approximation. Furthermore, I will outline how I plan to implement and evaluate an emotion detection and response module into ITSpoke.</Paragraph>
    <Section position="1" start_page="232" end_page="232" type="sub_section">
      <SectionTitle>
3.1 Pitch Contour Clustering
</SectionTitle>
      <Paragraph position="0"> The global acoustic-prosodic features used in most emotion prediction studies capture meaningful prosodic variation, but are not capable of describing the linguistically meaningful intonational behavior of an utterance.</Paragraph>
      <Paragraph position="1"> Though phonological labeling methods exist, such as ToBI, annotation of this sort is time-consuming and must be done manually. Instead, I propose an automatic algorithm that directly compares pitch contours and then groups them into classes based on abstract form. Specifically, I intend to use partition clustering to de ne a disjoint set of similar prosodic contour types over our data. I hypothesize that the resultant clusters will be theoretically meaningful and useful for emotion modeling.</Paragraph>
      <Paragraph position="2"> The similarity metric used to compare two contours will be edit distance, calculated using dynamic time warping techniques. Essentially, the algorithm nds the best t between two contours by stretching and shrinking each 3With respect to certainness.</Paragraph>
      <Paragraph position="3"> contour as necessary. The score of a comparison is calculated as the sum of the normalized real-valued distances between mapped points in the contours.</Paragraph>
    </Section>
    <Section position="2" start_page="232" end_page="232" type="sub_section">
      <SectionTitle>
3.2 Voice Quality
</SectionTitle>
      <Paragraph position="0"> Voice quality is a term used to describe a perceptual coloring of the acoustic speech signal and is generally believed to play an important role in the vocal communication of emotion. However, it has rarely been used in automatic classi cation experiments because the exact parameters de ning each quality of voice (e.g., creaky and breathy) are still largely unknown. Yet, some researchers believe much of what constitutes voice quality can be described using information about glottis excitation produced by the vocal folds, most commonly referred to as the glottal pulse waveform. While there are ways of directly measuring the glottal pulse waveform, such as with an electroglottograph, these techniques are too invasive for practical purposes. Therefore, the glottal pulse waveform is usually approximated by inverse ltering of the speech signal. I will derive glottal pulse waveforms from the data using an algorithm that automatically identi es voiced regions of speech, obtains an estimate of the glottal ow derivative, and then represents this using the Liljencrants-Fant parametric model. The nal result is a glottal pulse waveform, from which features can be extracted that describe the shape of this waveform, such as the Open and Skewing Quotients.</Paragraph>
    </Section>
    <Section position="3" start_page="232" end_page="233" type="sub_section">
      <SectionTitle>
3.3 Implementation
</SectionTitle>
      <Paragraph position="0"> The motivating force behind much of the research I have presented herein is the common assumption in the research community that emotion modeling will improve spoken dialogue systems. However, there is little to no empirical proof testing this claim (See (Pon-Barry et al., In publication) for a notable exception.). For this reason, I will implement functionality for detecting and responding to student emotion in ITSpoke (the Intelligent Tutoring System described in Section 2.3) and analyze the effect it has on student behavior, hopefully showing (quantitatively) that doing so improves the system's effectiveness. null Research has shown that frustrated students learn less than non-frustrated students (Lewis and Williams, 1989) and that human tutors respond differently in the face of student uncertainty than they do when presented with certainty (Forbes-Riley and Litman, 2005). These ndings indicate that emotion plays an important role in Intelligent Tutoring Systems. Though I do not have the ability to alter the discourse- ow of ITSpoke, I will insert active listening prompts on the part of ITSpoke when the system has detected either frustration or uncertainty. Active listening is a technique that has been shown to diffuse negative emotion in general (Klein et al., 2002). I hy- null pothesize that diffusing user frustration and uncertainty will improve ITSpoke.</Paragraph>
      <Paragraph position="1"> After collecting data from an emotion-enabled ITSpoke I will compare evaluation metrics with those of a control study conducted with the original ITSpoke system. One such metric will be learning gain, the difference between student pre- and post-test scores and the standard metric for quantifying the effectiveness of educational devices. Since learning gain is a crude measure of academic achievement and may overlook behavioral and cognitive improvements, I will explore other metrics as well, such as: the amount of time taken for the student to produce a correct answer, the amount of negative emotional states expressed, the quality and correctness of answers, the willingness to continue, and subjective posttutoring assessments.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="233" end_page="233" type="metho">
    <SectionTitle>
4 Contributions
</SectionTitle>
    <Paragraph position="0"> I see the contributions of my dissertation to be the extent to which I have helped to answer the questions I posed at the outset of this paper.</Paragraph>
    <Section position="1" start_page="233" end_page="233" type="sub_section">
      <SectionTitle>
4.1 How is emotion communicated in speech?
</SectionTitle>
      <Paragraph position="0"> The experimental design of extracting features from spoken utterances and conducting machine learning experiments to predict emotion classes identi es features important for the vocal communication of emotion. Most of the features I have described here are well established in the research community; statistic measurements of fundamental frequency and energy, for example. However, I have also described more experimental features as a way of improving upon the state-of-the-art in emotion modeling. These exploratory features include breath-group segmentation, contextual information, pitch contour clustering, and voice quality estimation. In addition, exploring three domains will allow me to comparatively analyze the results, with the ultimate goal of identifying universal qualities of spoken emotions as well as those that may particular to speci c domains. The ndings of such a comparative analysis will be of practical bene t to future system builders and to those attempting to de ne a universal model of human emotion alike.</Paragraph>
    </Section>
    <Section position="2" start_page="233" end_page="233" type="sub_section">
      <SectionTitle>
4.2 Does emotion modeling help?
</SectionTitle>
      <Paragraph position="0"> By collecting data of students interacting with an emotion-enabled ITSpoke, I will be able to report quantitatively the results of emotion modeling in a spoken dialogue system. Though this is the central motivation for most researchers in this eld, there is currently no de nitive evidence either supporting or refuting this claim.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML