File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-0308_evalu.xml
Size: 5,364 bytes
Last Modified: 2025-10-06 13:59:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0308"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Towards a validated model for affective classification of texts</Title> <Section position="7" start_page="59" end_page="61" type="evalu"> <SectionTitle> 4.2 Results </SectionTitle> <Paragraph position="0"> Of the 1000 blog posts, there were 938 with at least one pattern. Table 5 shows the accuracy for the classi cation of these posts.</Paragraph> <Paragraph position="1"> An important set of emotions found in the literature (Ekman, 1972) has been termed the Big Six. These emotions are fear, anger, happiness, sadness, surprise and disgust. We have used a minimally extended set, adding love and desire (Cowie and Cornelius, 2002), to cover all four quadrants (we called this set the Big Eight). Fear, anger and disgust belong to quadrant 1, sadness and surprise (we have taken it to be a synonym of 'taken aback' in the typology) belong to quadrant 2, love and desire (taken to be synonyms of 'amorous' and 'longing' in the typology) belong to quadrant 3 and happy to quadrant 4. Table 6 shows the results for the classi cation of the blog posts that were tagged with one of these emotions. This amounts to classifying the posts containing only the Big Eight affective states.</Paragraph> <Paragraph position="2"> In the remaining two experiments, blog posts have been classifed using a discrete scoring system. Disregarding the real value of SO, each pattern was scored with a value of +1 for a positive score and -1 for a negative score. This amounts to counting the number of patterns on each side and has the advantage of providing a normalized value for E/T and A/T between -1 and +1. Normalized values are the rst step towards a measure of affect, not merely a score, in the sense that it gives an estimate of the strength of affect. We have not classi ed the posts for which the resulting score was zero, which means that even fewer posts (741) than the previous experiment were actually evaluated. Table 7 shows the results for all moods and table 8 for the Big Eight.</Paragraph> <Section position="1" start_page="60" end_page="61" type="sub_section"> <SectionTitle> 4.3 Analysis of Results </SectionTitle> <Paragraph position="0"> Our concerns about the paradigm words for evaluating the activity dimension are clearly revealed in the classi cation results. The classi er shows a heavy negative (passive) bias in all experiments.</Paragraph> <Paragraph position="1"> The overall accuracy for activity is consistently below that for evaluation: three of them are not statistically signi cant at 1% (51.8%, 53.3% and 52.8%) and two at even 5% (51.8% and 52.8%).</Paragraph> <Paragraph position="2"> The classi er appears particularly confused in table 5, averaging a score for active posts (-4.3) smaller than for passive posts (-4.2). It is not impossible that the moods present in the typology may have to be shifted towards the passive dimension, but further research should look rst at nding better paradigm words for activity. A good starting point for the calibration of the classi er for activity is the creation of a list of human-annotated words for activity, comparable in size to the GI list, combined with an experiment similar to the one for which results are reported in table 3.</Paragraph> <Paragraph position="3"> With regards to the evaluation dimension, tables 5 and 6 reveal a positive bias (despite having a classi er which has a 'built-in' negative bias, see section 4.1). Possible explanations for this phenomenon include the use of irony by people in negative posts, blogs which are expressed in more positive terms than their annotation would suggest, and failure to detect 'negative' contexts for patterns one example of the latter is provided in table 9. This phenomena appears to be alleviated Mood: bored (evaluation-) Post: gah!! i need new music, any suggestions? by the way, GOOD MUSIC.</Paragraph> <Paragraph position="4"> Patterns: new music [JJ NN] +4.38 by the use of discrete scores (see tables 7 and 8). One way of re ning the scoring system is to reduce the effect of scoring antonyms as high as synonyms by not counting co-occurences in the corpus where the word 'not' is in the neighbourhood (Turney, 2001). Also, The long-term goal of this research is to be able to classify texts by locating their normalized scores for evaluation and activity between -1 and +1, and we have suggested a simple method of achieving that by averaging over discrete scores. However, by combining individual results for evaluation and activity for each post13, we can already classify text into one of the four quadrants, and we can expect the average accuracy of this classi cation to be approximately the product of the accuracy for each dimension. Table 10 shows the results for the classi cation directly into quadrants of the 727 posts already classi ed into halves (ESS, ASS) in table 8. The overall accuracy is 31.1% (expected accuracy is 59.8% * 52.8% = 31.6%). There are biases towards Q2 and Q3, but no clear cases of confusion between two or more classes.</Paragraph> <Paragraph position="5"> Finally, our experiments show no correlation between the length of a post (in number of patterns) and the accuracy of the classi cation.</Paragraph> </Section> </Section> class="xml-element"></Paper>