File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-3253_metho.xml
Size: 13,789 bytes
Last Modified: 2025-10-06 14:09:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3253"> <Title>Sentiment analysis using support vector machines with diverse information sources</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Methods </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Semantic orientation with PMI </SectionTitle> <Paragraph position="0"> Here, the term semantic orientation (SO) (Hatzivassiloglou and McKeown, 2002) refers to a real number measure of the positive or negative sentiment expressed by a word or phrase. In the present work, the approach taken by Turney (2002) is used to derive such values for selected phrases in the text.</Paragraph> <Paragraph position="1"> This approach is simple and surprisingly effective.</Paragraph> <Paragraph position="2"> Moreover, is not restricted to words of a particular part of speech, nor even restricted to single words, but can be used with multiple word phrases. In general, two word phrases conforming to particular part-of-speech templates representing possible descriptive combinations are used. The phrase patterns used by Turney can be seen in figure 1. In some cases, the present approach deviates from this, utilizing values derived from single words. For the purposes of this paper, these phrases will be referred to as value phrases, since they will be the sources of SO values. Once the desired value phrases have been extracted from the text, each one is assigned an SO value. The SO of a phrase is determined based upon the phrase's pointwise mutual information (PMI) with the words &quot;excellent&quot; and &quot;poor&quot;. PMI is defined by Church and Hanks (1989) as follows: null</Paragraph> <Paragraph position="4"> is the difference between its PMI with the word &quot;excellent&quot; and its PMI with the word &quot;poor.&quot; The probabilities are estimated by querying the AltaVista Advanced Search engine1 for counts. The search engine's &quot;NEAR&quot; operator, representing occurrences of the two queried words within ten words of each other in a text, is used to define co-occurrence. The final SO equation is</Paragraph> <Paragraph position="6"> Intuitively, this yields values above zero for phrases with greater PMI with the word &quot;excellent&quot; and below zero for greater PMI with &quot;poor&quot;. A SO value of zero would indicate a completely neutral semantic orientation.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Osgood semantic differentiation with WordNet </SectionTitle> <Paragraph position="0"> Further feature types are derived using the method of Kamps and Marx (2002) of using WordNet relationships to derive three values pertinent to the emotive meaning of adjectives. The three values correspond to the potency (strong or weak), activity (active or passive) and the evaluative (good or bad) factors introduced in Charles Osgood's Theory 1. JJ NN or NNS anything 2. RB, RBR, or RBS JJ not NN nor NNS 3. JJ JJ not NN nor NNS 4. NN or NNS JJ not NN or NNS 5. RB, RBR, or RBS VB, VBD, VBN or VBG anything These values are derived by measuring the relative minimal path length (MPL) in WordNet between the adjective in question and the pair of words appropriate for the given factor. In the case of the evaluative factor (EVA) for example, the comparison is between the MPL between the adjective and &quot;good&quot; and the MPL between the adjective and &quot;bad&quot;.</Paragraph> <Paragraph position="1"> Only adjectives connected by synonymy to each of the opposites are considered. The method results in a list of 5410 adjectives, each of which is given a value for each of the three factors referred to as EVA, POT, and ACT. For the purposes of this research, each of these factors' values are averaged over all the adjectives in a text, yielding three real-valued feature values for the text, which will be added to the SVM model.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Topic proximity and syntactic-relation features </SectionTitle> <Paragraph position="0"> Our approach shares the intuition of Natsukawa and Yi (2003) that sentiment expressed with regard to a particular subject can best be identified with reference to the subject itself. Collecting emotive content from a text overall can only give the most general indication of the sentiment of that text towards the specific subject. Nevertheless, in the present work, it is assumed that the pertinent analysis will occur at the text level. The key is to find a way to incorporate pertinent semantic orientation values derived from phrases into a model of texts. Our approach seeks to employ semantic orientation values from a variety of different sources and use them to create a feature space which can be separated into classes using an SVM.</Paragraph> <Paragraph position="1"> In some application domains, it is known in advance what the topic is toward which sentiment is to be evaluated. The present approach allows for the incorporation of features which exploit this knowledge, where available. This is done by creating several classes of features based upon the semantic orientation values of phrases given their position in relation to the topic of the text.</Paragraph> <Paragraph position="2"> Although in opinion-based texts there is generally a single primary subject about which the opinion is favorable or unfavorable, it would seem that secondary subjects may also be useful to identify.</Paragraph> <Paragraph position="3"> The primary subject of a book review, for example, is a book. However, the review's overall attitude to the author may also be enlightening, although it is not necessarily identical to the attitude towards the book. Likewise in a product review, the attitude towards the company which manufactures the product may be pertinent. It is an open question whether such secondary topic information would be beneficial or harmful to the modeling task. The approach described in this paper allows such secondary information to be incorporated, where available.</Paragraph> <Paragraph position="4"> In the second of the two datasets used in the present experiments, texts were annotated by hand using the Open Ontology Forge annotation tool (Collier et al., 2003). In each record review, references (including co-reference) to the record being reviewed were tagged as THIS WORK and references to the artist under review were tagged as THIS ARTIST.</Paragraph> <Paragraph position="5"> With these entities tagged, a number of classes of features may be extracted, representing various relationships between topic entities and value phrases similar to those described in section 3.1. The classes looked at in this work are as follows: Turney Value The average value of all value phrases' SO values for the text. Classification by this feature alone is not the equivalent of Turney's approach, since the present approach involves retraining in a supervised model.</Paragraph> <Paragraph position="6"> In sentence with THIS WORK The average value of all value phrases which occur in the same sentence as a reference to the work being reviewed.</Paragraph> <Paragraph position="7"> Following THIS WORK The average value of all value phrases which follow a reference to the work being reviewed directly, or separated only by the copula or a preposition.</Paragraph> <Paragraph position="8"> Preceding THIS WORK The average value of all value phrases which precede a reference to the work being reviewed directly, or separated only by the copula or a preposition.</Paragraph> <Paragraph position="9"> In sentence with THIS ARTIST As above, but with reference to the artist.</Paragraph> <Paragraph position="10"> Following THIS ARTIST As above, but with reference to the artist.</Paragraph> <Paragraph position="11"> Preceding THIS ARTIST As above, but with reference to the artist.</Paragraph> <Paragraph position="12"> The features used which make use of adjectives with WordNet derived Osgood values include the following: Text-wide EVA The average EVA value of all adjectives in a text.</Paragraph> <Paragraph position="13"> Text-wide POT The average POT value of all adjectives in a text.</Paragraph> <Paragraph position="14"> Text-wide ACT The average ACT value of all adjectives in a text.</Paragraph> <Paragraph position="15"> TOPIC-sentence EVA The average EVA value of all adjectives which share a sentence with the topic of the text.</Paragraph> <Paragraph position="16"> TOPIC-sentence POT The average POT value of all adjectives which share a sentence with the topic of the text.</Paragraph> <Paragraph position="17"> TOPIC-sentence ACT The average ACT value of all adjectives which share a sentence with the topic of the text.</Paragraph> <Paragraph position="18"> The grouping of these classes should reflect some common degree of reliability of features within a given class, but due to data sparseness what might have been more natural class groupings--for example including value-phrase preposition topicentity as a distinct class--often had to be conflated in order to get features with enough occurrences to be representative.</Paragraph> <Paragraph position="19"> For each of these classes a value may be derived for a text. Representing each text as a vector of these real-valued features forms the basis for the SVM model. In the case of data for which no explicit topic information is available, only the Turney value is used from the first list, and the Text-wide EVA, POT, and ACT values from the second list.</Paragraph> <Paragraph position="20"> A resultant feature vector representing a text may be composed of a combination of boolean unigram-style features and real-valued favorability measures in the form of the Osgood values and the PMI derived values.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Support Vector Machines </SectionTitle> <Paragraph position="0"> SVMs are a machine learning classification technique which use a function called a kernel to map a space of data points in which the data is not linearly separable onto a new space in which it is, with allowances for erroneous classification. For a tutorial on SVMs and details of their formulation we refer the reader to Burges (1998) and Cristiani and Shawe-Tailor (2000). A detailed treatment of these models' application to text classification may be found in Joachims (2001).</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> First, value phrases were extracted and their values were derived using the method described in section 3.1. After this, supervised learning was performed using these values as features. In training data, reviews corresponding to a below average rating were classed as negative and those with an above average rating were classed as positive.</Paragraph> <Paragraph position="1"> The first dataset consisted of a total of 1380 Epinions.com movie reviews, approximately half positive and half negative. This is the same dataset as was presented in Pang et al.(2002). In order to compare results as directly as possible, we report results of 3-fold cross validation, following Pang et al.(2002). Likewise, we include punctuation as tokens and normalize the feature values for text length. To lend further support to the conclusions we also report results for 10-fold cross validation experiments. On this dataset the feature sets investigated include various combinations of the Turney value, the three text-wide Osgood values, and word token unigrams or lemmatized unigrams. 2 The second dataset consists of 100 record reviews from the Pitchfork Media online record review publication,3 topic-annotated by hand. In addition to the features employed with the first dataset, this dataset allows the use those features described in 3.3 which make use of topic information, namely the broader PMI derived SO values and the topic-sentence Osgood values. Due to the relatively small size of this dataset, test suites were created using 100, 20, 10, and 5-fold cross validation, to maximize the amount of data available for training and the accuracy of the results. Text length normalization appeared to harm performance on this dataset, and so the models reported here for this dataset were not normalized for length.</Paragraph> <Paragraph position="2"> SVMs were built using Kudo's TinySVM soft- null a linear kernel.</Paragraph> <Paragraph position="3"> ware implementation.4 Several kernel types, kernel parameters, and optimization parameters were investigated, but no appreciable and consistent benefits were gained by deviating from the the default linear kernel with all parameter values set to their default, so only these results are reported here, with the exception of the Turney Values-only model on the Pitchfork dataset. This single-featured model caused segmentation faults on some partitions with the linear kernel, and so the results for this model only, seen in figure 3, were obtained using a polynomial kernel with parameter a0 set to 2 (default is 1) and the constraints violation penalty set at 2 (default is 1).</Paragraph> <Paragraph position="4"> Several hybrid SVM models were further tested using the results from the previously described models as features. In these models, the feature values for each event represent the distance from the dividing hyperplane for each constituent model.</Paragraph> </Section> class="xml-element"></Paper>