File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1025_metho.xml
Size: 13,490 bytes
Last Modified: 2025-10-06 14:10:06
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1025"> <Title>Determining Term Subjectivity and Term Orientation for Opinion Mining</Title> <Section position="5" start_page="195" end_page="195" type="metho"> <SectionTitle> 3 Determining term subjectivity and </SectionTitle> <Paragraph position="0"> term orientation by semi-supervised learning The method we use in this paper for determining term subjectivity and term orientation is a variant of the method proposed in (Esuli and Sebastiani, 2005) for determining term orientation alone. This latter method relies on training, in a semi-supervised way, a binary classifier that labels terms as either Positive or Negative. A semi-supervised method is a learning process whereby only a small subset L [?] Tr of the training data Tr are human-labelled. In origin the training data in U = Tr [?] L are instead unlabelled; it is the process itself that labels them, automatically, by using L (with the possible addition of other publicly available resources) as input. The method of (Esuli and Sebastiani, 2005) starts from two small seed (i.e. training) sets Lp and Ln of known Positive and Negativeterms, respectively, and expands them into the two final training sets Trp [?] Lp andTrn [?] Ln byadding them new sets of terms Up and Un found by navigating the Word-Net graph along the synonymy and antonymy relations3. This process is based on the hypothesis that synonymy and antonymy, in addition to defining a relation of meaning, also define a relation of orientation, i.e. that two synonyms typically have the same orientation and two antonyms typically have opposite orientation. The method is iterative, generating two sets Trkp and Trkn at each iteration k, where Trkp [?] Trk[?]1p [?] ... [?] Tr1p = Lp and Trkn [?] Trk[?]1n [?] ... [?] Tr1n = Ln. At iteration k, Trkp is obtained by adding to Trk[?]1p all synonyms of terms in Trk[?]1p and all antonyms of terms in Trk[?]1n ; similarly, Trkn is obtained by adding to Trk[?]1n all synonyms of terms in Trk[?]1n and allantonyms oftermsinTrk[?]1p . Ifatotal ofK iterations are performed, then Tr = TrKp [?]TrKn .</Paragraph> <Paragraph position="1"> The second main feature of the method presented in (Esuli and Sebastiani, 2005) is that terms are given vectorial representations based on their WordNet glosses (i.e. textual definitions). For each term ti in Tr[?]Te (Te being the test set, i.e.</Paragraph> <Paragraph position="2"> thesetoftermstobeclassified), atextual representation of ti is generated by collating all the glosses of ti as found in WordNet4. Each such represen3Several other WordNet lexical relations, and several combinations of them, are tested in (Esuli and Sebastiani, 2005). In the present paper we only use the best-performing such combination, as described in detail in Section 4.2. The version of WordNet used here and in (Esuli and Sebastiani, 2005) is 2.0.</Paragraph> <Paragraph position="3"> 4In general a term ti may have more than one gloss, since tation is converted into vectorial form by standard text indexing techniques (in (Esuli and Sebastiani, 2005) and in the present work, stop words are removed and the remaining words are weighted by cosine-normalized tfidf; no stemming is performed)5. This representation method is based on the assumption that terms with a similar orientation tend to have &quot;similar&quot; glosses: for instance, that the glosses of honest and intrepid will both contain appreciative expressions, while the glosses of disturbing and superfluous will both contain derogative expressions. Note that this method allows to classify any term, independently of its POS, provided there is a gloss for it in the lexical resource.</Paragraph> <Paragraph position="4"> Once the vectorial representations for all terms inTr[?]Tehavebeengenerated, thosefortheterms in Tr are fed to a supervised learner, which thus generates a binary classifier. This latter, once fed with the vectorial representations of the terms in Te, classifies each of them as either Positive or Negative.</Paragraph> </Section> <Section position="6" start_page="195" end_page="197" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> In this paper we extend the method of (Esuli and Sebastiani, 2005) tothedetermination oftermsubjectivity and term orientation altogether.</Paragraph> <Section position="1" start_page="195" end_page="196" type="sub_section"> <SectionTitle> 4.1 Test sets </SectionTitle> <Paragraph position="0"> The benchmark (i.e. test set) we use for our experiments is the General Inquirer (GI) lexicon (Stone et al., 1966). This is a lexicon of terms labelled according to a large set of categories6, each one denoting the presence of a specific trait in the term. The two main categories, and the ones we will be concerned with, are Positive/Negative, which contain 1,915/2,291 terms having a positive/negative orientation (in what follows we will also refer to the category Subjective, which we define as the union of the two categories Positive and Negative). In opinion mining research the GI was first used by Turney and Littman (2003), who reduced the list of terms to 1,614/1,982 entries afit may have more than one sense; dictionaries normally associate one gloss to each sense.</Paragraph> <Paragraph position="1"> 5Several combinations of subparts of a WordNet gloss are tested as textual representations of terms in (Esuli and Sebastiani, 2005). Of all those combinations, in the present paper we always use the DGS! combination, since this is the one that has been shown to perform best in (Esuli and Sebastiani, 2005). DGS! corresponds to using the entire gloss and performing negation propagation on itstext, i.e. replacing allthe terms that occur after a negation in a sentence with negated versions of the term (see (Esuli and Sebastiani, 2005) for details). null terremoving 17termsappearing inboth categories (e.g. deal) and reducing all the multiple entries of the same term in a category, caused by multiple senses, to a single entry. Likewise, we take all the 7,582 GI terms that are not labelled as either Positive or Negative, as being (implicitly) labelled as Objective, and reduce them to 5,009 terms after combining multiple entries of the same term, caused by multiple senses, to a single entry.</Paragraph> <Paragraph position="2"> The effectiveness of our classifiers will thus be evaluated in terms of their ability to assign the total 8,605 GI terms to the correct category among Positive, Negative, and Objective7.</Paragraph> </Section> <Section position="2" start_page="196" end_page="196" type="sub_section"> <SectionTitle> 4.2 Seed sets and training sets </SectionTitle> <Paragraph position="0"> Similarly to (Esuli and Sebastiani, 2005), our training set is obtained by expanding initial seed sets by means of WordNet lexical relations. The main difference is that our training set is now the union of three sets of training terms Tr = TrKp [?]TrKn [?]TrKo obtained byexpanding, through K iterations, three seed sets Tr1p,Tr1n,Tr1o, one for each of the categories Positive, Negative, and Objective, respectively.</Paragraph> <Paragraph position="1"> Concerning categories Positive and Negative, we have used the seed sets, expansion policy, and number of iterations, that have performed best in the experiments of (Esuli and Sebastiani, 2005), i.e. the seed sets Tr1p = {good} and Tr1n = {bad} expanded by using the union of synonymy and indirect antonymy, restricting the relations only to terms with the same POS of the original terms (i.e. adjectives), for a total of K = 4 iterations. The final expanded sets contain 6,053 Positive terms and 6,874 Negative terms.</Paragraph> <Paragraph position="2"> Concerning the category Objective, the process we have followed is similar, but with a few key differences. These are motivated by the fact that the Objective category coincides with the complement of the union of Positive and Negative; therefore, Objective terms are more varied and diverse in meaning than the terms in the other two categories. To obtain a representative expanded set TrKo , we have chosen the seed set Tr1o = {entity} and we have expanded it by using, along with synonymy and antonymy, the WordNet relation of hyponymy (e.g. vehicle / car),andwithout imposing the restriction that the two related terms must have the same POS. These choices are strictly related to each other: the term entityis the root term of the largest generalization hierarchy in WordNet, with more than 40,000 SentiGI.tgz.</Paragraph> <Paragraph position="3"> terms (Devitt and Vogel, 2004), thus allowing to reach a very large number of terms by using the hyponymy relation8. Moreover, it seems reasonable to assume that terms that refer to entities are likely to have an &quot;objective&quot; nature, and that hyponyms (and also synonyms and antonyms) of an objective term are also objective. Note that, at each iteration k, a given term t is added to Trko only if it does not already belong to either Trp or Trn. We experiment with two different choices for the Tro set, corresponding to the sets generated in K = 3 and K = 4 iterations, respectively; this yields sets Tr3o and Tr4o consisting of 8,353 and 33,870 training terms, respectively.</Paragraph> </Section> <Section position="3" start_page="196" end_page="197" type="sub_section"> <SectionTitle> 4.3 Learning approaches and evaluation measures </SectionTitle> <Paragraph position="0"> We experiment with three &quot;philosophically&quot; different learning approaches to the problem of distinguishing between Positive, Negative, and Objective terms.</Paragraph> <Paragraph position="1"> Approach I is a two-stage method which consists in learning two binary classifiers: the first classifier places terms into either Subjective or Objective, while the second classifier places terms that have been classified as Subjective by thefirstclassifier into either Positive orNegative.</Paragraph> <Paragraph position="2"> In the training phase, the terms in TrKp [?]TrKn are used as training examples of category Subjective.</Paragraph> <Paragraph position="3"> Approach II is again based on learning two binary classifiers. Here, one of them must discriminate between terms that belong to the Positive category and ones that belong to its complement (not Positive), while the other must discriminate between terms that belong to the Negative category and ones that belong to its complement (not Negative). Terms that have been classified both into Positive by the former classifier and into (not Negative) by the latter are deemed to be positive, and terms that have been classified both into (not Positive) by the former classifier and into Negative by the latter are deemed to be negative. The terms that have been classified (i) into both (not Positive) and (not Negative), or (ii) into both Positive and Negative, are taken to be Objective. In the training phase of Approach II, the terms in TrKn [?] TrKo are used as training examples of category (not Positive), and the terms in TrKp [?]TrKo are used as training examples of category (not Negative).</Paragraph> <Paragraph position="4"> Approach III consists instead in viewing Positive, Negative, and Objective as three categories with equal status, and in learning a ternary classifier that classifies each term into exactly one among the three categories.</Paragraph> <Paragraph position="5"> There are several differences among these three approaches. A first difference, of a conceptual nature, is that only Approaches I and III view Objective as a category, or concept, in its own right, while Approach II views objectivity as a nonexistent entity, i.e. as the &quot;absence of subjectivity&quot; (in fact, in Approach II the training examples of Objective are only used as training examples of the complements of Positive and Negative). Asecond difference isthatApproaches Iand II are based on standard binary classification technology, while Approach III requires &quot;multiclass&quot; (i.e. 1-of-m) classification. As a consequence, while for the former we use well-known learners for binary classification (the naive Bayesian learner using the multinomial model (McCallum and Nigam, 1998), support vector machines using linear kernels (Joachims, 1998), the Rocchio learner, and its PrTFIDFprobabilistic version (Joachims, 1997)), for Approach III we use their multiclass versions9.</Paragraph> <Paragraph position="6"> Before running our learners we make a pass of feature selection, with the intent of retaining only those features that are good at discriminating our categories, while discarding those which are not.</Paragraph> <Paragraph position="7"> Feature selection is implemented by scoring each feature fk (i.e. each term that occurs in the glosses of at least one training term) by means of the mutual information (MI) function, defined as</Paragraph> <Paragraph position="9"> and discarding the x% features fk that minimize it. We will call x% the reduction factor. Note that theset{c1,...,cm}fromEquation 1isinterpreted differently in Approaches I to III, and always consistently with who the categories at stake are.</Paragraph> <Paragraph position="10"> Since the task we aim to solve is manifold, we will evaluate our classifiers according to two eval- null and Objective, i.e. in deciding both term orientation and subjectivity.</Paragraph> </Section> </Section> class="xml-element"></Paper>