XML Viewer - c00-1044

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1044_metho.xml
Size: 24,574 bytes
Last Modified: 2025-10-06 14:07:07
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1044">
  <Title>Effects of Adjective Orientation and Gradability on Sentence Subjectivity</Title>
  <Section position="3" start_page="299" end_page="299" type="metho">
    <SectionTitle>
2 Semantic Orientation
</SectionTitle>
    <Paragraph position="0"> The semantic orientation or polarity of a word indicates the direction the word deviates fl'om the norm for its semantic group or lexicalfield (Lehrer, 1974). It is an evaluative characteristic (Battistella, 1990) of the meaning of the word which restricts its usage to appropriate pragmatic contexts. Words that encode a desirable state (e.g., beautiful, unbiased) have a positive orientation, while words that represent undesirable states have a negative orientation. Within tile particular syntactic class o1' adjectives, orientation can be expressed as the ability of an adjective to ascribe in general a positive or negative quality to the modified item, making it better or worse than a similar unmodilied item.</Paragraph>
    <Paragraph position="1"> Most antonymous adjectives can be contrasted on the basis of orientation (e.g., beautil)d-ugly); similarly, nearly synonymous terms are often distinguished by dill fcrent orientations (e.g., simple-siml)listic). While orientation applies to many adjectives, there are also those that have no orientation, typically as members of groups of complementary, qualitative terms (Lyons, 1977) (e.g., domestic, medical, or red). Since orientation is inherently connected with cwduative judgements, it appears to be a promising feature for predicting subjectivity.</Paragraph>
    <Paragraph position="2"> Hatzivassiloglou and McKeown (1997) presented a method for autonmtically assigning a + or - orientation label to adjectives known to have some semantic orientation. Their method is based on information extracted fi'om conjunctions between adjectives in a large corpusI because orientation constrains the use of the words in specific contexts (e.g., compare corrupt and brutal with *corrupt but brutal), observed conjunctions of adjectives can be exploited to inl'er whether the conjoined words are of the same or different orientation. Using a shallow parser on a 21 million word corpus of Wall Street Journal articles, Hatzivassiloglou and McKeown developed and trained a log-linear statistical model that predicts whether any two adjectives have the same orientation with 82% accuracy. The predicted links o1' same or dil L ferent orientation are automatically assigned a strength value (essentially, a confidence estimate) by tile model, and induce a graph that can be partitioned with a clustering algorithm into components so that all words in the same component belong to the same orientation class.</Paragraph>
    <Paragraph position="3"> Once the classes have been determined, fl'equency information is used to assign positive or negative labels to each class (there are slightly fewer positive terms, but with a significantly higher rate of occurrence than negative terms).</Paragraph>
    <Paragraph position="4"> Hatzivassiloglou and McKeown applied their method to 1,336 (657 positive and 679 negative) adjectives which were all the oriented adjectives appearing in the corpus 20 times or more. Orientation labels were assigned to these adjectives by hand. I Subsequent validation of the initial selection and label assignment steps with independent human judges showed an agreement of 89% t'or tile first step and 97% for the second step, establishing that orientation is a fairly objective semantic property. Because the accuracy ol' the method depends on the density of conjunctions per adjective, Hatzivassiloglou and MeKeown tested separately their algorithm for adjectives appearing in at least 2, 3, 4, or 5 conjunctions in the col pus; their results are shown in Table I.</Paragraph>
    <Paragraph position="5"> In this paper, we use the model labels assigned by hand by Hatzivassiloglou and McKeown, and tile labels automatically obtained by their method and reported in (Hatzivassiloglou and McKeown, 1997) with the following extension: An adjective that appears in k conjunctions will receive (possibly different) labels when analyzed together with all adjectives appearing in at least 2, 3 ..... k conjunctions; since performance generally increases with the number of conjunctions per adjective, we select as the orientation label the one assigned by the experi,nent t, sing the highest applicable conjunctions threshold. Overall, we have labels for 730 adjectives 2, with a prediction accuracy of 81.51%.</Paragraph>
  </Section>
  <Section position="4" start_page="299" end_page="301" type="metho">
    <SectionTitle>
3 Gradability
</SectionTitle>
    <Paragraph position="0"> Gradability (or grading) (Sapir, 1944; Lyons, 1977, p.</Paragraph>
    <Paragraph position="1"> 27 I) is the semantic property that enables a word to participate in comparative constructs and to accept modifying expressions that act as intensitiers or diminishers. Gradable adjectives express properties in varying degrees ot' strength, relative to a norm either explicitly ISome adjectives with unclem; mnbiguous, or conlexl,-dependenl orientation were excluded.</Paragraph>
    <Paragraph position="2">  or moditication, for two adjectives, one gradable (cold) and one primarily non-gradable (civil). The frequencies were compt, ted li'om the 1987 Wall Street Journal corpus.</Paragraph>
    <Paragraph position="3"> mentioned or implicitly supplied by the modilied noun (for example, a small planet is usually much larger thart a large house; cf. the distinction between absolute and telalive adjectives made by Katz (1972, p. 254)). This relativism in the interpetation of gradable words indicates that gradability is likely to be a good predictor C/71' subjectivity. null</Paragraph>
    <Section position="1" start_page="300" end_page="300" type="sub_section">
      <SectionTitle>
3.1 Indicators ofgradability
</SectionTitle>
      <Paragraph position="0"> Most gradable words appear at least several times in a large corpt, s either in forms inflected for degree (i.e., comparative and superlative), or in tile context of grading modilicrs such as veo,. However, non-gradable words may also occasionally appear in such contexts or forms under exceptional circumstances. For example, ve O, dead can be used tk)r emphasis, and re&amp;let am~ re&amp;let (as in &amp;quot;her lhce became redder and redder&amp;quot;) can be used to indicate a progression of coloring, qb distinguish between truly gradablc adjectives and non-gradable adjectives in these exceptional contexts, we have developed a trainable log-linear statistical model that lakes into account tile number of times an ad.iective has been observed in a form or context indicating gradability relative to the number of limes it has been seen in non-gradable contexts. null We use a shallow parser to retrieve from a large corpus tagged for part-of-speech with Church's PARTS tagger (Church, 1988) all adjectives and their modifiers. Although the most common use of an adverb modifying an adjective is to function as an intensilier or diminisher (Quirk et al., 1985, p. 445), adverbs can also add to tile semantic content of the adjectival phrase instead of providing a grading effect (e.g., immediately available, politically vuhmrable), or function as cmphasizers, adding to the force o1' tile base adjective and not lo its degree (e.g., virtually impossible; compare *re O, impossible).</Paragraph>
      <Paragraph position="1"> Therefore, we compiled by hand a list of 73 adverbs and noun phrases (such as a little, exceedingly, somewhat, and veo') that are fi'equently used as grading moditicrs.</Paragraph>
      <Paragraph position="2"> The number of times each adjective appears mod ilied by a term form this list becomes a first indicator of gradability. null To detect inflected forms o1' adjectives (which, in 15&gt; glish, always indicate gradability st, bject to the excel&gt; tions discussed earlier), we have implemented an automatic lnorphology analysis component. This program recognizes several irregular forms (e.g., good-betterbest) and strips tile grading suffixes -er and -est Dora regularly inllected adjectives, producing a list of candidate base forms that if inflected would yield tilt original adjective (e.g., bigger produces three potential forms, big, bigg, and bigge). The frequency of these candidate base words is checked against tile corpus, and tile form with signilicantly higher frequency is selected. To guard against cases of base adjective forms that end in -er or-est (e.g., sih,er), the original word is also included alllong tile candidates. The total number of times this procedure is successfully applied for each adjective becomes a second indicator of gradability.</Paragraph>
    </Section>
    <Section position="2" start_page="300" end_page="301" type="sub_section">
      <SectionTitle>
3.2 l)etermlnlng gradabillty
</SectionTitle>
      <Paragraph position="0"> The presence or absence of each of the above two indicators results in a 2 x 2 frequency table IBr each adjective; examples for one gradable and one non-gradable adjective are given in &amp;quot;lhble 2. &amp;quot;lb convert lhese four numbers to a single decision on tile gradability of tile ad.iective, we use a log-linear model. Ix)g-linear models (Nantnef and l)ufly, 1989) construct a linear combination (weighted sum) of the predictor wlriables 1~,</Paragraph>
      <Paragraph position="2"> and relate it to the actual response H. (in this case, 0 for non-gmdable and 1 for gradable) via the so-called logistic trcm,sJbrmation, 1~- I -t- e'J Maximum likelihood estimates for the coefficients fli are obtained from training samples for which the correct response H, is known, using the iterative reweighted non-linear least squares algorithm (Bates and Watts, 1988). This statistical model is particularly suited for modeling variables with a &amp;quot;yes&amp;quot;-&amp;quot;no&amp;quot; (binary) value, because, unlike linear models, it captures the dependency of IFs variance on its mean (Santner and Dully, 1989).</Paragraph>
      <Paragraph position="3"> We normalize the counts for the two indicators of g,'adability, and the cot, at ot'joint occurrences of both intleetion and modilication by grading moditiers, by dividing with the total frequency of the adjective in the corpus. In this manner, we obtain three real-valued predictors  Classitied as gradable: acceptable accurate afraid aware busy careful cautious el~eap creative critical dangerous different disappointing equal fair fanfiliar far favorable formal free frequent good grand inadequate intense interesting legitimate likely positive professional reasonable rich short-term significant slow solid sophisticated sound speculative thin tight tough uucertain widespread worth Classilied as non-gradable: additional alleged alternative annual antitrust automatic certain criminal cumulative daily deputy domestic cldcrly false linaneial first-quarter full hefty illegal institutional internal legislative long-distance military minimum monthly moral national official  sample of 100 adjectives as gradable or not. Correct decisions (according to the COBU1LD-based reference model) are indicated in bold.</Paragraph>
      <Paragraph position="4"> 14&amp;quot;, i = 1,..., 3 for the log-linear model. We also consider a modilied model, where any adjective for which any occurrence of simultaneous inflection and modilication has been detected is automatically labeled gradable; the remaining two predictors are used to classify the adjectives that do not fullill this condition. This modilication is motivated by the fact that observing an adjective in such a context offers a very high likelihood o1' gradability. null</Paragraph>
    </Section>
    <Section position="3" start_page="301" end_page="301" type="sub_section">
      <SectionTitle>
3.3 Experimental results
</SectionTitle>
      <Paragraph position="0"> We extracted from the 1987 Wall Street Journal corpus (21 million words) all adjectives with a frequency o1' 300 or more; this produced a collection of 496 words. Gradability labels specifying whether each word is gradable or not were manually assigned, using tim designations of the Collins COBUILD (Collins Birmingham University International Language Database) dictionary (Sinclair, 1987). COBUILD marks each sense of each adjective with one of the labels QUALIT, CLASSIF, or COLOR, corresponding to gradable, non-gradable, and color adjectives. In cases where COBUILD supplies conflicting labels for different senses of a word, we either omitted that word or, if a sense were predominant, gave it the label of that sense. In some cases, the word did not appear in COBUILD; these typically were descriptive compounds speci\[ic to the domain (e.g., anti-takeover, over-the-coullter) and were in most cases marked as non-gradable adjectives. Overall, 453 of tile 496 adjeclives (91.33%) were assigned gradability labels by hand, while the remaining 53 words were discarded because they were misclassitied as adjectives by the part-ol:speech tagger (e.g., such) or because they coukt not be assigned a unique gradability label in accordance with COBUILD.</Paragraph>
      <Paragraph position="1"> Out of these words, 235 (51.88%) were manually classilied as gradable adjectives, and 218 (48.12%) were classilied as non-gradablc adjectives.</Paragraph>
      <Paragraph position="2"> Following the methodology of the preceding subsection, we recovered the inflection and modilication indicators for these 453 adjectives, and trained both the unmodified and modilied log-linear models rcpcatedly, using a randomly selected subset ol' 300 adjectives for training and 100 adjectives for testing. The entire cycle of selecting random test and training sets, fitting the model's coefficients, making predictions, and evaluating the predicted gradability labels is repeated 100 times, to ensure that the ewtluation is not affected by a lucky (or unlucky) partition of the data between training and test sets. This procedure yields over the 453 adjectives gradability classifications with an average precision o1' 93.55% and average recall o1' 82.24% (in terms of the gradable words reported or recovered, respectively). The overall accuracy of the predicted gradability labels is 87.97%. These results were obtained with the modified log-linear model, which slightly ot, tperformed the model that uses all three predictors (in that case, we obtained an average precision of 93.86%, average recall ol' 81.70%, and average over-all accuracy o1' 87.70%). Figure I lists the gradability labels that were automatically assigned to one of the 100 random test sets ttsing the moditied prediction algorithm.</Paragraph>
      <Paragraph position="3"> We also assigned automatically labels to the entire set of 453 adjectives, using 4-fold cross-validation (repeatedly training on three-fourths of tim 453 adjectives and testing on the rest). This resulted in precision of 94.15%, recall of 82.13%, and accuracy of 88.08% for the entire adjective set.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="301" end_page="303" type="metho">
    <SectionTitle>
4 Subjectivity
</SectionTitle>
    <Paragraph position="0"> The main motivation for the present paper is to examine the effect that information about an adjective's semantic orientation and gradability has on its probability of occurring in a subjective sentence (and hence on its quality as a subjectivity predictor). We tirst review related work on subjectivity recognition and then present our results.</Paragraph>
    <Section position="1" start_page="301" end_page="302" type="sub_section">
      <SectionTitle>
4.1 Previous work on subjectivity recognition
</SectionTitle>
      <Paragraph position="0"> In work by Wiebc, Bruce, and O'Hara (Wiebe ct al., 1999; Bruce and Wicbe, 2000), a corpus of 1,001 sentences 3 of the Wall Street Journal TreeBank Corpus 3Conlpoutld sentences were manually segmented into their conjuncts, and each conjtmct treated as a scparale sentence.</Paragraph>
      <Paragraph position="1">  (Marcus et al., 1993) was nlanually annotated with subjeciivity chlssifications. Specifically, each sentence was assigned a subjective or objective classitication, according to concensus lags derived by a slalistical analysis of lhe chisses assigned by three human judges (see (Wiebe et al., 1999) for further infornmtion). The total nulnber of subjective sentences in lhe data is 486, and the total number of objeclive sentences is 515.</Paragraph>
      <Paragraph position="2"> Bruce and Wiebe (2000) performed a statistical analysis of the assigned classitications, linding lhat ac(iectivcs are statistically signilicantly and positively correlated with subjective sentences in the corpus on the basis (, . The proba- of the log-likelihood ratio test statistic -,2 bility of a sentence being subjective, simply given din! there is at least one adjective in lhe sentellee, is 56%, even though there are more objective than subjective senlences in the corpus. In addition, Bruce and Wicbe identiffed a type of adjective that is indicative of subjective sentences: those Quirk et al. (1985) term dynamic, which &amp;quot;denote qualities that a,'e thoughl to be subjecl to control by the possessor&amp;quot; (p. 434). IZxamples are &amp;quot;kind&amp;quot; and &amp;quot;careful&amp;quot;. Bruce and Wiebe nianually applied synlactic tests to identify dynamic adjectives in hall' of the corpus nlentioned above. We inclutle such adjectives in the analysis below, to assess whether additional lexical seinantic features associated with subjectivity hel I ) improve prodictability. null Wiebe el al. (1999) developed an automatic system to perform st, bjectivily lagging. In 10-fold cross validalion experiments applied to the corpus described above, a probabilislic classilier oblaincd an average accuracy on subjectivity lagging of 72.17%, nlorc Ihan 20 perccnlage poinls higher than the baseline accuracy obtained by always choosing tile nlore frcquent class. A binary feature is included for each of lhe lbllowing: lhe presence in lhe sentence of a pl'ollotln, an adjective, a cardinal number, a modal other fllan will, and an adverb other than #lot.</Paragraph>
      <Paragraph position="3"> They also inchlded a binary feature representing whether or not the sentence begins a new lxu'agraph, l:inally, a feature was included representing co-occurrence of word tokens and punciuation marks with tile sul~jective and objective classilicfition. An analysis of the system showed that the adjective \['cature was imporlant to realizing the inlprovolncnts over lllO baseline accuracy, in this \])apci', we use lhe performance of the simple adjcclive fealtu'e as a baseline, and identify higher quality adjeclive features based on gradability and orienlalion.</Paragraph>
    </Section>
    <Section position="2" start_page="302" end_page="303" type="sub_section">
      <SectionTitle>
4.2 Orientation and gradability as subjectivity
</SectionTitle>
      <Paragraph position="0"> predictors: Results We measure the precision of a simple prediction method for subjectivity: a sonlence is classilicd as subjcclivc il' at least one nlonlbor of a set of adjectives N occurs in 1he sontonco, alld objeclive otherwise. By wirying 1tlo sot (e.g., all adjeclives, only gradable adjectives, only negatively orienied adjectives, etc.) we call assess the t, sefulheSS of ihe additional knowledge for predicting subjeclivity. null For the present study, we use tile set of all adjectives automatically identified in tile corpt, s by Wiebc et al. (1999) (Section 4.1 ); the set of dynamic adjectives Ill,{Inually identified by Bruce and Wiebe (2000) (Section 4.1); tile set of scnmntic orientation labels assigned by Hatzivassiloglou and McKeown (1997), both manually and automatically with our extension described in Section 2; and the set of gradability labels, both manually and atttomatically assigned according to the revised log-linear model of Section 3. We calculate restllts (shown in 'hible 3) for each of lhese sets of all adjectives, dynamic, oriented and gradable adjectives, as well as for unions and intersections of lhose sets. Nole fliat these four sets have been extracted l'rom comparable but different corpora (different years of the Wall Street Journal), therefore sometimes adjectives in one corpus may not be present in the other corpus, reducing the size of intersection sets.</Paragraph>
      <Paragraph position="1"> Also, for gradability, we worked with a sample set of 100 adjectives rather than all possible adjectives we could automatically calcuhtte gladabiliiy vahles for, since our goal in the present work is to measure correlations between these sets and sul~jeciivity, rather than building a system for predicling subjectivity for as many ac\[iectives as possible.</Paragraph>
      <Paragraph position="2"> In Table 3, the second cohmm identifies 8, the set of ac\[iective types in question. The third cohimn gives the number of subjective sentences that contain one or more instances of members of S, and the fourth colunul gives lhe same ligure for ol~jective sentences. Therefore these two cohinuls together specify lhe coverage of tlm subjectivity indicator examined. The lifth cohimn gives 111c conditional probability that a sentence is subjective.</Paragraph>
      <Paragraph position="3"> givell that (tile of iilorc illstatices of ti/enlbcl+S of +5; appears. This is a precishm inetrie that assesses feature quality: if inslances of &lt;&amp;quot;7 appear, how likely is the sontence to be subjective? The last two colunuls contrast the observed conditional probability with the a priori probability of subjective sentellees (i.e., chalice; sixth colulnn) and with the probability assigned by the baseline all-adjectives model (i.e., the lirst row in the table; seventh colunm).</Paragraph>
      <Paragraph position="4"> The nlost striking aspect of these results is lhat all sets involviug dynamic adiectives positive or negative polarity, or gradability are better predictors of sul~jective sentenccs than the class of adjectives as a whole, lqve of the sets are at least 25 points better (LI4, LI6, L21, L23, and L24); four others are at least 20 points better (L2, L9, L13, and 1,15); and live others are at least 15 points better (L4, LI I, 1,18, L20, and 1,22). In most of these cases, the difference between these predictors and all adjectives is statistically signiticant 4 fit the 5% level or less; ahnost all of these predictors offer statistically significantly better than even odds in predicting subjectivity correctly. In nlany cases where statistical signilicance 'Iwe applied a chi-square lest Oll the 2 x 2 cross-classificalion lable (Fleiss, 1981).</Paragraph>
      <Paragraph position="5">  4.3671.. 10 -4 could not established this is due to small counts, caused by the small size of the set of adjectives automatically labeled for gradability.</Paragraph>
      <Paragraph position="6"> It is also important to note that, in most cases, tile automatically-classified adjectives are comparable or better predictors of subjective sentences than the manually-assigned ones. Comparing tile automatically generated classes with the manually identilied ones, the positive polarity set decreases by 1 percentage point (L3 and L8), while the negative polarity set increases by 7 points (L4 and L9), and the gradable sot increases by 5 percentage points (L6 and LI 1). Among the intersection sets, in two cases the results are lower for tile computer-generated sets (Ll 3/LI 5 and L 14/L 16), but in tile other 4 eases, the results are higher (LI 7/L20, L 18/L21, L19/L2, L24/L23).</Paragraph>
      <Paragraph position="7"> Finally, the table shows that, in most cases, prodictability improves or at worst remains essentially tile same as additional lexical features are considered. For tile set of dynamic adjectives, the predictability is 74% (L2), and improves in 4 of the 6 cases in which it is intersected with other sets (LI4, L l6, L23, and L24). For the other two (L 13 and LI 5), predictability is only 1 or 2 points lower (not statistically significant). For the manually assigned polarity and gradability sets, in one case predictability is lower (L17 &lt; L6), but in the other cases it remains the same or improves. The results are even better for the automatically assigned polarity and gradability sets: predictability improves when both features are considered in all but one case, when predictability remains the same (L20 &gt; L8; L21 &gt; L9; L22 &gt; LI0; and LI 1 _&lt; L20, L21, and L22).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML