File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1071_metho.xml

Size: 15,438 bytes

Last Modified: 2025-10-06 14:08:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1071">
  <Title>Deeper Sentiment Analysis Using Machine Translation Technology</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Previous work on Sentiment
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Analysis
</SectionTitle>
      <Paragraph position="0"> Some prior studies on sentiment analysis focused on the document-level classification of sentiment (Turney, 2002; Pang et al., 2002) where a document is assumed to have only a single sentiment, thus these studies are not applicable to our goal. Other work (Subasic and Huettner, 2001; Morinaga et al., 2002) assigned sentiment to words, but they relied on quantitative information such as the frequencies of word associations or statistical predictions of favorability. null Automatic acquisition of sentiment expressions have also been studied (Hatzivassiloglou and McKeown, 1997), but limited to adjectives, and only one sentiment could be assigned to each word.</Paragraph>
      <Paragraph position="1"> Yi et al. (2003) pointed out that the multiple sentiment aspects in a document should be extracted. This paper follows that approach, but exploits deeper analysis in order to avoid the analytic failures reported by Nasukawa and Yi (2003), which occurred when they used a shallow parser and only addressed a limited number of syntactic phenomena.</Paragraph>
      <Paragraph position="2"> In our in-depth approach described in the next section, two types of errors out of the four reported by Nasukawa and Yi (2003) were easily removed2.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Sentiment Unit
</SectionTitle>
    <Paragraph position="0"> This section describes the sentiment units which are extracted from text, and their roles in the sentiment analysis and its applications.</Paragraph>
    <Paragraph position="1"> A sentiment unit consists of a sentiment, a predicate, its one or more arguments, and a surface form. Formally it is expressed as in Figure 3.</Paragraph>
    <Paragraph position="2"> The 'sentiment' feature categorizes a sentiment unit into four types: 'favorable' [fav], 'unfavorable' [unf], 'question' [qst], and 'request' [req]. A predicate is a word, typically a verb or an adjective, which conveys the main notion of the sentiment unit. An argument is also a word, typically a noun, which modifies the predicate with a case postpositional in Japanese. They roughly correspond to a subject and an object of the predicate in English.</Paragraph>
    <Paragraph position="3"> For example, from the sentence (2)3, the extracted  sentiment unit is (2a).</Paragraph>
    <Paragraph position="4"> ABC123-ha renzu-ga subarashii.</Paragraph>
    <Paragraph position="5"> ABC123-TOPIC lens-NOM excellent 'ABC123 has an excellent lens.' (2)  [fav] excellent h ABC123, lens i (2a) The sentiment unit (2a) stands for the sentiment is 'favorable', the predicate is 'excellent' and its arguments are 'ABC123' and 'lens'. In this case, both 'ABC123' and 'lens' are counted as words which are associated with a favorable sentiment. Arguments are used as the keywords in the outliner, as in the leftmost column in Figure 1. Predicates with no argument are ignored, because they have no effects on the view and often become noise.</Paragraph>
    <Paragraph position="6">  The predicate and its arguments can be different from the surface form in the original text. Semantically similar representations should be aggregated to organize extracted sentiments, so the examples in this paper use English canonical forms to represent predicates and arguments, while the actual implementation uses Japanese expressions.</Paragraph>
    <Paragraph position="7"> Predicates may have features, such as negation, facility, difficulty, etc. For example, &amp;quot;ABC123 doesn't have an excellent lens.&amp;quot; brings a sentiment unit &amp;quot;[unf] excellent+neg h ABC123, lens i&amp;quot;. Also the facility/difficulty feature affects the sentiments such as &amp;quot;[unf] break+facil&amp;quot; for 'easy to break' and &amp;quot;[unf] learn+diff&amp;quot; 'difficult to learn'. The surface string is the corresponding part in the original text. It is used for reference in the view of the output of SA, because the surface string is the most understandable notation of each sentiment unit for humans.</Paragraph>
    <Paragraph position="8"> We use the term sentiment polarity for the selection of the two sentiments [fav] and [unf]. The other two sentiments, [qst] and [req] are important in applications, e.g. the automatic creation of FAQ. Roughly speaking, [qst] is extracted from an interrogative sentence, and [req] is used for imperative sentences or expressions such as &amp;quot;I want ...&amp;quot; and &amp;quot;I'd like you to ...&amp;quot;. From a pragmatic point of view it is difficult to distinguish between them4, but we classify them using simple rules.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Implementation
</SectionTitle>
    <Paragraph position="0"> This section describes operations and resources designed for the extraction of sentiment units. There are many techniques analogous to those for machine translation, so first we show the architecture of the transfer-based machine translation engine which is used as the basis of the extraction of sentiment units.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Transfer-based Machine Translation
Engine
</SectionTitle>
      <Paragraph position="0"> As illustrated on the left side of Figure 2, the transfer-based machine translation system consists of three parts: a source language syntactic parser, a bilingual transfer which handles the syntactic tree structures, and a target language generator. Here the flow of the Japanese to English translation is shown with the following example sentence (3).</Paragraph>
      <Paragraph position="1"> 4For example, the interrogative sentence &amp;quot;Would you read it?&amp;quot; implies a request.</Paragraph>
      <Paragraph position="2"> kare hon ki</Paragraph>
      <Paragraph position="4"> First the syntactic parser parses the sentence (3) to create the tree structure as shown in Figure 4.</Paragraph>
      <Paragraph position="5"> Next, the transfer converts this Japanese parse tree into an English one by applying the translation patterns as in Figure 5. A translation pattern consists of a tree of the source language, a tree of the target language, and the word correspondences between both languages.</Paragraph>
      <Paragraph position="6"> The patterns (a) and (b) in Figure 5 match with the subtrees in Figure 4, as Figure 6 illustrates.</Paragraph>
      <Paragraph position="7"> Thismatchingoperationisverycomplicatedbecause there can be an enormous number of possible combinations of patterns. The fitness of the pattern combinations is calculated according to the similarity of the source tree and the left side of the translation pattern, the specificity of the translation pattern, and so on. This example also shows the process of matching the Japanese case markers (postpositional particles). The source tree and the pattern (a) match even though the postpositional particles are different ('ha' and 'ga'). This process may be much more complicated when a verb is transformed into special forms e.g. passive or causative. Besides this there are many operations to handle syntactic and semantic phenomena, but here we take them for granted because of space constraints.</Paragraph>
      <Paragraph position="8"> Now the target fragments have been created as in Figure 6, using the right side of the matched translation patterns as in Figure 5. The two fragments are attached at the shared node ' noun2 ', and lexicalized by using the bilingual lexicon. Finally the target sentence &amp;quot;He likes my book.&amp;quot; is generated by the target language generator.</Paragraph>
      <Paragraph position="9">  intotheEnglishtree. ThepatternsinFigure5create two English fragments, and they are attached at the nodes ' noun2 ' which share the same correspondent node in the source language tree.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Techniques Required for Sentiment
Analysis
</SectionTitle>
      <Paragraph position="0"> Our aim is to extract sentiment units with high precision. Moreover, the set of arguments of each predicate should be selected necessarily and sufficiently.</Paragraph>
      <Paragraph position="1"> Here we show that the techniques to meet these requirements are analogous to the techniques for machine translation which have been reviewed in Section 4.1.</Paragraph>
      <Paragraph position="2">  matching Full syntactic parsing plays an important role to extract sentiments correctly, because the local structures obtained by a shallow parser are not always reliable. For example, expressions such as &amp;quot;I don't think X is good&amp;quot;, &amp;quot;I hope that X is good&amp;quot; are not favorable opinions about X, even though &amp;quot;X is good&amp;quot; appears on the surface. Therefore we use top-down pattern matching on the tree structures from the full parsing in order to find each sentiment fragment, that is potentially a part of a sentiment unit.</Paragraph>
      <Paragraph position="3"> In our method, initially the top node is examined to see whether or not the node and its combination of children nodes match with one of the patterns in the pattern repository. In this top-down manner, the nodes &amp;quot;don't think&amp;quot; and &amp;quot;hope&amp;quot; in the above examples are examined before &amp;quot;X is good&amp;quot;, and thus the above expressions won't be misunderstood to express favorable sentiments.</Paragraph>
      <Paragraph position="4"> There are three types of patterns: principal patterns, auxiliary patterns, and nominal patterns. Figure 7 illustrates examples of principal patterns: the  ' declinable ' denotes a verb or an adjective in Japanese. Note that the two unit s on the right side of (f) are not connected. This means two separated sentiment units can be obtained.</Paragraph>
      <Paragraph position="5"> pattern (c) converts a Japanese expression &amp;quot; noun ga warui&amp;quot; to a sentiment unit &amp;quot;[unf] bad h noun i&amp;quot;. The pattern (d) converts an expression &amp;quot; noun -wo ki-ni iru&amp;quot; to a sentiment unit &amp;quot;[fav] like h noun i&amp;quot;, where the subject (the noun preceding the postpositional ga) is excluded from the arguments because the subject of 'like' is usually the author, who is not the target of sentiment analysis.</Paragraph>
      <Paragraph position="6"> Another type is the auxiliary pattern, which expands the scope of matching. Figure 8 has two examples. The pattern (e) matches with phrases such as &amp;quot;X-wa yoi-to omowa-nai. ((I) don't think X is good.)&amp;quot; and produces a sentiment unit with the negation feature. When this pattern is attached to a principal pattern, its favorability is inverted. The pattern (f) allows us to obtain two separate sentiment units from sentences such as &amp;quot;Dezain-ga waruimonono, sousasei-ha yoi. (The design is bad, but the usability is good.)&amp;quot;.</Paragraph>
      <Paragraph position="7">  The third type of pattern is a nominal pattern. Figure 9 shows three examples. The pattern (g) is used to avoid a formal noun (nominalizer) being an argument. Using this pattern, from the sentence &amp;quot;Kawaii no-ga suki-da. ((I) like pretty things)&amp;quot;, &amp;quot;[fav] like h pretty i&amp;quot; can be extracted instead of &amp;quot;[fav] like h thing i&amp;quot;. The pattern (h) is used to convert a noun phrase &amp;quot;renzu-no shitsu (quality of the lens)&amp;quot; into just &amp;quot;lens&amp;quot;. Due to this operation, from Sentence (4), an informative sentiment unit (4a) can be obtained instead of a less informative one (4b).</Paragraph>
      <Paragraph position="8">  The pattern (i) is for compound nouns such as &amp;quot;juuden jikan (recharging time)&amp;quot;. A sentiment unit &amp;quot;long h time i&amp;quot; is not informative, but &amp;quot;long h recharging time i&amp;quot; can be regarded as a [unf] sentiment. null</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2.3 Disambiguation of sentiment polarity
</SectionTitle>
      <Paragraph position="0"> Some adjectives and verbs may be used for both favorable and unfavorable predicates. This variation of sentiment polarity can be disambiguated naturally in the same manner as the word sense disambiguation in machine translation. The adjective 'takai (high)' is a typical example, as in (5a) and (5b). In this case the sentiment polarity depends on the noun preceding the postpositional particle 'ga': favorable if the noun is 'kaizoudo (resolution)', unfavorable if the noun is a product name. The semantic category assigned to a noun holds the information used for this type of disambiguation.</Paragraph>
      <Paragraph position="1"> Kaizoudo-ga takai.</Paragraph>
      <Paragraph position="2">  In contrast to disambiguation, aggregation of synonymous expressions is important to organize extracted sentiment units. If the different expressions which convey the same (or similar) meanings are aggregated into a canonical one, the frequency increases and one can easily find frequently mentioned opinions.</Paragraph>
      <Paragraph position="3"> Using the translation architecture, any forms can be chosen as the predicates and arguments by adjusting the patterns and lexicons. That is, monolingual word translation is done in our method.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Resources for Sentiment Analysis
</SectionTitle>
      <Paragraph position="0"> We prepared the following resources for sentiment analysis: Principal patterns: The verbal and adjectival patterns for machine translation were converted to principal patterns for sentiment analysis. The left sides of the patterns are compatible with the source language parts of the original patterns, so we just assigned a sentiment polarity to each word. A total of 3752 principal patterns were created.</Paragraph>
      <Paragraph position="1"> Auxiliary/Nominal patterns: A total of 95 auxiliary patterns and 36 nominal patterns were created manually.</Paragraph>
      <Paragraph position="2"> Polarity lexicon: Some nouns were assigned sentiment polarity, e.g. [unf] for 'noise'. This polarity is used in expressions such as &amp;quot;... ga ooi. (There are many ...)&amp;quot;. This lexicon is also used for the aggregation of words.</Paragraph>
      <Paragraph position="3"> Some patterns and lexicons are domaindependent. The situation is the same as in machine translation. Fortunately the translation engine used here has a function to selectively use domain-dependent dictionaries, and thus we can prepare patterns which are especially suited for the messages on bulletin boards, or for the domain of digital cameras. For example, &amp;quot;The size is small.&amp;quot; is a desirable feature of a digital camera. We can assign the appropriate sentiment (in this case, [fav]) by using a domain-specific principal pattern.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML