File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1642_intro.xml
Size: 5,267 bytes
Last Modified: 2025-10-06 14:03:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1642"> <Title>Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis</Title> <Section position="3" start_page="0" end_page="355" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Sentiment Analysis (SA) (Nasukawa and Yi, 2003; Yi et al., 2003) is a task to recognize writers' feelings as expressed in positive or negative comments, by analyzing unreadably large numbers of documents. Extensive syntactic patterns enable us to detect sentiment expressions and to convert them into semantic structures with high precision, as reported by Kanayama et al. (2004). From the example Japanese sentence (1) in the digital camera domain, the SA system extracts a sentiment representation as (2), which consists of a predicate and an argument with positive (+) polarity.</Paragraph> <Paragraph position="1"> (1) Kono kamera-ha subarashii-to omou.</Paragraph> <Paragraph position="2"> 'I think this camera is splendid.' (2) [+] splendid(camera) SA in general tends to focus on subjec null tivesentimentexpressions, whichexplicitlydescribe an author's preference as in the above example (1). Objective (or factual) expressions such as in the following examples (3) and (4) may be out of scope even though they describe desirable aspects in a specific domain. However, when customers or corporate users use SA system for their commercial activities, such domain-specific expressions have a more important role, since they convey strong or weak points of the product more directly, and may influence their choice to purchase a specific product, as an example.</Paragraph> <Paragraph position="3"> (3) Kontorasuto-ga kukkiri-suru.</Paragraph> <Paragraph position="4"> 'The contrast is sharp.' (4) Atarashii kishu-ha zuumu-mo tsuite-iru. 'The new model has a zoom lens, too.' This paper addresses the Japanese version of Domain-oriented Sentiment Analysis, which identifies polar clauses conveying goodness and badness in a specific domain, including rather objective expressions. Building domain-dependent lexicons for many domains is much harder work than preparing domain-independent lexicons and syntactic patterns, because the possible lexical entries are too numerous, and they may differ in each domain. To solve this problem, we have devised an unsupervised method to acquire domain-dependent lexical knowledge where a user has only to collect unannotated domain corpora.</Paragraph> <Paragraph position="5"> The knowledge to be acquired is a domain-dependent set of polar atoms. A polar atom is a minimum syntactic structure specifying polarity in a predicative expression. For example, to detect polar clauses in the sentences (3) and (4)1, the following polar atoms (5) and (6) should appear in the lexicon: (5) [+] kukkiri-suru 'to be sharp' (6) [+] tsuku - zuumu-ga 'to have - zoom lens-NOM' The polar atom (5) specified the positive polarity of the verb kukkiri-suru. This atom can be generally used for this verb regardless of its arguments. In the polar atom (6), on the other hand, the nominative case of the verb tsuku ('have') is limited to a specific noun zuumu ('zoom lens'), since the verb tsuku does not hold the polarity in itself. The automatic decision for the scopes of the atoms is one of the major issues.</Paragraph> <Paragraph position="6"> For lexical learning from unannotated corpora, our method uses context coherency in terms of polarity, an assumption that polar clauses with the same polarity appear successively unless the context is changed with adversative expressions. Exploiting this tendency, we can collect candidate polar atoms with their tentative polarities as those adjacent to the polar clauses which have been identified by their domain-independent polar atoms in the initial lexicon. We use both intra-sentential and inter-sentential contexts to obtain more candidate polar atoms.</Paragraph> <Paragraph position="7"> Our assumption is intuitively reasonable, but there are many non-polar (neutral) clauses adjacent to polar clauses. Errors in sentence delimitation or syntactic parsing also result in false candidate atoms. Thus, to adopt a candidate polar atom for the new lexicon, some threshold values for the frequencies or ratios are required, but they depend on the type of the corpus, the size of the initial lexicon, etc. Our algorithm is fully automatic in the sense that the criteria for the adoption of polar atoms are set automatically by statistical estimation based on the distributions of coherency: coherent precision and coherent density. No manual tuning process is required, so the algorithm only needs unannotated domain corpora and the initial lexicon. Thus our learning method can be used not only by the developers of the system, but also by endusers. This feature is very helpful for users to In the next section, we review related work, and Section 3 describes our runtime SA system. In Section 4, our assumption for unsupervised learning, context coherency and its key metrics, coherent precision and coherent density are discussed. Section 5 describes our unsupervised learning method. Experimental resultsareshowninSection6, andweconclude in Section 7.</Paragraph> </Section> class="xml-element"></Paper>