File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1145_intro.xml
Size: 3,224 bytes
Last Modified: 2025-10-06 14:02:12
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1145"> <Title>Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The semantic orientation (SO) of a word indicates the direction in which the word deviates from the norm for its semantic group or lexical field (Lehrer, 1974). Words that encode a desirable state (e.g., beautiful) have a positive SO, while words that represent undesirable states (e.g. absurd) have a negative SO (Hatzivassiloglou and Wiebe, 2000).</Paragraph> <Paragraph position="1"> Hatzivassiloglou and Mckeown (1997) used the words 'and', 'or', and 'but' as linguistic cues to extract adjective pairs. Turney (2003) assessed the SO of words using their occurrences near strongly-polarized words like 'excellent' and 'poor' with accuracy from 61% to 82%, subject to corpus size.</Paragraph> <Paragraph position="2"> Turney's algorithm requires a colossal corpus (hundred billion words) indexed by the AltaVista search engine in his experiment. Undoubtedly, internet texts have formed a very large and easilyaccessible corpus. However, Chinese texts in internet are not segmented so it is not cost-effective to use them.</Paragraph> <Paragraph position="3"> This paper presents a general strategy for inferring SO for Chinese words from their association with some strongly-polarized morphemes. The modified system of using morphemes was proved to be more effective than strongly-polarized words in a much smaller corpus.</Paragraph> <Paragraph position="4"> Related work and potential applications of SO are discussed in section 2.</Paragraph> <Paragraph position="5"> Section 3 illustrates one of the methods of Turney's model for inferring SO, namely, Pointwise Mutual Information (PMI), based on the hypothesis that the SO of a word tends to correspond to the SO of its neighbours.</Paragraph> <Paragraph position="6"> The experiment with polarized words is presented in section 4. The test set includes 1,249 words (604 positive and 645 negative). In a corpus of 34 million word tokens, 410k word types, the algorithm is run with 20 and 40 polarized words, giving a precision of 79.96% and 81.05%, and a recall of 45.56% and 59.57%, respectively.</Paragraph> <Paragraph position="7"> The system is further modified by using polarized morphemes in section 5. We first evaluate the distinction of Chinese morphemes to justify why the modification can probably give simpler and better results, and then introduce a more scientific selection of polarized morphemes.</Paragraph> <Paragraph position="8"> A high precision of 80.23% and a greatly increased recall of 85.03% are yielded.</Paragraph> <Paragraph position="9"> In section 6, the algorithm is run with 14, 10 and 6 morphemes, giving a precision of 79.15%, 79.89% and 75.65%, and a recall of 79.50%, 73.26% and 66.29% respectively. It shows that the algorithm can be also effectively run with 6 to 10 polarized morphemes in a smaller corpus.</Paragraph> <Paragraph position="10"> The conclusion and future work are discussed in section 7.</Paragraph> </Section> class="xml-element"></Paper>