File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1145_concl.xml

Size: 3,822 bytes

Last Modified: 2025-10-06 13:53:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1145">
  <Title>Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper presents an algorithm based on Turney's model (2003) for inferring SO of Chinese words from their association with strongly-polarized Chinese morphemes. The algorithm was run with 20 and 40 strongly-polarized Chinese words respectively in a corpus of 34 million words, giving a high precision of 79.96% and 81.05%, but a low recall of 45.56% and 59.57%. The algorithm was then run with 20 Chinese polarized morphemes, or single characters, in the same corpus, giving a high precision of 80.23% and an even high recall of 85.03%. The algorithm was further run with just 14, 10 and 6 morphemes, giving a precision of 79.15%, 79.89% and 75.65%, and a recall of 79.50%, 73.26% and 66.29% respectively.</Paragraph>
    <Paragraph position="1"> Thus, conveniently defined morphemes in Chinese enhance the effectiveness of the algorithm by simplifying processing and yielding better results even in a smaller corpus compared with what Turney (2003) used. Just 6 to 10 morphemes can give satisfactory results in a smaller corpus. The efficient application of Turney's algorithm with help of colossal corpus like hundred-billion-word corpus is matched by the ready availability of internet texts. However, the same convenience is not available to Chinese because of the heavy cost of word segmentation.</Paragraph>
    <Paragraph position="2"> The efficient application of Turney's algorithm with help of colossal corpus like hundred-billion-word corpus is matched by the ready availability of internet texts. However, the same convenience is not available to Chinese because of the heavy cost of word segmentation.</Paragraph>
    <Paragraph position="3"> In our experiment, all syntactic markers are ignored. Better results can be expected if syntactic markers are taken into consideration. An obvious example is negation (not, never) which can counteract the polarity of a word. In future, we will try to handle negation and other syntactic markers. The lists of the probability of morphemes forming polarized words in section 5.2 can be handled by the concept of decision list (Yarowsky, 2000) which has not been applied in this paper for simplification. In the future, decision lists can be employed to systematically include the loaded features of morphemes.</Paragraph>
    <Paragraph position="4"> The experiment can be conducted with different sets of paradigm morphemes, and on corpora of different sizes. With the LIVAC synchronous corpus (Tsou et al., 2000), it should be possible to compare the SO of some words in different communities like Beijing, Hong Kong and Taipei.</Paragraph>
    <Paragraph position="5"> The data would be valuable for cultural studies if the SO of some words fluctuates in different communities.</Paragraph>
    <Paragraph position="6"> SO from association can be also applied to the judgment of news articles like editorials on celebrities. Given a celebrity name or organization name, we can calculate, using SO-PMI, the strength of SO of the 'given word', i.e., the name. Then we would be able to tell whether the news about the target is positive or negative. For example, we tried to calculate the SO-PMI of the name 'George W Bush', the U.S. President, with thousands of polarized Chinese words in the corpus, it was found that the SO-PMI of 'Bush' was about -200 from January to February, 2003, and plunged to -500 from March to April, 2003, when U.S. launched an 'unauthorized war' against Iraq. Such useful applications will be further investigated in future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML