File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1116_intro.xml

Size: 2,182 bytes

Last Modified: 2025-10-06 14:01:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1116">
  <Title>A Maximum Entropy Approach to HowNet-Based Chinese Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> A word usually has more than one meaning or sense, which are listed in the dictionary. The task of Word Sense Disambiguation (WSD) is to make the choice between the senses for a particular usage of the word in context. There are, however, several difficulties to WSD (Yang et al, 2000): (i) The evaluation of word sense disambiguation system is not yet standardized. (ii) The potential for WSD varies by task. (iii) Sense-tagged corpora are crucial resources for WSD but they are difficult to obtain. Efforts in building large Chinese corpora started in the 90s, for example, the Sinica corpus (CKIP, 1995) and the Chinese Penn Tree Bank (Xia et al., 2000). However, these two corpora concentrate on the tagging of parts-of-speech and syntactic structures, while little work has been done on semantic annotation.</Paragraph>
    <Paragraph position="1"> Of the few efforts that were carried out, Lua1 annotated 340,000 words with semantic classes defined in a thesaurus (Mei, 1983). This resource, however, was not publicly accessible. With the  Chinese corpus of 30,000 words with the senses from HowNet. The corpus is a subset of the Sinica balanced corpus, and consists of 103 narratives on news stories, in which the words have already been segmented and tagged with parts-of-speech.</Paragraph>
    <Paragraph position="2"> Gan and Tham (1999) added sense tagging and subsequently Gan and Wong (2000) annotated the corpus with semantic dependency relations as defined in HowNet. The corpus was released to the public in January 2002 2 , providing essential resources for Chinese word sense disambiguation.</Paragraph>
    <Paragraph position="3"> This paper is organized as follows: Section 2 gives an introduction of HowNet. Section 3 describes the WSD task and the experiment results. Section 4 describes the previous work, followed by a conclusion in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML