File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-3012_intro.xml

Size: 4,244 bytes

Last Modified: 2025-10-06 14:02:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-3012">
  <Title>Integrating Collocation Features in Chinese Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="87" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> WSD tries to resolve lexical ambiguity which refers to the fact that a word may have multiple meanings such as the word &amp;quot;walk&amp;quot; in &amp;quot;Walk or Bike to school&amp;quot; and &amp;quot;BBC Education Walk Through Time&amp;quot;, or the Chinese word &amp;quot; &amp;quot; in &amp;quot; p &amp;quot;(&amp;quot;local government&amp;quot;) and &amp;quot;3 ,X &amp;quot;(&amp;quot;He is also partly right&amp;quot;). WSD tries to automatically assign an appropriate sense to an occurrence of a word in a given context.</Paragraph>
    <Paragraph position="1"> Various approaches have been proposed to deal with the word sense disambiguation problem including rule-based approaches, knowledge or dictionary based approaches, corpus-based approaches, and hybrid approaches. Among these approaches, the supervised corpus-based approach had been applied and discussed by many researches ([2-8]). According to [1], the corpus-based supervised machine learning methods are the most successful approaches to WSD where contextual features have been used mainly to distinguish ambiguous words in these methods.</Paragraph>
    <Paragraph position="2"> However, word occurrences in the context are too diverse to capture the right pattern, which means that the dimension of contextual words will be very large when all words in the training samples are used for WSD [14]. Certain uninformative features will weaken the discriminative power of a classifier resulting in a lower precision rate. To narrow down the context, we propose to use collocations as contextual information as defined in Section 3.1.2. It is generally understood that the sense of an ambiguous word is unique in a given collocation [19]. For example, &amp;quot;&gt; &amp;quot; means &amp;quot;burden&amp;quot; but not &amp;quot;baggage&amp;quot; when it appears in the collocation &amp;quot;&gt; &amp;quot; (&amp;quot; burden of thought&amp;quot;).</Paragraph>
    <Paragraph position="3"> In this paper, we apply a classifier to combine the local features of collocations which contain the target word with other contextual features to discriminate the ambiguous words. The intuition is that when the target context captures a collocation, the influence of other dimensions of  contextual words can be reduced or even ignored. For example, in the expression &amp;quot;$$ &amp;!Z x &amp;quot; (&amp;quot;terrorists burned down the gene laboratory&amp;quot;), the influence of contextual word &amp;quot; &amp;quot; (&amp;quot;gene&amp;quot;) should be reduced to work on the target word &amp;quot;$ &amp;quot; because &amp;quot;$$ &amp;quot; is a collocation whereas &amp;quot;$ &amp;quot; and &amp;quot; &amp;quot; are not collocations even though they do co-occur. Our intention is not to generally replace contextual information by collocation only. Rather, we would like to use collocation as an additional feature in WSD. We still make use of other contextual features because of the following reasons. Firstly, contextual information is proven to be effective for WSD in the previous research works. Secondly, collocations may be independent on the training corpus and a sentence in consideration may not contain any collocation.</Paragraph>
    <Paragraph position="4"> Thirdly, to fix the tie case such as $$  in the collocation $ &amp;quot;. The primary purpose of using collocation in WSD is to improve precision rate without any sacrifices in recall rate. We also want to investigate whether the use of collocation as an additional feature can reduce the size of hand tagged sense corpus. The rest of this paper is organized as follows. Section 2 summarizes the existing Word Sense Disambiguation techniques based on annotated corpora. Section 3 describes the classifier and the features in our proposed WSD approach.</Paragraph>
    <Paragraph position="5"> Section 4 describes the experiments and the analysis of our results. Section 5 is the conclusion. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML