File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0826_intro.xml

Size: 1,141 bytes

Last Modified: 2025-10-06 14:02:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0826">
  <Title>UBBNBC WSD System Description</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> According to the literature, the NBC algorithm is very efficient, in many cases it outperforms more sophisticated methods (Pedersen 1998). Therefore, this is the approach we used in our research. The word sense disambiguating process has three major steps, therefore, the application has three main components as follows: Stemming - removal of suffixes, and the filtering out of the irrelevant information from the corpora. A simple dictionary based approach. null Learning - the training of the classifier, based on the sense tagged corpora. A database containing the number of co-occurrences is built.</Paragraph>
    <Paragraph position="1"> Disambiguating -on the basis of the database, the correct sense of a word in a given context is estimated.</Paragraph>
    <Paragraph position="2"> In the followings the previously mentioned three steps are described in detail.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML