XML Viewer - w97-0323

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0323_intro.xml
Size: 3,965 bytes
Last Modified: 2025-10-06 14:06:21
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0323">
  <Title>Exemplar-Based Word Sense Disambiguation: Some Recent Improvements</Title>
  <Section position="3" start_page="208" end_page="208" type="intro">
    <SectionTitle>
2 Learning Algorithms
2.1 PEBLS
</SectionTitle>
    <Paragraph position="0"> The heart of exemplar-based learning is a measure of the similarity, or distance, between two examples.</Paragraph>
    <Paragraph position="1"> If the distance between two examples is small, then the two examples are similar. In PEBLS (Cost and Salzberg, 1993), the distance between two symbolic values vl and v2 of a feature f is defined as:</Paragraph>
    <Paragraph position="3"> where n is the total number of classes. P(Ci\]vl) h N 1 is estimated by N1 ' W ere ~,~ &amp;quot;s the number of training examples with value vl for feature f that is classified as class i in the training corpus, and N1 is the number of training examples with value Vl for feature f in any class. P(Ci\]v2) is estimated similarly. This distance metric of PEBLS is adapted from the value difference metric of the earlier work of (Stanfill and Waltz, 1986). The distance between two examples is the sum of the distances between the values of all the features of the two examples.</Paragraph>
    <Paragraph position="4"> Let k be the number of nearest neighbors to use for determining the class of a test example, k &gt;_ 1.</Paragraph>
    <Paragraph position="5"> During testing, a test example is compared against all the training examples. PEBLS then determines the k training examples with the shortest distance to the test example. Among these k closest matching training examples, the class which the majority of these k examples belong to will be assigned as the class of the test example, with tie among multiple majority classes broken randomly.</Paragraph>
    <Paragraph position="6"> Note that the nearest neighbor algorithm tested in (Mooney, 1996) uses Hamming distance as the distance metric between two symbolic feature values.</Paragraph>
    <Paragraph position="7"> This is different from the above distance metric used in PEBLS.</Paragraph>
    <Section position="1" start_page="208" end_page="208" type="sub_section">
      <SectionTitle>
2.2 Naive-Bayes
</SectionTitle>
      <Paragraph position="0"> Our presentation of the Naive-Bayes algorithm (Duda and Hart, 1973) follows that of (Clark and Niblett, 1989). This algorithm is based on Bayes' theorem:</Paragraph>
      <Paragraph position="2"> ample is of class Ci given feature values vj. (Avj denotes the conjunction of all feature values in the test example.) The goal of a Naive-Bayes classifier is to determine the class Ci with the highest conditional probability P(Ci\] A vj). Since the denominator P(Avj) of the above expression is constant for all classes Ci, the problem reduces to finding the class Ci with the maximum value for the numerator.</Paragraph>
      <Paragraph position="3"> The Naive-Bayes classifier assumes independence of example features, so that</Paragraph>
      <Paragraph position="5"> During training, Naive-Bayes constructs the matrix P(vjICi), and P(Ci) is estimated from the distribution of training examples among the classes. To avoid one zero count of P(vj \[Ci) nullifying the effect of the other non-zero conditional probabilities in the multiplication, we replace zero counts of P (vj\]Ci) by P(Ci)/N, where N is the total number of training examples. Other more complex smoothing procedures (such as those used in (Gale et al., 1992a)) are also possible, although we have not experimented with these other variations.</Paragraph>
      <Paragraph position="6"> For the experimental results reported in this paper, we used the implementation of Naive-Bayes algorithm in the PEBLS program (Rachlin and Salzberg, 1993), which has an option for training and testing using the Naive-Bayes algorithm. We only changed the handling of zero probability counts to the method just described.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML