File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0613_intro.xml

Size: 3,535 bytes

Last Modified: 2025-10-06 14:07:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0613">
  <Title>Unsupervised Models for Named Entity Classification</Title>
  <Section position="3" start_page="100" end_page="100" type="intro">
    <SectionTitle>
I
</SectionTitle>
    <Paragraph position="0"> Mitchell 98). (Yarowsky 95) describes an algorithm for word-sense disambiguation that exploits redundancy in contextual features, and gives impressive performance. Unfortunately, Yarowsky's method is not well understood from a theoretical viewpoint: we would like to formalize the notion of redundancy in unlabeled data, and set up the learning task as optimization of some appropriate objective function. (Blum and Mitchell 98) offer a promising formulation of redundancy, also prove some results about how the use of unlabeled examples can help classification, and suggest an objective function when training :with unlabeled examples. Our first algorithm is similar to Yarowsky's, but with some important modifications motivated by (Blum and Mitchell 98). The algorithm can be viewed as heuristically optimizing an objective function suggested by (Blum and Mitchell 98); empirically it is shown to be quite successful in optimizing this criteflon. null The second algorithm builds on a boosting algorithm called AdaBoost (Freund and Schapire 97; Schapire and Singer 98). The AdaBoost algorithm was developed for supervised learning. AdaBoost finds a weighted combination of simple (weak) classifiers, where the w'eights are chosen to minimize a function that bounds the classification error on a set of training examples. Roughly speaking, the new algorithm presented in this paper performs a similar search, but instead minimizes a bound on the number of (unlabeled) examples on which two classifiers disagree. The algorithm builds two classifiers iteratively: each iteration involves minimization of a continuously differential function which bounds the number of examples on which the two classifiers disagree.</Paragraph>
    <Section position="1" start_page="100" end_page="100" type="sub_section">
      <SectionTitle>
1.1 Additional Related Work
</SectionTitle>
      <Paragraph position="0"> There has been additional recent work on inducing lexicons or other knowledge sources from large corpora. (Brin 98)idescribes a system for extracting (author, book-tiile) pairs from the World Wide Web using an approach that bootstraps from an initial seed set of examples. (Berland and Charniak 99) describe a method for extracting parts of objects from wholes (e.g., &amp;quot;speedometer&amp;quot; from &amp;quot;car&amp;quot;) from a large corpus using hand-crafted patterns.</Paragraph>
      <Paragraph position="1"> (Hearst 92) describes a method for extracting hyponyms from a corpus (pairs of words in &amp;quot;isa&amp;quot; relations). (Riloff and Shepherd 97) describe a bootstrapping approach ifor acquiring nouns in particular categories (such as &amp;quot;vehicle&amp;quot; or &amp;quot;weapon&amp;quot; categofies). The approach builds from an initial seed set for a category, and is quite similar to the decision list approach described in (Yarowsky 95). More recently, (Riloff and Jones 99) describe a method they term &amp;quot;mutual bootstrapping&amp;quot; for simultaneously constructing a lexicon and contextual extraction patterns. The method shares some characteristics of the decision list algorithm presented in this paper. (Riloff and Jones 99) was brought to our attention as we were preparing the final version of this paper.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML