File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/p03-1042_relat.xml

Size: 4,307 bytes

Last Modified: 2025-10-06 14:15:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1042">
  <Title>Uncertainty Reduction in Collaborative Bootstrapping: Measure and Algorithm</Title>
  <Section position="3" start_page="0" end_page="0" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Co-Training and Bilingual Bootstrapping
</SectionTitle>
      <Paragraph position="0"> Co-training, proposed by Blum and Mitchell (1998), conducts two bootstrapping processes in parallel, and makes them collaborate with each other. More specifically, it repeatedly trains two classifiers from the labelled data, labels some unlabelled data with the two classifiers, and exchanges the newly labelled data between the two classifiers. Blum and Mitchell assume that the two classifiers are based on two subsets of the entire feature set and the two subsets are conditionally independent with one another given a class. This assumption is called 'view independence'. In their algorithm of co-training, one classifier always asks the other classifier to label the most certain instances for the collaborator. The word sense disambiguation method proposed in Yarowsky (1995) can also be viewed as a kind of co-training.</Paragraph>
      <Paragraph position="1"> Since the assumption of view independence cannot always be met in practice, Collins and Singer (1998) proposed a co-training algorithm based on 'agreement' between the classifiers.</Paragraph>
      <Paragraph position="2"> As for theoretical analysis, Dasgupta et al.</Paragraph>
      <Paragraph position="3"> (2001) gave a bound on the generalization error of co-training within the framework of PAC learning.</Paragraph>
      <Paragraph position="4"> The generalization error is a function of 'disagreement' between the two classifiers. Dasgupta et al's result is based on the view independence assumption, which is strict in practice.</Paragraph>
      <Paragraph position="5"> Abney (2002) refined Dasgupta et al's result by relaxing the view independence assumption with a new constraint. He also proposed a new co-training algorithm on the basis of the constraint.</Paragraph>
      <Paragraph position="6"> Nigam and Ghani (2000) empirically demonstrated that bootstrapping with a random feature split (i.e. co-training), even violating the view independence assumption, can still work better than bootstrapping without a feature split (i.e., bootstrapping with a single classifier).</Paragraph>
      <Paragraph position="7"> For other work on co-training, see (Muslea et al 200; Pierce and Cardie 2001).</Paragraph>
      <Paragraph position="8"> Li and Li (2002) proposed an algorithm for word sense disambiguation in translation between two languages, which they called 'bilingual bootstrapping'. Instead of making an assumption on the features, bilingual bootstrapping makes an assumption on the classes. Specifically, it assumes that the classes of the classifiers in bootstrapping do not overlap. Thus, bilingual bootstrapping is different from co-training.</Paragraph>
      <Paragraph position="9"> Because the notion of agreement is not involved in bootstrapping in (Nigam &amp; Ghani 2000) and bilingual bootstrapping, Dasgupta et al and Abney's analyses cannot be directly used on them.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Active Learning
</SectionTitle>
      <Paragraph position="0"> Active leaning is a learning paradigm. Instead of passively using all the given labelled instances for training as in supervised learning, active learning repeatedly asks a supervisor to label what it considers as the most critical instances and performs training with the labelled instances. Thus, active learning can eventually create a reliable classifier with fewer labelled instances than supervised learning. One of the strategies to select critical instances is called 'uncertain reduction' (e.g., Lewis and Gale, 1994). Under the strategy, the most uncertain instances to the current classifier are selected and asked to be labelled by a supervisor.</Paragraph>
      <Paragraph position="1"> The notion of uncertainty reduction was not used for bootstrapping, to the best of our knowledge. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML