File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/w06-2805_relat.xml

Size: 2,202 bytes

Last Modified: 2025-10-06 14:15:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2805">
  <Title>Learning to Recognize Blogs: A Preliminary Exploration</Title>
  <Section position="3" start_page="24" end_page="24" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Blog classification is still very much in its infancy and to date no directly related work has been published as far as we are aware. There is, however, work related to several aspects of our experiments.</Paragraph>
    <Paragraph position="1"> Nanno et al. (2004) describe a system for gathering a large collection of weblogs, not only those published using one of the many well-known authoring tools but also the hand-written variety. A very much comparable system was developed and used for these experiments.</Paragraph>
    <Paragraph position="2"> Members of the BlogPulse team also describe blog crawling and corpus creation in some detail (Glance et al., 2004), but their system is aimed more at gathering updates and following active blogs rather than gathering as many blogs in their entirety, as our system is set up to do.</Paragraph>
    <Paragraph position="3"> As to the resampling methods used in this paper--bootstrapping and co-training--, Jones et al. (1999) describe the application of bootstrapping to text learning tasks and report very good results applying this method to these tasks. Even though text learning is a very different genre, their results provide hope that the application of this method may also prove useful for our blog classification problem.</Paragraph>
    <Paragraph position="4"> Blum and Mitchell (1998) describe the use of separate weak indicators to label unlabeled instances as &amp;quot;probably positive&amp;quot; to further train a learning algorithm and gathered results that suggested that their method has the potential for improving results on many practical learning problems. Indeed their example of web-page classification is in many ways very similar to our binary blog classification problem. In these experiments however we will use a different kind of indicators on the unlabeled data, namely the predictions of several different types of algorithms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML