File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2805_concl.xml

Size: 1,968 bytes

Last Modified: 2025-10-06 13:55:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2805">
  <Title>Learning to Recognize Blogs: A Preliminary Exploration</Title>
  <Section position="7" start_page="28" end_page="30" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> Our experiments have shown that binary blog classification can be performed successfully if the right attributes are chosen to describe the data, even if the classifier is forced to rely on a small number of training instances. Almost all basic off-the-shelf machine learning algorithms perform well given this task, but support vector based algorithms performed best in this experiment. Notable was that the best algorithms of each type achieved almost the same accuracy, all over 90% and the difference is never larger than a few percent even though they approach the problem in completely different manners.</Paragraph>
    <Paragraph position="1"> The performance of these algorithms can be improved by using resampling methods, but not all resampling methods achieve gains and those that do gain very little. The extremely high success rates of the plain algorithms means that there is very little room for improvement, especially as the classification errors are almost always caused by outliers that none of the algorithms manage to classify correctly.</Paragraph>
    <Paragraph position="2"> The results of later experiments with larger numbers of manually annotated instances show that a lot of work remains to be done and that although this paper shows that the application of machine learning to this problem offers substantial improvements over our baseline, this problem is still far from solved.</Paragraph>
    <Paragraph position="3"> Future work will include further analysis of the results obtained using larger manually annotated subsets as well as a detailed analysis of the contributions of the different features in the feature set described in Section 3.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML