File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3809_evalu.xml
Size: 2,654 bytes
Last Modified: 2025-10-06 13:59:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3809"> <Title>Random-Walk Term Weighting for Improved Text Classification</Title> <Section position="6" start_page="57" end_page="58" type="evalu"> <SectionTitle> 5 Evaluation and Discussion </SectionTitle> <Paragraph position="0"> Tables 3, 4, 5, 6 show the classification results for WebKB4, WebKB6, LingSpam, Reuter, and 20Newsgroups respectively. The rw2, rw4, rw6, and rw8 represent the accuracies achieved using random-walk weighting under window sizes of 2, 4, 6, and 8 respectively. The tf column represents the results obtained with a term frequency weighting scheme.</Paragraph> <Paragraph position="1"> By examining the results we can see that the rw.idf model outperforms the tf.idf model on all the classifiers and datasets with only one exception in the case of a Na&quot;ive Bayes classifier under Reuter. The error reductions range from 3.5% as in {20Newsgroups, NaiveBayes, rw4} to 44% as in the case of {WebKB6, Rocchio, rw6}. The system gives, in its worst performance, a comparable result to the tf.idf baseline. The system shows a consistent performance with different window sizes, with no clear cut window size that would give the best result. By further analyzing the results using statistical paired t-tests we can see that windows of size 4 and 6 supply the most significant results across all the classifiers as well as the datasets.</Paragraph> <Paragraph position="2"> Comparing WebKB4 and WebKB6 fine-grained results, we found that both systems failed to predict the class Staff; however the significant improvea paired t-test, with p < 0.05. The result is marked by ++ when p < 0.001.</Paragraph> <Paragraph position="3"> By also examining the diversity of the classification systems based on rw and tf weighting, as shown in Table 7, 8, 9, 10, we can see an interesting property of the system. The two models are generally more diverse and less correlated when using windows of size 6 and 8 than using windows of size 2 and 4. This could be due to the increasing drift from the feature independence assumption that is implied by tf.idf. However increasing the dependency is not always desirable as seen in the reported accuracies. We expect that at a certain window size the system performance will degrade to tf.idf. This threshold window size will be equal to the document size. In such a case each term will depend on all the remaining terms resulting in an almost completely connected graph. Consequently, each feature contribution to the surrounding will be equal resulting in similar rw scores to all the features.</Paragraph> </Section> class="xml-element"></Paper>