File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-4002_concl.xml

Size: 1,595 bytes

Last Modified: 2025-10-06 13:54:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4002">
  <Title>MMR-based feature selection for text categorization</Title>
  <Section position="6" start_page="21" end_page="21" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we proposed a MMR-based feature selection method which strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization.</Paragraph>
    <Paragraph position="1"> We carried out extensive experiments to verify the proposed method. Based on the experiment results, we can verify that MMR-based feature selection is more effective than Koller &amp; Sahami's method, which is one kind of greedy methods, and conventional information gain which is commonly used in feature selection for text categorization. Besides, MMR-based feature selection method sometimes produces improvements of conventional machine learning algorithms over SVM which is known to give the best classification accuracy.</Paragraph>
    <Paragraph position="2"> A disadvantage in using MMR-based feature selection is that the computational cost of computing the pairwise information gain (i.e. IGpair) is quadratic time with respect to the number of features. To reduce this computational cost, we can use MMR-based feature selection method on the reduced feature set resulting from IG as our experiments in section 4. Another drawback of our method is the need to tune for l . It appears that a tuning method based on held-out data is needed here</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML