File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/02/c02-1074_ackno.xml
Size: 2,534 bytes
Last Modified: 2025-10-06 13:50:15
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1074"> <Title>Text Categorization using Feature Projections</Title> <Section position="3" start_page="3211" end_page="3211" type="ackno"> <SectionTitle> 4. Discussions </SectionTitle> <Paragraph position="0"> First of all, time complexities between k-NN and TCFP are compared. Using the inverted-file indexing of training documents, the time complexity of k-NN is O(m l/n) (Yang, 1994), where m is the number of unique words in the document, l is the number of training documents, and n is the number of unique terms in the training collection. TCFP has the time complexity of O(m ). Even more, the time complexity of TCFP without contextual information is O(mc),wherec is the number of categories. That is, the classification of TCFP requires a simple calculation in proportion to the number of unique terms in the test document. On the other hand, in k-NN,asearchinthe whole training space must be done for each test document.</Paragraph> <Paragraph position="1"> The other strong points of TCFP are the simplicity of algorithm and high-performance. Since the algorithm of TCFP is very simple like k-NN, TCFP can be implemented quite easily and its training phase can also be a simple process. In our experiments, we achieved the better performance than k-NN. We analyze that our algorithm is more robust from irrelevant features than k-NN. When a document contains irrelevant features, the angle of the document vector is changed in k-NN. In TCFP, however, the irrelevant features contribute to only voting of the features. Hence TCFP decreases the bad effect of the irrelevant features.</Paragraph> <Paragraph position="2"> Conclusions In this paper, a new type of text categorization, TCFP, has been presented. This algorithm has been compared with k-NN and other classifiers. Since each feature in TCFP individually contributes to the classification process, TCFP is robust from irrelevant features. By the simplicity of TCFP algorithm, its implementation and training process can be done very easily. The experimental results show that, on the performance, TCFP is superior to Rocchio, Naive Bayes, and k-NN. Moreover, it outperforms other classifiers for speeding classification such as k-NNFP and k-NN with reduction. In running time observation, TCFP is about one hundred times faster than k-NN.</Paragraph> <Paragraph position="3"> Therefore, we can use TCFP in the areas, which require a fast and high-performance text classifier.</Paragraph> </Section> class="xml-element"></Paper>