File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-2604_concl.xml
Size: 2,139 bytes
Last Modified: 2025-10-06 13:55:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2604"> <Title>Basque Country ccpzejaa@si.ehu.es I~naki Alegria UPV-EHU Basque Country acpalloi@si.ehu.es Olatz Arregi UPV-EHU Basque Country acparuro@si.ehu.es</Title> <Section position="7" start_page="30" end_page="30" type="concl"> <SectionTitle> 6 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> Inthispaperwepresentanapproachformultilabel document categorization problems which consists in a multiclassifier system based on the k-NN algorithm. The documents are represented in a reduced dimensional space calculated by SVD. We want to emphasize that, due to the multilabel character of the database used, we have adapted the too. The learning of the system has been unique (9603 training documents) and the category label predictions made by the classifier have been evaluated on the testing set according to the three category sets: top-10, R(90) and R(115). The microaveraged F1 scores we obtain are among the best reported for the Reuters-21578.</Paragraph> <Paragraph position="1"> As future work, we want to experiment with generating more than 30 training databases, and in a preliminary phase select the best among them. The predictions made using the selected training databases will be combined to obtain the final predictions. null Whenthereisalownumberofdocumentsavailable for a given category, the power of LSI gets limited to create a space that reflects interesting properties of the data. As future work we want to include background text in the training collection and use an expanded term-document matrix that includes, besides the 9603 training documents, some other relevant texts. This may increaseresults,speciallyforthecategorieswithless null documents (Zelikovitz and Hirsh, 2001).</Paragraph> <Paragraph position="2"> In order to see the consistency of our classifier, we also plan to repeat the experiment for the RCV1 (Lewis et al., 2004), a new benchmark collection for text categorization tasks which consists of 800,000 manually categorized newswire stories recently made available by Reuters.</Paragraph> </Section> class="xml-element"></Paper>