File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-0608_concl.xml
Size: 1,789 bytes
Last Modified: 2025-10-06 13:54:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0608"> <Title>Domain Kernels for Text Categorization</Title> <Section position="9" start_page="61" end_page="61" type="concl"> <SectionTitle> 6 Conclusion and Future Works </SectionTitle> <Paragraph position="0"> In this paper a novel technique to perform semi-supervised learning for TC has been proposed and evaluated. We de ned a Domain Kernel that allows us to improve the similarity estimation among documents by exploiting Domain Models. Domain Models are acquired from large collections of non annotated texts in a totally unsupervised way.</Paragraph> <Paragraph position="1"> An extensive evaluation on two standard benchmarks shows that the Domain Kernel allows us to reduce drastically the amount of training data required for learning. In particular the recall increases sensibly, while preserving a very good accuracy. We explained this phenomenon by showing that the similarity scores evaluated by the Domain Kernel takes into account both variability and ambiguity, being able to estimate similarity even among texts that do not have any word in common.</Paragraph> <Paragraph position="2"> As future work, we plan to apply our semi-supervised learning method to some concrete applicative scenarios, such as user modeling and categorization of personal documents in mail clients.</Paragraph> <Paragraph position="3"> In addition, we are going deeper in the direction of semi-supervised learning, by acquiring more complex structures than clusters (e.g. synonymy, hyperonymy) to represent domain models. Furthermore, we are working to adapt the general framework provided by the Domain Models to a multilingual scenario, in order to apply the Domain Kernel to a Cross Language TC task.</Paragraph> </Section> class="xml-element"></Paper>