File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/w05-0802_concl.xml
Size: 1,354 bytes
Last Modified: 2025-10-06 13:54:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0802"> <Title>Cross language Text Categorization by acquiring Multilingual Domain Models from Comparable Corpora</Title> <Section position="10" start_page="15" end_page="15" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> In this paper we proposed a solution to cross language Text Categorization based on acquiring Multilingual Domain Models from comparable corpora in a totally unsupervised way and without using any external knowledge source (e.g. bilingual dictionaries). These Multilingual Domain Models are exploited to de ne a generalized similarity function (i.e. a kernel function) among documents in different languages, which is used inside a Support Vector Machines classi cation framework. The basis of the similarity function exploits the presence of common words to induce a second-order similarity for the other words in the lexicons. The results have shown that this technique is suf cient to capture relevant aspects of topic similarity in cross-language TC tasks, obtaining substantial improvements over a simple baseline. As future work we will investigate the performance of this approach to more than two languages TC task, and a possible generalization of the assumption about equality of the common words.</Paragraph> </Section> class="xml-element"></Paper>