File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/h05-1063_concl.xml
Size: 2,063 bytes
Last Modified: 2025-10-06 13:54:32
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1063"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 499-506, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Mining Context Specific Similarity Relationships Using The World Wide Web</Title> <Section position="7" start_page="504" end_page="505" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> In this paper, we proposed and empirically studied an approach to improve similarity computation between text documents by creating a context specific Web corpus and performing similarity mining within it.</Paragraph> <Paragraph position="1"> The results demonstrated that the similarity errors can be reduced by additional 50% after all the standard procedures such as stemming, term weighting, and vector normalization. We also established the crucial importance of the following three factors, which we believe make our technique distinct from those already explored early and explain more encouraging results that we obtained: 1) Using external corpus. 2) Taking the context of the target collection into consideration. 3) Using the appropriate mining formula. Another important distinction and possible explanation of a more dramatic effect is our focus on similarity computation between text documents, rather than on document retrieval tasks, which have been more extensively studied in the past. Similarity computation is a more general procedure, which in turns defines the quality of virtually all other specific tasks such as document retrieval, summarization, clustering, categorization, topic detection, query by example, etc. Our future plans are to overcome some of the limitations in this study, specifically using more than a single (although standard and very diverse) collection and study other experimental setups, such as document retrieval, text categorization, or topic detection and tracking.</Paragraph> </Section> class="xml-element"></Paper>