File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2406_concl.xml
Size: 1,443 bytes
Last Modified: 2025-10-06 13:54:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2406"> <Title>Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces</Title> <Section position="13" start_page="0" end_page="0" type="concl"> <SectionTitle> 11 Conclusions </SectionTitle> <Paragraph position="0"> We present an extensive comparative analysis of word sense discrimination techniques using first order and second order context vectors, where both can be employed in similarity and vector space. We conclude that for larger amounts of homogeneous data such as the Line, Hard and Serve data, the first order context vector representation and the UPGMA clustering algorithm are the most effective at word sense discrimination. We believe this is the case because in a large sample of data, it is very likely that the features that occur in the training data will also occur in the test data, making it possible to represent test instances with fairly rich feature sets. When given smaller amounts of data like SENSEVAL-2, second order context vectors and a hybrid clustering method like Repeated Bisections perform better. This occurs because in small and sparse data, direct first order features are seldom observed in both the training and the test data. However, the indirect second order co-occurrence relationships that are captured by these methods provide sufficient information for discrimination to proceed.</Paragraph> </Section> class="xml-element"></Paper>