File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-3244_abstr.xml
Size: 1,270 bytes
Last Modified: 2025-10-06 13:44:13
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3244"> <Title>Learning Nonstructural Distance Metric by Minimum Cluster Distortions</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Much natural language processing still depends on the Euclidean (cosine) distance function between two feature vectors, but this has severe problems with regard to feature weightings and feature correlations. To answer these problems, we propose an optimal metric distance that can be used as an alternative to the cosine distance, thus accommodating the two problems at the same time. This metric is optimal in the sense of global quadratic minimization, and can be obtained from the clusters in the training data in a supervised fashion.</Paragraph> <Paragraph position="1"> We confirmed the effect of the proposed metric distance by a synonymous sentence retrieval task, document retrieval task and the K-means clustering of general vectorial data. The results showed constant improvement over the baseline method of Euclid and tf.idf, and were especially prominent for the sentence retrieval task, showing a 33% increase in the 11-point average precision.</Paragraph> </Section> class="xml-element"></Paper>