File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3244_intro.xml

Size: 2,118 bytes

Last Modified: 2025-10-06 14:02:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3244">
  <Title>Learning Nonstructural Distance Metric by Minimum Cluster Distortions</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Issues with Euclidean distances
</SectionTitle>
    <Paragraph position="0"> When we address nonstructural matching, linguistic expressions are often modeled by a feature vector ~x 2 Rn, with its elements x1 ::: xn corresponding to the number of occurrences of i'th feature. If features are simply words, this is called a 'bag of words'; but in general, features are not restricted to this kind, and we will use the general term &amp;quot;feature&amp;quot; in the rest of the paper.</Paragraph>
    <Paragraph position="1"> To measure the distance between two vectors ~u;~v, a dot product or Euclidean distance</Paragraph>
    <Paragraph position="3"> (where T denotes a transposition) has been employed so far 1, with a heuristic feature weighting such as tf.idf in a preprocessing stage.</Paragraph>
    <Paragraph position="4"> However, there are two main problems with this distance:  (1) The correlation between features is ignored. (2) Feature weighting is inevitably arbitrary.</Paragraph>
    <Paragraph position="6"> because linguistic features (e.g., words) generally have strong correlations between them, such as collocations or typical constructions. But this correlation cannot be considered in a simple dot product.</Paragraph>
    <Paragraph position="7"> While it is possible to address this with a specific kernel function, such as polynomials (M&amp;quot;uller et al., 2001), this is not available for many problems, such as information retrieval or question answering, that do not fit classifications or cannot be easily &amp;quot;kernelized&amp;quot;. Problem (2) is a more subtle but inherent one: while tf.idf often works properly in practice, there are several options, especially in tf such as logs or square roots, but we have no principle with which to choose from. Further, it has no theoretical basis that gives any optimality as a distance function.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML