File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1036_intro.xml
Size: 3,759 bytes
Last Modified: 2025-10-06 14:02:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1036"> <Title>Feature Vector Quality and Distributional Similarity</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Background and Definitions </SectionTitle> <Paragraph position="0"> In the distributional similarity scheme each word w is represented by a feature vector, where an entry in the vector corresponds to a feature f. Each feature represents another word (or term) with which w co-occurs, and possibly specifies also the syntactic relation between the two words. The value of each entry is determined by some weight function weight(w,f), which quantifies the degree of statistical association between the feature and the corresponding word.</Paragraph> <Paragraph position="1"> Typical feature weighting functions include the logarithm of the frequency of word-feature co-occurrence (Ruge, 1992), and the conditional probability of the feature given the word (within probabilistic-based measures) (Pereira et al., 1993), (Lee, 1997), (Dagan et al., 1999). Probably the most widely used association weight function is (point-wise) Mutual Information (MI) (Church et al., 1990), (Hindle, 1990), (Lin, 1998), (Dagan, 2000), defined by:</Paragraph> <Paragraph position="3"> A known weakness of MI is its tendency to assign high weights for rare features. Yet, similarity measures that utilize MI showed good performance. In particular, a common practice is to filter out features by minimal frequency and weight thresholds. A word's vector is then constructed from the remaining features, which we call here active features.</Paragraph> <Paragraph position="4"> Once feature vectors have been constructed, the similarity between two words is defined by some vector similarity metric. Different metrics have been used in the above cited papers, such as Weighted Jaccard (Dagan, 2000), cosine (Ruge, 1992), various information theoretic measures (Lee, 1997), and others. We picked the widely cited and competitive (e.g. (Weeds and Weir, 2003)) measure of Lin (1998) as a representative case, and utilized it for our analysis and as a starting point for improvement.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Lin's ('98) Similarity Measure </SectionTitle> <Paragraph position="0"> Lin's similarity measure between two words, w and v, is defined as follows: where F(w) and F(v) are the active features of the two words and the weight function is defined as MI. A feature is defined as a pair <term, syntac- null word pairs country-state and country-economy, along with their corresponding ranks in the sorted feature lists of the two words.</Paragraph> <Paragraph position="1"> country.</Paragraph> <Paragraph position="2"> tic_relation>. For example, given the word &quot;company&quot; the feature <earnings_report, geng969> (genitive) corresponds to the phrase &quot;company's earnings report&quot;, and <profit, pcompg969> (prepositional complement) corresponds to &quot;the profit of the company&quot;. The syntactic relations are generated by the Minipar dependency parser (Lin, 1993). The arrows indicate the direction of the syntactic dependency: a downward arrow indicates that the feature is the parent of the target word, and the upward arrow stands for the opposite.</Paragraph> <Paragraph position="3"> In our implementation we filtered out features with overall frequency lower than 10 in the corpus and with MI weights lower than 4. (In the tuning experiments the filtered version showed 10% improvement in precision over no feature filtering.) From now on we refer to this implementation as Lin98.</Paragraph> </Section> </Section> class="xml-element"></Paper>