File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1146_concl.xml

Size: 2,537 bytes

Last Modified: 2025-10-06 13:53:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1146">
  <Title>Characterising Measures of Lexical Distributional Similarity</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Conclusions and further work
</SectionTitle>
    <Paragraph position="0"> We have presented an analysis of a set of distributional similarity measures. We have seen that there is a large amount of variation in the neighbours selected by di erent measures and therefore the choice of measure in a given application is likely to be important.</Paragraph>
    <Paragraph position="1"> We also identi ed one of the major axes of variation in neighbour sets as being the frequency of the neighbours selected relative to the frequency of the target word. There are three major classes of distributional similarity measures which can be characterised as 1) higher frequency selecting or high recall measures; 2) lower frequency selecting or high precision measures; and 3) similar frequency selecting or high precision and recall measures.</Paragraph>
    <Paragraph position="2"> A word tends to have high recall similarity with its hyponyms and high precision similarity with its hypernyms. Further, in the majority of cases, it tends to be more frequent than its hyponyms and less frequent than its hypernyms.</Paragraph>
    <Paragraph position="3"> Thus, there would seem to a three way correlation between word frequency, distributional generality and semantic generality.</Paragraph>
    <Paragraph position="4"> We have considered the impact of these observations on a technique which uses a distributional similarity measure to determine compositionality of collocations. We saw that in this application we achieve signi cantly better results using a measure that tends to select higher frequency words as neighbours rather than a measure that tends to select neighbours of a similar frequency to the target word.</Paragraph>
    <Paragraph position="5"> There are a variety of ways in which this work might be extended. First, we could use the observations about distributional generality and relative frequency to aid the process of organising distributionally similar words into hierarchies. Second, we could consider the impact of frequency characteristics in other applications.</Paragraph>
    <Paragraph position="6"> Third, for the general application of distributional similarity measures, it would be useful to nd other characteristics by which distributional similarity measures might be classi ed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML