File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-0908_concl.xml

Size: 1,575 bytes

Last Modified: 2025-10-06 13:53:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0908">
  <Title>Improvements in Automatic Thesaurus Extraction</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
8 Conclusion
</SectionTitle>
    <Paragraph position="0"> In these experiments we have proposed new measure and weight functions that, as our evaluation has shown, signi cantly outperform existing similarity  functions. The list of measure and weight functions we compared against is not complete, and we hope to add other functions to provide a general framework for thesaurus extraction experimentation. We would also like to expand our evaluation to include direct methods used by others (Lin, 1998a) and using the extracted thesaurus in NLP tasks.</Paragraph>
    <Paragraph position="1"> We have also investigated the speed/performance trade-off using frequency cutoffs. This has lead to the proposal of a new approximate comparison algorithm based on canonical attributes and a process of coarse- and ne-grained comparisons. This approximation algorithm is dramatically faster than simple pairwise comparison, with only a small performance penalty, which means that complete thesaurus extraction on large corpora is now feasible. Further, the canonical vector parameters allow for control of the speed/performance trade-off. These experiments show that large-scale thesaurus extraction is practical, and although results are not yet comparable with manually-constructed thesauri, may now be accurate enough to be useful for some NLP tasks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML