File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1046_concl.xml

Size: 1,363 bytes

Last Modified: 2025-10-06 13:55:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1046">
  <Title>Scaling Distributional Similarity to Large Corpora</Title>
  <Section position="11" start_page="367" end_page="367" type="concl">
    <SectionTitle>
9 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have evaluated several state-of-the-art techniques for improving the efficiency of distributional similarity measurements. We found that, in terms of raw efficiency, Random Indexing (RI) was significantly faster than any other technique, but at the cost of accuracy. Even after our modifications to the RI algorithm to significantly improve its accuracy, SASH still provides a better accuracy/efficiency trade-off. This is more evident when considering the time to extract context information from the raw text. SASH, unlike RI, also allows us to choose both the weight and the measure used. LSH and PLEB could not match either the efficiency of RI or the accuracy of SASH.</Paragraph>
    <Paragraph position="1"> We intend to use this knowledge to process even larger corpora to produce more accurate results.</Paragraph>
    <Paragraph position="2"> Having set out to improve the efficiency of distributional similarity searches while limiting any loss in accuracy, we are producing full nearest-neighbour searches 18 times faster, with only a 2% loss in accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML