File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0316_concl.xml

Size: 1,209 bytes

Last Modified: 2025-10-06 13:57:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0316">
  <Title>Lexicon Effects on Chinese Information Retrieval</Title>
  <Section position="8" start_page="145" end_page="146" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> For the TREC-5 Chinese collection of documents and queries, it is found that a small 2175-lexicon coupled with some simple linguistic rules is sufficient to provide indexing features for good retrieval results.</Paragraph>
    <Paragraph position="1"> Larger lexicons can give incremental improvements.</Paragraph>
    <Paragraph position="2"> Lexicon or rule-based stopword removal have negligible effect on retrieval with long queries. For short queries with a large lexicon, stopword elimination can lead to some improvements, but runs the risks of accidentally deleting a crucial word in a query that can adversely affect retrieval significantly. It appears advisable to keep all stopwords and use them for segmentation purposes. One needs only retain high and low frequency thresholds to screen out frequency-based statistical stopwords. Experimentation with more varied queries is needed to verify these findings.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML