File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0316_abstr.xml
Size: 1,040 bytes
Last Modified: 2025-10-06 13:49:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0316"> <Title>Lexicon Effects on Chinese Information Retrieval</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We investigate the effects of lexicon size and stopwords on Chinese information retrieval using our method of short-word segmentation based on simple language usage rules and statistics. These rules allow us to employ a small lexicon of only 2,175 entries and provide quite admirable retrieval results. It is noticed that accurate segmentation is not essential for good retrieval. Larger lexicons can lead to incremental improvements. The presence of stopwords do not contribute much noise to IR.</Paragraph> <Paragraph position="1"> Their removal risks elimination of crucial words in a query and adversely affect retrieval, especially when the queries are short. Short queries of a few words perform more than 10% worse than paragraph-size queries.</Paragraph> </Section> class="xml-element"></Paper>