File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1059_intro.xml
Size: 2,031 bytes
Last Modified: 2025-10-06 14:00:49
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1059"> <Title>Corpus-dependent Association Thesauri for Information Retrieval</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A thesaurus plays essential roles in information retrieval systems. In particular, a domain-specific thesaurus greatly improves the effectiveness of information retrieval. However, we are confronted with the difficult problem of how to construct and maintain a domain-specific thesaurus. The goal of our present research is to establish a method for autolnatically generating a thesaurus from a text corpus of a domain and demonstrate its application to information retrieval. null Thesauri are classified into taxonomy-type thesauri and association thesauri.</Paragraph> <Paragraph position="1"> There has been various research on the extraction of taxonomic information |'io111 a corpus, including extraction of hyponyms by using linguistic patterns (Hearst 1992) and extraction of synonyms based on the similarity of sets of co-occurring words (Ruge 1991; Grefenstette 1992). However, the performance of these methods is limited, and they should be considered as aids to augment hand-made thesauri. In contrast, an association thesaurus, that is a collection of pairs of semantically associated terms, can be possibly generated from a corpus entirely automatically. Word association norms based on co-occurrence information have been proposed by (Church and Hanks 1990). Here we focus on the automatic generation of an association thesanrus.</Paragraph> <Paragraph position="2"> Association thesauri are as useful as taxondeg omy-type thesauri in information retrieval. The improvement of retrieval effectiveness by using an association thesaurus has been reported by a number of papers (Jing and Croft 1994; Schutze and Pedersen 1994). We propose to use a coro pus-dependent association thesaurus interactively. null</Paragraph> </Section> class="xml-element"></Paper>