File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1121_evalu.xml
Size: 7,958 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1121"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 963-970, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Query Expansion with the Minimum User Feedback by Transductive Learning</Title> <Section position="9" start_page="966" end_page="967" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> This section provides empirical evidence on how our query expansion method can improve the performance of information retrieval. We compare our method with other traditional methods.</Paragraph> <Section position="1" start_page="966" end_page="967" type="sub_section"> <SectionTitle> 5.1 Environmental Settings </SectionTitle> <Paragraph position="0"> We use the TREC-8 data set (Voorhees and Harman, 1999) for our experiment. The document corpus contains about 520,000 news articles. Each document is preprocessed by removing stopwords and stemming. We also use fifty topics (No.401-450) and relevance judgments which are prepared for ad-hoc task in the TREC-8. Queries for an initial search are nouns extracted from the title tag in each topic.</Paragraph> <Paragraph position="1"> We use two representative retrieval models which are bases of the Okapi (Robertson, 1997) and SMART systems. They showed highest performance in the TREC-8 competition.</Paragraph> <Paragraph position="2"> Okapi : The weight function in Okapi is BM25. It calculates each document's score by the following formula.</Paragraph> <Paragraph position="4"> where Q is a query containing terms T, tf is the term's frequency in a document, qtf is the term's frequency in a text from which Q was derived. rt and nt are described in section 2. K is calculated by (7), where dl and avdl denote the document length and the average document length. In our experiments, we set k1 = 1.2,k3 = 1000,b = 0.75, and avdl = 135.6. Terms for query expansion are ranked in decreasing order of rt xw(1) for the following Okapi's retrieval tests without SGT (Okapi manual and Okapi pseudo) to make conditions the same as of TREC-8.</Paragraph> <Paragraph position="6"> df is the term's document frequency. tf, dl and avdl are the same as Okapi. When doing relevance feedback, a query vector is modified by the following Rocchio's method (with parame- null Drel and Dnrel are sets of seen relevant and non-relevant documents respectively. Terms for query expansion are ranked in decreasing order of the above Rocchio's formula.</Paragraph> <Paragraph position="7"> Table 1 shows their initial search results of Okapi (Okapi ini) and SMART (SMART ini). We adopt five evaluation measures. Their meanings are as fol- null are retrieved, where R is the number of relevant documents for the current topic.</Paragraph> <Paragraph position="8"> MAP : Mean average precision (MAP) is the average precision for a single topic is the mean of the precision obtained after each relevant document is retrieved (using zero as the precision for relevant documents that are not retrieved). R05P : Recall at the rank where precision first dips below 0.5 (after at least 10 documents have been retrieved).</Paragraph> <Paragraph position="9"> The performance of query expansion or relevance feedback is usually evaluated on a residual collection where seen documents are removed. However we compare our method with pseudo feedback based ones, thus we do not use residual collection in the following experiments.</Paragraph> <Paragraph position="10"> For manual feedback, we set an assumption that a user tries to find relevant and non-relevant documents within only top 10 documents in the result of an initial search. If a topic has no relevant document or no non-relevant document in the top 10 documents, we do not apply manual feedback, instead we consider the result of the initial search for such topics. There are 8 topics 3 which we do not apply manual feedback methods.</Paragraph> </Section> <Section position="2" start_page="967" end_page="967" type="sub_section"> <SectionTitle> 5.2 Basic Performance </SectionTitle> <Paragraph position="0"> Firstly, we evaluate the basic performance of our query expansion method by changing the number of training examples. Since our method is based on Okapi model, we represent it as Okapi sgt (with parameters k = 0.5[?]Ntr, d = 0.8[?]Ntr. k is the number of nearest neighbors, d is the number of eigen values to use and Ntr is the number of training examples). null Table 2-5 shows five evaluation measures of Okapi sgt when the number of expansion terms changes. We test 20, 50 and 100 as the number of training examples and 5, 10 15 and 20 for the number of expansion terms. As for the number of training examples, performance of 20 and 50 does not differ so much in all the number of expansion terms.</Paragraph> <Paragraph position="1"> However performance of 100 is clearly worse than of 20 and 50. The number of expansion terms does not effect so much in every evaluation measures. In the following experiments, we compare the results of Okapi sgt when the number of training examples is 50 with other query expansion methods.</Paragraph> </Section> <Section position="3" start_page="967" end_page="967" type="sub_section"> <SectionTitle> 5.3 Comparison with other Manual Feedback Methods </SectionTitle> <Paragraph position="0"> We next compare our query expansion method with the following manual feedback methods.</Paragraph> <Paragraph position="1"> Okapi man : This method simply uses only one relevant document judged by hand. This is called incremental relevance feedback (Aalbersberg, 1992; Allan, 1996; Iwayama, 2000). SMART man : This method is SMART's manual relevance feedback (with parameters a = 3, b = 2, g = 0). g is set to 0 because the performance is terrible if g is set to 2.</Paragraph> <Paragraph position="2"> Table 6 shows the mean average precision of three methods when the number of expansion terms changes. Since the number of feedback documents is extremely small, two methods except for Okapi sgt get worse than their initial searches.</Paragraph> <Paragraph position="3"> Okapi man slightly decreases as the number of expansion terms increases. Contrary, SMART man do not change so much as the number of expansion terms increases. Table 7 shows another evaluation measures with 10 terms expanded. It is clear that Okapi sgt outperforms the other two methods.</Paragraph> </Section> <Section position="4" start_page="967" end_page="967" type="sub_section"> <SectionTitle> 5.4 Comparison with Pseudo Feedback Methods </SectionTitle> <Paragraph position="0"> We finally compare our query expansion method with the following pseudo feedback methods.</Paragraph> <Paragraph position="1"> Okapi pse : This is a pseudo version of Okapi which assumes top 10 documents in the initial search as relevant ones as well as TREC-8 settings. null It also assumes top 10 documents as relevant ones. In addition, it assumes top 500-1000 documents as non-relevant ones.</Paragraph> <Paragraph position="2"> In TREC-8, above two methods uses TREC1-5 disks for query expansion and a phase extraction technique. However we do not adopt these methods in our experiments4. Since these methods showed the highest performance in the TREC-8 adhoc task, it is reasonable to compare our method with them as competitors.</Paragraph> <Paragraph position="3"> Table 8 shows the mean average precision of three methods when the number of expansion terms changes. Performance does not differ so much if the number of expansion terms changes. Okapi sgt out-performs at any number of expansion. Table 9 shows the results in other evaluation measures. Okapi sgt also outperforms except for R05P. In particular, performance in P10 is quite well. It is preferable behavior for the use in practical situations.</Paragraph> </Section> </Section> class="xml-element"></Paper>