File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1026_evalu.xml
Size: 4,742 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1026"> <Title>The Effectiveness of Dictionary and Web-Based Answer Reranking</Title> <Section position="7" start_page="5" end_page="10" type="evalu"> <SectionTitle> 5 Experiments and Results </SectionTitle> <Paragraph position="0"> We used a set of 102 definition questions from TREC-10 QA track as our test set. The performance of Webclopedia without dictionary or web-based answer reranking was used as the baseline. Webclopedia with dictionary-based answer reranking.</Paragraph> <Paragraph position="1"> To study the effect of using different search engines, context window sizes, number of top ranked web pages, and web gloss weight cut-off threshold on the performance of web-based answer reranking, we had the following setup: ).</Paragraph> <Paragraph position="2"> To investigate the performance of combining dictionary and web-based answer reranking, we ran the above setup again but each question's A total of 354 runs were performed. Manual evaluation of these 354 runs was not impossible but would be time consuming. We instead used the answer patterns provided by NIST to score all runs automatically.</Paragraph> <Paragraph position="3"> Due to space constraint, Table 3 shows the (MRR, PCT5) score pair for 90 runs out of 352 runs. The other two runs were the baseline run with a score pair of (0.450, 0.637) and the dictionary-based run, (0.535, 0.667). The best run was the combined dictionary and web-based run using Google as the search engine with 10-word context window, 70 top ranked pages, and a gloss weight cut-off threshold of 5.</Paragraph> <Paragraph position="4"> Analyzing all runs according to Table 3, we made the following observations.</Paragraph> <Paragraph position="5"> (1) Dictionary-based reranking improved baseline performance by 19% in MRR and 5% in PCT5 (MRR: 0.535, PCT5: 0.667).</Paragraph> <Paragraph position="6"> (2) The best web-based reranking (MRR: 0.539, PCT5: 0.676) was achieved with W=10, R=70, and T=5. It was comparable to the dictionary-based reranking.</Paragraph> <Paragraph position="7"> (3) Web-based reranking generally improved results. Only 6 runs (not shown in the table) did worse in their MRR scores than just using Webclopedia alone and these runs concentrated on low ranked page counts of 5 and 10.</Paragraph> <Paragraph position="8"> (6) Lower web gloss weight cut-off threshold was better at 5.</Paragraph> <Paragraph position="9"> (7) Longer context window was better at 10 (not shown in the table).</Paragraph> <Paragraph position="10"> (8) Taking top ranked pages of 50 to 90 pages provided better results.</Paragraph> <Paragraph position="11"> (9) Combining dictionary and web-based reranking always did better than using the web-based method alone.</Paragraph> <Paragraph position="12"> (10) Using WordNet and Google together was always better than just using WordNet alone in both MRR and PCT5 (the underlined cells).</Paragraph> <Section position="1" start_page="10" end_page="10" type="sub_section"> <SectionTitle> 5.1 Question Difficulty </SectionTitle> <Paragraph position="0"> To investigate the effectiveness of using dictionary and web-based answer reranking on question of different difficulty, we define question difficulty as: )/(1 Nnd [?]= , where n is the number of systems participating in TREC-10 that returned answers in top 5 and N is the number of total runs (that is, 67 for TREC-10). When d = 1 no systems provided an answer in top 5; while d = 0 if all runs provided at least one answer in top 5. Table 4 shows the improvement of MRR and PCT5 scores at four different question difficulty levels with four different system setups. The results indicate that using either dictionary or web-based answer reranking improved system performance at all levels. The best results were achieved when evidence from both resources was used.</Paragraph> <Paragraph position="1"> However, it also demonstrates the difficulty of improving performance on very hard questions (d>=0.75). This implies we might need to consider alternative methods to improve the system performance further.</Paragraph> <Paragraph position="2"> Table 3. Results of 90 runs shown in (MRR, PCT5) score pair where A: Altavista, G: Google, M: MSNSearch, X: all three search engines, W: context window size, R: number of top ranked web paged used, T: web gloss weight cut-off threshold. Runs marked with '+' indicate both dictionary and web-based answer reranking are used.</Paragraph> <Paragraph position="3"> difficulty levels. (F: Webclopedia only, F+: Webclopedia with WordNet, FG: Webclopedia with Google, and F+G: Webclopedia with</Paragraph> <Paragraph position="5"/> </Section> </Section> class="xml-element"></Paper>