File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-1903_evalu.xml
Size: 4,892 bytes
Last Modified: 2025-10-06 13:58:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1903"> <Title>A Reliable Indexing Method for a Practical QA System</Title> <Section position="4" start_page="21" end_page="21" type="evalu"> <SectionTitle> 3 Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 3.1 The Experiment data </SectionTitle> <Paragraph position="0"> To experiment on MAYA, we use two sorts of document collections. One is a collection of documents that are collected from two web sites; korea.internet.com and www.sogang.ac.kr. The former gives the members on-line articles on Information Technology (IT). The latter is a homepage of Sogang University. We call the collection WEBTEC (WEB TEst Collection).</Paragraph> <Paragraph position="1"> The other is KorQATeC 1.0 (Korean Test Collection for evaluation of QA system) (Lee, Kim and Choi (2000)). WEBTEC consists of 22,448 documents (110,004 kilobytes), and KorQATeC 1.0 consists of 207,067 balanced documents (368,768 kilobytes). WEBTEC and KorQATeC 1.0 each include 50 pairs of question-answers (QAs).</Paragraph> <Paragraph position="2"> To experiment on MAYA, we compute the performance score as the Reciprocal Answer Rank (RAR) of the first correct answer given by each question. To compute the overall performance, we use the Mean Reciprocal Answer Rank (MRAR), as shown in Equation 6</Paragraph> <Paragraph position="4"> In Equation 6, ranki is the rank of the first correct answer given by the ith question. n is the number of questions.</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 3.2 The analysis of experiment results </SectionTitle> <Paragraph position="0"> For ranking answer candidates, MAYA uses the weighted sums of global scores and local scores, as shown in Equation 4. To set the weighting factors, we evaluated performances of MAYA according to the values of the weighting factors.</Paragraph> <Paragraph position="1"> Table 3 shows overall MRAR as the values of the weighting factors are changed. In Table 3, the boldface MRARs are the highest scores in each test bed. We set a and b to 0.1 and 0.9 on the basis of the experiment.</Paragraph> <Paragraph position="2"> To evaluate the performance of MAYA, we compared MAYA with Lee2000 (Lee, Kim and Choi (2000)) and Kim2001 (Kim, Kim, Lee and Seo (2000)) in KorQATeC 1.0 because we could not obtain any experimental results on Lee2000 in WEBTEC. As shown in Table 4, the performance of MAYA is higher than those of the other systems. The fact means that the scoring features of MAYA are useful. In Table 4, Lee2000 (50-byte) returns 50-byte span of phrases that include answer candidates, and the others return answer candidates in themselves. MRAR-1 is MRAR except questions for which the QA system fails in finding correct answers. MAYA could not extract correct answers for 5 questions. The failure cases are the following: a254 The query classifier failed to identify users' asking points. We think that most of these failure queries can be dealt with by supplementing additional lexico-syntactic grammars.</Paragraph> <Paragraph position="3"> a254 The NE recognizer failed to extract answer candidates. To resolve this problem, we should supplement the entries in the PLO dictionary and regular expressions. We also should endeavor to improve the precision of the NE recognizer.</Paragraph> <Paragraph position="4"> As shown in Table 5, the average retrieval time of the IR system (Lee, Park and Won (1999)) is 0.026 second per query on a PC server with dual Intel Pentium III. MAYA consumes 0.048 second per query. The difference of the retrieval times between the IR system and MAYA is not so big, which means that the retrieval speed of MAYA is fast enough to be negligible. Table 5 also shows the difference of the response times between MAYA and a QA system without a predictive answer indexer. We call the QA system without an answer indexer Incomplete-MAYA. Incomplete-MAYA finds and ranks answer candidates on retrieval time. Hence, it does not need additive indexing time except indexing time for the underlying IR system. In the experiment on the response time, we made Incomplete-MAYA process answer candidates just in top 30 documents that are retrieved by the underlying IR system. If Incomplete-MAYA finds and ranks answer candidates in the whole retrieved documents, it will take longer response time than the response time in Table 5. As shown in Table 5, the response time of MAYA is about 110 times faster than that of Incomplete-MAYA. Although MAYA consumes 19.120 seconds per mega byte for creating the answer DB, we conclude that MAYA is more efficient because most of the users are impatient for a system to show answers within a few milliseconds.</Paragraph> </Section> </Section> class="xml-element"></Paper>