File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1026_intro.xml

Size: 5,009 bytes

Last Modified: 2025-10-06 14:01:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1026">
  <Title>The Effectiveness of Dictionary and Web-Based Answer Reranking</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In an attempt to further progress in information retrieval research, the Text REtrieval Conference (TREC) sponsored by the National Institute of Standards and Technology (NIST) started a series of large-scale evaluations of domain independent automated question answering systems in TREC-8 (Voorhees 2000) and continued in TREC-9 and TREC-10. NTCIR (NII-NACSIS Test Collection for IR Systems, TREC's counterpart in Japan) initiated its question answering evaluation effort, Question Answering Challenge (QAC) in 2001 (Fukumoto et al. 2001). Research systems participating in TRECs and the coming QAC focused on the problem of answering closed-class questions that have short fact-based answers (&amp;quot;factoids&amp;quot;) from a large collection of text.</Paragraph>
    <Paragraph position="1"> These systems bear a similar structure:  (1) Question analysis - identify question  keywords to be submitted to search engines (local or web), recognize question types, and suggest expected answer types. Although most systems rely on a taxonomy of expected answer types, the number of nodes in the taxonomy varies widely from single digits to a few thousands. For example, Abney et al. (2000) used 5; Ittycheriah et al. (2001), 31; Hovy et al. (2001), 140; Harabagiu et al. (2001), 8,797.</Paragraph>
    <Paragraph position="2"> These taxonomies were mostly based on named entities and WordNet (Fellbaum 1998). Special types such definition questions (ex: &amp;quot;What is an atom?&amp;quot;) were added as necessary.</Paragraph>
    <Paragraph position="3"> (2) Passage or Sentence retrieval - this aims to provide a text pool of manageable size for extracting candidate answers. Most top performing systems in TRECs use their own retrieval methods for passages (Brill et al. 2001; Clarke et al. 2001; Harabagiu et al. 2001) or sentences (Hovy et al. 2001).</Paragraph>
    <Paragraph position="4"> (3) Candidate answer extraction - extract candidate answers according to answer types. If the expected answer types are typical named entities, information extraction engines (Bikel et al. 1999, Srihari and Li 2000) are used to extract candidate answers. Otherwise special answer patterns are used to pinpoint answers. For example, Soubbotin and Soubbotin (2001) create a set of 6 answer patterns for definition questions. (4) Answer ranking - assign scores to candidate answers according to their frequency in top ranked passages (Abney et al. 2000; Clarke et al. 2001), similarity to candidate answers extracted from external sources such as the web (Brill et al. 2001; Buchholz 2001) or WordNet (Harabagiu et al. 2001; Hovy et al. 2001), density, distance, or order of question keywords around the candidates, similarity between the dependency structures of questions and candidate answers (Harabagiu et al. 2001; Hovy et al. 2001; Ittycheriah et al. 2001), and match of expected answer types.</Paragraph>
    <Paragraph position="5"> In this paper, we describe an in-depth study of answer reranking for definition questions.</Paragraph>
    <Paragraph position="6"> Definition questions account for over 100 (20%) test questions in TREC-10. They are not named entities that have been the cornerstones of many .</Paragraph>
    <Paragraph position="7"> high performance QA systems (Srihari and Li 2000; Harabagiu et al. 2001).</Paragraph>
    <Paragraph position="8"> By reranking we mean the following. Assume a QA system such as Webclopedia (Section 3) provides an initial set of ranked candidate answers from the TREC corpus. The ranking is based on the IR engine's passage or sentence match scores. One can then measure the effectiveness of utilizing resources such as WordNet or the web to rerank the initial results, hoping to achieve better mean reciprocal rank (MRR) and percent of correctness in the top 5 (PTC5).</Paragraph>
    <Paragraph position="9"> Answer reranking is often overlooked. The answer candidates (&lt;= 400 instances per question) generated by Webclopedia from TREC corpus included answers for 83% of 102 definition questions used in this study (the TREC-10 definition questions). However, Webclopedia ranked only 64% of them in the top 5, giving an MRR score of 45%. If a perfect answer reranking function had been used, the best achievable MRR would have been 83% (an 84% increase over the original 45%).</Paragraph>
    <Paragraph position="10"> Section 2 gives a brief overview of TREC-10.</Paragraph>
    <Paragraph position="11"> Section 3 outlines the Webclopedia system.</Paragraph>
    <Paragraph position="12"> Section 4 defines definition questions and describes our dictionary and web-based reranking methods. Section 5 presents experiments and results. We conclude with lessons learned and future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML