File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-3018_metho.xml
Size: 10,619 bytes
Last Modified: 2025-10-06 14:09:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3018"> <Title>Resource Analysis for Question Answering</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Approach </SectionTitle> <Paragraph position="0"> For the purpose of this paper, resources consist of structured and semi-structured knowledge, such as the Web, web search engines, gazetteers, and encyclopedias. Although many QA systems incorporate or access such resources, few systems quantify individual resource impact on their performance and little work has been done to estimate bounds on resource impact to Question Answering. Independent of a specific QA system, we quantify the degree to which these resources are able to directly provide answers to questions.</Paragraph> <Paragraph position="1"> Experiments are performed on the 2,393 questions and the corresponding answer keys provided through NIST (Voorhees, 2003) as part of the TREC</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 8 through TREC 12 evaluations. 4 Gazetteers </SectionTitle> <Paragraph position="0"> Although the Web consists of mostly unstructured and loosely structured information, the available structured data is a valuable resource for question answering. Gazetteers in particular cover several frequently-asked factoid question types, such as &quot;What is the population of X?&quot; or &quot;What is the capital of Y?&quot;. The CIA World Factbook is a database containing geographical, political, and economical profiles of all the countries in the world. We also analyzed two additional data sources containing astronomy information (www.astronomy.com) and detailed information about the fifty US states (www.50states.com).</Paragraph> <Paragraph position="1"> Since gazetteers provide up-to-date information, some answers will differ from answers in local corpora or the Web. Moreover, questions requiring interval-type answers (e.g. &quot;How close is the sun?&quot;) may not match answers from different sources which are also correct. Gazetteers offer high precision answers, but have limited recall since they only cover a limited number of questions (See answered directly by gazetteers - shown are results for CIA Factbook and All gazetteers combined. Our extractor precision is Precision (P).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 WordNet </SectionTitle> <Paragraph position="0"> Wordnets and ontologies are very common resources and are employed in a wide variety of direct and indirect QA tasks, such as reasoning based on axioms extracted from WordNet (Moldovan et al., 2003), probabilistic inference using lexical relations for passage scoring (Paranjpe et al., 2003), and answer filtering via WordNet constraints (Leidner et al., 2003).</Paragraph> <Paragraph position="1"> WordNet glosses (Gloss), synonyms (Syns), hypernyms and hyponyms (Hyper), and all of them combined All.</Paragraph> <Paragraph position="2"> Table 2 shows an upper bound on how many TREC questions could be answered directly using WordNet as an answer source. Question terms and phrases were extracted and looked up in WordNet glosses, synonyms, hypernyms, and hyponyms. If the answer key matched the relevant WordNet data, then an answer was considered to be found. Since some answers might occur coincidentally, we these results to represent upper bounds on possible utility.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Structured Data Sources </SectionTitle> <Paragraph position="0"> Encyclopedias, dictionaries, and other web databases are structured data sources that are often employed in answering definitional questions (e.g., &quot;What is X?&quot;, &quot;Who is X?&quot;). The top-performing definitional systems at TREC (Xu et al., 2003) extract kernel facts similar question profiles built using structured and semi-structured resources: WordNet (Miller et al., 1990), the Merriam- null Table 3 shows a number of data sources and their impact on answering TREC questions. N-grams were extracted from each question and run through Wikipedia and Google's define operator (which searches specialized dictionaries, definition lists, glossaries, abbreviation lists etc). Table 3 show that TREC 10 and 11 questions benefit the most from the use of an encyclopedia, since they include many definitional questions. On the other hand, since TREC 12 has fewer definitional questions and more procedural questions, it does not benefit as much from Wikipedia or Google's define operator.</Paragraph> <Paragraph position="1"> To test coverage of different answer types, we employed the top level of the answer type hierarchy used by the JAVELIN system (Nyberg et al., 2003). The most frequent types are: definition (e.g. &quot;What is viscosity?&quot;), person-bio (e.g. &quot;Who was Lacan?&quot;), object(e.g. &quot;Name the highest mountain.&quot;), process (e.g. &quot;How did Cleopatra die?&quot;), lexicon (&quot;What does CBS stand for?&quot;)temporal(e.g. &quot;When is the first day of summer?&quot;), numeric (e.g. &quot;How tall is Mount Everest?&quot;), location (e.g. &quot;Where is Tokyo?&quot;), and proper-name (e.g. &quot;Who owns the down by answer type. Due to temporal consistency, numeric questions are not covered very well. Although the process and object types are broad answer types, the coverage is still reasonably good. As expected, the definition and person-bio answer types are covered well by these resources.</Paragraph> <Paragraph position="2"> 8 The Web as a Resource An increasing number of QA systems are using the web as a resource. Since the Web is orders of magnitude larger than local corpora, answers occur frequently in simple contexts, which is more conducive to retrieval and extraction of correct, confident answers (Clarke et al., 2001; Dumais et al., 2002; Lin and Katz, 2003). The web has been employed for pattern acquisition (Ravichandran et al., 2003), document retrieval (Dumais et al., 2002), query expansion (Yang et al., 2003), structured information extraction, and answer validation (Magnini et al., 2002) . Some of these approaches enhance existing QA systems, while others simplify the question answering task, allowing a less complex approach to find correct answers.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 8.1 Web Documents </SectionTitle> <Paragraph position="0"> Instead of searching a local corpus, some QA systems retrieve relevant documents from the web (Xu et al., 2003). Since the density of relevant web documents can be higher than the density of relevant local documents, answer extraction may be more successful from the web. For a TREC evaluation, answers found on the web must also be mapped to relevant documents in the local corpus.</Paragraph> <Paragraph position="1"> and rank of first relevant document.</Paragraph> <Paragraph position="2"> In order to evaluate the impact of web documents on TREC questions, we performed an experiment where simple queries were submitted to a web search engine. The questions were tokenized and filtered using a standard stop word list. The resulting keyword queries were used to retrieve 100 documents through the Google API (www.google.com/api). Documents containing the full question, question number, references to TREC, NIST, AQUAINT, Question Answering and similar content were filtered out.</Paragraph> <Paragraph position="3"> Figure 1 shows the density of documents containing a correct answer, as well as the rank of the first document containing a correct answer. The simple word query retrieves a relevant document for almost half of the questions. Note that for most systems, the retrieval performance should be superior since queries are usually more refined and additional query expansion is performed. However, this experiment provides an intuition and a very good lower bound on the precision and density of current web documents for the TREC QA task.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 8.2 Web-Based Query Expansion </SectionTitle> <Paragraph position="0"> Several QA systems participating at TREC have used search engines for query expansion (Yang et al., 2003). The basic query expansion method utilizes pseudo-relevance feedback (PRF) (Xu and Croft, 1996). Content words are selected from questions and submitted as queries to a search engine. The top n retrieved documents are selected, and k terms or phrases are extracted according to an optimization criterion (e.g. term frequency, n-gram frequency, average mutual information using corpus statistics, etc). These k items are used in the expanded query.</Paragraph> <Paragraph position="1"> We experimented by using the top 5, 10, 15, 20, sion terms - applied to 2183 questions for witch answer keys exist.</Paragraph> <Paragraph position="2"> 50, and 100 documents retrieved via the Google API for each question, and extracted the most frequent fifty n-grams (up to trigrams). The goal was to determine the quality of query expansion as measured by the density of correct answers already present in the expansion terms. Even without filtering n-grams matching the expected answer type, simple PRF produces the correct answer in the top n-grams for more than half the questions. The best correct answer density is achieved using PRF with only 20 web documents.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 8.3 Conclusions </SectionTitle> <Paragraph position="0"> This paper quantifies the utility of well-known and widely-used resources such as WordNet, encyclopedias, gazetteers and the Web on question answering. The experiments presented in this paper represent loose bounds on the direct use of these resources in answering TREC questions. We reported the performance of these resources on different TREC collections and on different question types. We also quantified web retrieval performance, and confirmed that the web contains a consistently high density of relevant documents containing correct answers even when simple queries are used. The paper also shows that pseudo-relevance feedback alone using web documents for query expansions can produce a correct answer for fifty percent of the questions examined.</Paragraph> </Section> </Section> class="xml-element"></Paper>