File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1021_intro.xml

Size: 4,877 bytes

Last Modified: 2025-10-06 14:01:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1021">
  <Title>Evaluating Question-Answering Techniques in Chinese</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> A number of techniques for &amp;quot;question answering&amp;quot; have recently been evaluated both in the TREC environment (Voorhees and Harman, 1999) and in the DARPA TIDES program. In the standard approach to information retrieval, relevant text documents are retrieved in response to a query. The parts of those documents that may contain the most useful information or even the actual answer to the query are typically indicated by highlighting occurrences of query words in the text. In contrast, the task of a question-answering system is to identify text passages containing the relevant information and, if possible, extract the actual answer to the query. Question answering has a long history in natural language processing, and Salton's first book (Salton, 1968) contains a detailed discussion of the relationship between information retrieval and question-answering systems. The focus in recent research has been on extracting answers from very large text databases and many of the techniques use search technology as a major component. A significant number of the queries used in information retrieval experiments are questions, for example, TREC topic 338 &amp;quot;What adverse effects have people experienced while taking aspirin repeatedly?&amp;quot; and topic 308 &amp;quot;What are the advantages and/or disadvantages of tooth implants?&amp;quot; In question-answering experiments, the queries tend to be more restricted questions, where answers are likely to be found in a single text passage, for example, TREC question-answering question 11 &amp;quot;Who was President Cleveland's wife?&amp;quot; and question 14 &amp;quot;What country is the biggest producer of Tungsten?&amp;quot; The TREC question-answering experiments have, to date, used only English text. As the first step towards our goal of cross-lingual question answering, we investigated whether the general approaches to question answering that have been used in English will also be effective for Chinese.</Paragraph>
    <Paragraph position="1"> Although it is now well known that statistical information retrieval techniques are effective in many languages, earlier research, such as Fujii and Croft (1993, 1999), was helpful in pointing out which techniques were particularly useful for languages like Japanese. This research was designed to provide similar information for question answering. In the next section, we describe the components of the Chinese question answering system (Marsha) and the algorithm used to determine answers. In section 3, we describe an evaluation of the system using queries obtained from Chinese students and the TREC-9 Chinese cross-lingual database (164,779 documents from the Peoples Daily and the Xing-Hua news agencies in the period 1991-1995).</Paragraph>
    <Paragraph position="2">  2. Overview of the Marsha Question</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Answering System
</SectionTitle>
      <Paragraph position="0"> The Chinese question-answering system consists of three main components. These are the query processing module, the Hanquery search engine, and the answer extraction module. The query processing module recognizes known question types and formulates queries for the search engine.</Paragraph>
      <Paragraph position="1"> The search engine retrieves candidate texts from a large database. The answer extraction module identifies text passages that are likely to contain answers and extracts answers, if possible, from these passages. This system architecture is very similar to other question-answering systems described in the literature.</Paragraph>
      <Paragraph position="2"> More specifically, the query processing module carries out the following steps: (1) The query is matched with templates to decide the question type and the &amp;quot;question words&amp;quot; in the query. We define 9 question types. Most of these correspond to typical named entity classes used in information extraction systems. For each question type, there are one or more templates.</Paragraph>
      <Paragraph position="3"> Currently there are 170 templates. If more than one template matches the question, we pick the longest match.</Paragraph>
      <Paragraph position="4"> For example, a question may include &amp;quot; &amp;quot; (how many dollars). Then both (how many dollars) and (how many) will match the question. In this case, we will pick and assign &amp;quot;MONEY&amp;quot; to the question type.</Paragraph>
      <Paragraph position="5"> The following table gives examples for each question type:</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML