File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2501_intro.xml

Size: 7,360 bytes

Last Modified: 2025-10-06 14:02:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2501">
  <Title>Strategies for Advanced Question Answering</Title>
  <Section position="3" start_page="0" end_page="1" type="intro">
    <SectionTitle>
Q
</SectionTitle>
    <Paragraph position="0"> : &amp;quot;How have thefts impacted on the safety of Russia's nuclear navy, and has the theft problem been increased or decreased over time?&amp;quot; we may have series of simpler questions that decompose the question focus. One such example of simple question is Q  a : &amp;quot;What specific instances of theft do we know about?&amp;quot; - which is a listquestion similar to those evaluated in the recent TREC tracks (Harabagiu et al., 2003). Related, simpler ques- null asks about instantiations of the theft events, whereas question Q  b inquires about the objects of the events. The decompositions may follow other arguments of the event predicates, e.g. - the agents in</Paragraph>
    <Paragraph position="2"> : &amp;quot;Who are the perpetrators of these thefts?&amp;quot; as well as specializations of the events, e.g. &amp;quot;economical impact&amp;quot; specializing one of the possible impacts of the thefts in the question Q  d : &amp;quot;Do thefts have an economical impact on the naval bases?&amp;quot;. Furthermore, the concepts from the complex question need to be clearly understood, and often definition questions will be considered as decompositions that enable the processing of complex questions. The definition may involve entities from the complex question, e.g. Q  : &amp;quot;What does 'impact' mean?&amp;quot; There are several criteria that guide question decomposition, which also determine the answer resolution strategies. The criteria are: 1. There are coordinations in the question format, suggesting decompositions along the constituents they coordinate. Coordinations may exist at: (a) question stem level, e.g. &amp;quot;When and where did the thefts occur?&amp;quot;; (b) at predicate level, e.g. &amp;quot;How does one define an increase or a decrease in the theft problem?&amp;quot;; (c) at argument level, e.g. &amp;quot;To what degree do different thefts put nuclear or radioactive materials at risk?&amp;quot;; (d) at question level, e.g. &amp;quot;What specific instances of theft do we know about, and what are the sources of this information?&amp;quot;. Question decomposition by identifying coordinations involves: (a) disambiguation of conjunctives for identifying when they indicate separate questions as opposed to when they just co-ordinate constituents; (b) reference and ellipsis resolution of anaphoric expressions in the original question; (c) recognition of the relations between the resulting, decomposed questions, e.g. contrast, reinforcement, mutual exclusion.</Paragraph>
    <Paragraph position="3"> 2. The question asks about (a) a complex relation, e.g. cause-effect, resultative, trend, likelihood, (b) comparison with similar situations, or (c) elaboration of a state of affairs. Therefore the expected answer type is of complex nature and it requires definitions in the context of the complex scenario. The expected answer, recognized in a predicate from the question, determines the decomposition into (a) a definition question, (b) specializations of the predicate-concept, and (c) examples.</Paragraph>
    <Paragraph position="4">  3. In order to search for the complex answer, elaborations of its arguments are needed. Such elaborations, called argument-answer decompositions, may involve (a) nested predicate-argument structures, (b) quantifications, or (c) instantiations.</Paragraph>
    <Paragraph position="5">  When a complex question is processed, and is decomposed into a set of simpler questions which are analyzed independently. Each decomposed question may belong to a different class, for which certain strategies may be optimal. Such strategies implement the pragmatic processes that interact with the syntactic and semantic information that results from the derivation of: (1) expected answer types or structures, (2) name entities which are recognized, as well as (3) syntactic and semantic dependencies derived from the parsing of the question into predicate-argument structures. To be able to process the question precisely we are developing techniques that leverage a database of one million questions that have answers in a controlled corpus. This large database provides wide coverage of answer types and answer instances. It also enhances the retrieval, navigation and fusion of partial answers.</Paragraph>
    <Paragraph position="6"> The challenge of creating a set of approximately one million question and answer pairs are twofold. First, the pairs need to be diverse in terms of difficulty, where difficulty can be defined in terms of answer type complexity (common, uncommon, requiring decomposition), answer granularity (concentrated within a small fragment or spread across several passages and documents), ease of matching (requiring both surface-text and deep semantic understanding). Second, the pairs should be reliable, i.e. each question must be associated with a correct answer. Our solution is a combination of collection and generation from semi-structured resources, followed by expansion and validation. We will generate the collection of QA pairs from Frequently Asked Questions (FAQ) files on various topics. We will develop a dedicated harvesting algorithm to identify FAQ's on the Web and extract the QA pairs.</Paragraph>
    <Paragraph position="7"> The large database of questions also allows us to create a benchmark that will support the development of statistical techniques for Q/A. The architecture of the benchmark system is illustrated in Figure 1. Our system selects answers based on (1) question processing strategies; (2) passage retrieval strategies made possible by (3) question decomposition and (4) answer fusion.</Paragraph>
    <Paragraph position="8"> When a question is posed to the system, it is either decomposed on a set of simpler questions or it is processed in parallel with similar questions provided by the Interactive Question Answering component. Based on the user background, a set of similar questions may be selected and analyzed in parallel. Multiple strategies are available for retrieving relevant passages. The possible selections are once again dictated by feedback from  interactions with the user. The relevant passages may also be combined on the basis of the same interactive and background information. We propose to study and develop several kernel methods that can operate in Support Vector Machines for determining the optimal strategies and compare the results with the Maximum Entropy combinations reported in (Echihabi and Marcu, 2003). The answer is produced by an answer fusion module that uses fusion operators. Since such operators are template-like, pattern acquisition methods may be employed for acquiring them.</Paragraph>
    <Paragraph position="9"> The rest of the paper is organized as follows. The answer fusion strategies are presented in Section 2. Section 3details the methods for bootstrapping Question Answering. Section 4 describes the impact of the user background on the pragmatics of Q/A. Section 5 presents the problems engendered by processing negations in Question Answering. Section 6 summarizes the conclusions. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML