File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2508_metho.xml
Size: 37,617 bytes
Last Modified: 2025-10-06 14:09:21
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2508"> <Title>Experiments with Interactive Question Answering in Complex Scenarios</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> COMPLEX QUESTION: </SectionTitle> <Paragraph position="0"> What is the current status of India's Prithvi ballistic missile project?</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> DECOMPOSITION </SectionTitle> <Paragraph position="0"> (1) (a) How should 'India' be identified? (1) (b) Pre[?]independence or post[?]independence, post[?]colonial, or post[?]1947 India? (2) (a) What is 'Prithvi'? (2) (b) What does Prithvi mean? (2) (c) What class of missiles does Prithvi belong to? (2) (d) What is its range/payload, and other technical details? (3) (a) What is the meaning of 'status'? (3) (b) Does status mean research and development, flight[?]tests, user[?]trials, The decompositions presented in Figure 1 introduce a number of novel challenges for Q/A systems. Three are discussed below: Clarification Questions. Questions like What is the meaning of &quot;status&quot;? represent a new challenge for current Q/A systems. Unlike TREC-style definition questions, this class of questions (which we refer to as clarification questions) seek to identify the most domain-specific characterization available for the concept, entity, or term in focus. Although informationally &quot;simple&quot;, answers to these questions depend on implicit domain-specific knowledge that can only be supplied by an interactive Q/A system. In order to answer a question like What is the meaning of &quot;status&quot;?, a system must be able (1) to identify the differences between the domain-specific and the domain-general characterization of the focal item, (2) to recognize which domain-specific sense the user is seeking, and finally (3) to return information that will help the user understand all of the domain-dependent semantic entailments of the term.</Paragraph> <Paragraph position="1"> Alternative Set Questions. Questions produced as part of a scenario decomposition often ask a system to distinguish between several different possible alternatives for the characterization of an entity. Faced with a question like How should &quot;India&quot; be identified? Preindependence or post-independence? Post-colonial or post-1947 India?, the Q/A system must not only identify the named entity India but must also return enough contextual information to be able to determine which of the named set of entities should be considered most relevant to the current contextual scenario.</Paragraph> <Paragraph position="2"> Although the set of alternatives can be overtly stipulated by the user, an interactive Q/A system should ideally possess the domain-specific knowledge and the inferential capacity to be able to generate these kinds of alternative sets automatically. Although a set of alternatives may be extractable from a highly-specified semantic ontology for a question like What is the meaning of &quot;status&quot;?, it is unlikely that such an ontology can be used to derive the different instantiations of India listed in How should &quot;India&quot; be identified?. In this latter case, the system would have to (1) decide whether some sort of differentiation was necessary between the available instantiations, (2) identify which of the set of instantiations were the most relevant alternatives, and finally, (3) determine which instantiation should be used to retrieve the answer.</Paragraph> <Paragraph position="3"> Contextual-Dependent Ellipsis. Questions that involve syntactic ellipsis must be answered in context.</Paragraph> <Paragraph position="4"> With a question like What does &quot;Prithvi&quot; mean?, the system must recognize that semantic meaning is evaluated with regards to a language (here, any of those spoken on the Indian sub-continent). The system must also be able to (1) identify examples of implicit syntactic ellipsis, (2) determine the semantic type of the syntactically-elided information, and finally, (3) supply the contextually-relevant members of that semantic class needed to return the answer.</Paragraph> <Paragraph position="5"> Based on the initial observations above, we conclude that a careful analysis of the questions generated by scenario decompositions needs to be conducted to identify new types of questions that cannot be processed by current Q/A systems. By expanding the coverage of Q/A systems for these kinds of &quot;informationally simple&quot; questions, we expect future Q/A systems to be better positioned to process questions with more complex informational goals. A careful examination of the question decompositions generated by expert users can help us better understand what kinds of domain-specific knowledge should be made available to an interactive Q/A system.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 A Practical Solution </SectionTitle> <Paragraph position="0"> The goal of question decomposition is to translate complex questions into simpler questions that have identifiable answer types. Effective question decomposition does not guarantee answers, however: current Q/A systems are only able to provide answers for approximately 55% of simple (i.e. factoid, definition, and list) questions.</Paragraph> <Paragraph position="1"> For most state-of-the-art Q/A systems, correct answers are returned iff the system identifies the correct answer type from the syntax and semantics of the question itself.</Paragraph> <Paragraph position="2"> Although current answer-type hierarchies can be fairly broad in their coverage of concepts, they do not provide an exhaustive treatment of all of the types of information that users can request for any particular domain. In LCC's current Q/A system (Harabagiu, Moldovan, et al., 2003), no answer type could be detected for questions like What business was the source of John D. Rockefeller's fortune? (TREC-1909) or What 1857 U.S. Supreme Court decision denied that blacks were citizens? (TREC-2259). The failure of our system to return answer types for these questions was attributed to identifiable gaps in our semantic ontology of answer types. By revising our answer type hierarchy to include classes of businesses or Supreme Court decisions, we could presumably enable our system to identify a viable answer type for each of these questions, and thereby improve our chances of returning a correct answer to the user.</Paragraph> <Paragraph position="3"> However, the challenge of expanding an answer type hierarchy becomes considerably more difficult when we start considering the very specific semantic ontologies that would need to be added to a hierarchy to account for specific domains such as the development of Prithvi missiles in India, opium production in Afghanistan, or AIDS in Africa. Without expert input into ontology creation for each of these domains, NLP researchers can have only a limited idea of the conceptual knowledge that necessarily needs to be added to the answer type hierarchy in order to improve Q/A for each of these domains.</Paragraph> <Paragraph position="4"> Given these considerations, we were able to improve our coverage of domain-specific factoid questions by incorporating a database of 342 question-answer pairs (related to a series of specific domains) into our Q/A system. We used this database, known as the QUestion-Answer Base or QUAB, to measure the conceptual similarity of new questions to question-answer pairs already listed in QUAB. In the absence of a highly-articulated answer-type hierarchy, we assumed that questions that exhibited a high degree of similarity necessarily encoded What types of weapons are available? What kinds of attacks similar to the 9[?]11 attack are possible? What defenses are there against conventional attacks?What types were used in the past, and where did they attack? What evidence of WMD capability by terrorists exists? What weapons have been used by terrorists in the past? What threats have been made? What sorts of threats were they, and where were they targetted at? What unconventional weapons do those groups own?What conventional weapons do those groups own? What sorts of people will they target?What are their specific goals? What organizations are interested in attacking targets on US soil?How difficult is it to engage in a 9[?]11 style attack? How difficult is it to obtain conventional weapons? What kinds of conventional weapons are available? What kinds of attacks have been thwarted recently? What are the major targets on US soil? Assuming another terrorist attack occurs on US soil, is it more likely to be a conventional attack or an attack using a weapon of mass destruction?</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> COMPLEX QUESTION: TOPIC DECOMPOSITION + USER DECOMPOSITION </SectionTitle> <Paragraph position="0"> What are the risks associated with WMD use? How credible are these threats? What kind of delivery systems do they have? What are the capabilities of those organizations? What kinds of WMD are available? WMD? What defenses are there against WMD attacks? How difficult is it to engage in attacks on the US? similar types of information. When a new question was judged to be conceptually similar to a question in QUAB, the QUAB question's answer was returned to users as a potential answer. Questions that were not similar to any existing question in QUAB were submitted automatically to our Q/A system without providing any additional feedback to users. This type of methodology allowed us to develop a series of Just-in-Time-Information-Search-Agents (JITISA) which exploited different measures of conceptual and lexical similarity in order to identify answers to domain-specific questions.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Users of Complex Q/A </SectionTitle> <Paragraph position="0"> The performance of interactive Q/A systems can be improved by identifying what strategies different users employ to reach their informational goals. We define an informational goal as the propositional knowledge that a user is trying to obtain by participating in a dialogue with a Q/A system. We suggest that the representation of an informational goal depends crucially on the specific knowledge that individual users bring to the interaction with the Q/A system. Users that possess little or no knowledge of a particular domain will necessarily seek a different level of information than users who are intimately familiar with the domain. Based on these assumptions, we propose that interactive Q/A systems be sensitive to two kinds of users: (1) expert users, who may be expected to interact with the system based on a working knowledge of the semantic ontology underlying a domain, and (2) novice users, who are expected to have no foreknowledge of the ontological categories specific to the domain. By examining the differences in information-seeking techniques employed by expert users (such as intelligence analysts) and novice users (such as NLP researchers), we can better identify user intentions and work towards anticipating the information needs of any user.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Question Decompositions </SectionTitle> <Paragraph position="0"> We suggest that question decomposition can be approached in two ways. The first approach generates a set of questions related to the complex question by maximizing the extraction of information related to the domain. In this way, the Q/A user is provided with full coverage of the information associated with the concepts expressed in the complex question. This methodology seeks to approximate domain-specific knowledge. The idea is that by caching information associated with the domain, the domain coverage is maximized and the likelihood that the retrieved answers meet the users' information needs is enhanced. The questions that extract relevant domain information are clustered in related sub-topics and generate a bottom-up decomposition of the complex question.</Paragraph> <Paragraph position="1"> The second approach generates a top-down decomposition by monitoring user strategies towards decomposition. The purpose is to derive general relations between topic-specific questions and the subquestions that they entail. Such relations are discovered by combining domain specific knowledge with general coherence relations. The domain knowledge selects decompositions viable in the context of a domain scenario, whereas the coherence relations connect questions of different levels of complexity.</Paragraph> <Paragraph position="2"> In recognition of these diverse goals, we hypothesize that research in question decomposition should follow two parallel tracks: topic-centric and user-centric. These two proposed strategies have different strengths. The user-centric strategy mimics the user's intentions when resolving an information-seeking task but may miss relevant information since not all the right questions may be covered. In contrast, the topic-centric strategy generates good recall, but it relies on similarity functions that are hard to encode. Section 3.1 presents topic-centric work.</Paragraph> <Paragraph position="3"> The rest of the section is organized in the following way. Section 3.2 presents user-centric work. Section 3.3 speculates about the contribution of each form of decomposition to interactions with the Q/A system. We argue that there are optimal ways for combining advances in each approach to provide a unified treatment of question decomposition.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Topic-Centric Method </SectionTitle> <Paragraph position="0"> Answers to a complex question are retrieved from a set of topic-relevant documents. In our experiments, we have used two sets of such documents. In the first pilot evaluations, we have created our own corpus of documents relevant to the topics proposed. The corpus combined documents from Lexus-Nexus with documents we have gathered from the Internet. The relevance of the documents was provided by the presence of certain concepts we deemed characteristic for each domain. In the second pilot, we used the documents provided by the Center for Non-Proliferation Studies (CNS) that were considered relevant based on the concepts that could be derived from the complex questions and their decompositions.</Paragraph> </Section> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> COMPLEX QUESTION: </SectionTitle> <Paragraph position="0"> Despite having complete access, to this day UN inspections have been unable to find any biological weapons, or remnants thereof, in Iraq. Why has it proven difficult to discover hard information about Iraq's biological weapons program and what are the implications of these difficulties fot the intrnational biological arms control regime?</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> RELEVANT CONCEPTS </SectionTitle> <Paragraph position="0"> UN inspections comlete access biological weapons biological weapons program difficulty biological arms control regime What is the nature of the UN inspector team ? What kinds of technology do UN inspectors have? Do UN inspectors have access to all public, private and government facilities? How might the nature of Iraq aid the government in hiding a bioweapons program? Does the natural terrain provide natural hiding places for bioweapons? How likely is it that Iraq could destroy the biowepons program with no trace? As illustrated in Figure 3, topic-relevant concepts guide the generation of questions than are easier to process. Because the simpler questions contain concepts used also in the complex question, they are related to it.</Paragraph> <Paragraph position="1"> Figure 3 shows several questions created by question patterns for which (1) there was at least one text snippet in the collection that matched a trigger word; and (2) contained at least one of the relevant concepts. When the relevant concept was &quot;UN inspectors&quot; the trigger words were: &quot;team&quot;, &quot;technology&quot; and &quot;facilities&quot;, which are typical of inspections.</Paragraph> <Paragraph position="2"> Questions generated by topic-centric decomposition could be related to multiple relevant concepts. Figure 3 illustrates also a set of questions that related both to the &quot;Iraq&quot; concept and to the &quot;bioweapons program&quot; concept. The latter concept is a synonym of the concept &quot;biological weapons program&quot;.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 User-Centric Method </SectionTitle> <Paragraph position="0"> Different users might decompose a complex question differently. By producing an analysis of the complex question and of the questions produced by each user we can explore several different paths of searching for the relevant information of a complex question. Additionally, the different paths indicate the kind of topic knowledge each user has available. It also indicates the level of expertise of each user.</Paragraph> <Paragraph position="1"> In analyzing the complex question, we focus on (1) the focus of the question; (2) the context of the question and (3) the implied results. Since complex questions may consist of multiple sentences and interrogatives, we produce such tree-dimensional structures for each sentence/interrogative of the complex question, as illustrated in Figure 4. Figure 4 also lists a set of questions that may be derived from the structure associated with each question constituent. It may be noticed that these questions have multiple natures. Some can be cast as definition questions, e.g. What is a biological weapon ?. Other questions are based on knowledge of each sub-topic. For example, Did Iraq violate any international law?, implies that international laws govern the international biological arms control regime.</Paragraph> </Section> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> COMPLEX QUESTION: </SectionTitle> <Paragraph position="0"> Despite having complete access, to this day UN inspections have been unable to find any biological weapons, or remnants thereof, in Iraq.</Paragraph> </Section> <Section position="11" start_page="0" end_page="0" type="metho"> <SectionTitle> BACKGROUND </SectionTitle> <Paragraph position="0"> TOPIC: biological weapons CONTEXT: Iraq, UN inspections RESULTS: unable to find any bioweapons or remnants Contradiction: UN inspectors have complete access What is a biological weapon? How was complete access granted to UN inspectors? How lilkely is it that any inspector could detect any bioweapon? What limits did the UN inspectors have ? intrnational biological arms control regime? weapons program and what are the implications of these difficulties for the Why has it proven so difficult to discover hard information about Iraq's biological</Paragraph> </Section> <Section position="12" start_page="0" end_page="0" type="metho"> <SectionTitle> COMPLEX QUESTION: CONTINUATION </SectionTitle> <Paragraph position="0"> User-centric decompositions are based on the idea that each user generates a sequence of questions that represents a path from the complex question to a series of questions that are connected through coherence relations of the type ELABORATION or CAUSE-EFFECT. Since definition and list questions are also used, the set of coherence relations needs to be adapted for the task of interactive Q/A.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Experiments with Expert Users </SectionTitle> <Paragraph position="0"> This section presents a brief case study comparing decompositions produced by three users of different skill levels.</Paragraph> <Paragraph position="1"> We collected three decompositions of the following complex scenario: Despite having complete access, to this day UN inspections have been unable to find any biological weapons, or remnants thereof, in Iraq. Why has it proven difficult to discover hard information about Iraq's biological weapons program and what are the implications of these difficulties for the international biological arms control regime?. This scenario asks users to elaborate about a state of affairs: namely, the failure of UN weapons inspectors to find evidence of a biological weapons program in Iraq. In addition, users are asked to return information about (1) the potential causes of this state as well as (2) the expected effects of the continued duration of the state on the &quot;biological arms control regime&quot;.</Paragraph> <Paragraph position="2"> (2) (a) What is a biological weapon? (2) (b) Is it, for example, a quantity of pathogens or toxins, or is there more to it? COMPLEX QUESTION: Despite having complete access, to this day UN inspections have been unable to find any biological weapons, or remnants thereof, in Iraq. Why has it proven difficult to discover hard information about Iraq's biological weapons program and what are the implications of these difficulties fot the intrnational biological arms control regime? DECOMPOSITION: accessing sites and facilities? (1) (a) Is there such a concept as &quot;complete access&quot; or are there inevitably limits to related systems, they would be found by inspectors? an acceptable level of assurance that were there biological weapons and/or (1) (b) If there are such limits, can inspections in fact be carried out effectively; i.e., with (3) (a) What are the likely signatures of a national biological weapons program and how likely is it that inspectors from the outside would be able to detect them? (4) (a) What are the constituent parts of the &quot;international arms control regime&quot; in the context of biological weapons? (4) (b) Does it, for example, solely consist ofthe 1972 Biological and Toxin Weapons Convention (BWC), ir is there more to it? (5) (a) Since Iraq was only a signatory (not retifier) of the BWC during the time it was developing and producing biological weapons (1985[?]1991), were its actions in this regard contrary to international law? (5) (b) If not, did the international community have a different recourse to designate the Iraqi government as having violated international law or norms by having acquired biological weapons? We predict that the domain-specific knowledge that users possess will directly influence how they perform question decomposition. If a user has overt knowledge that causality can exist between the state described in the scenario and another set of states or events, then we should expect decomposition to proceed in an evidentiary mode. Since the user has evidence that two states may be causally linked in another domain, subquestions are asked in order to gather information that describes how this causal relationship is instantiated in the current domain of interest. However, if a user only has a belief or an expectation (and no overt knowledge) that a causal link can be established between two states, we expect decomposition to be more general and epistemic in nature. Since the user has only a belief that causality exists between two states, they must first confirm that this expectation is viable before they can turn to gathering information which supports their claim. Since they have (by definition) a better conception of the semantic ontology for the domain, expert users will ask a higher percentage of factoid and evidence-seeking questions than novice users. Likewise, we expect that that the decompositions of novice users will be characterized by more general questions that seek to evaluate which ontological relationships are available in a particular domain.</Paragraph> <Paragraph position="3"> Although these predictions may prove difficult to evaluate in many real texts, they do appear to borne out in the following decompositions.</Paragraph> <Paragraph position="4"> NIST Decomposition. Figure 5 presents the scenario decomposition generated by NIST as part of the ARDA AQUAINT project. This decomposition focuses on four major topics: (1) the nature of &quot;complete access&quot; in terms of the UN inspections in Iraq, (2) the definition of the term &quot;biological weapon&quot;, (3) the potential sources of evidence which would point to the existence of a biological weapons program, and finally, (4) the clarification of international laws concerning biological weapons. Although these four topics are clearly central to the domain, it is notable that this decomposition does not include any questions that address the issue of finding biological weapons in Iraq.</Paragraph> <Paragraph position="5"> What was the scope of Iraq's biological weapons program? In the past? Immediately prior to US invasion? What quantities of biological weapons has Iraq used in past wars? In other periods? Within Iraq? Against Iran? Does Iraq have the infrastructure necessary for destroying biological weapons safely? For creating biological weapons? Does Iraq have the capacity to store and/or transport biological weapons? By land? By air? By sea? How has that capacity chanced since 1991? Are there personnel within the Iraqi government responsible for destroying biological weapons? Are these people civilians or military personnel? Are there Iraqi personnel (scientist, clerks, military) that we can identify who have been traditionally associated with the Iraqi bioweapons program? What are their names? In what capacity did they participate in the bioweapons program? warfare sickness or contamination? Is there evidence from Iraqi military medical records for possible signs of biological Is there evidence from Iraqi civilian hospital records of doctors who have treated possible biological weapon sicknesses? Are there individuals who have witnessed cases of biological weapon sicknesses? Has Iraqi military trained personnel in the use of biological weapons? At any time in the past 12 years? Does Iraq have any military units tasked with using biological weapons? Are those units still active? When were they disbanded? Which countries have been formally allied with Iraq? Since 1991? Is there evidence that countries may have stored bioweapons for Iraq? Is there in the past? evidence that other countries have engaged in similar kinds of deals with Iraq Analyst Decomposition. Figure 6 presents a scenario decomposition generated by an intelligence analyst as part of a pilot study conducted by LCC. (For more details on this pilot study, see Section 4.2) In contrast with the NIST decomposition, this decomposition focuses on establishing factual evidence for several different hypotheses concerning the failure of inspectors to find bioweapons in Iraq.</Paragraph> <Paragraph position="6"> LCC Decomposition. Finally, Figure 7 presents a scenario decomposition created by a novice LCC researcher who had no specific training in analysis techniques and no specific background in the domain. Although the LCC researcher produced considerably more questions than the two experts, these questions focus mostly on discovering the classes of hypotheses and conceptual relations that are found in the domain. Questions like What do What is the nature of a bioweapons program? What is its goal? What kinds of traces does an active bioweapons program leave? weapons? What sort of equipment? What sort of chemicals? What sort of technology? What exactly is a bioweapon? What kind of infrastructure is requried to make bio[?] Does Iraq have the infrastructure necessary to produce bioweapons? How long does weapons facilities produce? What is the desired output? What is the waste output? it take to put together a bioweapons faci;ity? What kinds of products do most bio[?] What would consitute 'hard evidence' of the existence of a bioweapons program? hard evidence'? What is the likelihood that outside inspectors could find these traces? What signatures would a bioweapons program leave? How much would be needed to be is it that Iraq could hide the program with no traces? How likely is it that Iraq could How could a government or an organization hide a bioweapons program? How likely destroy the program with no traces? Have the inspection teams found evidence of bioweapons programs in other countries? Which countries? How might the nature of Iraq aid the government in hiding bioweapons program? What about the natural geography? How large is Iraq? Does the natural terrain provide natural hiding places for bioweapons? Would Iraqi citizens aid the government in hiding a bioweapons program? Are Iraqi citizens still loyal to the Iraqi government? What is the nature of the UN inspector team? How many inspectors are there? How experienced are they? What kinds of technology do the UN inspectors use to find bio[?] weapons? What kinds of intelligence do they have? Do they have informants in Iraq? Do inspectors have access to all public, private and government facilities? Currently? In the past? Before the 1991 war? What does complete access mean? Is there an official definition? Criteria? What do we know about Iraq's bioweapons program in the past? Since 1991? What evidence do we have about its existence? What products did they produce? When was the last time Iraq produced bioweapons? How much bioweapon did they produce? Was it exported to anyone? Was it tested? Where do we get information about Iraqi bioweapons programs? How reliable is it? we know about Iraq's bioweapons program in the past? suggest the researcher's informational goals were defined at a much more general level than either of the two experts. In addition, the LCC researcher's questions provided a broader coverage of the topics within the domain; although this demonstrates that the researcher did have some familiarity with issues central to the domain, it also signifies that he most likely did not have access to knowledge that would have allowed him to evaluate which concepts were most central to the informational focus defined in the scenario.</Paragraph> <Paragraph position="7"> Comparison. Although all three of the decompositions above cover many of the same topics (e.g. the nature of bioweapons and bioweapons programs, the Iraqi infrastructure for supporting bioweapons programs, etc.), they differ in the the level of specificity of their questions.</Paragraph> <Paragraph position="8"> While both expert and novice decompositions do include questions that establish domain-specific definitions for particular keywords or phrases (e.g. What constitutes &quot;complete access&quot; for inspectors?), questions in the expert decompositions appear to focus more on gathering evidence for particular hypotheses, while questions in novice decompositions focus more on establishing which kinds of hypotheses are viable in the given domain. This observation is supported by comparing analogous examples from the decompositions above. While the expert analyst was able to ask a rather specific question about Iraq's past use of biological weapons that demonstrated an in-depth knowledge of the geopolitical entities in the domain (i.e. What quantities of biological weapons has Iraq used in past wars? In other periods? Within Iraq? Against Iran?), the novice LCC user was only able to question whether or not the event had occurred at all (i.e.</Paragraph> <Paragraph position="9"> Has Iraq ever used biological weapons?).</Paragraph> <Paragraph position="10"> The results from this case study appears to confirm that interactive Q/A systems need to be sensitive to the intentions users bring to their interaction with the system. The Gricean maxim of quantity is supported in these cases: users participate in dialogues in order to obtain information that they do not already possess. Given this assumption, we predict that users' decompositions of complex scenarios focus on questions that allow them to maximize the new information obtained from the system while minimizing the amount of old (or previously-known) information that the system returns.</Paragraph> </Section> </Section> <Section position="13" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Lessons Learned </SectionTitle> <Paragraph position="0"> This section presents preliminary results from two experiments examining scenario decomposition in an interactive Q/A context. In Section 4.1 we discuss results which confirm that a database of question/answer pairs can be used to approximate the types of specific semantic knowledge necessary to process (and answer) domain-dependent complex questions. In Section 4.2, we outline five strategies for question decomposition employed by experts that could be use to improve the automatic processing of complex information-seeking scenarios.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Results of the Interactions based on QUAB </SectionTitle> <Paragraph position="0"> Recent research has shown that the precision of Q/A systems is dependent on the semantic coverage of their answer type hierarchies. For most current interactive Q/A systems, correct answers can only be returned iff a system is able to identify the answer type that most closely approximates a question's informational goal. In most cases, if the Q/A system cannot identify an appropriate answer type - or if the answer type does not exist in the semantic ontology - no answer can be returned.</Paragraph> <Paragraph position="1"> However, as we pointed out in Section 3.2, ontology creation may not be possible (or effective) for every semantic domain that users ask about. In order to answer domain-dependent questions, interactive Q/A systems need to incorporate ways of approximating the domain-specific information that their answer type hierarchies may lack. In this section, we present results that show that a database of question/answer pairs (known in our system as QUAB) can be used to improve interactive Q/A for domains that may not have completely specified answer-type hierarchies.</Paragraph> <Paragraph position="2"> The utility of QUAB was evaluated in a series of two &quot;Wizard-of-Oz&quot;-style dialogue pilot experiments conducted as part of the ARDA AQUAINT project. In each pilot, professional intelligence analysts interacted with LCC developers (and the LCC interactive Q/A system) through an Internet chat-style interface. LCC used the preparation time prior to the first experiment to seed QUAB with 140 domain-specific question/answer pairs. 182 additional question/answer pairs (based on 6 of the 12 Spring 2003 AQUAINT domains) were added to QUAB prior to the second pilot experiment as well.</Paragraph> <Paragraph position="3"> QUAB was primarily used to return answers for domain-specific questions that our interactive Q/A system could not process. Each user question was evaluated in terms of keyword and conceptual similarity with all of the question-answer pairs contained in QUAB; if no answer could be provided to the user's question, the most similar QUAB answers were returned. In results compiled from both pilots, QUAB provided exactly the correct answer 52% (39/75) of the time, and either exactly or partially the correct answer 73% (55/75) of the time. Table 1 presents these results organized by question domain.</Paragraph> <Paragraph position="4"> QUAB was also used to provide an interactive component to our Q/A system as well. Each question submitted by a user was compared to the database of question/answer pairs already contained in QUAB. If the user's question was deemed to be conceptually similar to an entry in QUAB, the user was informed that the system could return information &quot;related&quot; to the user's question. If a user requested this related information, the QUAB entry was presented to the user in the form of a question/answer pair. For example, when a user asked the question What facilities has Iraq used to produce biological weapons?, QUAB offered the answer to How does the US know about the existence of biological weapons plants in Iraq? as related information that could potentially facilitate the user's research.</Paragraph> <Paragraph position="5"> In the second dialogue pilot, question/answer pairs from QUAB were presented to the users a total of 27 times in 6 different dialogues. On average, contributions from QUAB made up approximately 35% of the questions and about 23% of the answers considered by users throughout the course of the dialogue. Table 2 presents results from the 6 domains considered in the second dialogue pilot. (It is important to note that in this pilot, users could ask the Q/A system to return more answers for any question; this explains why there are often more answers than questions in each of the dialogues.) The success of a relatively small QUAB suggest that this type of database construction may be an efficient way to augment interactive Q/A and answer-type detection for very domain-specific questions.</Paragraph> </Section> </Section> class="xml-element"></Paper>