XML Viewer - c04-1189

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1189_metho.xml
Size: 26,811 bytes
Last Modified: 2025-10-06 14:08:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1189">
  <Title>HITIQA: Towards Analytical Question Answering</Title>
  <Section position="4" start_page="2" end_page="2" type="metho">
    <SectionTitle>
QA
</SectionTitle>
    <Paragraph position="0"> evaluations. Given the excellent results posted by the best systems and an adequate performance attained even by some entry-level system, we believe that the process of factoid question answering is now fairly well understood (Harabagiu et al., 2002; Hovy et al., 2000; Prager at al., 2001, Wu et al., 2003).</Paragraph>
    <Paragraph position="1"> In contrast to a factoid question, an analytical question has a virtually unlimited variety of syntactic forms with only a loose connection between their syntax and the expected answer.</Paragraph>
    <Paragraph position="2"> Given the many possible forms of analytical questions, it would be counter-productive to restrict them to a predefined number of question/answer types. Therefore, the formation of an answer in analytical QA should instead be guided by the user's intended interest expressed in the question, as well as through any follow up dialogue with the system. This clearly involves user's intentions (the speech acts) and how they evolve with respect to the overall information strategy they are pursuing.</Paragraph>
    <Paragraph position="3"> In this paper we argue that the semantics (though not necessarily the intent) of an analytical question is more likely to be deduced from the information that is considered relevant to the question than through a detailed analysis of its particular form. We noted that the questions analysts ask, while clearly part of a strategy, are generally quite flexible and &amp;quot;forgiving&amp;quot;, in the sense that there is always a strong possibility that the answer may not arrive in the expected form, and thus a change of strategy, and even the initial expectations, may be warranted. This suggests strongly that a solution to analytic QA must involve a dialogue that combines information seeking and problem solving strategies.</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3 Document Retrieval
</SectionTitle>
    <Paragraph position="0"> HITIQA works with unstructured text data, which means that a document retrieval step is required to detect any information that may be relevant to the user question. It has to be noted that determining &amp;quot;relevant&amp;quot; information is not the same as finding an answer; indeed we can use relatively simple information retrieval methods (keyword matching, etc.) to obtain perhaps 200 &amp;quot;relevant&amp;quot;  TREC QA is the annual Question Answering evaluation sponsored by the U.S. National Institute of Standards and Technology www.trec.nist.gov documents from a database. This gives us an initial information space to work on in order to determine the scope and complexity of the answer, but we are nowhere near the answer yet. The current version of HITIQA uses the INQUERY system (Callan et al., 1992), although we have also used SMART (Buckley, 1985) and other IR systems (such as Google).</Paragraph>
  </Section>
  <Section position="6" start_page="2" end_page="4" type="metho">
    <SectionTitle>
4 Text Framing
</SectionTitle>
    <Paragraph position="0"> In HITIQA we use a text framing technique to delineate the gap between the possible meaning of the user's question and the system &amp;quot;understanding&amp;quot; of this question. We can approximate the meaning of the question by extracting references to known concepts in it, including named entities. The information retrieved from the database may well lead to other interpretations of the question, and we need to determine which of these are &amp;quot;correct&amp;quot;. The framing process imposes a partial structure on the text passages that allows the system to systematically compare different passages against each other and against the question. Framing is not attempting to capture the entire meaning of the passage; it needs to be just sufficient enough to communicate with the user about the differences in their question and the returned text. In particular, the framing process may uncover topics or aspects within the answer space which the user has not explicitly asked for, and thus may be unaware of their existence. If these topics or aspects align closely with the user's question, (i.e., matching many of the salient attributes) we may want to make the user aware of them and let him/her decide if they should be included in the answer.</Paragraph>
    <Paragraph position="1"> Frames are built from the retrieved data, after clustering it into several topical groups. Passages are clustered using a combination of hierarchical clustering and n-bin classification (Hardy et al., 2002a). Each cluster represents a topic theme within the retrieved set: usually an alternative or complimentary interpretation of the user's question. Since clusters are built out of small text passages, we initially associate a frame with each passage that serves as a seed of a cluster. We subsequently merge passages and their associated frames to arrive at one or more combined frames for the cluster.</Paragraph>
    <Paragraph position="2"> HITIQA starts text framing by building a general frame on the seed passages of the clusters and any of the top N (currently N=10) scored passages that are not already in a cluster. The general frame represents an event or a relation involving any number of entities, which make up the frame's attributes, such as LOCATION, PERSON, ORGANIZATION, DATE, etc. Attributes are extracted from text passages by BBN's Identifinder, which tags 24 types of named entities. The event/relation itself could be pretty much anything, e.g., accident, pollution, trade, etc. and it is captured into the TOPIC attribute from the central verb or noun phrase of the passage. In the general frame, attributes have no assigned roles; they are loosely grouped around the TOPIC (Figure 2).</Paragraph>
    <Paragraph position="3"> We have also defined three slightly more specialized typed frames by assigning roles to selected attributes in the general frame. These three &amp;quot;specialized&amp;quot; frames are: (1) a Transfer frame with three roles including FROM, TO and OBJECT; (2) a two-role Relation frame with AGENT and OBJECT roles; and (3) an one-role Property frame. These typed frames represent certain generic events/relationships, which then map into more specific event types in each domain. Other frame types may be defined if needed, but we do not anticipate there will be more than a handful all together.</Paragraph>
    <Paragraph position="4">  For example, another 3-role frame may be State-Change frame with AGENT, OBJECT and INSTRUMENT roles, etc.</Paragraph>
    <Paragraph position="5">  FRAME TYPE: General TOPIC: imported LOCATION: Iraq, France, Israel ORGANIZATION: IAEA [missed: Nukem] PERSON: Leonard Spector WEAPON: uranium, nuclear bomb DATES: 1981, 30 November 1990, ..</Paragraph>
    <Paragraph position="6"> FIGURE 2: A general frame obtained from the  text passage in Figure 3 (not all attributes shown). Where the general frame is little more than just a &amp;quot;bag of attributes&amp;quot;, the typed frames capture some internal structure of an event, but only to the extent required to enable an efficient dialogue with the user. Typed frames are &amp;quot;triggered&amp;quot; by appearance of specific words in text, for example the word export may trigger a Transfer frame. A single text passage may invoke one or more typed frames, or none at all. When no typed frame is invoked, the general frame is used as default. If a typed frame is invoked, HITIQA will attempt to identify the roles, e.g. FROM, TO, OBJECT, etc. This is done by mapping general frame attributes selected from text onto the typed attributes in the frames. In any given domain, e.g., weapon nonproliferation, both the trigger words and the role identification rules can be specialized from a  Scalability is certainly an outstanding issue here, and we are working on effective frame acquisition methods, which is outside of the scope of this paper. While classifications such as (Levin, 1993) or FrameNet (Fillmore, 2001) are relevant, we are currently aiming at a less detailed system.</Paragraph>
    <Paragraph position="7">  A more detailed discussion of possible frame types is beyond the scope of the current paper.</Paragraph>
    <Paragraph position="8"> training corpus of typical documents and questions. For example, the role-id rules rely both on syntactic cues and the expected entity types, which are domain adaptable.</Paragraph>
    <Paragraph position="9"> Domain adaptation is desirable for obtaining more focused dialogue, but it is not necessary for HITIQA to work. We used both setups under different conditions: the generic frames were used with TREC document collection to measure impact of IR precision on QA accuracy (Small et al., 2004). The domain-adapted frames were used for sessions with intelligence analysts working with the WMD Domain (see below). Currently, the adaptation process includes manual tuning followed by corpus bootstrapping using an unsupervised learning method (Strzalkowski &amp; Wang, 1996). We generally rely on BBN's Identifinder for extraction of basic entities, and use bootstrapping to define additional entity types as well as to assign roles to attributes.</Paragraph>
    <Paragraph position="10"> The version of HITIQA reported here and used by analysts during the evaluation has been adapted to the Weapons of Mass Destruction Non-Proliferation domain (WMD domain, henceforth). Figure 3 contains an example passage from this data set. In the WMD domain, the typed frames were mapped onto WMDTransfer 3-role frame, and two 2-role frames WMDTreaty and WMDDevelop. Adapting the frames to the WMD domain required very minimal modification, such as adding the WEAPON entity to augment the Identifinder entity set, generating a list of international weapon control treaties, etc.</Paragraph>
    <Paragraph position="11"> The Bush Administration claimed that Iraq was within one year of producing a nuclear bomb. On 30 November 1990... Leonard Spector said that Iraq possesses 200 tons of natural uranium imported and smuggled from several countries.</Paragraph>
    <Paragraph position="12"> Iraq possesses a few working centrifuges and the blueprints to build them. Iraq imported centrifuge materials from Nukem of the FRG and from other sources. One decade ago, Iraq imported 27 pounds of weapons-grade uranium from France, for Osirak nuclear research center. In 1981, Israel destroyed the Osirak nuclear reactor. In November 1990, the  MUC, the Message Understanding Conference, funded by DARPA, involved the evaluation of information extraction systems applied to a common task.</Paragraph>
    <Paragraph position="13"> (Humphreys et al., 1998). What we're trying to do here is to &amp;quot;fit&amp;quot; a frame over a text passage. This also means that multiple frames can be associated with a text passage, or to be exact, with a cluster of passages. Since most of the passages that undergo the framing process are part of some cluster of very similar passages, the added redundancy helps to reinforce the most salient features for extraction. This makes the framing process potentially less error-prone than MUC-style template filling.</Paragraph>
    <Paragraph position="14"> A very similar framing process is applied to the user's question, resulting in one or more Goal frames, which are subsequently compared to the data frames obtained from retrieved text passages. A Goal frame can be a general frame or any of the typed frames. Goal frames generated from the question, &amp;quot;Has Iraq been able to import uranium?&amp;quot; are shown in Figures 4 and 5.</Paragraph>
  </Section>
  <Section position="7" start_page="4" end_page="4" type="metho">
    <SectionTitle>
FRAME TYPE: General
</SectionTitle>
    <Paragraph position="0"> The frame in Figure 4 is simply a General frame which is invoked first. HITIQA then discovers that TOPIC=import denotes a Transfer-event in the WMD domain, so it creates a WMDTransfer frame that replaces the general frame. This new frame, shown in Figure 5, has three role attributes TRF_TO, TRF_FROM and TRF_OBJECT, plus the relation type (TRF_TYPE).</Paragraph>
    <Paragraph position="1"> Each role attribute is defined over an underlying general frame attribute (given in parentheses), which are used to compare frames of different types. The role-id rules rely both on syntactic cues and the expected entity types, which are domain adaptable.</Paragraph>
    <Paragraph position="2">  HITIQA automatically judges a particular data frame as relevant, and subsequently the corresponding segment of text as relevant, by comparison to the Goal frame. The data frames are scored based on the number of conflicts found with the Goal frame. The conflicts are mismatches on values of corresponding attributes, specifically when the data frame attribute list does not contain any of the entities in the corresponding Goal Frame attribute list. If a data frame is found to have no conflicts, it is given the highest relevance rank, and a conflict score of zero.</Paragraph>
    <Paragraph position="3"> All other data frames are scored with an increasing value based on the number of conflicts, score 1 for frames with one conflict with the Goal frame, score 2 for two conflicts etc. Frames that conflict with all information found in the query are given the score 99 indicating the lowest rank.</Paragraph>
    <Paragraph position="4"> Currently, frames with a conflict score 99 are excluded from further processing as outliers. The frame in Figure 6 is scored as relevant to the user's query and included in the answer space.</Paragraph>
  </Section>
  <Section position="8" start_page="4" end_page="4" type="metho">
    <SectionTitle>
5 Enabling Dialogue with the User
</SectionTitle>
    <Paragraph position="0"> Framed information allows HITIQA to automatically judge text passages as fully or partially relevant and to conduct a meaningful dialogue with the user about their content. The purpose of the dialogue is to help the user navigate the answer space and to negotiate more precisely what information he or she is seeking. The main principle here is that the dialogue is primarily content oriented. Thus, it is okay to ask the user whether information about the AIDS conference in Cape Town should be included in the answer to a question about combating AIDS in Africa.</Paragraph>
    <Paragraph position="1"> However, the user should never be asked if a particular keyword is useful or not, or if a document is relevant or not.</Paragraph>
    <Paragraph position="2"> Our approach to dialogue in HITIQA is modeled to some degree upon the mixed-initiative dialogue management adopted in the AMITIES project (Hardy et al., 2002b). The main advantage of the AMITIES model is its reliance on data-driven semantics which allows for spontaneous and mixed initiative dialogue to occur. By contrast, the major approaches to implementation of dialogue systems to date rely on systems of functional transitions that make the resulting system much less flexible. In the grammar-based approach, which is prevalent in commercial systems, such as in various telephony products, as well as in practically oriented research prototypes (e.g., DARPA Communicator; Seneff and Polifoni, 2000; Ferguson and Allen, 1998), a complete dialogue transition graph is designed to guide the conversation and predict user responses, which is suitable for closed domains only. In the statistical variation of this approach, a transition graph is derived from a large body of annotated conversations (e.g., Walker, 2000; Litman and Pan, 2002). This latter approach is facilitated through a dialogue annotation process, e.g., using Dialogue Act Markup in Several Layers (DAMSL) (Allen and Core, 1997), which is a system of functional dialogue acts.</Paragraph>
    <Paragraph position="3"> Nonetheless, an efficient, spontaneous dialogue cannot be designed on a purely functional layer.</Paragraph>
    <Paragraph position="4"> Therefore, here we are primarily interested in the semantic layer, that is, the information exchange and information building effects of a conversation. In order to properly understand a dialogue, both semantic and functional layers need to be considered. In this paper we are concentrating exclusively on the semantic layer.</Paragraph>
  </Section>
  <Section position="9" start_page="4" end_page="4" type="metho">
    <SectionTitle>
6 Clarification Dialogue
</SectionTitle>
    <Paragraph position="0"> The clarification dialogue is when the user and the system negotiate the information task that needs to be performed. Data frames with a conflict score of 0 form the initial kernel answer space and HITIQA proceeds by generating an answer from this space. Depending upon the presence of other frames outside of this set, the system may initiate a dialogue with the user. When the Goal frame is a general frame HITIQA first initiates a clarification dialogue on existing general data frames that have one conflict. All of these 1-conflict general frames are first grouped on their common conflict attribute. HITIQA begins asking the user questions on these near-miss frame groups, with the largest group first. The groups must be at least groups of size N, where N is a user controlled setting. This setting restricts of all HITIQA's generated dialogue. HITIQA then check for the existence of any data frames that are one of the three typed frames. Clarification dialogue will be initiated on these, when all of their general attributes agree with the general attributes of the Goal frame respectively. Alternatively, if the Goal frame is one of the three type specific frames, a clarification dialogue is first initiated on groups of one conflict data frames that are the same type as the Goal frame. The clarification dialogue will then continue to the remaining two type specific frames if any exist, and finally on to any General data frames.</Paragraph>
    <Paragraph position="1"> A 1-conflict frame has only a single attribute mismatch with the Goal frame. This could be a mismatch on any of the general frame attributes, for example, LOCATION, ORGANIZATION, TIME, etc., or in one of the role-assigned attributes, TO, FROM, OBJECT, etc. A special case arises when the conflict occurs on the TOPIC attribute, which indicated the event type. Since all other attributes match, we may be looking at potentially different events (though similar type events) involving the same entities, occurring at the same location or time. The purpose of the clarification dialogue in this case is to probe which of these additional events may be of interest to the user.</Paragraph>
    <Paragraph position="2"> Another special case arises when the Goal frame has a different value of the TYPE attribute than a data frame. The purpose of the clarification dialogue here is to see if the user wishes to expand the answer space to include events of a different type. This situation is illustrated in the exchange shown in Figure 7.</Paragraph>
    <Paragraph position="3"> The clarification question asked by HITIQA in  frame (Figure 5) to a partly matching frame (Figure 8) generated from another text passage.</Paragraph>
    <Paragraph position="4"> We note first that the Goal frame for this example is WMDTransfer type, while the data frame in Figure 8 is WMDDevelop type. Nonetheless, both frames match on their general-frame attributes WEAPON and LOCATION. Therefore, HITIQA asks the user if it should expand the answer space to include development of uranium in Iraq as well.</Paragraph>
    <Paragraph position="5"> User: &amp;quot;Has Iraq been able to import uranium?&amp;quot;</Paragraph>
  </Section>
  <Section position="10" start_page="4" end_page="4" type="metho">
    <SectionTitle>
HITIQA: IRAQ REPORTED TO HAVE
IMPORTED MISSILES AND URANIUM
</SectionTitle>
    <Paragraph position="0"> Supporting passage: ...that Iraq possesses 200 tons of natural uranium imported and smuggled from several countries. Iraq possesses a few working centrifuges and the blueprints to build them. Iraq imported centrifuge materials from Nukem of the FRG and from other sources. One decade ago, Iraq imported 27 pounds of weapons-grade uranium from France, for Osirak nuclear research center...</Paragraph>
    <Paragraph position="1"> HITIQA: &amp;quot;Are you also interested in background information on the uranium  During the dialogue, as new information is obtained from the user, the Goal frame is updated and the scores of all the data frames are reevaluated. If the user responds the equivalent of &amp;quot;yes&amp;quot; to the system clarification question in the dialogue in Figure 7, a corresponding WMDDevelop frame will be added to the set of active Goal frames and all WMDDevelop frames obtained from text passages will be re-scored for possible inclusion in the answer.</Paragraph>
    <Paragraph position="2">  in Figure 7.</Paragraph>
    <Paragraph position="3"> The user may end the dialogue at any point using the generated answer given the current state of the frames. Currently, the answer is simply composed of text passages from the zero conflict frames. In addition, HITIQA will generate a &amp;quot;headline&amp;quot; for the text passages in the answer space. This is done using a combination of text templates and simple grammar rules applied to the attributes of the passage frame. Figure 7 shows a portion of the answer generated by HITIQA for the Iraq query.</Paragraph>
  </Section>
  <Section position="11" start_page="4" end_page="4" type="metho">
    <SectionTitle>
7 HITIQA Preliminary Evaluations
</SectionTitle>
    <Paragraph position="0"> We have evaluated HITIQA in a series of workshops with professional analysts in order to obtain an in-depth and comprehensive assessment of the system usability and performance. In addition to evaluating our research progress, the purpose of these workshops was to test several evaluation instruments to see if they can be meaningfully applied to a complex information system such as HITIQA.</Paragraph>
    <Paragraph position="1"> For the participating analysts, the primary activity at these workshops involved preparation of reports in response to &amp;quot;scenarios&amp;quot; - complex questions that often encompass multiple subquestions, aspects and hypotheses. For example, in one scenario, analysts were asked ti locate information about the al Qaeda terorist group: its membership, sources of funding and activities. In another scenario, the analysts were requested to find information on the chemical weapon Sarin.</Paragraph>
    <Paragraph position="2"> Figure 9 shows one of the analytical scenarios used in these workshops. We prepared a database of over 1GByte of text documents; it included articles from the Center for Non-proliferation (CNS) data collected for the AQUAINT program and similar data retrieved from the web using Google. The analysts' task was to prepare a report &amp;quot;as much like what you would do in your normal work environment as possible.&amp;quot; Over the six days of the workshops, each analyst prepared five such reports in sessions of one to three hours. Each session involved multiple questions posed to the system, as well as clarification dialogue, visual browsing and report construction. Figure 10 shows an abridged transcript from another analytical session with  One of our primary concerns was to design tasks that were similar in scope and difficulty to those that the analysts are used to performing at work and to ensure that they felt comfortable using the system. 5 questions in the scenario evaluation dealt with this issue; for example, one question asked how the scenarios compared in difficulty with the tasks the analysts normally perform at work. The mean score for these five questions was 3.75 on a 5 point scale (five is the best score). The lowest score (M=2.88) was received on the question 'How did the scenario compare in difficulty to tasks that you normally perform at work?&amp;quot;; this slightly above average rating of difficulty of the tasks was quite satisfactory for our purposes.</Paragraph>
    <Paragraph position="3"> In the final evaluation, analysts were asked to rate their agreement with statements such as &amp;quot;Having HITIQA helps me find important information&amp;quot; (score 4.50), &amp;quot;Having Hitiqa at work would help me find information faster than I can currently find it&amp;quot; (score 4.33), and &amp;quot;Hitiqa would be a useful addition to the tools that I already have at work&amp;quot; (score 4.25). The mean normalized score for the combined final evaluation of Workshop I was 3.75 on the 5 point scale; this means that the system received many more ratings of 4 and 5 than of 1 and 2. Comments made by the analysts in the group discussion and in the individual interviews confirmed that analysts liked the interactive dialogue and were very pleased with the results.</Paragraph>
    <Paragraph position="4"> For example, one analyst said &amp;quot;I learned more about Sarin gas in 30 minutes than I probably would have at work in a half a day.&amp;quot; As desired, the analysts also made many suggestions for improving the interface and the interoperation of The department chief has requested a report by the close of business today on the nuclear arms program in Iraq and how it was influenced by the neighboring countries. List the extent of the nuclear program in each involved country including funding, capabilities, quantity, etc. Your report should also include key figures in Iraq nuclear program as well as in other countries in the region, and,any travels that these key figures have made to other countries in regards to a nuclear program, any weapons that have been used in the past by either country, any purchases or trades that have been made relevant to weapons of mass destruction (possibly oil trade, etc.), any ingredients and chemicals that have been used, any potential weapons that could be under development, countries that are involved or have close ties to Iraq or her trade partners, possible locations of development sites, and possible companies or organizations that these countries work with for their nuclear arms program. Add any other information relating to the Iraqi Nuclear Arms Programs. the visual and text display. For a research system undergoing its first rigorous evaluation, these results are very satisfactory - they support the value of the design of the HITIQA system, including the interactive mode and the visual display and encourage us to move forward with this approach.</Paragraph>
    <Paragraph position="5"> FIGURE 10: Fragment of an analytical session</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML