File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-0705_evalu.xml
Size: 7,331 bytes
Last Modified: 2025-10-06 13:59:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0705"> <Title>Using Scenario Knowledge in Automatic Question Answering</Title> <Section position="9" start_page="36" end_page="38" type="evalu"> <SectionTitle> 6 Experimental Results </SectionTitle> <Paragraph position="0"> In this section, we present preliminary results from four sets of experiments which show how forms of textual and contextual entailment can enhance the quality of answers returned by an automatic Q/A system.</Paragraph> <Paragraph position="1"> Questions used in these experiments were gathered from human interactions with the interactive Q/A system described in (Hickl et al., 2006a). A total of 6 users were asked to spend approximately 90 minutes gathering information related to three different information-gathering scenarios similar to the one in Table 1. Each user researched two different scenarios, resulting in a total of 12 total research sessions. Once all research sessions were completed, linguistically well-formed questions were extracted from the system logs for each session for use in our experiments; ungrammatical questions or keyword-style queries were not used in our experiments. Table 2 presents a breakdown of the total number of questions collected for each of the 6 scenarios.</Paragraph> <Paragraph position="2"> ments.</Paragraph> <Paragraph position="3"> In order to evaluate the performance of our Q/A system under each of the experimental conditions described below, questions were re-submitted to the Q/A system and the top 10 answers were retrieved. Two annotators were then tasked with judging the correctness or relevance of each returned answer to the original question. If the answer could be considered to provide either a complete or partial answer to the original question, it was marked as correct; if the answer contained information that could not be construed as an answer to the original question, it was marked as incorrect. null</Paragraph> <Section position="1" start_page="36" end_page="37" type="sub_section"> <SectionTitle> 6.1 Textual Entailment </SectionTitle> <Paragraph position="0"> Following (Harabagiu and Hickl, 2006), we used TE information in order to lter answers identi ed by the Q/A system that were not entailed by the user's original question. After ltering, the top-ranked entailed answer (as determined by the Q/A system) was returned as the system's answer to the question. Results from both a baseline version and a TE-enhanced version of our Q/A system are presented in Table 4.</Paragraph> <Paragraph position="1"> Although no information from the scenario was used in this experiment, performance of the Q/A system increased by more than 6% over the base-line system for each of the three scenarios. These results suggest that TE can be used effectively in order to boost the percentage of relevant answers found in the top answers returned by a system: by focusing only on answers that are entailed by a user's question, we feel that systems can better identify passages that might contain information relevant to a user's information need.</Paragraph> </Section> <Section position="2" start_page="37" end_page="37" type="sub_section"> <SectionTitle> 6.2 Contextual Entailment </SectionTitle> <Paragraph position="0"> In order to evaluate the performance of our contextual entailment system directly, two annotators were tasked with identifying instances of CE amongst the passages and answers returned by our Q/A system. Annotators were instructed to mark a passage as being contextually entailed by a scenario only when the passage could be reasonably expected to be associated with one of the subtopics they believed to be entailed by the complex scenario. If the passage could not be associated with the extension of any subtopic they believed to be entailed by the scenario, annotators were instructed to mark the passage as not being contextually entailed by the scenario. For evaluation purposes, only examples that were marked by both annotators were considered as valid examples of CE.</Paragraph> <Paragraph position="1"> Annotators were tasked with evaluating three types of output from our Q/A system: (1) the ranked list of passages retrieved by our system's Passage Retrieval module, (2) the list of passages identi ed as being CE by the scenario, and (3) the set of answers marked as being CE by the scenario (AnsSet3). Results from the annotation of these passages are presented in Table 4.</Paragraph> <Paragraph position="2"> Annotators marked 39.3% of retrieved passages as being CE by one of the three scenarios. This number increased substantially when only passages identi ed by the CE system were considered, as annotators judged 48.6% of CE passages and 45.2% of CE- ltered answers to be valid instances of contextual entailment.</Paragraph> </Section> <Section position="3" start_page="37" end_page="37" type="sub_section"> <SectionTitle> 6.3 Intrinsic Evaluation </SectionTitle> <Paragraph position="0"> In order to evaluate the impact of CE on a Q/A system, we compared the quality of answers produced (1) when no CE information was used (AnsSet1), (2) when CE information was used to select a list of entailed paragraphs that were submitted to an Answer Processing module (AnsSet2), and (3) when CE information was used directly to select answers (AnsSet3). Results from these three experiments are presented in Table 5.</Paragraph> <Paragraph position="1"> formance.</Paragraph> <Paragraph position="2"> As with the TE-based experiments described in Section 7.1, we found that the Q/A system was more likely to return at least one relevant answer among the top-ranked answers when contextual entailment information was used to either rank or select answers. When CE was used to rank passages for Answer Processing (AnsSet2), accuracy increased by nearly 9% over the base-line (AnsSet1), while accuracy increased by almost 14% overall when CE was used to select answers directly (AnsSet3).</Paragraph> </Section> <Section position="4" start_page="37" end_page="38" type="sub_section"> <SectionTitle> 6.4 Extrinsic Evaluation </SectionTitle> <Paragraph position="0"> In order to evaluate the performance of the framework illustrated in Figure 6, we compared the performance of a question-focused MDS system ( rst described in (Lacatusu et al., 2006)) that did not use CE with a similar system that used CE to rank passages for a summary answer.</Paragraph> <Paragraph position="1"> When CE was not used, sentences identi ed by the system's Q/A and MDS system for each question were combined and ranked based on number of question keywords found in each sentence. In the CE-enabled system (analogous to the system depicted in Figure 6), only the sentences that were contextually entailed by the scenario were considered; sentences were then ranked using the real-valued entailment con dence computed by the CE system for each sentence. Results from this system are presented in Table 6.</Paragraph> <Paragraph position="2"> Although the CE-enabled system was more likely to return a scenario-relevant sentence in top position (48.23%) than the system that did not use CE (41.09%), differences between the systems were much less apparent when the top 5 answers returned by each system were compared.</Paragraph> </Section> </Section> class="xml-element"></Paper>