XML Viewer - p00-1070

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/p00-1070_evalu.xml
Size: 7,270 bytes
Last Modified: 2025-10-06 13:58:38
<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1070">
  <Title>Importance of Pronominal Anaphora resolution in Question Answering systems</Title>
  <Section position="7" start_page="1" end_page="1" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> For this evaluation, several people unacquainted with this work proposed 150 queries  Here, we mean that #0Crstly we obtain the maximum number of repetitions for an antecedentinthe remaining list. After that, we extract from that list the antecedents that have this value of repetition. whose correct answer appeared at least once into the analysed collection. These queries were also selected based on their expressing the user's information need clearly and their being likely answered in a single sentence.</Paragraph>
    <Paragraph position="1"> First, relevant documents for each query were retrieved using the IR system described earlier. Only the best 50 matching documents were selected for QA evaluation. As the document containing the correct answer was included into the retrieved sets for only 93 queries #28a 62#25 of the proposed queries#29, the remaining 57 queries were excluded for this evaluation.</Paragraph>
    <Paragraph position="2"> Once retrieval of relevant document sets was accomplished for each query, the system applied anaphora resolution algorithm to these documents. Finally, sentence matching and ranking was accomplished as described in section 4.2 and the system presented a ranked list containing the 10 most relevant sentences to each query.</Paragraph>
    <Paragraph position="3"> For a better understandingof evaluation results, queries were classi#0Ced into three groups depending on the following characteristics: #0F Group A. There are no pronominal references in the target sentence #28sentence containing the correct answer#29.</Paragraph>
    <Paragraph position="4"> #0F Group B. The information required as answer is referenced via pronominal anaphora in the target sentence.</Paragraph>
    <Paragraph position="5"> #0F Group C. Any term in the query is referenced pronominally in the target sentence. null Group A was made up by 37 questions.</Paragraph>
    <Paragraph position="6"> Groups B and C contained 25 and 31 queries respectively. Figure 3 shows examples of queries classi#0Ced into groups B and C.</Paragraph>
    <Paragraph position="7"> Evaluation results are presented in Figure 4 as the number of target sentences appearing into the 10 most relevant sentences returned by the system for each query and also, the number of these sentences that are considered a correct answer. An answer is considered correct if it can be obtained by simply looking at the target sentence. Results Question: &amp;quot;Who is the village head man of Digha ?&amp;quot; Answer: &amp;quot;He is the sarpanch, or village head man of Digha, a hamlet or mud-and-straw huts 10  are classi#0Ced based on question type introduced above. The number of queries pertaining to each group appears in the second column. Third and fourth columns show base-line results #28without solving anaphora#29. Fifth and sixthcolumnsshow resultsobtained when pronominal references have been solved.</Paragraph>
    <Paragraph position="8"> Results show several aspects we have to takeinto account. Bene#0Cts obtained from applying pronominal anaphora resolution vary depending on question type. Results for group A and B queries show us that relevance to the query is the same as baseline system.</Paragraph>
    <Paragraph position="9"> So, it seems that pronominal anaphora resolution does not achieve any improvement.</Paragraph>
    <Paragraph position="10"> This is true only for group A questions. Although target sentences are ranked similarly, for group B questions, target sentences returned by baseline can not be considered as correct because we do not obtain the answer by simply looking at returned sentences. The correct answer is displayed only when pronominal anaphora is solved and pronominal references are substituted by the noun phrase they refer to. Only if pronominal references are solved, the user will not need to read more text to obtain the correct answer.</Paragraph>
    <Paragraph position="11"> For noun-phrase extraction QA systems the improvement is greater. If pronominal references are not solved, this information will  not be analysed and probably a wrong noun-phrase will be given as answer to the query. Results improve again if we analyse group C queries performance. These queries have the following characteristic: some of the query terms were referenced via pronominal anaphora in the relevant sentence. When this situation occurs, target sentences are retrieved earlier in the #0Cnal ranked list than in the baseline list. This improvement is because similarity increases between query and target sentence when pronouns are weighted with the same score as their referring terms. The percentage of target sentences obtained increases 38,71 points #28from 29,03#25 to 67,74#25#29. Aggregate results presented in Figure 4 measure improvement obtained considering the system as a whole. General percentage of target sentences obtained increases 12,90 points #28from 41,94#25 to 54,84#25#29 and the level of correct answers returned by the system increases 25,81 points #28from 29,03#25 to 54,84#25#29. At thispointwe need to considerthe following question: Will these results be the same for any other question set? Wehave analysed test questions in order to determine if results obtained depend on question test set. We argue that a well-balanced query set would have a percentage of target sentences that contain pronouns #28PTSC#29 similar to the pronominal reference ratio of the text collection that is being queried. Besides, we suppose that the probability of #0Cnding an answer in a sentence is the same for all sentences in the collection. Comparing LAT ratio of pronominal reference #2855,20#25#29 with the question test set PTSC we can measure how a question set can a#0Bect results. Our question set PTSC value is a 60,22#25. We obtain as target sentences containing pronouns only a 5,02#25 more than expected when test queries are randomly selected. In order to obtain results according to awell-balanced question set, we discarded #0Cve questions from both groups B and C. Figure 5 shows that results for this well-balanced question set are similarto previous results. Aggregate results show that general percentage of target sentences increases 10,84 points when solving pronominal anaphora and the level of correct answers retrieved increases 22,89 points #28instead of 12,90 and 25,81 obtained in previous evaluation respectively#29.</Paragraph>
    <Paragraph position="12"> As results show, we can say that pronominal anaphora resolution improves QA systems performance in several aspects. First, precision increases when query terms are referenced anaphorically in the target sentence. Second, pronominal anaphora resolution reduces the amount of text a user has to read when the answer sentence is displayed and pronominal references are substituted with their coreferent noun phrases. And third, for noun phrase extraction QA systems it is essential to solve pronominal references if a good performance is pursued.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML