File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/c00-1043_evalu.xml

Size: 2,614 bytes

Last Modified: 2025-10-06 13:58:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1043">
  <Title>References</Title>
  <Section position="6" start_page="297" end_page="297" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> Both qualitative and quautitative evaluation of the integration of surface text-based and knowledge-based methods for Q/A is imposed. Quantitatively, Tal)le 3 summarizes the scores obtained when only shallow methods were emI)loyed, in contrast with the results when knowledge-based methods were integrated. We have sepm'ately measured the effect of tile integration of the knowledge-based methods at question processing and answer processing level.</Paragraph>
    <Paragraph position="1"> We have also evaluated the precision of the systern when both integrations were implemented. The results were the first five answes's returned within 250 bytes of text, when approximatively half million TREC documents are mined. Wc have used the 200 questions from TREC-8, mid tile correct answers provided by NIST. The performance was measured both with the NIST scoring method employed in the TREC-8 and by simply assigning a score of 1 tbr the question having a correct answer, regardless of its position.</Paragraph>
    <Paragraph position="2">  When using the NIST scoring method to evaluate an individual answer, we used only six values:(1, .5, .33, .25, .2, 0), representing the score the answer's question obtains. If the first answer is correct, it obtains a score of 1, if the second one is correct, it is scored with .5, if the third one is correct, tile score becomes .aa, if the fourth is correct, the score is .25 and if the fifth one is correct, the score is .2. Otherwise, it is scored with 0. No credit is given if multiple answers are correct. Table 3 shows that both knowledge-based methods enhanced the precision, regardless of the scoring method.</Paragraph>
    <Paragraph position="3"> To further evaluate the contribution of tim justificat, ion option, we evaluated separately the precision of the prover tbr those questions for which tile surface-text-based methods of our system, when operating alone, emmet find correct answers. We had 45 TREC-8 questions for which the evaluation of the prover was performed. Table 4 summarizes the accuracy of the prover.</Paragraph>
    <Paragraph position="4">  Qualitatively, we find that the integration of knowledge-based methods is very beneficial. Table 2 illustrates tile correct answer obtained with these methods, in contrast to tile incorrecl, answer provided when only the shallow techniques m'e al)plied.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML