File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/m92-1003_intro.xml
Size: 1,360 bytes
Last Modified: 2025-10-06 14:05:19
<?xml version="1.0" standalone="yes"?> <Paper uid="M92-1003"> <Title>THE STATISTICAL SIGNIFICANCE OF THE MUC-4 RESULTS</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> The MUC-4 scores of recall, precision, and the F-measures are used to measure the performance of the participating systems. The differences in the scores between any two systems may be due to chance or may be due to a significant difference between the two systems. To rule out the possibility that the difference is due to chance, statistical hypothesis testing is used. The method of hypothesis testing used is a computationally-intensive method known as approximate randomization. The method and the statistical significance of the results for the two MUC-4 test sets, TST3 and TST4, will be discussed in this paper.</Paragraph> <Paragraph position="1"> In our hypothesis testing, our objective was to determine whether a system is characteristically different from another system. This was achieved by comparing two systems to see if their actual difference in performance stands out in comparison with the results for random combinations of their scores. If their actual difference stands out, then we know that this difference could not have arisen by chance.</Paragraph> </Section> class="xml-element"></Paper>