File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/x98-1029_evalu.xml
Size: 2,774 bytes
Last Modified: 2025-10-06 14:00:35
<?xml version="1.0" standalone="yes"?> <Paper uid="X98-1029"> <Title>TIPSTER Information Extraction Evaluation: The MUC-7 Workshop</Title> <Section position="4" start_page="233" end_page="233" type="evalu"> <SectionTitle> FORMAL EVALUATION </SectionTitle> <Paragraph position="0"> The evaluation began with the distribution of the formal run test for NE at the beginning of March 1998. The training set of articles, ST guidelines and keys were made available at the beginning of March and one month afterward the test set of articles was made available by electronic transfer from SAIC. The deadline for completing the TE, TR, ST, and CO tasks was 6 April 1998 via electronic file transfer of system outputs to SAIC.</Paragraph> <Paragraph position="1"> Tests were run by individual participating sites at their own facilities, following a written test procedure. Sites could conduct official &quot;optional&quot; tests in addition to the basic test and adaptive systems were permitted. Each site's system output was scored according to the following categories with respect to the answer keys: correct, incorrect, missing, spurious, possible (affected by inclusion and omission of optional data) and actual. Metrics included recall (a measure of how much of the key's fills were produced in the response), precision (a measure of how much of the response fills are actually in the key), F-measure (combining recall and precision into one measure, and ERR (error per response fill).</Paragraph> <Paragraph position="2"> Additional supporting metrics of undergeneration, overgeneration, and substitution were provided as well. The scoring procedure was completely automatic. Initial results for five tasks are presented in Figure 1.</Paragraph> <Paragraph position="3"> independent relations that hold between these elements. The hope was that this would lead to performance improvements on the Scenario Template task. The evaluation domain for MUC-7 was concerned with vehicle launch events. The template consisted of one high-level event object with 7 slots, including two relational objects, three set fills, and two pointers to low-level objects. The domain represented a change from person-oriented domain of MUC-6 to a more artifact-oriented domain.</Paragraph> <Paragraph position="4"> While there have been important advances in information extraction for named entity tasks and substantial improvement in the other tasks for which these MUC evaluations were developed, much remains to be done to put production-level information extraction systems on users' desks. We leave these breakthroughs to future researchers with thanks and recognition of the groundbreaking efforts of all the MUC participants throughout the years.</Paragraph> </Section> class="xml-element"></Paper>