File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/91/m91-1007_concl.xml
Size: 1,345 bytes
Last Modified: 2025-10-06 13:56:41
<?xml version="1.0" standalone="yes"?> <Paper uid="M91-1007"> <Title>Matched Only Matched / Missin g All Template s</Title> <Section position="11" start_page="67" end_page="67" type="concl"> <SectionTitle> SUMMARY </SectionTitle> <Paragraph position="0"> The GE system performed very well on MUC-3, but our official run on TST2 produced scores substantiall y lower than our TST1 results, in spite of other tests that showed system improvement over time . Even in a revised run that fixed system-level problems, our TST2 score was about the same as on TST1 . In trying to explain why the performance was lower than our expectations, we made some interesting observations about the test, including the apparent relationship between template overgeneration and recall .</Paragraph> <Paragraph position="1"> The result of this analysis is that while the highest scoring systems all produced comparable results in th e MATCHED/MISSING row, there are major differences in the way the systems produced the results . We propose several ways of finding these differences in the score reports, as well as one correction to the tes t design to reduce some problems with template-level decisions . Finally, we strongly support the methodology of MUC while warning against repeated, prolonged testing in any single domain .</Paragraph> </Section> class="xml-element"></Paper>