File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/m92-1016_metho.xml

Size: 10,160 bytes

Last Modified: 2025-10-06 14:13:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="M92-1016">
  <Title>PARAMAX SYSTEMS CORPORATION :MUC-4 TEST RESULTS AND ANALYSI S</Title>
  <Section position="3" start_page="0" end_page="129" type="metho">
    <SectionTitle>
TEST RESULTS
</SectionTitle>
    <Paragraph position="0"> The Paramax MUC-4 system's ALL TEMPLATES score summaries for the TST3 and TST4 test set s are listed below . The Paramax system generated more spurious responses in each of the two tests than any other system : the average number of TST3 spurious responses for all systems participating in MUC- 4 was 883 and the average number of TST4 spurious responses was 867 ; the Paramax system generated  Since the Paramax MUC-4 implementation is substantially different from the Paramax MUC-3 submission, the two systems are difficult to compare .2 The rules developed for the MUC-4 system were initially based on rules developed for the MUC-3 system, but the MUC-3 and MUC-4 rule formalisms are significantly different in structure and functionality . In Figure 1, the TST2 scores for the Paramax MUC-3 system and the TST3 progress scores for the MUC-4 system are listed .3 An examination of the scores in Figure 1 indicates that improvements in recall between MUC-3 an d MUC-4 have generally resulted in degraded precision scores . However, the P&amp;R F scores for the MUC- 3  TST2 and MUC-4 TST3 evaluations indicate an improvement of 1 .34 in overall performance . F measures are determined using the following formula : F -- (p';1 .0)xPxR where P is precision, R is recall, and Q is the relative importance given to recallp xP} R over precision.4 No analyses of statistical significance were performed among MUC-3 TST2 and MUC-4 TST3 performances. However, analyses of statistical significance were performed among F scores across system s participating in MUC-4 . The results of these analyses indicate that for the P&amp;R F measure (in which precision and recall are given equal weight), there was no significant difference in performance between th e Paramax system and the system submitted by SRA . Similarly, on the 2P&amp;R F measure (in which precisio n is given more weight), there was no significant difference in performance between the Paramax system an d the systems submitted for evaluation by McDonnell-Douglas (MDC) and New Mexico-Brandeis (NM-BR) . Finally, on the P&amp;2R F measure (in which recall is given more weight), there was no significant differenc e in performance between the Paramax system and the system submitted by BBN . Appendix G provide s additional information on F scores and how the analyses of statistical significance were performed .</Paragraph>
  </Section>
  <Section position="4" start_page="129" end_page="130" type="metho">
    <SectionTitle>
ANALYSIS
</SectionTitle>
    <Paragraph position="0"> The Paramax MUC-4 implementation satisfied the key goal of its developers : a fast rule development cycle. The Paramax MUC-3 system was implemented using a forward-chaining engine called Pfc, whic h is written in Prolog . Although the Pfc rule formalism has a number of interesting properties, includin g in particular a mechanism for easily escaping to Prolog in order to use Prolog's built-in factbase and t o reason in a backward-chaining fashion, the system as a whole was inefficient . Processing a standard test set of 100 messages using the MUC-3 implementation required 40 hours of processing time running on three separate Sun workstations. In contrast, the Paramax MUC-4 system implemented in CLIPS ca n process 100 messages in just 31. hours on one Sun workstation . This dramatic improvement in the rule development cycle made it possible to achieve a respectable level of performance in a small amount of time.</Paragraph>
    <Paragraph position="1"> The mid-range performance of the Paramax MUC-4 system could have been significantly improved i f additional staffing had been available to better engineer the implementation .' After the MUC-4 test, i t was determined that a bug existed in the preprocessing code for recognizing sentence boundaries--sentenc e endings terminated by double quotes were not recognized . Since sentence boundaries play a very important role in determining the relative likelihood of possible slot values, this problem had a significant impact on the accuracy of the system's slot value preferencing heuristics . The problem could have been easily resolved if enough staffing had been available to more carefully examine system output during trainin g runs. Bugs in the forward-chaining rule base were also discovered after the MUC-4 test that would hav e been easy to correct and that had a dramatic cumulative impact on performance . Examples of such bug s are given in the Paramax MUC-4 system summary.</Paragraph>
    <Paragraph position="2"> The Paramax system 's high rate of spurious responses was caused by a poor performance in establishin g coreference among event descriptions . This poor performance was caused in large part by a lack o f time/staffing to develop routine heuristics for merging similar templates . For example, in some cases the Paramax system would generate two identical templates for the same message . In other cases, the same target would arise in two different templates of the same type for the same message (ie, the same building would be bombed, the same individual would be killed, and so forth) . Improving the set of heuristics use d to establish object coreference will be a top priority for the Paramax team in MUC-5 . These improvements should result in a lower rate of spurious responses .</Paragraph>
    <Paragraph position="3"> 41n the case of P&amp;R F scores, for which recall and precision are given equal weight, ,0 = 1 .0.</Paragraph>
    <Paragraph position="4"> 'No formal mechanism exists for determining the level of effort dedicated to the development of MUC-4 systems, and the informal estimates offered by the participating research groups are surely inaccurate . We estimate that implementations which performed better than the Paramax system generally involved double the staffing level--most such systems wer e developed with government support, which is not the case for the Paramax system .</Paragraph>
  </Section>
  <Section position="5" start_page="130" end_page="130" type="metho">
    <SectionTitle>
CONCLUDING REMARK S
</SectionTitle>
    <Paragraph position="0"> The Paramax MUC-4 system takes about 32 hours to process 100 messages on a Sparc2 with 32MB o f memory and a normal CPU load (ie, with a text editor or two in use) . The CLIPS-based data extraction component's average elapsed processing time per text in the MUC-4 TST3 data set is 1 minute, 47 seconds .</Paragraph>
    <Paragraph position="1"> This processing speed permits a fast rule development cycle, which is critical in building knowledge-base d systems.</Paragraph>
    <Paragraph position="2"> A failure to insure a rapid rule development cycle is a common mistake among research groups that ar e not accustomed to building large-scale text processing systems . This mistake was made by a number o f research groups in MUC-3, and the Paramax team and other research groups, most notably SRI, rectifie d this mistake in MUC-4. The MUC-4 development strategies of Paramax and SRI were roughly similar : a rapid rule development cycle was insured by stripping away inefficient linguistic analysis techniques . The SRI MUC-4 system performed significantly better than the Paramax submission, but this is very likely a result of greater staffing resources than the consequence of some fundamental difference in approach .</Paragraph>
    <Paragraph position="3"> For both the Paramax and SRI research teams, the decision to eliminate linguistic analysis technique s was more a recognition of the primary importance of satisfying the requirements of knowledge-based systems than it was a rejection of linguistic analysis as a useful methodology in text processing . Linguisti c analysis is still clearly necessary for achieving finer-grained data extraction capabilities, but additional re search must be performed to improve the efficiency and robustness of the techniques. Meanwhile, the dat a extraction capabilities of systems with only rudimentary linguistic analysis techniques are capable of generating data bases with sufficient detail to cause researchers to begin worrying about system developmen t issues beyond the data extraction process itself. Paramount among these issues is the need to perform object coreference on the database level--in other words, to recognize that multiple database records ar e describing the same object. Until object cofererence on the data base level becomes a manageable problem , it will be difficult to use the data bases that are now being extracted .</Paragraph>
    <Paragraph position="4"> The decision on the part of the Paramax team to build a completely new text processing implementatio n for MUC-4 was a difficult one to make. Although it was clearly necessary to achieve a fast rule developmen t cycle, it was also clear that building a new implementation in only a couple of months with limited staffin g was a high risk venture. But in retrospect, the Paramax team is confident that the right decision wa s made; system development requirements were prioritized and the need for a rapid rule development cycl e came out on top .</Paragraph>
    <Paragraph position="5"> What is truly surprising is that the Paramax MUC-4 system did as well as it did, given the level o f effort that went into developing it . CLIPS has proven to be an excellent choice for building rule-base d text analysis systems : it is an extremely fast forward-chaining engine, and it is easily integrated wit h other analysis components . Several CLIPS rule modules developed for the MUC-4 system can be reused , particularly the rules used to recognize proper names . Since the MUC-4 test, the Paramax team ha s implemented a specialized proper name database containing over 9,000 entries in C in order to reduce the size of the CLIPS fact base . This strategy should further improve the modularity and reasoning efficienc y of the text processing system .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML