File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/m91-1006_intro.xml

Size: 2,798 bytes

Last Modified: 2025-10-06 14:05:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="M91-1006">
  <Title>BBN PLUM: MUC-3 Test Results and Analysis</Title>
  <Section position="1" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Perhaps the most important facts about our participation in MUC-3 reflect our starting point and goals . In March, 1990, we initiated a pilot study on the feasibility and impact of applying statistical algorithms in natura l language processing. The experiments were concluded in March, 1991 and lead us to believe that statistical approaches can effectively improve knowledge-based approaches [Weischedel, et al., 1991a, Weischedel, Meteer, and Schwartz, 1991] . Due to nature of that effort, we had focussed on many well-defined algorithm experiments .</Paragraph>
    <Paragraph position="1"> We did not have a complete message processing system ; nor was the pilot study designed to create an applicatio n system.</Paragraph>
    <Paragraph position="2"> For the Phase I evaluation, we supplied a module to New York University . At the time of the Phase I Workshop (12-14 February 1991) we decided to participate in MUC with our own entry. The Phase I Workshop provide d invaluable insight into what other sites were finding successful in this particular application. On 25 February, we started an intense effort not just to be evaluated on the FBIS articles, but also to create essential components (e .g., discourse component and template generator) and to integrate all components into a complete message processin g system.</Paragraph>
    <Paragraph position="3"> Although the timing of the Phase II test (6-12 May) was hardly ideal for evaluating our site's capabilities, it wa s ideally timed to serve as a benchmark prior to starting a four year plan for research and development in messag e understanding. Because of this, we were determined to try alternatives that we believed would be different tha n those employed by other groups, wherever time permitted. These are covered in the next section .</Paragraph>
    <Paragraph position="4"> Our results were quite positive, given these circumstances . Our max-tradeoff version achieved 45% recall and 52% .precision with 22% overgenerating (See Figure 2 .) PLUM can be run in several modes, trading off recal l versus precision and overgeneration . Our other official run shows this tradeoff when PLUM is more conservative i n generating templates. In this mode, we achieved 42% recall, 58% precision, and only 14% overgeneration . (See Figure 3 .) This conservative version is actually our preferred mode of running the system . By being more conservative, recall dropped only 3 points, while precision increased 7 points and overgeneration was cut by one third.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML