File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/m92-1011_intro.xml

Size: 2,658 bytes

Last Modified: 2025-10-06 14:05:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="M92-1011">
  <Title>Precision A Matched/ Missing</Title>
  <Section position="2" start_page="0" end_page="108" type="intro">
    <SectionTitle>
NLP OBJECTIVES
</SectionTitle>
    <Paragraph position="0"> LSI's overall natural language processing (NLP) objective is the development of a broad coverage, reusable system which is readily transportable to additional domains, applications, and sublanguages in English, as well as providing a foundation for our multilingual work . Our system, called DBG, for Data Base Generator, is comprised of a set of NLP components which have been developed, extended, and rebuilt over a period of some years. The core of the system is an innovative Principle-based parser, using ideas from [1], which we began developing in the course of MUC-3 to replace our previous chart parser. Our approach thus relies on the concept of powerful, robust parsing as the most crucial component in an NLP system. In applying our NLP system to text extraction, our ultimate objective is to develop a high quality text extraction system, where &amp;quot;high quality&amp;quot; is defined as scoring above 80% -- a number well beyond any current MUC scores.</Paragraph>
    <Paragraph position="1"> In line with these NLP objectives, our major focus for MUC-4 was a follow-up to our main &amp;quot;lesson learned&amp;quot; i n MUC-3, which was to acquire a machine-readable dictionary (MRD) and integrate its content into the DBG system. When attempts to acquire the computer-friendly Longmans or one of the Oxford Dictionaries were unsuccessful, we turned to ACL's CD-ROM containing the Collins English Dictionary . The most correct version of the CED on the ACL CD-ROM was apparently developed directly from a medium prepared for the typographer, and unfortunately lacks any documentation of features, fonts, language, etc . The effort of acquiring an d integrating the CED was clearly a worthwhile endeavor, since we were able to increase the number of entries i n our lexicon three-fold in a relatively short time (see Table 1) . The increase in lexicon size will benefit all th e applications LSI is currently working on.</Paragraph>
    <Paragraph position="2">  1. The work reported in this paper was supported in part by the Defense Advanced Research Projects Agency , Software and Intelligent Systems Technology Office, ruder Contract No . N66001-90-C-0192 (Subcontrac t 19-930042-31 to SAIC), and by the U . S. Army Ballistic Research Laboratory under Contract No .</Paragraph>
    <Paragraph position="3"> DAAA15-89-C-0004 (Subcontract No. 05-562-01 to Logicon, Inc .)</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML