File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/x96-1025_metho.xml
Size: 3,663 bytes
Last Modified: 2025-10-06 14:14:28
<?xml version="1.0" standalone="yes"?> <Paper uid="X96-1025"> <Title>SRA PROJECT FOR ARPA / USACOM</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> SRA PROJECT FOR ARPA / USACOM </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> I. INTRODUCTION </SectionTitle> <Paragraph position="0"> The USACOM project is part of the larger</Paragraph> </Section> <Section position="3" start_page="0" end_page="107" type="metho"> <SectionTitle> SAIC/BBN Advanced Joint Planning (AJP) Advanced Concepts Technology Demonstration </SectionTitle> <Paragraph position="0"> (ACTD). The primary objective of the project is to demonstrate the effectiveness of Tipster-sponsored data extraction technologies. Although this project is just now getting underway, we plan to investigate the applicability of data extraction technologies in a number of scenarios, including: Improved accuracy of text filtering based on integrating natural-language based extraction with detection (text retrieval) Using data extraction technology to automatically categorize incoming texts into predelrmed folders Integrated structured information extracted from texts as an additional source of input to the planning process Enabling the collection of aggregate information derived from texts for information presentation Methods of displaying text to focus on area of interest explicitly II. NAMETAG OVERVIEW NameTag consists of a software engine that applies name recognition rules to text, supported by lexical resources and limited lists of proper names. Users implement an application driver that invokes the engine and configures the processing. NameTag can either generate a document that has the names annotated with SGML (Standardized Generalized Markup Language), or provide a table of the names with indices (offsets) to the text -- &quot;annotation&quot; mode. A name recognition rule consists of a pattern and an action. The pattern is similar to a regular expression and consists of special operators and operands that match portions of text. Typically, patters recognize structural or contextual indicators of names and thus perform dynamic recognition. The action performs operations on the text, such as tagging a name with a classification. The rules are partitioned to form processing phases that primarily recognize one class of name. For example, NameTag has separate phases for recognizing personal names and organizational names. This feature allows for the recognition of certain classes of names to impact the recognition of other names. The lexical resources contain information about words, such as their part-of-speech and their meaning.</Paragraph> <Paragraph position="1"> III. PROJECT PLANS This project will integrate with two communitydeveloped reference architectures. These are the Joint Task Force (JTF) Reference Architecture and the Tipster Architecture. With respect to the JTF Reference Architecture, SRA's core data extraction product (NameTag) will be integrated into the Generic Server layer, with a customized data extraction prototype integrated into the Application Layer. With respect to the Tipster Architecture, a Tipster-compliant version of NameTag will be used. Additional portions of the developing architecture will be evaluated for the appropriateness of insertion into the demonstration (for example, the document manager or text annotator). The entire project will be evaluated by the Tipster Engineering Review Board (ERB) for documentation and certification as appropriate.</Paragraph> </Section> class="xml-element"></Paper>