File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/x96-1005_metho.xml

Size: 4,971 bytes

Last Modified: 2025-10-06 14:14:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1005">
  <Title>ARCHITECTURE OVERVIEW</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ARCHITECTURE OVERVIEW
TIPSTER SE/CM
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE TIPSTER ARCHITECTURE
</SectionTitle>
    <Paragraph position="0"> The TIPSTER Architecture is a software architecture for providing Document Detection (i.e. Document Retrieval and Message Routing) and Information Extraction functions to text handling applications. The high level architecture is described in the Architecture Design Document.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PURPOSE OF THE ARCHITECTURE
</SectionTitle>
    <Paragraph position="0"> The TIPSTER Architecture is intended to facilitate the deployment into the workplace of advanced Document Detection and Information Extraction software. It provides a component and module design which has been jointly developed by a significant number of providers of advanced software of this type. In addition, this design meets the requirements of a number of US Government agencies.</Paragraph>
    <Paragraph position="1"> The Architecture was developed to meet the need for US Government agencies with similar text handling requirements to share some software modules and knowledge sources that meet these requirements. Use of the Architecture for Government procurements will also shorten the development process for new text handling applications, because a basis for design would already exist and be understood by vendor and customer alike. Finally, the Architecture will allow systems to be upgraded in a modular fashion as new text handling technology becomes available. Similarly, the research community can take advantage of the Architecture to facilitate the testing of new ideas in advanced text handling.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="33" type="metho">
    <SectionTitle>
SCOPE OF THE ARCHITECTURE
</SectionTitle>
    <Paragraph position="0"> The Architecture has been designed to meet a large number of text handling requirements for CIA, DIA, and NSA. It meets, however, only those requirements having to do with Document Detection and Information Extraction functions. Most requirements for other functions, such as Machine Translation or Optical Character Recognition must be met outside the TIPSTER Architecture. Selected requirements in these areas may be part of TIPSTER  Phase III as the Architecture is expanded. In addition, User Interface (GUI) requirements are not covered by the Architecture, but, are unique to the specific application. Analytical tools, such as link analysis tools, timelines, or other displays showing document clustering are considered part of the User Interface or the application. These tools lie outside the Architecture, but use information about document relevancy, relationships between documents, phrase lists, name lists, and relational or object data base records which has been exported by the functionality residing within the TIPSTER Architecture.</Paragraph>
  </Section>
  <Section position="5" start_page="33" end_page="34" type="metho">
    <SectionTitle>
ARCHITECTURE COMPONENTS
</SectionTitle>
    <Paragraph position="0"> There are four components: Detection, Extraction, Annotation, and Document Management. i&amp;quot; Detection encompasses the technology which does document retrieval and document or message routing.</Paragraph>
    <Paragraph position="1"> Extraction encompasses the technology which identifies specific entities and the relationships between entities in free text so they can be use to build a database.</Paragraph>
    <Paragraph position="2"> Annotation allows these two components to share information at a component level.</Paragraph>
    <Paragraph position="3"> Primarily, at present, it is the method for recording and passing forward the information developed by the Extraction component. Items of specific types, such as personal names, places, or organization names, for example, can be located in the text by appropriate annotators, and the text locations and data types can be passed to any other component or part of the application, through Annotations, for fiwther processing or viewing.</Paragraph>
    <Paragraph position="4"> The Document Management component handles the document storage and archive.</Paragraph>
    <Paragraph position="5"> This function can be performed by existing document managers or Commercial off the Shelf (COTS) products, such as a standard Data Base Management System (DBMS), with the addition of a wrapper to be compatible with the TIPSTER Architecture.</Paragraph>
    <Paragraph position="6"> The TIPSTER Architecture is explained in more detail in the &amp;quot;TIPSTER Text Phase II Architecture Concept&amp;quot; in this volume.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML