File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/x96-1042_metho.xml
Size: 46,362 bytes
Last Modified: 2025-10-06 14:14:24
<?xml version="1.0" standalone="yes"?> <Paper uid="X96-1042"> <Title>Architecture Committee</Title> <Section position="2" start_page="224" end_page="224" type="metho"> <SectionTitle> 2.0 ARCHITECTURE CONCEPT </SectionTitle> <Paragraph position="0"> The TIPSTER Architecture is a flexible, cost-effective, technology-independent framework for building text-processing applications for Government analysts. It is designed to accommodate a wide range of text-processing requirements, in particular: The TIPSTER Architecture is documented in an Interface Control Document (ICD), which specifies the form and content of the inputs and outputs to TIPSTER modules. The TIPSTER Architecture Design Document describes the design underlying the Architecture.</Paragraph> <Paragraph position="1"> The TIPSTER Architecture is modular. Modules are loosely coupled, communicating by means of shared data and control messages. The module interfaces are unambiguously defined in the ICD.</Paragraph> <Paragraph position="2"> The TIPSTER Architecture is open. It is designed to be easily extended and enhanced as text-processing technology advances, and as modules with new functionality are developed in response to the needs of particular applications. Modules that comply with the TIPSTER ICD specifications may be sharable by other applications. A list of TIPSTER-compliant modules will be maintained under Configuration Management. As a related service, the TIPSTER program will also maintain a catalog of other sharable resources, such as lexicons, gazetteers, data dictionaries, and query libraries, as well as tools to assist in building compliant applications.</Paragraph> <Paragraph position="3"> Procedures for establishing an application's compliance with the Architecture, as well as procedures for extending the Architecture itself, have been established. They are set forth in the Configuration Management Plan. The</Paragraph> </Section> <Section position="3" start_page="224" end_page="225" type="metho"> <SectionTitle> TIPSTER Configuration Control Board has the responsibility for determining compliance with and extensions to the </SectionTitle> <Paragraph position="0"> Architecture.</Paragraph> <Paragraph position="1"> Because the main benefits from the Architecture derive from its commonality, its usefulness will be directly proportional to the number of applications which are built upon it. The more applications use and contribute sharable modules and data, the better and more extensive the Architecture will become, and the more cost savings will be realized.</Paragraph> <Section position="1" start_page="224" end_page="225" type="sub_section"> <SectionTitle> 2.1 Architecture vs. Application </SectionTitle> <Paragraph position="0"> Care should be taken to distinguish the TIPSTER Architecture from an application which is built in compliance with it. The following section outlines some of the differences between the Architecture itself and an Architecturally-compliant application. This is followed by a description of how the Architecture and an application interact with one another, and by a definition of an &quot;Architecturally-compliant application&quot;.</Paragraph> <Paragraph position="1"> 2.1.1 What the Architecture Provides The TIPSTER Architecture provides: * functional descriptions for each of a set of modules * for each included module, the form and content of inputs to it and outputs from it.</Paragraph> <Paragraph position="2"> Architecture-related services include: * maintenance of a catalog of previously-built TIPSTER modules and Persistent Knowledge bases. * maintenance of the Architecture itself, through the Configuration Control Board.</Paragraph> <Paragraph position="3"> A TIPSTER-compliant application is responsible for: * selecting the programming language and operating system, and making other related hardware/software decisions determining what functionalities are needed to meet the User's needs for those functionalities covered by TIPSTER specifications, building (or obtaining from previous TIPSTER applications) the necessary modules, ensuring that the interfaces between them conform to details on documents.) 2.1.3 Differences between Architecture and Application As can be seen from the above, the differences between the Architecture and a TIPSTER-compliant application fall into three main areas: functionality covered, computing environment, and internals vs. interfaces. These are discussed below.</Paragraph> </Section> </Section> <Section position="4" start_page="225" end_page="229" type="metho"> <SectionTitle> 2.1.3.1 Functionality Covered </SectionTitle> <Paragraph position="0"> In the majority of cases, the set of modules in a given application will not be the same as the set of modules covered by the ICD.</Paragraph> <Paragraph position="1"> On the one hand, it is likely that the Architecture will define specifications for more modules than are needed by any particular application. For example, an application may be needed for document detection only, even though the Architecture also provides specifications for extraction-related modules. The selection of which Architecturallyspecified modules to use is the responsibility of the application.</Paragraph> <Paragraph position="2"> On the other hand, an application will also need some functionalities not covered by the Architecture. For example, it may require the parsing of aircraft tail numbers, even though no ICD specifications exist for such a module. Obviously, if an aircraft-tail-number parser were to be linked to TIPSTER modules, its input and output must conform to the output- and input specifications of the TIPSTER modules surrounding it.</Paragraph> <Paragraph position="3"> Every application will need at least one component not covered by the ICD: the User Interface Component. While the Architecture provides a means for handling a wide range of User needs, as well as a specification for the way in which the relevant data are input into the Architecture, it does not specify which of those User needs a particular application must satisfy, nor does it specify the manner in which the User Interface Component should operate. It is the application's responsibility to select the appropriate capabilities, to build the User Interface Component (including user commands, screen layout, and sequence of user operations), and to ensure that the output of this component meets the TIPSTER ICD specifications.</Paragraph> <Paragraph position="4"> In order to utilize the features of the Architecture to the fullest, most applications will also need to provide an additional module which is not covered by the Architecture: the Document Setup Module. The Architecture provides a means for handling many types of information within documents (e.g., different types of formatted text vs. free text; text in different languages; and text at different security levels). It also provides a specification of the way these different types of information should be marked as input into the Architecture. In other words, the Architecture specifies an internal structure to which documents being input into TIPSTER modules should conform, in order to ensure optimal processing. Since documents, as originally received, are not likely to conform to this structure, it will usually be necessary to convert documents as they exist in the external world into documents that conform to the internal document structure used by the Architecture. Ensuring that this happens (&quot;document setup&quot;) is the responsibility of the application. (Section 4.1.3 provides more details on documents and document setup.) The TIPSTER Architecture is general and designed for use in a variety of software and hardware environments. The selection of the computing environment is the responsibility of the application* 2.1.3.3 Interfaces vs. Internals The TIPSTER ICD defines interfaces (inputs and outputs) between modules. The internal operation of modules is left completely to the application. As long as the inputs and outputs of a module conform to ICD specifications, it is considered to be a TIPSTER module.</Paragraph> <Paragraph position="5"> It should be noted that a TIPSTER module may, in fact, comprise several modules from the point of view of a particular application. These sub-modules may be modules in themselves, or complete systems, or databases.</Paragraph> <Paragraph position="6"> Individually, they may or may not be TIPSTER-compliant. The existence or nature of sub-modules is irrelevant for the Architecture, as long as the module itself accepts TIPSTER-compliant input and outputs TIPSTER-compliant output. From the point of view of the Architecture, only one TIPSTER module is involved, and only its interfaces must conform to the ICD.</Paragraph> <Paragraph position="7"> Non-TIPSTER modules may be made TIPSTER-compliant by the use of wrappers which provide TIPSTER-compliant interfaces. In this case, the module plus the wrapper is equivalent to a single TIPSTER module. The relationship between the TIPSTER Architecture and TIPSTER applications is expected to be a close and mutually beneficial one. The Architecture will facilitate the development of text-processing applications, and new text-processing applications will contribute to the development of the Architecture* As noted above, the TIPSTER ICD, which defines the Architecture, specifies inputs and outputs to components and modules. Initially, those ICD specifications will be based on the module interfaces in the first applications built under the TIPSTER Text Phase II program.</Paragraph> <Paragraph position="8"> Developers of new TIPSTER-compliant text-processing applications will follow the existing ICD specifications for those modules whose functionalities are included in their application. Application development may be expedited by adapting previously-developed Architecturally-compliant modules.</Paragraph> <Paragraph position="9"> Modify the Module and its Interfaces Build a Wrapper around it</Paragraph> <Paragraph position="11"> compliant input m .... ~ Tipster Compliant output of previous module m input to non-Tipster module Non-compliant output of non-Tipster module m m input to Tipster compliant module ITiPstdegrMdegdu'deg I __ ... ,rnon-Tipster\]----, --Lmodul e \] Figure 2-2 Two Ways to &quot;TIPSTERize&quot; a Module It is likely that a new application will also require additional functionalities beyond those for which the ICD specifies an interface, especially in the early phases of the Architecture. It is expected that any newly-implemented functionality which is of potential use to other applications will be submitted to the TIPSTER Configuration Control Board (CCB) as an extension to the Architecture. If accepted, its specifications would be included in the ICD. (The TIPSTER Configuration Management Plan describes the procedures involved.) Application developers may also find that individual ICD specifications need to be modified and extended to meet their needs. Such modifications and extensions may be submitted to the CCB as proposed changes to the ICD. Because changes and improvements to the Architecture are expected, the Architecture will be under version control, administrated by the TIPSTER Configuration Control Board. TIPSTER applications will be associated with a particular version of the Architecture.</Paragraph> <Section position="1" start_page="227" end_page="229" type="sub_section"> <SectionTitle> 2.2 Available Help in Building a TIPSTER Application </SectionTitle> <Paragraph position="0"> Let us suppose that a developer has been given the task of building a TIPSTER-compliant text processing application that accomplishes certain tasks. The TIPSTER Architecture gives the developer several sources of help.</Paragraph> <Paragraph position="1"> First, the TIPSTER Architecture Committee will maintain a list of previously-built TIPSTER applications, the contractors who built them, and detailed information about each module in those applications. The developer may be able to obtain one or more of those modules from the contractor(s), and to modify them for his purposes with minimal effort.</Paragraph> <Paragraph position="2"> Even if no modules have yet been built which serve his purposes, the developer may be able to find help in the TIPSTER Interface Control Document. This document may contain specifications for some of the modules he needs to implement. By following those specifications, the developer is assured that his modules will be TIPSTERcompliant. null It may be the case, however, that the ICD does not include specifications for the modules the developer needs. In that case, he may look in the Architecture Design Document for guidance. Since this document gives the overall design of the TIPSTER Architecture, the developer may be able to determine where his modules would fit in the TIPSTER design. Once his modules are implemented, he may be able to submit them for consideration as extensions to the TIPSTER ICD. Details about the Configuration Management Policy are given in Section 5.0 of this document, as well as in the Configuration Management Plan.</Paragraph> <Paragraph position="3"> Finally, even if the developer can find no reference to his proposed modules in the Architecture Design Document, he may look in the Architecture Requirements Document. This document describes the goals of the TIPSTER Architecture over the long term, and would give the developer an indication as to whether or not his modules would even be considered relevant to TIPSTER.</Paragraph> </Section> </Section> <Section position="5" start_page="229" end_page="236" type="metho"> <SectionTitle> 3.0 BENEFITS OF THE ARCHITECTURE FOR INTERESTED PARTICIPANTS </SectionTitle> <Paragraph position="0"> Many people are involved in many different ways in working with a text processing application. Each has a different set of interests. Seven major groups of participants are identified below, according to their interests regarding a text processing application. (Depending on the agency or the contract, these roles may be combined.)</Paragraph> <Section position="1" start_page="229" end_page="231" type="sub_section"> <SectionTitle> Participant Interests End User </SectionTitle> <Paragraph position="0"> COTR/Program Manager Wants to be able to specify application needs efficiently Needs operational, useful, efficient applications May need application which can be developed in a short time and modified quickly as requirements change Must facilitate the identification of application contractors and potential teaming members Desires ability to specify application which meets End User needs at lowest cost in a timely manner Must evaluate application technical and cost alternatives Needs clear and unambiguous understanding of application design Must reduce the risk of fielding applications with new technology Technology Transfer Officer Would like flexible applications which can be easily integrated, reconfigured, and extended.</Paragraph> <Paragraph position="1"> Manager Concerned with the cost of a new application j Needs to know immediate staffing impact of a new application ! Needs accurate estimates of life-cycle support required (cost and</Paragraph> </Section> <Section position="2" start_page="231" end_page="231" type="sub_section"> <SectionTitle> 3.1 End User </SectionTitle> <Paragraph position="0"> The End User is expected to be someone in a United States Government agency who uses text processing applications.</Paragraph> <Paragraph position="1"> The End User is the person whose needs the application is designed to meet. The specification of those needs is usually made by the End User and the Technology Transfer Officer working as a team. Typically, at the procurement stage and during the early stages of development, more technologically-oriented input from the Technology Transfer Officer is required; as the project develops and is deployed, the End User provides more input regarding needs. The Architecture Requirements Document provides a standard framework and terminology in which to specify and discuss needs efficiently. This document is a compilation of all the text processing needs which are envisioned to be required by the U.S. Government during the life of the TIPSTER project. It is not necessary that the End User be familiar with the Architecture Requirements Document in order to benefit from it: the availability of this document, whether or not the End User is aware of it, will reduce the chances of misunderstanding or overlooking critical requirements.</Paragraph> <Paragraph position="2"> The End User is the person who most directly benefits from an operational, useful, efficient application. To ensure that an application meets these criteria, it must be tested. Because an Architecturally-compliant application will follow standard conventions for internal and external interfaces, it will be possible to build standard test suites. Easier and better testing will contribute to an application which better meets the End User's needs.</Paragraph> <Paragraph position="3"> The End User generally has immediate needs which must be met as quickly as possible. When the needs change, the technology should be able to adapt quickly. One of the goals of the Architecture is to provide a catalog of previously-developed TIPSTER modules which may be adapted for use in other applications, thus saving time in developing that module.</Paragraph> </Section> <Section position="3" start_page="231" end_page="231" type="sub_section"> <SectionTitle> 3.2 COTR/Program Manager </SectionTitle> <Paragraph position="0"> The COTR/Program Manager's responsibility is to ensure that the most suitable developer(s) are selected for a project, and to oversee the progress of the project. (For simplicity, the term COTR will be used in the following discussion.) For some projects, the COTR and the Technology Transfer Officer are the same person.</Paragraph> <Paragraph position="1"> The TIPSTER Architecture will help the COTR choose an Application Developer team, since a review of previous applications and modules will identify developers that provided/support a particular technology. This will also allow the COTR to recommend sources for specific technology and encourage teaming arrangements.</Paragraph> <Paragraph position="2"> Like the End User, the COTR/Program Manager must be able to specify requirements, but at a more technical level.</Paragraph> <Paragraph position="3"> The availability of standards for describing text processing components and interfaces should greatly facilitate COTR/contractor discussions about requirements.</Paragraph> <Paragraph position="4"> Once the requirements are established, the COTR must be able to evaluate the technical and cost alternatives for meeting them. Again, the availability of standards will provide a common ground for discussions. Also, the COTR will have access to information about any TIPSTER compliant components which have already been developed and which may be appropriate for use.</Paragraph> <Paragraph position="5"> The COTR must monitor and oversee the application design. The modular nature of the Architecture supports review packages that have well defined characteristics.</Paragraph> <Paragraph position="6"> The COTR must identify and focus attention on the more risky parts of his development effort. Because the list of TIPSTER applications and their modules will be available, the COTR will be able to identify what areas of his application development are new and under-tested, and thus probably more risky.</Paragraph> </Section> <Section position="4" start_page="231" end_page="232" type="sub_section"> <SectionTitle> 3.3 Technology Transfer Officer </SectionTitle> <Paragraph position="0"> The Technology Transfer Officer is responsible for the integration of an application into the End User's environment and for major upgrades to the application. For some projects, the Technology Transfer Officer and the COTR are the same person.</Paragraph> <Paragraph position="1"> The Architecture will promote easy insertion of new technology into existing environments because it defines a common set of external interfaces. For the same reason, reconfiguration and extension of an Architecturally-compliant application will be easier than for a non-compliant application.</Paragraph> </Section> <Section position="5" start_page="232" end_page="232" type="sub_section"> <SectionTitle> 3.4 Manager </SectionTitle> <Paragraph position="0"> The Manager is primarily concerned with the cost and staffing issues (both immediate and long term) which are associated with an application.</Paragraph> <Paragraph position="1"> Because the Architecture is modular and identifies equivalent components, a basis on which to compare these variables may exist. If a component has previously been used in another application, its associated costs, both to develop and to maintain, may be known. Its impact on staffing is also likely to be known. This provides a basis from which to estimate the cost and staffing required for a similar, new component.</Paragraph> <Paragraph position="2"> The more widely the Architecture is used, the better will be the basis on which to estimate cost and staffing requirements.</Paragraph> </Section> <Section position="6" start_page="232" end_page="232" type="sub_section"> <SectionTitle> 3.5 Application Developer </SectionTitle> <Paragraph position="0"> For the Application Developer, the Architecture will provide most of the same support that is available for the COTR in identifying appropriate TIPSTER technology, identifying teaming partners, monitoring the project, making risk assessment, and measuring performance. In addition, the Architecture will assist the Application Developer in the actual implementation of the application.</Paragraph> <Paragraph position="1"> The Architecture will assist the Developer in communicating with the End User/Technology Transfer Officer team, because everyone will have the same reference points for conceiving and evaluating the capabilities of TIPSTER modules, and everyone will be using the same terminology for those modules.</Paragraph> <Paragraph position="2"> The Architecture will help the Developer get an application to the End User more quickly and will provide a more flexible application design. It will provide a starting point for application development, because many (if not all) of the required components will be identified and their interfaces defined. Some previously-developed TIPSTER modules may be available which can be easily adapted for a new application. When new modules must be implemented, the standards for the interfaces will be pre-established, so that a minimum of investigation about related application modules will be required.</Paragraph> <Paragraph position="3"> Developers who do not produce entire applications will still be able to produce Architecturally-compliant components or modules in their areas of expertise. Since their interfaces will be standard, these pieces can readily be used by other developers.</Paragraph> </Section> <Section position="7" start_page="232" end_page="233" type="sub_section"> <SectionTitle> 3.6 R&D Researcher </SectionTitle> <Paragraph position="0"> The TIPSTER Architecture will support the R&D Researcher with many of the same capabilities as are provided for the Application Developer.</Paragraph> <Paragraph position="1"> The ability to test new ideas without having to develop all the components of an application is particularly attractive to the R&D Researcher. For example, if a Researcher has a new concept for extracting data, he may be able to use existing document management capabilities, Persistent Knowledge bases, matching functions and defined user interface to test the idea quickly and cost effectively. In this way, proof-of-concept and hypothesis testing can be performed with more comprehensive, realistic applications, and thus more effectively. The Researcher, operating in an established environment, is free to concentrate on specific, well-defined areas and not be burdened with developing the necessary infrastructure to test the new component. This will facilitate new research ideas, increase interest and expand the technology and the associated funding.</Paragraph> <Paragraph position="2"> The Architecture may also help the Researcher to look for gaps in the technology since all modules in TIPSTER applications will be documented. The Researcher who is considering implementing a new idea will thus be able to determine whether or not a similar TIPSTER module already exists.</Paragraph> <Paragraph position="3"> As for Application Developers, the Architecture will aid the Researcher in identifying collaborators to fill their technology gaps.</Paragraph> <Paragraph position="4"> The Architecture will make it easier for the Researcher to evaluate component interactions. With interfaces clearly defined, it will be easier for the Researcher to insert measurement tools around new and supporting components.</Paragraph> </Section> <Section position="8" start_page="233" end_page="234" type="sub_section"> <SectionTitle> 3.7 System Support Officer </SectionTitle> <Paragraph position="0"> The System Support Officer is mainly concerned with the life-cycle support of the application. The Architecture helps System Support Officers perform their work more effectively in several ways.</Paragraph> <Paragraph position="1"> Over the long term, it is envisaged that multiple applications will bc able to use a set of common Persistent Knowledge Repository items. This would simplify the task of maintaining the data in the Repository. For example, adding words to a lexicon should occur less often than if every application had its own lexicon. Also, the maintenance procedures should be more consistent across different types of Persistent Knowledge Repository items.</Paragraph> <Paragraph position="2"> The use of common shared modules in the Architecture will require the Systcm Support Officer to become knowledgeable about fewer application parts. This will expedite problem identification and reporting, facilitate direct user support, and in general provide a more structured and consistent operational environment. Also, if the same components have been used in different applications, it is likely that they have been exercised more extensively than non-shared components. A more robust TIPSTER application, requiring less maintenance, should result.</Paragraph> <Paragraph position="3"> Section 4.2. The reason for this is that often, different types of output need to be manipulated, thus becoming input. From a user's point of view (the point of view taken here), the distinction blurs.</Paragraph> </Section> <Section position="9" start_page="234" end_page="236" type="sub_section"> <SectionTitle> 4.1 Documents </SectionTitle> <Paragraph position="0"> Documents are one of the links between the outside world and the TIPSTER environment. The purposes of this to define the types of documents and document parts, which will be exploitable within the TIPSTER Architecture; * to describe types of setup processing which might be performed on a document to facilitate and improve the output of TIPSTER; * to illustrate some of the ways in which the TIPSTER Architecture might be extended for particular applications, in order to exploit a wider range of documents and document parts.</Paragraph> <Paragraph position="1"> Below are definitions of some terms as used in the discussion of TIPSTER documents. For TIPSTER purposes, it is useful to distinguish between different Forms of a document, on the basis of the types of processing performed on it. Forms 0 - 4 are defined below. Note that, for any given document, Forms 1 - 4 may or may not be distinct from one another, depending on the amount of processing performed on the document. * anything done to it between receipt by the End User's site and input into the TIPSTER Architecture.</Paragraph> <Paragraph position="2"> A Form 3 document may contain markups (e.g., the results of a non-TIPSTER document retrieval application). It may be reassembled, decrypted, decompressed.</Paragraph> <Paragraph position="3"> Form 4 Document - The internal TIPSTER document. A Form 4 document is a Form 3 document plus: * anything done to the document (e.g., identifying parts) to prepare it for the appropriate TIPSTER algorithms.</Paragraph> </Section> </Section> <Section position="6" start_page="236" end_page="242" type="metho"> <SectionTitle> FM DIA WASHINGTON DC INFO RUEAHQA/CSAF WASHINGTON DC RUETIAQ/MPC FT GEORGE G MEADE MD RUEHC /SECSTATE WASHINGTON DC RUEAIIA/CIA WASHINGTON DC RULKQAN/MARCORINTCEN QUANTICO VA RUDMONI/ONI SUITLAND MD//NAVATAC// </SectionTitle> <Paragraph position="0"> Viewed in this context, a particular natural language is one of many possible sets of conventions used to convey information within a document. Each set of conventions requires a different type of processing in order to exploit information represented in it. For example, the portion of a document that is contained in a table will require different processing from the portion of a document that is written in English, which will, in turn, require different processing from the portion of a document that is written in Arabic.</Paragraph> <Paragraph position="1"> A Document Part will be defined as a portion of a document which requires a particular type of processing, in order to exploit its information.</Paragraph> <Paragraph position="2"> In case no Parts are identified, the basic TIPSTER Architecture will make the default assumption that all Parts of a given document are text Parts of one particular, user specified language. Note that under this assumption, information conveyed through a combination of text and non-text may be partially exploited. For example, some information may be gleaned from the words in a table, without actually processing the table structure. 4.1.3.1 Markup of Documents A document markup will be defined as information which has been added to a document before it becomes a Form 4 document. Markups may be added manually or by automatic means, such as a message handling application or a detection application. The most common markup is probably to indicate Parts of a document. Markups may be embedded within the document or they may co-exist with it in the form of annotations, possibly containing pointers to specific locations in the document.</Paragraph> <Paragraph position="3"> To exploit Document Parts, markups must indicate, at a minimum, boundaries between the Parts and identify the Part type. Then the appropriate type of processing can be applied to each Part. An application must be able to ignore any markups it does not use; that is, unused markups should not cause it to break.</Paragraph> <Paragraph position="4"> The TIPSTER program has undertaken to identify several types of Document Parts. These include some message headers and two languages, English and Japanese. The identification of other Part types will be added on a case-by-case basis by Architecture users who require the exploitation of those parts.</Paragraph> <Section position="1" start_page="238" end_page="238" type="sub_section"> <SectionTitle> 4.1.3.2 Processing of Document Parts </SectionTitle> <Paragraph position="0"> As noted above, if no Parts are identified, the basic TIPSTER Architecture will make the default assumption that all Parts of a given document are text Parts of one particular, user specified language. Obviously, if that document is to yield information, that default language must be one of the languages for which TIPSTER modules and components exist.</Paragraph> <Paragraph position="1"> In addition, the TIPSTER Architecture will process Part types that are traditionally viewed as part of Natural Language Understanding (NLU), such as communication headers.</Paragraph> <Paragraph position="2"> As the TIPSTER Architecture grows, processing capabilities for additional generic Part types can be added. Although the Architecture will theoretically be extendible to accommodate the processing of any Part type (e.g., LISP, spreadsheets), it is expected that the emphasis for the TIPSTER program will be primarily on processing textual information. In particular, Parts in any natural language will be processable within the Architecture, provided that the language can be identified and indicated through markups, and that Architecturally-compliant modules and components for the language exist.</Paragraph> <Paragraph position="3"> The TIPSTER Architecture is not envisioned to encompass the processing of non-generic Parts, such as specific types of tables or outlines. Such processing could be included in an application, but outside the Architecture compliant portion.</Paragraph> </Section> <Section position="2" start_page="238" end_page="239" type="sub_section"> <SectionTitle> 4.1.3.3 Syntax of Markups </SectionTitle> <Paragraph position="0"> Markups present in Form 3 documents require a key before the information they represent can be used. For example, if the markups were in the Standard Generalized Markup Language (SGML), then the syntax and definitions (e.g., embedded tags <NAME> </NAME> mark a name) would need to be available in the form of &quot;Document Type Definitions&quot; (DTDs).</Paragraph> <Paragraph position="1"> The Architecture will provide a standard for document markups. During document setup, pre-existing markups must be converted to the standard, in order to be used by the TIPSTER parts of the application. Markups which are not needed by the application need not be converted.</Paragraph> <Paragraph position="2"> It may be the case that the Architecture does not include a standard for some pre-existing markup which would be useful for some other part of the application. For example, formatting markup for bolding, which may not be specified in the Architecture, might be used by the (application-specific) User Interface Component. Because the &quot;original&quot; document is always retained, all pre-existing markups are available to the application, whether or not they are converted to the Architecture standard.</Paragraph> </Section> <Section position="3" start_page="239" end_page="241" type="sub_section"> <SectionTitle> 4.2 Interactions with the Application </SectionTitle> <Paragraph position="0"> This section discusses some ways in which users will need to be able to interact with a text handling application. An application need not provide all of these capabilities to be Architecturally compliant. However, if it does provide any of these capabilities, in order to be compliant, it must follow the TIPSTER standards for user/application interfaces. Of course, the Architecture also does not prohibit an application from providing additional capabilities which are outside the scope of the Architecture.</Paragraph> <Paragraph position="1"> Different users of a TIPSTER application will typically have different types of interactions with the application, to accomplish different tasks. This discussion is organized around typical interactions of the End User, the Application Developer, and the System Support Officer. It is not meant to imply that only those individuals should be able to make those inputs.</Paragraph> <Paragraph position="2"> The End User will need to have the ability to interact with the application in many ways. Most obviously, he will submit &quot;information requests&quot;: if the application is a detection application, this will most likely be a retrieval request or a routing request, and if the application is an extraction application, it will probably be a request to fill templates with information from documents. The End User must be able to control many other aspects of the application as well. The Architecture will provide standards for handling the following types of End User interactions.</Paragraph> <Paragraph position="3"> 4.2.1.1 Detection Information Requests a. creating detection criteria in the form of a statement of relevance (text), an example (text), a Boolean expression, keywords (including negative operators), or a combination of these elements b. manipulating and refining detection criteria (e.g., reformulating them, annotating them with priorities which are then transferred to the returned documents, weighting parts of the detection criteria) c. using external databases (e.g., word lists) to help formulate a query. If such a list is used frequently by several applications it may be added to a Persistent Knowledge base.</Paragraph> <Paragraph position="4"> d. storing, searching, and retrieving detection criteria e. comparing results of using different versions of detection criteria f. specifying the following aspects for the returned document list: criteria for prioritizing (e.g., relevance, author, date of document, date of receipt, subject), destination (e.g., file, extraction application), cutoff point for ranked document list, additional document related information to be returned with the document list (e.g., classification, bibliographic information, abstract), order in which the information returned with the document list is to be presented g. saving and manipulating the document list (e.g., re-ordering it manually) h. searching, simultaneously, an archival index and an index for newly acquired documents and merging their results i. searching foreign language documents using an English statement of relevance and receiving a single ranked list of documents j. annotating foreign language documents with English glosses and viewing the annotated document k. associating specific portions of the Detection Criteria to specific zones of a document a. accessing a library of templates (for pattern based extraction applications) to reuse or to use as a basis for a new template b. writing, or selecting from a library, fill-rules and patterns to accompany a template c. combining fill-rules, templates and documents in various languages d. editing filled templates 4.2.1.3 Information Requests Common to Both Detection and Extraction a. viewing the original document b. creating ordered and unordered groups of documents c. specifying multiple sources from which to get information (e.g., multiple collections, message streams, databases) d. narrowing a search by using attributes of the document or part of document (e.g., date, source, language) e. clustering documents by similarity; identifying duplicate documents f. viewing document with reason for selection/slot fill (e.g., for detection, highlighting the text that caused the document to be selected for detection, or presenting a bar graph with the weighting of terms; for extraction, highlighting the source text for any given slot fill) g. viewing data about documents (e.g., size, classification, author, dates) and collections (e.g., concordances, KWlC indexes) h. finding documents containing identical passages and annotating them i. requesting and modifying items from the Persistent Knowledge bases (e.g., collections of detection criteria, template and template component collections, indexes, lexicons) j. specifying whether processing occurs in the background or foreground k. specifying the time at which to begin processing a collection 1. adding user specific annotations to be used for future processing m. viewing the annotations to a document n. obtaining an abstract or summary of the document a. requesting Persistent Knowledge (e.g., query and profile collections, template and template component collections, document collections, indexes, lexicons, document lists, data dictionaries which list the annotations and attributes related to each module/component) b. manipulating Persistent Knowledge Repository (e.g., adding lexicon, modifying templates) c. testing individual modules and components using standard evaluation tools d. modifying existing modules and components e. creating templates for extraction from scratch f. modifying existing templates by deleting slots, adding slots (for pattern based extraction components). These templates would be objects, and each of the slots in the templates would be linked to the parts of the application that are used to fill it. This implies the ability to view and modify each of the parts of the application that are linked to those slots.</Paragraph> <Paragraph position="5"> g. grouping annotations into sets a. ensuring that each part of the application is marked and handled at the proper level of security classification. This would include: documents and parts of documents, application modules, Persistent Knowledge Repository items (e.g., lexicon, saved queries and profiles) b. ensuring that access control is properly handled. This would include being able to access and change</Paragraph> </Section> <Section position="4" start_page="241" end_page="242" type="sub_section"> <SectionTitle> User and User Group permissions to read/write/execute on a module level </SectionTitle> <Paragraph position="0"> c. requesting auditing and usage information (e.g., number of documents and sources accessed; how often certain types of data are searched; source and classification of each document) d. combining detection criteria from multiple users for a single pass against a collection e. applying detection criteria against multiple document lists simultaneously f. maintaining an archival index and a corresponding index for new documents g. viewing diagnostic information about problem documents, document processing and errors h. viewing the results of multiple diagnostic runs</Paragraph> </Section> </Section> <Section position="7" start_page="242" end_page="243" type="metho"> <SectionTitle> 5.0 CONFIGURATION MANAGEMENT </SectionTitle> <Paragraph position="0"> Responsibility for maintenance of the Architecture resides with the Configuration Control Board. The TIPSTER Configuration Management Policy is documented in the Configuration Management Plan. It describes procedures for: * obtaining the current version of the ICD * submitting ICD specifications for a new TIPSTER module * proposing modifications to existing ICD specifications * obtaining certification for a TIPSTER-compliant application * obtaining information about previously-developed TIPSTER-compliant modules. The Executive Summary of the Configuration Management Plan is reprinted below for reference.</Paragraph> <Section position="1" start_page="242" end_page="243" type="sub_section"> <SectionTitle> 5.1 Architecture Compliance </SectionTitle> <Paragraph position="0"> The CM review process will result in a document which details the ways in which an application or vendor product conforms to the Architecture Design Document and is in agreement with the TIPSTER Architecture design. This document is a TIPSTER Application Conformance Assessment Document (TACAD).</Paragraph> <Paragraph position="1"> In order for an application or vendor product to successfully acquire a TACAD, the following conditions must be met: For TIPSTER Application development: * The TIPSTER Application development complies with the TIPSTER CM process, the details which are contained in this document. In short, the TIPSTER Application must undergo a Preliminary Design Review (PDR) and a Final Operating Capability (FOC) review. At these reviews, any discrepancy or deviation from the TIPSTER Architecture must be documented and justified/explained.</Paragraph> <Paragraph position="2"> * Any new code or capabilities for the TIPSTER Application must be developed in accordance with the TIPSTER Architecture. Failure to do so will be documented and justified in the TACAD.</Paragraph> <Paragraph position="3"> * To the extent possible and in the Government's best interest, existing code and capability to be incorporated into the TIPSTER Application will be re-engineered in accordance with the TIPSTER Architecture. Failure to do so will be documented and justified in the TACAD.</Paragraph> <Paragraph position="4"> For Vendor Products: * If the vendor's product is used in a TIPSTER Application, the criteria stated above in &quot;For TIPSTER Application development&quot; will apply.</Paragraph> <Paragraph position="5"> * A vendor's product may be determined to be TIPSTER compliant with the use of a TACAD independent of actually being part of a TIPSTER Application. To support the development of this TACAD, the vendor will demonstrate, by inspection, module-by-module compliance with the TIPSTER Architecture.</Paragraph> <Paragraph position="6"> On the basis of the TACAD, the Configurations Control Board (CCB) will determine that a TIPSTER Application is conformant or non-conformant, if it exhibits sufficient overlap with the Architecture Design Document. The extent of TIPSTER Conformance will be determined on a &quot;per module&quot; basis and documented in the TACAD. As a result of the TIPSTER Application reviews (described in section 1.2, below) a summary matrix will be available as shown in Appendix A, Figure A-1, below.</Paragraph> <Paragraph position="7"> The TACAD may be used by Vendors to facilitate teaming with other Vendors or insertions of new capability into existing TIPSTER systems.</Paragraph> </Section> <Section position="2" start_page="243" end_page="243" type="sub_section"> <SectionTitle> 5.2 Configuration Management in a TIPSTER Application Life Cycle </SectionTitle> <Paragraph position="0"> The TIPSTER CM process imposes two control gates, PDR and FOC, on the TIPSTER Application development lifecycle, as shown in Figure 5-1. In preparation for these control gates, it is expected that the developing contractor and the SE/CM will work together to prepare the documentation and to identify any discrepancies between the Architecture Design as detailed in the Interface Control Document (ICD) and the TIPSTER Application's design.</Paragraph> <Paragraph position="1"> This cooperation will be in the form of an Engineering Review Board (ERB).</Paragraph> <Paragraph position="2"> At the PDR control gate, the following TIPSTER Application documents are expected to be put under TIPSTER CM control:</Paragraph> </Section> </Section> class="xml-element"></Paper>