File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/x93-1001_metho.xml

Size: 4,330 bytes

Last Modified: 2025-10-06 14:13:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="X93-1001">
  <Title>TIPSTER PROGRAM OVERVIEW</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TIPSTER PROGRAM OVERVIEW
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
rhmerch@ afterlife.ncsc.mil
1. TIPSTER PHASE I
</SectionTitle>
    <Paragraph position="0"> The task of TIPSTER Phase I was to advance the state of the art in two language technologies, Document Detection and Information Extraction.</Paragraph>
    <Paragraph position="1"> Document Detection includes two subtasks, routing (running static queries against a stream of new data), and retrieval (running ad hoc queries against archival data). Information Extraction is a technology in which pre-specified types of information are located within free text, extracted, and placed within a database.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. THE STATE OF THE ART IN DOCUMENT
DETECTION BEFORE TIPSTER
</SectionTitle>
    <Paragraph position="0"> Before TIPSTER users searching large volumes of data and using many queries had few information retrieval tools to use other than the boolean keyword search systems which had been developed more than a decade earlier. The characteristics of these boolean systems are: * low recall (the user loses an unknown quantity of useful information because the system is unable to  retrieve many of the relevant documents) * low precision (the user has to read a very large number of irrelevant documents which the system has mistakenly retrieved) * no ranking or prioritization (the user must scan the entire list of retrieved documents because a good document is just as likely to be at the end of the list of retrieved documents as at the hesinning) * exact matches (the user must generate by hand variant spellings or alternate word choices because there are no built-in rules for adding variants) * hand built queries (the user has to understand how the system works and the syntax of queries in order to use the system)</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. DOCUMENT DETECTION DELIVERABLES IN
PHASE H
</SectionTitle>
    <Paragraph position="0"> As a result of algorithm development in Phase I, during TIPSTER Phase lI. prototype systems will be built, giving the user Document Detection tools which feature the technology developed in Phase I: * improved recall (comparative evaluation of systems in TIPSTER and TREC \[1\] has demonstrated higher recall of relevant documents) * improved precisica (the user will read fewer useless documents in order to find the ones he wants) * ranked retrievals (the user reviews documents statistically ranked according to how well they match the query, thus improving the chances that the most useful documents will be near the top of the queue) * query expansion (the system, not the user, automatically expands queries to draw in more relevant documents by using concept based tools such as tbesauri) * automatic query generation (the system uses a natural language description of the subject supplied by the user to generate queries)</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. THE STATE OF THE ART IN INFORMATION
EXTRACTION BEFORE TIPSTER
</SectionTitle>
    <Paragraph position="0"> Notwithstanding ARPA and commercial support for the development of information extraction technology and the positive impact of the series of Message Understanding Conferences, before TIPSTER, information extraction had been applied to the database update task as largely a manual procedure. Manual extraction is characterized by:  The deployment of information extraction systems was rare for both commercial and Government applications. Such systems have been characterized by</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="2" type="metho">
    <SectionTitle>
5. INFORMATION EXTRACTION DELIVERABLES
IN PHASE H
</SectionTitle>
    <Paragraph position="0"> As a result of algorithm development in Phase I, during TIPSTER Phase H. prototype systems will be built with the following characteristics:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML