File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/x93-1005_metho.xml

Size: 11,102 bytes

Last Modified: 2025-10-06 14:13:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="X93-1005">
  <Title>DOCUMENT DETECTION OVERVIEW</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. TEST DESIGN
</SectionTitle>
    <Paragraph position="0"> The test design called for the creation of a set of training data and a set of test data. The training data consisted of large numbers of documents (between 1 and 2 gigabytes of text), 50 training topics, and lists of documents for each of the topics that were known to be relevant (the &amp;quot;fight answers&amp;quot;). The test data consisted of 50 new topics and about a gigabyte of new documents.</Paragraph>
    <Paragraph position="1"> A slight departure from traditional information retrieval methodology was needed to better handle the TIPSTER environment. All previous test collections have assumed that the test questions or topics are closely related to the actual queries submitted to the retrieval systems, as the test questions are generally transformed automatically into the structure of terms submitted to the retrieval systems as input. This input structure is called the query in the TIPSTER environment, with the test question itself referred to as the topic. Since most previous research has involved simple automatic generation of queries from topics, there was no need for a distinction to be made between topics and queries. In TIPSTER this distinction became important because the topics needed to carry a large amount of highly specific information, and the methods of query construction therefore became more complex.</Paragraph>
    <Paragraph position="2">  Figure 1 shows a schematic of the test design, including the various components of the test methodology. The diagram reflects the four data sets (2 sets of topics and 2 sets of documents) that were provided to contractors. The first set of topics and documents (T-Train and D-Train) were provided to allow system training and to serve as the base for routing and adhoc experiments. The roudng task assumes a static set of topics (T-Train), with evaluation of routing done by providing new test documents (D-Test).</Paragraph>
    <Paragraph position="3"> The adhoc task assumes a static set of documents (D-Train), with evaluation of adhoc retrieval done by providing new topics (T-Test).</Paragraph>
    <Paragraph position="4"> Three different sets of queries were generated from the data sets. Q1 is the set of queries (probably multiple sets) created to help in adjusting a retrieval system to this task. The results of this research were used to create Q2, the routing queries to be used against the new test documents (D-Test). Q3 is the set of queries created from the new test topics (T-Test) as adhoc queries for .~earching against the old documents (D-Train). The results from searches using Q2 and Q3 were the official evaluation results sent to NIST for both TIPSTER and TREC.</Paragraph>
    <Paragraph position="5"> The Japanese language test design paralleled exactly the English language test design.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="10" type="metho">
    <SectionTitle>
3. EVALUATION SCHEDULE
</SectionTitle>
    <Paragraph position="0"> For the English language document detection task there  * routing test -- topics 1-50 against disk 2 Because of the lateness of data availability, and the scarcity of sample relevance assessments for training, the emphasis was put on doing adhoc evaluation and only half of the routing test was done.</Paragraph>
    <Paragraph position="1"> 18-month evaluation * D-Train -- disks 1 &amp; 2 (about 2 gigabytes of documents) null * T-Train -- topics 51-100 * D-Test -- subset of future disk 3 (about 500 megabytes of documents) * T-Test -- revised topics 1-50 * adhoc test -- topics 1-50 against disks 1 &amp; 2 * routing test -- topics 51-100 against subset of disk 3  By the 18-month evaluation point, large numbers of relevance judgments were available for training (due to the many TREC-1 participants). This second evaluation therefore concentrated on the routing task, although adhoc evaluation was also done.</Paragraph>
    <Paragraph position="2">  24-month evaluation * D-Train -- disks 1 &amp; 2 (about 2 gigabytes of documents) null * T-Train -- topics 1-100 * D-Test -- disk 3 (about 1 gigabyte of documents) * T-Test -- topics 101-150 * adhoc test -- topics 101-150 against disks 1 &amp; 2 * routing test -- topics 51-100 against all of disk 3  This data point corresponded directly to the TREC-2 data and therefore allows comparison between the 24-month TIPSTER results and the TREC-2 results.</Paragraph>
  </Section>
  <Section position="5" start_page="10" end_page="11" type="metho">
    <SectionTitle>
4. SPECIFIC TASK GUIDELINES
</SectionTitle>
    <Paragraph position="0"> Because the TIPSTER contractors and TREC participants used a wide variety of indexing/knowledge base building techniques, and a wide variety of approaches to generate search queries, it was important to establish clear guidelines for the evaluation task. The guidelines deal with the methods of indexing/knowledge base construction, and with the methods of generating the queries from the supplied topics. In general they were constructed to reflect an actual operational environment, and to allow .as fair as possible a separation among the diverse query construction approaches.</Paragraph>
    <Paragraph position="1"> There were guidelines for constructing and manipulating the system data structures. These structures were defined to consist of the original documents, any new structures built automatically from the documents (such as inverted files, thesauri, conceptual networks, etc.) and any new structures built manually from the documents (such as thesauri, synonym lists, knowledge bases, rules, etc.).</Paragraph>
    <Paragraph position="2"> The following guidelines were developed for the TIP- null STER task.</Paragraph>
    <Paragraph position="3"> 1. System data structures should be built using the  initial training set (documents D-Train, training topics T-Train, and the relevance judgments).</Paragraph>
    <Paragraph position="4"> They may be modified based on the test documents D-Test, but not based on the test topics. In particular, the processing of one test topic should not affect the processing of another test topic. For example, it is not allowed to update a system knowledge base based on the analysis of one test topic in such a way that the interpretation of subsequent test topics was changed in any fashion.</Paragraph>
    <Paragraph position="5"> 2. There are several parts of the Wall Street Journal and the Ziff material that contain manually assigned controlled or uncontrolled index terms.</Paragraph>
    <Paragraph position="6"> These fields are delimited by SGML tags, as specified in the documentation files included with the da~ Since the primary focus is on retrieval and routing of naturally occurring text, these manually indexed terms should not be used.</Paragraph>
    <Paragraph position="7"> 3. Special care should be used in handling the routing topics. In a true routing situation, a single document would be indexed and compared against the routing topics. Since the test documents are generally indexed as a complete set, routing should be simulated by not using any test document information (such as IDF based on the test collection, total frequency based on the test collection, etc.) in the searching. It is permissible to use training-set collection information however.</Paragraph>
    <Paragraph position="8"> Additionally there were guidelines for constructing the queries from the provided topics. These guidelines were considered of great importance for fair system comparison and were therefore carefully constructed. Three generic categories were defined, based on the amount and kind of manual intervention used.</Paragraph>
    <Paragraph position="9">  1. Method 1 -- completely automatic initial query construction.</Paragraph>
    <Paragraph position="10"> adhoc queries -- The system will automatically  extract information from the topic (the topic fields used should be identified) to construct the query.</Paragraph>
    <Paragraph position="11"> The query will then be submitted to the system (with no manual modifications) and the results</Paragraph>
    <Paragraph position="13"> from the system will be the results submitted to NIST. There should be no manual intervention that would affect the results.</Paragraph>
    <Paragraph position="14"> routing queries -- The queries should be constructed automatically using the training topics, the training relevance judgments and the training documents. The queries should then be submitted to NIST before the test documents are released and should not be modified after that point. The unmodified queries should be run against the test documents and the results submitted to NIST.</Paragraph>
    <Paragraph position="15"> Method 2 -- manual initial query construction.</Paragraph>
    <Paragraph position="16"> adhoc queries -- The query is constructed in some manner from the topic, either manually or using machine assistance. The methods used should be identified, along with the human expertise (both domain expertise and computer expertise) needed to construct a query. Once the query has been constructed, it will be submitted to the system * (with no manual intervention), and the results from the system will be the results submitted to NIST. There should be no manual intervention after initial query conslrucfion that would affect the results. (Manual intervention is covered by</Paragraph>
    <Section position="1" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
Method 3.)
</SectionTitle>
      <Paragraph position="0"> routing queries -- The queries should be constructed in the same manner as the adhoc queries for method 2, but using the training topics, relevance judgments, and training documents. They should then be submitted to NIST before the test documents are released and should not be modified after that point. The unmodified queries should be run against the test documents and the results submitted to NIST.</Paragraph>
      <Paragraph position="1"> Method 3 -- automatic or manual query construction with feedback.</Paragraph>
      <Paragraph position="2"> adhoc queries -- The initial query can be constructed using either Method 1 or Method 2. The query is submitted to the system, and a subset of the retrieved documents is used for manual feedback, i.e. a human makes judgments about the relevance of the documents in this subset. These judgments may be communicated to the system, which may automatically modify the query, or the human may simply choose to modify the query himself. At some point, feedback should end, and the query should be accepted as final. Systems that submit runs using this method must submit several different sets of results to allow tracking of the time/cost benefit of doing relevance feedback.</Paragraph>
      <Paragraph position="3"> routing queries -- Method 3 cannot be used for routing queries as routing systems have typically not supported feedback.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML