File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/a97-1046_evalu.xml

Size: 3,845 bytes

Last Modified: 2025-10-06 14:00:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1046">
  <Title>Fast Statistical Parsing of Noun Phrases for Document Indexing</Title>
  <Section position="8" start_page="316" end_page="316" type="evalu">
    <SectionTitle>
5 Results analysis
</SectionTitle>
    <Paragraph position="0"> We used, as our document set, the Wall Street Journal database in Tipster Disk2 (Harman 96) the size of which is about 250 megabytes. We performed the experiments by using the TREC-5 ad hoc topics (i.e., TREC topics 251-300). Each run involves an automatic feedback with the top 10 documents returned from the initial retrieval. The CLARIT automatic feedback is performed by adding terms from a query-specific thesaurus extracted from the top N documents returned from the initial retrieval(Evans and Lefferts 95). The results are evaluated using the standard measures of recall and precision. Recall measures how many of the relevant documents have actually been retrieved. Precision measures how many of the retrieved documents are indeed relevant. They are calculated by the following simple formulas: number of relevant items retrieved Recall = total number of relevant items in collection number of relevant items retrieved Precision = total number of items retrieved We used the standard TREC evaluation package provided by Cornell University and used the judgedrelevant documents from the TREC evaluations as the gold standard(Harman 94).</Paragraph>
    <Paragraph position="1"> In Table 1, we give a summary of the results and compare the three phrase combination runs with the corresponding baseline run. In the table, &amp;quot;Ret-rel&amp;quot; means &amp;quot;retrieved-relevant&amp;quot; and refers to the total number of relevant documents retrieved. &amp;quot;Init Prec&amp;quot; means &amp;quot;initial precision&amp;quot; and refers to the highest level of precision over all the points of recall. &amp;quot;Avg Prec&amp;quot; means &amp;quot;average precision&amp;quot; and is the average of all the precision values computed after each new relevant document is retrieved.</Paragraph>
    <Paragraph position="2"> It is clear that phrases help both recall and precision when supplementing single words, as can be seen from the improvement of all phrase runs (WD-HM-SET, WD-NP-SET, WD-I-IM-NP-SET) over the single word run WD-SET.</Paragraph>
    <Paragraph position="3"> It can also be seen that when only one kind of phrase (either the full NPs or the head modifiers) is used to supplement the single words, each can lead to a great improvement in precision. However, when we combine the two kinds of phrases, the effect is a greater improvement in recall rather than precision.</Paragraph>
    <Paragraph position="4"> The fact that each kind of phrase can improve precision significantly when used separately shows that  these phrases are indeed very useful for indexing.</Paragraph>
    <Paragraph position="5"> The combination of phrases results in only a smaller precision improvement but causes a much greater increase in recall. This may indicate that more experiments are needed to understand how to combine and weight different phrases effectively.</Paragraph>
    <Paragraph position="6"> The same parsing method has also been used to generate phrases from the same data for the CLARIT NLP track experiments in TREC-5(Zhai et al. 97), and similar results were obtained, although the WD-NP-SET was not tested. The results in (Zhai et al. 97) are not identical to the results here, because they are based on two separate training processes. It is possible that different training processes may result in slightly different parameter estimations, because the corpus is arbitrarily segmented into chunks of only roughly 4 megabytes for training, and the chunks actually used in different training processes may vary slightly.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML