XML Viewer - w06-1665

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1665_intro.xml
Size: 9,436 bytes
Last Modified: 2025-10-06 14:03:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1665">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Context-Dependent Term Relations for Information Retrieval</Title>
  <Section position="4" start_page="555" end_page="557" type="intro">
    <SectionTitle>
4. Experimental Evaluation
</SectionTitle>
    <Paragraph position="0"> We evaluate query expansion with different relations on four TREC collections, which are described in Table 1. All documents have been processed in a standard manner: terms are stemmed using Porter stemmer and stopwords are removed. We only use titles of topics as queries, which contain 3.58 words per query on average.</Paragraph>
    <Paragraph position="1">  In our experiments, the document model remains the same while the query model changes. The document model uses the following Dirichlet smoothing:  where ),( Dwtf i is the term frequency of wi in D, )|( CwP iML is the collection model and u is the Dirichlet prior, which is set at 1000 following (Zhai and Lafferty, 2001).</Paragraph>
    <Paragraph position="2"> There are two other smoothing parameters 1l , and 2l to be determined. In our experiments, we use a simple method to set them: the parameters are tuned empirically using a training collection containing AP1989 documents and queries 101150. These preliminary tests suggest that the best value of 1l and 2l (in Equations 1-2) are relatively stable (we will show this later). In the experiments reported below, we will use 4.01 =l , and 3.02 =l .</Paragraph>
    <Section position="1" start_page="555" end_page="556" type="sub_section">
      <SectionTitle>
4.1 Experimental Results
</SectionTitle>
      <Paragraph position="0"> The main experimental results are described in Table 2, which reports average precision with different methods as well as the number of relevant documents retrieved. UM is the basic unigram model without query expansion (i.e. we use MLE for the query model, while the document model is smoothed with Dirichlet method). CIQE is the context-independent query expansion model using unigram relations (Model  1). CDQE is the context-dependent query expansion model using biterm relations (Model 2). In the table, we also indicate whether the improvement in average precision obtained is statistically significant (t-test).</Paragraph>
      <Paragraph position="1">  significant according to t-test: * indicates p&lt;0.05, ** indicates p&lt;0.01; (.) is compared to UM and [.] is compared to CIQE.</Paragraph>
      <Paragraph position="2"> CIQE and CDQE vs. UM It is interesting to observe that query expansion, either by CIQE or CDQE, consistently outperforms the basic unigram model on all the collections. In all the cases except CIQE for WSJ, the improvements in average precision are statistically significant. At the same time, the increases in the number of relevant documents retrieved are also consistent with those in average precision.</Paragraph>
      <Paragraph position="3"> The improvement scales obtained with CIQE are relatively small: from 1% to 10%. These correspond to the typical figure using this method.</Paragraph>
      <Paragraph position="4"> Comparing CIQE and CDQE, we can see that context-dependent query expansion (CDQE)  always produces better effectiveness than context-independent expansion (CIQE). The improvements range between 10% and 17%. All the improvements obtained by CDQE are statistically significant. This result strongly suggests that in general, the context-dependent term relations identify better expansion terms than context-independent unigram relations. This confirms our earlier hypothesis.</Paragraph>
      <Paragraph position="5"> Indeed, when we look at the expansion results, we see that the expansion terms suggested by biterm relations are usually better. For example, the (stemmed) expansion terms for the query &amp;quot;insider trading&amp;quot; suggested respectively by CIQE and CDQE are as follows:  We can see that in general, the terms suggested by CDQE are much more relevant. In particular, it has been able to suggest &amp;quot;boeski&amp;quot; (Boesky) who is involved in an insider trading scandal. Several other terms are also highly relevant, such as scandal, investing, sec, drexel, fraud, etc. The addition of these new terms does not only improve recall. Precision of top-ranked documents is also improved. This can be seen in Figure 1 where we compare the full precision-recall curve for the AP collection for the three models. We can see that at all the recall levels, the precision values always follow the following order: CDQE &gt; UM. The same observation is also made on the other collections. This shows that the CDQE method does not increase recall to the detriment of precision, but both of them. In contrast, CIQE increases precision at all but 0.0 recall points: the precision at the 0.0 recall point is 0.6565 for CIQE and 0.6699 for UM. This shows that CIQE can slightly deteriorate the top-ranked few documents.</Paragraph>
      <Paragraph position="6">  to be an effective query expansion method. In many previous experiments, it produced very good results. The mixture model (Zhai and Lafferty, 2001) is a representative and effective method to implement pseudo-relevance feedback: It uses a set of feedback documents to smooth the original query model. Compared to the mixture model, our CDQE method is also more effective: By manually tuning the parameters of the mixture model to their best, we obtained the average precisions of 0.3171, 0.2393 and 0.2565 respectively for AP, SJM and WSJ collections. These values are lower than those obtained with CDQE, which has not been heavily tuned.</Paragraph>
      <Paragraph position="7"> For the same query &amp;quot;insider trading&amp;quot;, the mixture model determines the following expansion terms:  We can see that some of these terms overlap with those suggested by biterm relations. However, interesting words such as boeski, drexel and scandal are not suggested.</Paragraph>
      <Paragraph position="8"> The above comparison shows that our method outperforms the state-of-the-art methods of query expansion developed so far.</Paragraph>
    </Section>
    <Section position="2" start_page="556" end_page="557" type="sub_section">
      <SectionTitle>
4.2 Effect of the Smoothing Parameter
</SectionTitle>
      <Paragraph position="0"> In the previous experiments, we have fixed the smoothing parameters. In this series of tests, we  analyze the effect of this smoothing parameter on retrieval effectiveness. The following figure shows the change of average precision (AvgP) using CDQE (Model 2) along with the change of the parameter 2l (UM is equivalent to 12 =l ).  We can see that for all the three collections, the effectiveness is good when the parameter is set in the range of 0.1-0.5. The best value for different collections remains stable: 0.2-0.3. The effect of 1l on Model 1 is slightly different, but we observe the same trend.</Paragraph>
    </Section>
    <Section position="3" start_page="557" end_page="557" type="sub_section">
      <SectionTitle>
4.3 Number of Expansion Terms
</SectionTitle>
      <Paragraph position="0"> In the previous tests, we limit the number of expansion terms to 80. When different numbers of expansion terms are used, we obtain different effectiveness measures. The following figure shows the variation of average precision (AvgP) with different numbers of expansion terms, using  We can see that when more expansion terms are added, the effectiveness does not always increase. In general, a number around 80 will produce good results. In some cases, even if better effectiveness can be obtained with more expansion terms, the retrieval time is also longer. The number 80 seems to produce a good compromise between effectiveness and retrieval speed: the retrieval time remains less than 1 sec. per query.</Paragraph>
    </Section>
    <Section position="4" start_page="557" end_page="557" type="sub_section">
      <SectionTitle>
4.4 Suitability of Relations Across
Collections
</SectionTitle>
      <Paragraph position="0"> In many real applications (e.g. Web search), we do not have a static document collection from which relations can be extracted. The question is whether it is possible and beneficial to extract relations from one text collection and use them to retrieve documents in another text collection. Our intuition is that this is possible because the relations (especially context-dependent relations) encode general knowledge, which can be applied to a different collection. In order to show this, we extracted term relations from each collection, and applied them on other collections. The following tables show the effectiveness produced using respectively unigram and bi-term relations.</Paragraph>
      <Paragraph position="1">  From this table, we can observe that relations extracted from any collection are useful to some degree: they all outperform UM (see Table 2). In particular, the relations extracted from AP are the best for almost all the collections. This can be explained by the larger size and wider coverage of the AP collection. This suggests that we do not necessarily need to extract term relations from the same text collection on which retrieval is performed. It is possible to extract relations from a large text collection, and apply them to other collections. This opens the door to the possibility of constructing a general relation base for various document collections.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML