File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1098_intro.xml

Size: 3,360 bytes

Last Modified: 2025-10-06 14:02:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1098">
  <Title>A Trigger Language Model-based IR System</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Using language models for information retrieval has been studied extensively recently(Jin et al 2002 Lafferty and Zhai 2001 Srikanth and Srihari 2002 Lavrenko and Croft 2001 Liu and Croft 2002). The basic idea is to compute the conditional probability P(Q|D), i.e. the probability of generating a query Q given the observation of a document D. Several different methods have been applied to compute this conditional probability. In most approaches, the computation is conceptually decomposed into two distinct steps: (1) Estimating a document language model; (2) Computing the query likelihood using the estimated document model based on some query model. For example, Ponte and Croft emphasized the first step, and used several heuristics to smooth the Maximum Likelihood of the document language model, and assumed that the query is generated under a multivariate Bernoulli model (Ponte and Croft 1998). The BBN method (Miller et al 1999) emphasized the second step and used a two-state hidden Markov model as the basis for generating queries, which, in effect, is to smooth the MLE with linear interpolation, a strategy also adopted in Hiemstra and Kraaij (Hiemstra and Kraaij 1999). In Zhai and Lafferty (Zhai and Lafferty 2001), it has been found that the retrieval performance is affected by both the estimation accuracy of document language models and the appropriate modeling of the query, and a two stage smoothing method was suggested to explicitly address these two distinct steps.</Paragraph>
    <Paragraph position="1"> It's not hard to see that the unigram language model IR method contains the following assumption: Each word appearing in the document set and query has nothing to do with any other word. Obviously this assumption is not true in reality. Though statistical MT approach (Berger and Lafferty 1999 ) alleviates the situation by taking the synonymy factor into account, it never helps to judge the different meanings of the same word in varied context. In this paper we propose the trigger language model based IR system to resolve the problem. Though the basic idea of using the triggered words to improve the performance of language model was proposed by Raymond almost 10 years ago (Raymond et al 1993), Our method adopts a different approach for other objectivity in the IR field. Firstly we compute the mutual information of the words from training corpus and then design the algorithm to get the triggered words of the query in order to fix down the topic of query more clearly. We introduce the relative parameters into the document language model to form the trigger language model based IR system.</Paragraph>
    <Paragraph position="2"> Experiments show that the performance of trigger language model based IR system has been improved greatly.</Paragraph>
    <Paragraph position="3"> In what follows, Section 2 describes trigger language model based IR system in detail.</Paragraph>
    <Paragraph position="4"> Section 3 is our evaluation about the model.</Paragraph>
    <Paragraph position="5"> Finally, Section 4 summarizes the work in this paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML