File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1071_intro.xml
Size: 2,656 bytes
Last Modified: 2025-10-06 14:05:47
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1071"> <Title>Learning from Relevant Documents in Large Scale Routing Retrieval</Title> <Section position="3" start_page="0" end_page="358" type="intro"> <SectionTitle> 2. PIRCS RETRIEVAL SYSTEM PIRCS (acronym for Probabilistic Indexing and </SectionTitle> <Paragraph position="0"> Retrieval -Components- System) is a network-based system implementing a Bayesian decision approach to IR \[9,10\] and extended with the concept of document components \[11\] as shown in Fig.1. The network \[12\] has three layers of nodes representing the queries (Q), terms (T) and documents (D), with edges connecting adjacent layers in a bidirectional fashion. Retrieval operation consists of initializing a document node d~ to activation 1 and spreading it via the edge weights to terms t k and to a query node q~ under focus, q, receives activation ~wa% i which is regarded as the query-focused retrieval status value (RSV) of d i for ranking purposes. If activation originates from a query q, and spreads towards dl we accumulate the document-focused RSV: ~waw~ that is based on statistics of term usage different from before. Combining the two can cooperatively provide more effective results.</Paragraph> <Paragraph position="1"> The edge weights of the net are first initialized with default values using global and local term usage statistics. Later they can learn from experience as illustrated in Fig.2. In particular for routing experiments, the edges on the query-term side of the net is first created based on the routing queries and the terms of the training collection, and given default values called self-learn relevant weights. Relevant training documents are then linked in on the document-term side of the net. Knowing which document is relevant to which query allows edge weights on the term-query side like w~, to adapt according to the term usage statistics of the relevant sets via a learning rule that is borrowed from artificial neural network studies. New edges like w~, w\], can also grow between queries and terms using, for example, the K highest activated terms of the relevant documents, a process we call level K query</Paragraph> <Paragraph position="3"> expansion. After learning, these query-term edges and weights are frozen, the training documents removed, and new unseen testing documents are then linked in for simulation of the routing operation.</Paragraph> <Paragraph position="4"> Thus, test documents are ranked with respect to each routing query based on term usage statistics seen in the training collection and the relevant documents.</Paragraph> </Section> class="xml-element"></Paper>