File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1014_intro.xml

Size: 3,908 bytes

Last Modified: 2025-10-06 14:02:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1014">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 105-112, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Novelty Detection: The TREC Experience</Title>
  <Section position="2" start_page="0" end_page="105" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The problem of novelty detection has long been a significant one for retrieval systems. The &amp;quot;selective dissemination of information&amp;quot; (SDI) paradigm assumed that the people wanted to be able to track new information relating to known topics as their primary search task. While most SDI and information filtering systems have focused on similarity to a topical profile (Robertson, 2002) or to a community of users with a shared interest (Belkin and Croft, 1992), recent efforts (Carbonell and Goldstein, 1998; Allan et al., 2000; Kumaran et al., 2003) have looked at the retrieval of specifically novel information.</Paragraph>
    <Paragraph position="1"> The TREC novelty track experiments were conducted from 2002 to 2004 (Harman, 2002; Soboroff and Harman, 2003; Soboroff, 2004). The basic task was defined as follows: given a topic and an ordered set of documents related to that topic, segmented into sentences, return those sentences that are both relevant to the topic and novel given what has already been seen previously in that document set.</Paragraph>
    <Paragraph position="2"> This task models an application where a user is skimming a set of documents, and the system highlights new, on-topic information.</Paragraph>
    <Paragraph position="3"> There are two problems that participants must solve in this task. The first is identifying relevant sentences, which is essentially a passage retrieval task. Sentence retrieval differs from document retrieval because there is much less text to work with, and identifying a relevant sentence may involve examining the sentence in the context of those surrounding it. The sentence was specified as the unit of retrieval in order to standardize the task across a variety of passage retrieval approaches, as well as to simplify the evaluation.</Paragraph>
    <Paragraph position="4"> The second problem is that of identifying those relevant sentences that contain new information. The operational definition of &amp;quot;new&amp;quot; here is information that has not appeared previously in this topic's set of documents. In other words, we allow the system to assume that the user is most concerned about finding new information in this particular set of documents, and is tolerant of reading information he already knows because of his background knowledge.</Paragraph>
    <Paragraph position="5"> Since each sentence adds to the user's knowledge, and later sentences are to be retrieved only if they contain new information, novelty retrieval resembles a filtering task.</Paragraph>
    <Paragraph position="6"> Novelty is an inherently difficult phenomenon to operationalize. Document-level novelty detection, while intuitive, is rarely useful because nearly every document contains something new, particularly when the domain is news. Hence, our decision to use sentences as the unit of retrieval. Moreover, determining ground truth for a novelty detection task is more difficult than for topical relevance, because one is forced not only to face the idiosyncratic na- null ture of relevance, but also to rely all the more on the memory and organizational skills of the assessor, who must try and remember everything he has read.</Paragraph>
    <Paragraph position="7"> We wanted to determine if people could accomplish this task to any reasonable level of agreement, as well as to see what computational approaches best solve this problem.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML