File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0804_intro.xml

Size: 2,225 bytes

Last Modified: 2025-10-06 14:03:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0804">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics How to Find Better Index Terms Through Citations</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Information Retrieval (IR) is an established eld and, today, the 'conventional' IR task is embodied by web searching. IR is mostly term-based, relying on the words within documents to describe them and, thence, try to determine which documents are relevant to a given user query. There are theoretically motivated and experimentally validated techniques that have become standard in the eld. An example is the Okapi model; a probabilistic function for term weighting and document ranking (Spcurrency1arck Jones, Walker &amp; Robertson 2000). IR techniques using such statistical models almost always outperform more linguistically based ones. So, as statistical models are developed and re ned, it begs the question 'Can Computational Linguistics improve Information Retrieval?' Our particular research involves IR on scienti c papers. There are de nite parallels between the web and scienti c literature, such as hyperlinks between webpages alongside citation links between papers. However, there are also fundamental differences, like the greater variability of webpages and the independent quality control of academic texts through the peer review process.</Paragraph>
    <Paragraph position="1"> The analogy between hyperlinks and citations itself is not perfect: whereas the number of hyperlinks varies greatly from webpage to webpage, the number of citations in papers is more constrained, due to the combination of strict page limits, the need to cite to show awareness of other work and the need to conserve space by including only the most relevant citations. Thus, while some aspects of web-based techniques will carry across to the current research domain, others will probably not.</Paragraph>
    <Paragraph position="2"> We are interested in investigating which lessons learned from web IR can successfully be applied to this slightly different domain.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML