File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0804_concl.xml

Size: 3,192 bytes

Last Modified: 2025-10-06 13:55:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0804">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics How to Find Better Index Terms Through Citations</Title>
  <Section position="7" start_page="30" end_page="30" type="concl">
    <SectionTitle>
5 Discussion and Conclusions
</SectionTitle>
    <Paragraph position="0"> It is not too hard to nd examples of citations that show a xed window size is suboptimal for nding terms used in reference to cited papers. In extracting the ideal reference terms from only 24 citations for our case study, we saw just how dif cult it is to decide which terms refer to which citations.</Paragraph>
    <Paragraph position="1"> We, the authors, came across examples where it was ambiguous how many citations certain terms referred to, ones where knowledge of the cited papers was required to interpret the scope of the citation and ones where we simply did not agree. This is a highly complex indexing task; one which humans have dif culty with, one for which we expect low human agreement and, therefore, the type that computational linguistics struggles to achieve high performance on. We agree with O'Connor (1982) that it is hard. We make no claims that computational linguistics will provide a full solution.</Paragraph>
    <Paragraph position="2"> Nevertheless, our examples suggest that even simple computational linguistics techniques should help to more accurately locate reference terms. While it may be impossible to automatically pick out each speci c piece of text that does refer to a given citation, there is much scope for improvement over a xed window. The examples in Section 2 suggest that altering the size of the window that is applied would be a good rst step.</Paragraph>
    <Paragraph position="3"> Some form of text segmentation, whether it be full-blown discourse analysis or simple sentence boundary detection, may be useful in determining where the extent of the reference text is.</Paragraph>
    <Paragraph position="4"> While the case study presented here highlights several interesting effects of using terms from around citations as additional index terms for the cited paper, it cannot answer questions about how successful a practical method based on these observations would be, over a using simple xed window, for example. In order for any real improvement in IR, the term pro le of a document would have to be signi cantly altered by the reference terms. Enough terms, in particular repeated terms, would have to be successfully found via citations for such a quantitative improvement. It is not clear that computational linguistic techniques will improve over the statistical effects of redundant data.</Paragraph>
    <Paragraph position="5"> We are thus in the last stages of setting up a larger experiment that will shed more light on this question. The experimental setup requires data where there are a signi cant number of citations to a number of test documents and a signi cant number of reference set terms. We have recently presented a test collection of scienti c research papers (Ritchie, Teufel &amp; Robertson 2006), which we intend to use for this experiment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML