File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1613_intro.xml

Size: 18,109 bytes

Last Modified: 2025-10-06 14:03:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1613">
  <Title>Automatic classi cation of citation function</Title>
  <Section position="3" start_page="0" end_page="106" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Why do researchers cite a particular paper? This is a question that has interested researchers in discourse analysis, sociology of science, and information sciences (library sciences) for decades (Gar eld, 1979; Small, 1982; White, 2004). Many annotation schemes for citation motivation have been created over the years, and the question has been studied in detail, even to the level of in-depth interviews with writers about each individual citation (Hodges, 1972).</Paragraph>
    <Paragraph position="1"> Part of this sustained interest in citations can be explained by the fact that bibliometric metrics are commonly used to measure the impact of a researcher's work by how often they are cited (Borgman, 1990; Luukkonen, 1992). However, researchers from the eld of discourse studies have long criticised purely quantitative citation analysis, pointing out that many citations are done out of politeness, policy or piety (Ziman, 1968), and that criticising citations or citations in passing should not count as much as central citations in a paper, or as those citations where a researcher's work is used as the starting point of somebody else's work (Bonzi, 1982). A plethora of manual annotation schemes for citation motivation have been invented over the years (Gar eld, 1979; Hodges, 1972; Chubin and Moitra, 1975).</Paragraph>
    <Paragraph position="2"> Other schemes concentrate on citation function (Spiegel-Rcurrency1using, 1977; O'Connor, 1982; Weinstock, 1971; Swales, 1990; Small, 1982)). One of the best-known of these studies (Moravcsik and Murugesan, 1975) divides citations in running text into four dimensions: conceptual or operational use (i.e., use of theory vs. use of technical method); evolutionary or juxtapositional (i.e., own work is based on the cited work vs. own work is an alternative to it); organic or perfunctory (i.e., work is crucially needed for understanding of citing article or just a general acknowledgement); and nally con rmative vs. negational (i.e., is the correctness of the ndings disputed?). They found, for example, that 40% of the citations were perfunctory, which casts further doubt on the citationcounting approach.</Paragraph>
    <Paragraph position="3"> Based on such annotation schemes and handanalyzed data, different in uences on citation behaviour can be determined. Nevertheless, researchers in the eld of citation content analysis do not normally cross-validate their schemes with independent annotation studies with other human annotators, and usually only annotate a small number of citations (in the range of hundreds or thousands). Also, automated application of the annotation is not something that is generally considered in the eld, though White (2004) sees the future of discourse-analytic citation analysis in automation.</Paragraph>
    <Paragraph position="4"> Apart from raw material for bibliometric studies, citations can also be used for search purposes in document retrieval applications. In the library world, printed or electronic citation indexes such as ISI (Gar eld, 1979) serve as an orthogonal  word similarity by the relative entropy or Kulbach-Leibler (KL) distance, between the corresponding conditional distributions.</Paragraph>
    <Paragraph position="5"> His notion of similarity seems to agree with our intuitions in many cases, but it is not clear how it can be used directly to construct word classes and corresponding models of association.</Paragraph>
    <Paragraph position="6">  search tool to nd relevant papers, starting from a source paper of interest. With the increased availability of documents in electronic form in recent years, citation-based search and automatic citation indexing have become highly popular, cf. the successful search tools Google Scholar and CiteSeer (Giles et al., 1998).1 But not all search needs are ful lled by current citation indexers. Experienced researchers are often interested in relations between articles (Shum, 1998). They want to know if a certain article criticises another and what the criticism is, or if the current work is based on that prior work. This type of information is hard to come by with current search technology. Neither the author's abstract, nor raw citation counts help users in assessing the relation between articles.</Paragraph>
    <Paragraph position="7"> Fig. 1 shows a hypothetical search tool which displays differences and similarities between a target paper (here: Pereira et al., 1993) and the papers that it cites and that cite it. Contrastive links are shown in grey links to rival papers and papers the current paper contrasts itself to. Continuative links are shown in black links to papers that use the methodology of the current paper. Fig. 1 also displays the most characteristic textual sentence about each citation. For instance, we can see which aspect of Hindle (1990) our example paper criticises, and in which way the example paper's work was used by Dagan et al. (1994).</Paragraph>
    <Paragraph position="8"> Note that not even the CiteSeer text snippet 1These tools automatically citation-index all scienti c articles reached by a web-crawler, making them available to searchers via authors or keywords in the title, and displaying the citation in context of a text snippet.</Paragraph>
    <Paragraph position="9"> can ful l the relation search need: it is always centered around the physical location of the citations, but the context is often not informative enough for the searcher to infer the relation. In fact, studies from our annotated corpus (Teufel, 1999) show that 69% of the 600 sentences stating contrast with other work and 21% of the 246 sentences stating research continuation with other work do not contain the corresponding citation; the citation is found in preceding sentences (which means that the sentence expressing the contrast or continuation is outside the CiteSeer snippet). A more sophisticated, discourse-aware citation indexer which nds these sentences and associates them with the citation would add considerable value to the researcher's bibliographic search (Ritchie et al., 2006b).</Paragraph>
    <Paragraph position="10"> Our annotation scheme for citations is based on empirical work in content citation analysis. It is designed for information retrieval applications such as improved citation indexing and better bibliometric measures (Teufel et al., 2006). Its 12 categories mark relationships with other works. Each citation is labelled with exactly one category. The following top-level four-way distinction applies: Explicit statement of weakness Contrast or comparison with other work (4 categories) Agreement/usage/compatibility with other work (6 categories), and A neutral category.</Paragraph>
    <Paragraph position="11"> In this paper, we show that the scheme can be reliably annotated by independent coders. We also report results of a supervised machine learning experiment which replicates the human annotation.</Paragraph>
    <Paragraph position="12"> 2 An annotation scheme for citations Our scheme (given in Fig. 2) is adapted from that of Spiegel-Rcurrency1using (1977) after an analysis of a corpus of scienti c articles in computational linguistics. We avoid sociologically orientated distinctions ( paying homage to pioneers ), as they can be dif cult to operationalise without deep knowledge of the eld and its participants (Swales, 1986). Our rede nition of the categories aims at reliably annotation; at the same time, the categories should be informative enough for the document management application sketched in the introduction. null  PMot This citation is positive about approach used or problem addressed (used to motivate work in current paper) PSim Author's work and cited work are similar PSup Author's work and cited work are compatible/provide support for each other Neut Neutral description of cited work, or not enough textual evidence for above categories, or unlisted citation function  Our categories are as follows: One category (Weak) is reserved for weakness of previous research, if it is addressed by the authors. The next four categories describe comparisons or contrasts between own and other work. The difference between them concerns whether the contrast is between methods employed or goals (CoCoGM), or results, and in the case of results, a difference is made between the cited results being worse than the current work (CoCo-), or comparable or better results (CoCoR0). As well as considering differences between the current work and other work, we also mark citations if they are explicitly compared and contrasted with other work (i.e. not the work in the current paper). This is expressed in category CoCoXY. While this is not typically annotated in the literature, we expect a potential practical bene t of this category for our application, particularly in searches for differences and rival approaches.</Paragraph>
    <Paragraph position="13"> The next set of categories we propose concerns positive sentiment expressed towards a citation, or a statement that the other work is actively used in the current work (which we consider the ultimate praise). We mark statements of use of data and methods of the cited work, differentiating unchanged use (PUse) from use with adaptations (PModi). Work which is stated as the explicit starting point or intellectual ancestry is marked with our category PBas. If a claim in the literature is used to strengthen the authors' argument, or vice versa, we assign the category PSup. We also mark similarity of (an aspect of) the approach to the cited work (PSim), and motivation of approach used or problem addressed (PMot).</Paragraph>
    <Paragraph position="14"> Our twelfth category, Neut, bundles truly neutral descriptions of cited work with those cases where the textual evidence for a citation function was not enough to warrant annotation of that category, and all other functions for which our scheme did not provide a speci c category.</Paragraph>
    <Paragraph position="15"> Citation function is hard to annotate because it in principle requires interpretation of author intentions (what could the author's intention have been in choosing a certain citation?). One of our most fundamental principles is thus to only mark explicitly signalled citation functions. Our guidelines explicitly state that a general linguistic phrase such as better or used by us must be present; this increases the objectivity of de ning citation function. Annotators must be able to point to textual evidence for assigning a particular function (and are asked to type the source of this evidence into the annotation tool for each citation). Categories are de ned in terms of certain objective types of statements (e.g., there are 7 cases for PMot, e.g. Citation claims that or gives reasons for why problem Y is hard ). Annotators can use general text interpretation principles when assigning the categories (such as anaphora resolution and parallel constructions), but are not allowed to use in-depth knowledge of the eld or of the authors.</Paragraph>
    <Paragraph position="16"> Guidelines (25 pages, 150 rules) describe the categories with examples, provide a decision tree and give decision aids in systematically ambiguous cases. Nevertheless, subjective judgement of the annotators is still necessary to assign a single tag in an unseen context, because of the many difcult cases for annotation. Some of these concern the fact that authors do not always state their purpose clearly. For instance, several earlier studies found that negational citations are rare (Moravcsik and Murugesan, 1975; Spiegel-Rcurrency1using, 1977); MacRoberts and MacRoberts (1984) argue that the reason for this is that they are potentially politically dangerous. In our data we found ample evidence of the meekness effect. Other dif culties concern the distinction of the usage of a method from statements of similarity between a method and the own method (i.e., the choice between categories PSim and PUse). This happens in cases where authors do not want to admit (or stress)  that they are using somebody else's method. Another dif cult distinction concerns the judgement of whether the authors continue somebody's research (i.e., consider their research as intellectual ancestry, i.e. PBas), or whether they simply use the work (PUse).</Paragraph>
    <Paragraph position="17"> The unit of annotation is a) the full citation (as recognised by our automatic citation processor on our corpus), and b) names of authors of cited papers anywhere in running text outside of a formal citation context (i.e., without date). These latter are marked up, slightly unusually in comparison to other citation indexers, because we believe they function as important referents comparable in importance to formal citations.2 In principle, there are many other linguistic expressions by which the authors could refer to other people's work: pronouns, abbreviations such as Mueller and Sag (1990), henceforth M &amp; S , and names of approaches or theories which are associated with particular authors. The fact that in these contexts citation function cannot be annotated (because it is not technically feasible to recognise them well enough) sometimes causes problems with context dependencies.</Paragraph>
    <Paragraph position="18"> While there are unambiguous example cases where the citation function can be decided on the basis of the sentence alone, this is not always the case. Most approaches are not criticised in the same sentence where they are also cited: it is more likely that there are several descriptive sentences about a cited approach between its formal citation and the evaluative statement, which is often at the end of the textual segment about this citation.</Paragraph>
    <Paragraph position="19"> Nevertheless, the annotator must mark the function on the nearest appropriate annotation unit (citation or author name). Our rules decree that context is in most cases constrained to the paragraph boundary. In rare cases, paper-wide information is required (e.g., for PMot, we need to know that a praised approach is used by the authors, information which may not be local in the paragraph). Annotators are thus asked to skim-read the paper before annotation.</Paragraph>
    <Paragraph position="20"> One possible view on this annotation scheme could consider the rst two sets of categories as negative and the third set of categories positive , in the sense of Pang et al. (2002) and Turney (2002). Authors need to make a point (namely, 2Our citation processor can recognise these after parsing the citation list.</Paragraph>
    <Paragraph position="21"> that they have contributed something which is better or at least new (Myers, 1992)), and they thus have a stance towards their citations. But although there is a sentiment aspect to the interpretation of citations, this is not the whole story. Many of our positive categories are more concerned with different ways in which the cited work is useful to the current work (which aspect of it is used, e.g., just a de nition or the entire solution?), and many of the contrastive statements have no negative connotation at all and simply state a (value-free) difference between approaches. However, if one looks at the distribution of positive and negative adjectives around citations, it is clear that there is a non-trivial connection between our task and sentiment classi cation.</Paragraph>
    <Paragraph position="22"> The data we use comes from our corpus of 360 conference articles in computational linguistics, drawn from the Computation and Language E-Print Archive (http://xxx.lanl.gov/cmp-lg). The articles are transformed into XML format; headlines, titles, authors and reference list items are automatically marked up. Reference lists are parsed using regular patterns, and cited authors' names are identi ed. Our citation parser then nds citations and author names in running text and marks them up. Ritchie et al. (2006a) report high accuracy for this task (94% of citations recognised, provided the reference list was error-free). On average, our papers contain 26.8 citation instances in running text3. For human annotation, we use our own annotation tool based on XML/XSLT technology, which allows us to use a web browser to interactively assign one of the 12 tags (presented as a pull-down list) to each citation.</Paragraph>
    <Paragraph position="23"> We measure inter-annotator agreement between three annotators (the three authors), who independently annotated 26 articles with the scheme (containing a total of 120,000 running words and 548 citations), using the written guidelines. The guidelines were developed on a different set of articles from the ones used for annotation.</Paragraph>
    <Paragraph position="24">  Kappa, which follows the formula K = P(A)[?]P(E)1[?]P(E) where P(A) is observed, and P(E) expected agreement. Kappa ranges between -1 and 1. K=0 means agreement is only as expected by chance. Generally, Kappas of 0.8 are considered stable, and Kappas of .69 as marginally stable, according to the strictest scheme applied in the eld.</Paragraph>
    <Paragraph position="25">  (e.g., non-local dependencies) of the task. The relative frequency of each category observed in the annotation is listed in Fig. 3. As expected, the distribution is very skewed, with more than 60% of the citations of category Neut.5 What is interesting is the relatively high frequency of usage categories (PUse, PModi, PBas) with a total of 18.9%. There is a relatively low frequency of clearly negative citations (Weak, CoCo-, total of 4.1%), whereas the neutral contrastive categories (CoCoR0, CoCoXY, CoCoGM) are slightly more frequent at 7.6%.</Paragraph>
    <Paragraph position="26"> This is in concordance with earlier annotation experiments (Moravcsik and Murugesan, 1975; Spiegel-Rcurrency1using, 1977).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML