File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/w06-1312_abstr.xml

Size: 24,690 bytes

Last Modified: 2025-10-06 13:45:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1312">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An annotation scheme for citation function</Title>
  <Section position="2" start_page="0" end_page="83" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We study the interplay of the discourse structure of a scienti c argument with formal citations. One subproblem of this is to classify academic citations in scienti c articles according to their rhetorical function, e.g., as a rival approach, as a part of the solution, or as a awed approach that justi es the current research. Here, we introduce our annotation scheme with 12 categories, and present an agreement study.</Paragraph>
    <Paragraph position="1"> 1 Scienti c writing, discourse structure and citations In recent years, there has been increasing interest in applying natural language processing technologies to scienti c literature. The overwhelmingly large number of papers published in elds like biology, genetics and chemistry each year means that researchers need tools for information access (extraction, retrieval, summarization, question answering etc). There is also increased interest in automatic citation indexing, e.g., the highly successful search tools Google Scholar and CiteSeer (Giles et al., 1998).1 This general interest in improving access to scienti c articles ts well with research on discourse structure, as knowledge about the overall structure and goal of papers can guide better information access.</Paragraph>
    <Paragraph position="2"> Shum (1998) argues that experienced researchers are often interested in relations between articles. They need to know if a certain article criticises another and what the criticism is, or if the current work is based on that prior work. This type of information is hard to come by with current search technology. Neither the author's abstract, nor raw citation counts help users in assessing the relation between articles. And even though CiteSeer shows a text snippet around the physical location for searchers to peruse, there is no guarantee that the text snippet provides enough information for the searcher to infer the relation. In fact, studies from our annotated corpus (Teufel, 1999), show that 69% of the 600 sentences stating contrast with other work and 21% of the 246 sentences stating research continuation with other work do not contain the corresponding citation; the citation is found in preceding 1CiteSeer automatically citation-indexes all scienti c articles reached by a web-crawler, making them available to searchers via authors or keywords in the title.</Paragraph>
    <Paragraph position="3">  word similarity by the relative entropy or Kulbach-Leibler (KL) distance, between the corresponding conditional distributions.</Paragraph>
    <Paragraph position="4"> His notion of similarity seems to agree with our intuitions in many cases, but it is not clear how it can be used directly to construct word classes and corresponding models of association.</Paragraph>
    <Paragraph position="5">  sentences (i.e., the sentence expressing the contrast or continuation would be outside the CiteSeer snippet). We present here an approach which uses the classi cation of citations to help provide relational information across papers.</Paragraph>
    <Paragraph position="6"> Citations play a central role in the process of writing a paper. Swales (1990) argues that scienti c writing follows a general rhetorical argumentation structure: researchers must justify that their paper makes a contribution to the knowledge in their discipline. Several argumentation steps are required to make this justi cation work, e.g., the statement of their speci c goal in the paper (Myers, 1992). Importantly, the authors also must relate their current work to previous research, and acknowledge previous knowledge claims; this is done with a formal citation, and with language connecting the citation to the argument, e.g., statements of usage of other people's approaches (often near textual segments in the paper where these approaches are described), and statements of contrast with them (particularly in the discussion or related work sections). We argue that the automatic recognition of citation function is interesting for two reasons: a) it serves to build better citation indexers and b) in the long run, it will help constrain interpretations of the overall argumentative structure of a scienti c paper.</Paragraph>
    <Paragraph position="7"> Being able to interpret the rhetorical status of a citation at a glance would add considerable value to citation indexes, as shown in Fig. 1. Here differences and similarities are shown between the example paper (Pereira et al., 1993) and the papers it cites, as well as  the papers that cite it. Contrastive links are shown in grey links to rival papers and papers the current paper contrasts itself to. Continuative links are shown in black links to papers that are taken as starting point of the current research, or as part of the methodology of the current paper. The most important textual sentence about each citation could be extracted and displayed. For instance, we see which aspect of Hindle (1990) the Pereira et al. paper criticises, and in which way Pereira et al.'s work was used by Dagan et al. (1994).</Paragraph>
    <Paragraph position="8"> We present an annotation scheme for citations, based on empirical work in content citation analysis, which ts into this general framework of scienti c argument structure. It consists of 12 categories, which allow us to mark the relationships of the current paper with the cited work. Each citation is labelled with exactly one category. The following top-level four-way distinction applies: Weakness: Authors point out a weakness in cited work Contrast: Authors make contrast/comparison with cited work (4 categories) Positive: Authors agree with/make use of/show compatibility or similarity with cited work (6 categories), and Neutral: Function of citation is either neutral, or weakly signalled, or different from the three functions stated above.</Paragraph>
    <Paragraph position="9"> We rst turn to the point of how to classify citation function in a robust way. Later in this paper, we will report results for a human annotation experiment with three annotators.</Paragraph>
    <Paragraph position="10"> 2 Annotation schemes for citations In the eld of library sciences (more speci cally, the eld of Content Citation Analysis), the use of information from citations above and beyond simple citation counting has received considerable attention. Bibliometric measures assesses the quality of a researcher's output, in a purely quantitative manner, by counting how many papers cite a given paper (White, 2004; Luukkonen, 1992) or by more sophisticated measures like the h-index (Hirsch, 2005). But not all citations are alike. Researchers in content citation analysis have long stated that the classi cation of motivations is a central element in understanding the relevance of the paper in the eld. Bonzi (1982), for example, points out that negational citations, while pointing to the fact that a given work has been noticed in a eld, do not mean that that work is received well, and Ziman (1968) states that many citations are done out of politeness (towards powerful rival approaches), policy (by namedropping and argument by authority) or piety (towards one's friends, collaborators and superiors). Researchers also often follow the custom of citing some  1. Cited source is mentioned in the introduction or discussion as part of the history and state of the art of the research question under investigation.</Paragraph>
    <Paragraph position="11"> 2. Cited source is the speci c point of departure for the research question investigated.</Paragraph>
    <Paragraph position="12"> 3. Cited source contains the concepts, de nitions, interpretations used (and pertaining to the discipline of the citing article).</Paragraph>
    <Paragraph position="13"> 4. Cited source contains the data (pertaining to the discipline of the citing article) which are used sporadically in the article.</Paragraph>
    <Paragraph position="14"> 5. Cited source contains the data (pertaining to the discipline of the citing particle) which are used for comparative purposes, in tables and statistics.</Paragraph>
    <Paragraph position="15"> 6. Cited source contains data and material (from other disciplines than citing article) which is used sporadically in the citing text, in tables or statistics.</Paragraph>
    <Paragraph position="16"> 7. Cited source contains the method used.</Paragraph>
    <Paragraph position="17"> 8. Cited source substantiated a statement or assumption, or points to further information.</Paragraph>
    <Paragraph position="18"> 9. Cited source is positively evaluated.</Paragraph>
    <Paragraph position="19"> 10. Cited source is negatively evaluated.</Paragraph>
    <Paragraph position="20"> 11. Results of citing article prove, verify, substantiate the data or interpretation of cited source.</Paragraph>
    <Paragraph position="21"> 12. Results of citing article disprove, put into question the data as interpretation of cited source. 13. Results of citing article furnish a new interpreta- null particular early, basic paper, which gives the foundation of their current subject ( paying homage to pioneers ). Many classi cation schemes for citation functions have been developed (Weinstock, 1971; Swales, 1990; Oppenheim and Renn, 1978; Frost, 1979; Chubin and Moitra, 1975), inter alia. Based on such annotation schemes and hand-analyzed data, different inuences on citation behaviour can be determined, but annotation in this eld is usually done manually on small samples of text by the author, and not con rmed by reliability studies. As one of the earliest such studies, Moravcsik and Murugesan (1975) divide citations in running text into four dimensions: conceptual or operational use (i.e., use of theory vs. use of technical method); evolutionary or juxtapositional (i.e., own work is based on the cited work vs. own work is an alternative to it); organic or perfunctory (i.e., work is crucially needed for understanding of citing article or just a general acknowledgement); and nally con rmative vs. negational (i.e., is the correctness of the ndings disputed?). They found, for example, that 40% of the citations were perfunctory, which casts further doubt on the citation-counting approach.</Paragraph>
    <Paragraph position="22"> Other content citation analysis research which is rel- null evant to our work concentrates on relating textual spans to authors' descriptions of other work. For example, in O'Connor's (1982) experiment, citing statements (one or more sentences referring to other researchers' work) were identi ed manually. The main problem encountered in that work is the fact that many instances of citation context are linguistically unmarked. Our data conrms this: articles often contain large segments, particularly in the central parts, which describe other people's research in a fairly neutral way. We would thus expect many citations to be neutral (i.e., not to carry any function relating to the argumentation per se).</Paragraph>
    <Paragraph position="23"> Many of the distinctions typically made in content citation analysis are immaterial to the task considered here as they are too sociologically orientated, and can thus be dif cult to operationalise without deep knowledge of the eld and its participants (Swales, 1986). In particular, citations for general reference (background material, homage to pioneers) are not part of our analytic interest here, and so are citations in passing , which are only marginally related to the argumentation of the overall paper (Ziman, 1968).</Paragraph>
    <Paragraph position="24"> Spiegel-Rcurrency1using's (1977) scheme (Fig. 2) is an example of a scheme which is easier to operationalise than most. In her scheme, more than one category can apply to a citation; for instance positive and negative evaluation (category 9 and 10) can be cross-classi ed with other categories. Out of 2309 citations examined, 80% substantiated statements (category 8), 6% discussed history or state of the art of the research area (category 1) and 5% cited comparative data (category 5).  rent work is better than cited work) CoCoXY Contrast between 2 cited methods PBas author uses cited work as starting point PUse author uses tools/algorithms/data PModi author adapts or modi es tools/algorithms/data PMot this citation is positive about approach or problem addressed (used to motivate work in current paper) PSim author's work and cited work are similar PSup author's work and cited work are compatible/provide support for each other Neut Neutral description of cited work, or not enough textual evidence for above categories or unlisted citation function Figure 3: Our annotation scheme for citation function Our scheme (given in Fig. 3) is an adaptation of the scheme in Fig. 2, which we arrived at after an analysis of a corpus of scienti c articles in computational linguistics. We tried to rede ne the categories such that they should be reasonably reliably annotatable; at the same time, they should be informative for the application we have in mind. A third criterion is that they should have some (theoretical) relation to the particular discourse structure we work with (Teufel, 1999). Our categories are as follows: One category (Weak) is reserved for weakness of previous research, if it is addressed by the authors (cf. Spiegel-Rcurrency1using's categories 10, 12, possibly 13). The next three categories describe comparisons or contrasts between own and other work (cf. Spiegel-Rcurrency1using's category 5). The difference between them concerns whether the comparison is between methods/goals (CoCoGM) or results (CoCoR0).</Paragraph>
    <Paragraph position="25"> These two categories are for comparisons without explicit value judgements. We use a different category (CoCo-) when the authors claim their approach is better than the cited work.</Paragraph>
    <Paragraph position="26"> Our interest in differences and similarities between approaches stems from one possible application we have in mind (the rhetorical citation search tool). We do not only consider differences stated between the current work and other work, but we also mark citations if they are explicitly compared and contrasted with other work (not the current paper). This is expressed in category CoCoXY. It is a category not typically considered in the literature, but it is related to the other contrastive categories, and useful to us because we think it can be exploited for search of differences and rival approaches.</Paragraph>
    <Paragraph position="27"> The next set of categories we propose concerns positive sentiment expressed towards a citation, or a statement that the other work is actively used in the current work (which is the ultimate praise). Like Spiegel-Rcurrency1using, we are interested in use of data and methods (her categories 4, 5, 6, 7), but we cluster different usages together and instead differentiate unchanged use (PUse) from use with adaptations (PModi). Work which is stated as the explicit starting point or intellectual ancestry is marked with our category PBas (her category 2). If a claim in the literature is used to strengthen the authors' argument, this is expressed in her category 8, and vice versa, category 11. We collapse these two in our category PSup. We use two categories she does not have de nitions for, namely similarity of (aspect of) approach to other approach (PSim), and motivation of approach used or problem addressed (PMot). We found evidence for prototypical use of these citation functions in our texts. However, we found little evidence for her categories 12 or 13 (disproval or new interpretation of claims in cited literature), and we decided against a state-of-the-art category (her category 1), which would have been in con ict with our PMot de nition in many cases.</Paragraph>
    <Paragraph position="28"> Our fourteenth category,Neut, bundles truly neutral descriptions of other researchers' approaches with all those cases where the textual evidence for a citation function was not enough to warrant annotation of that category, and all other functions for which our scheme did not provide a speci c category. As stated above, we do in fact expect many of our citations to be neutral.  Citation function is hard to annotate because it in principle requires interpretation of author intentions (what could the author's intention have been in choosing a certain citation?). Typical results of earlier citation function studies are that the sociological aspect of citing is not to be underestimated. One of our most fundamental ideas for annotation is to only mark explicitly signalled citation functions. Our guidelines explicitly state that a general linguistic phrase such as better or used by us must be present, in order to increase objectivity in nding citation function. Annotators are encouraged to point to textual evidence they have for assigning a particular function (and are asked to type the source of this evidence into the annotation tool for each citation). Categories are de ned in terms of certain objective types of statements (e.g., there are 7 cases for PMot). Annotators can use general text interpretation principles when assigning the categories, but are not allowed to use in-depth knowledge of the eld or of the authors.</Paragraph>
    <Paragraph position="29"> There are other problematic aspects of the annotation. Some concern the fact that authors do not always state their purpose clearly. For instance, several earlier studies found that negational citations are rare (Moravcsik and Murugesan, 1975; Spiegel-Rcurrency1using, 1977); MacRoberts and MacRoberts (1984) argue that the reason for this is that they are potentially politically dangerous, and that the authors go through lengths to diffuse the impact of negative references, hiding a negative point behind insincere praise, or diffusing the thrust of criticism with perfunctory remarks. In our data we found ample evidence of this effect, illustrated by the following example: Hidden Markov Models (HMMs) (Huang et al. 1990) offer a powerful statistical approach to this problem, though it is unclear how they could be used to recognise the units of interest to phonologists. (9410022, S-24)2 It is also sometimes extremely hard to distinguish usage of a method from statements of similarity between a method and the own method. This happens in cases where authors do not want to admit they are using somebody else's method: The same test was used in Abney and Light (1999). (0008020, S-151) Uni cation of indices proceeds in the same manner as uni cation of all other typed feature structures (Carpenter 1992).</Paragraph>
    <Paragraph position="30"> (0008023, S-87) In this case, our annotators had to choose between categories PSim and PUse.</Paragraph>
    <Paragraph position="31"> It can also be hard to distinguish between continuation of somebody's research (i.e., taking somebody's 2In all corpus examples, numbers in brackets correspond to the of cial Cmp lg archive number, S- numbers to sentence numbers according to our preprocessing.</Paragraph>
    <Paragraph position="32"> research as starting point, as intellectual ancestry, i.e. PBas) and simply using it (PUse). In principle, one would hope that annotation of all usage/positive categories (starting withP), if clustered together, should result in higher agreement (as they are similar, and as the resulting scheme has fewer distinctions). We would expect this to be the case in general, but as always, cases exist where a con ict between a contrast (CoCo) and a change to a method (PModi) occur: In contrast to McCarthy, Kay and Kiraz, we combine the three components into a single projection. (0006044, S-182) The markable units in our scheme are a) all full citations (as recognized by our automatic citation processor on our corpus), and b) all names of authors of cited papers anywhere in running text outside of a formal citation context (i.e., without date). Our citation processor recognizes these latter names after parsing the citation list an marks them up. This is unusual in comparison to other citation indexers, but we believe these names function as important referents comparable in importance to formal citations. In principle, one could go even further as there are many other linguistic expressions by which the authors could refer to other people's work: pronouns, abbreviations such as Mueller and Sag (1990), henceforth M &amp; S , and names of approaches or theories which are associated with particular authors. If we could mark all of these up automatically (which is not technically possible), annotation would become less dif cult to decide, but technical dif culty prevent us from recognizing these other cases automatically. As a result, in these contexts it is impossible to annotate citation function directly on the referent, which sometimes causes problems. Because this means that annotators have to consider non-local context, one markable may have different competing contexts with different potential citation functions, and problems about which context is stronger may occur. We have rules that context is to be constrained to the paragraph boundary, but for some categories paperwide information is required (e.g., for PMot, we need to know that a praised approach is used by the authors, information which may not be local in the paragraph).</Paragraph>
    <Paragraph position="33"> Appendix A gives unambiguous example cases where the citation function can be decided on the basis of the sentence alone, but Fig. 4 shows a more typical example where more context is required to interpret the function. The evaluation of the citation Hindle (1990) is contrastive; the evaluative statement is found 4 sentences after the sentence containing the citation3. It consists of a positive statement (agreement with authors' view), followed by a weakness, underlined, which is the chosen category. This is marked on the nearest markable (Hindle, 3 sentences after the citation). null  S-5 Hindle (1990)/Neut proposed dealing with the sparseness problem by estimating the likelihood of unseen events from that of similar events that have been seen.</Paragraph>
    <Paragraph position="34"> S-6 For instance, one may estimate the likelihood of a particular direct object for a verb from the likelihoods of that direct object for similar verbs.</Paragraph>
    <Paragraph position="35"> S-7 This requires a reasonable de nition of verb similarity and a similarity estimation method.</Paragraph>
    <Paragraph position="36"> S-8 In Hindle/Weak 's proposal, words are similar if we have strong statistical evidence that they tend to participate in the same events.</Paragraph>
    <Paragraph position="37"> S-9 His notion of similarity seems to agree with our intuitions in many cases, but it is not clear how it can be used directly to construct word classes and correspond- null A naive view on this annotation scheme could consider the rst two sets of categories in our scheme as negative and the third set of categories positive .</Paragraph>
    <Paragraph position="38"> There is indeed a sentiment aspect to the interpretation of citations, due to the fact that authors need to make a point in their paper and thus have a stance towards their citations. But this is not the whole story: many of our positive categories are more concerned with different ways in which the cited work is useful to the current work (which aspect of it is used, e.g., just a de nition or the entire solution?), and many of the contrastive statements have no negative connotation at all and simply state a (value-free) difference between approaches. However, if one looks at the distribution of positive and negative adjectives around citations, one notices a (non-trivial) connection between our task and sentiment classi cation.</Paragraph>
    <Paragraph position="39"> There are written guidelines of 25 pages, which instruct the annotators to only assign one category per citation, and to skim-read the paper before annotation. The guidelines provide a decision tree and give decision aids in systematically ambiguous cases, but subjective judgement of the annotators is nevertheless necessary to assign a single tag in an unseen context. We implemented an annotation tool based on XML/XSLT technology, which allows us to use any web browser to interactively assign one of the 12 tags (presented as a pull-down list) to each citation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML