The number of non-self-citations (NSC) to one author from one paper approximates a Zipfian distribution
Citation hacking, by definition, takes place on the level of the individual paper. The frequency of citations, within one published paper, to any of a single author’s entire body of published papers, is approximately power-law distributed or Zipfian. For our data, we find linear projection (Ordinary Least Squares) of NSC frequency on NSC number, on a log-log scale explains more than 90% of variability (R2 > 0.9) for more than 95% of the authors. Figure 2 shows this linear log-log relationship, characteristic of a Zipfian distribution, is roughly valid overall, for our author subset. Self-citations (SC) did not follow a Zipfian distribution, but this was expected since they are governed by different mechanisms (i.e., an author’s preference as opposed to external awareness/interest).
Zipfian distributions are known to arise in a variety of natural systems and are thought to be governed by laws of preferential attachment 15,16. An important implication of Zipf’s law to this study is that, because the trend is linear in log-space, future values in the series can be approximated using the initial or early values in the series. Thus, the frequency by which any author has received n citations from a paper should be proportional to the frequency by which they receive n + 1, n + 2, etc. This frames the problem overall by defining statistical expectations regarding what is “normal” and enabling null hypothesis testing for observed frequency distributions. It does not mean that authors with skewed distributions have necessarily engaged in RLM, but it does mean that all authors who do engage in RLM will have skewed distributions in proportion to their activity. The more references requested per attempt and the more frequent the requests, the more severe the skew.
Identifying “red flags” suggestive of citation hacking
Detecting citation hacking is complicated by several factors: First, the fraction of all authors regularly attempting to influence citation of their work to a large degree during peer-review is presumed to be rare. Second, there is no gold-standard or ground truth to evaluate how well a metric reflects citation hacking activity. Third, citation hackers may have different strategies and/or opportunities to influence reference lists. Finally, accusations of citation hacking would be a sensitive matter and, consequently, more than one line of evidence is highly desirable. These are common circumstances in the field of anomaly (or outlier) detection. One method of gaining confidence in declaring a data point an anomaly is for it to be anomalous by several different measures attributable to a common cause 17. For example, readings across multiple sensors within a device are often used to diagnose a potential root cause (e.g., overheating, electrical surge, etc). So, we examined several citation pattern “red flags” that are suggestive, but not independently conclusive, of citation hacking (Table 1).
Table 1
Summary of red flags used to identify patterns of behavior that are suggestive of RLM.
Red Flag #
|
Abbr.
|
For each author, an unusually high frequency of:
|
1
|
Blocks
|
Consecutive NSC to an author within papers
|
2
|
NSCI
|
H-index for papers with multiple NSC to an author
|
3
|
17+
|
# of papers with extreme NSC events
|
4
|
MCJI
|
Journal-specific NSC H-index for an author
|
5
|
%SC
|
% of reference list used for self-citation
|
Each flag is motivated by economic considerations: If someone wants to increase their perceived influence via increased citation of their work, but their supply of opportunities is limited, then their incentive is to maximize the number of references added per opportunity available. This is mitigated by other factors such as author compliance and editorial intervention, but also by some expectation the hackers have regarding potential costs associated with their behavior being called into question.
The Gini Index as a proposed metric for quantifying skew in a frequency distribution
The Gini Index is a well-known statistical measure of dispersion and inequality18. It is especially popular in economics to quantify inequalities of income distribution. Formally, the Gini coefficient is proportional to the area between the Lorenz curve, which plots the cumulative changes of a variable of interest across a population normalized to a percent value (1%-100%), and the straight line representing uniformity. Gini values range from 0 (perfect uniformity) to 1 (one observation contains all the values). A equivalent simpler formula defines the Gini coefficient as half of the relative mean absolute difference of the variable under study19. For an author cited at least once within n papers written by other authors, where xi is the number of non-self citations (NSC) to that author given within the ith paper (i = 1,…,n) and xj the NSC from the jth paper (j = 1,…,n), the Gini coefficient can be computed as:
Where
is the mean NSC per paper. A friendly implementation of the formula is provided by the function “Gini” within the ineq R package. Thus, the Gini will be a measure of how skewed the NSC distribution is for each author across all papers referencing any of that author’s entire body of work. If all papers received the same number of citations the Gini would be zero (perfect uniformity), and if all the citations came from one paper it would be one (maximally non-uniform). One advantage of the Gini is that, as a scale-independent relative measure, it does not require normalization. The distribution of Gini index values for all authors is shown in Fig. 3.
To examine how sensitive the Gini is to outliers, we removed the paper with the most NSC for each author, recalculated their Gini and compared the rankings using Spearman’s correlation. We found a high concordance (R = 0.996) and that within the top 1% highest Gini’s (n = 208), the most an individual author dropped was 32 positions, suggesting their membership at the extreme end was not sensitive to single outlier removal.
We also wanted to examine whether or not the Gini might be influenced by “mega-reviews”. Mega-reviews are papers with an unusually high number of references that attempt to summarize work within a large area of research. Thus, they might be prone to citing individuals in the field more frequently in a single paper. Since there is no standard definition for how many references make a paper a mega-review, we recalculated Ginis after excluding NSC from all papers with > 150 references, which encompasses only 0.8% of all papers but 6.3% of all references. Within the top 1%, one author’s Gini fell by a striking 788 positions, but the average drop for the remaining authors was only 1 rank. Further examination of this author shows 128/991 (13%) of the papers with > = 1 NSC to them also had > 150 references, suggesting 150 may be an insufficiently small cutoff for some fields, and showing that this drastic change in Gini was not attributable to a small number of mega-reviews.
Finally, we estimated how much their Most Citing Author (MCA) affected the Gini of each author in the list, as defined by an H-index-like measure of at least n NSC observed in at least n papers for each citing author. Within the top 1%, the ranking for two authors dropped substantially (Fig. 4), but most did not change appreciably. Note that Gini contributions from single outliers and mega-reviews are viewed as potential confounds, but are not necessarily innocuous. For example, an MCA simply be an admirer or a quid-pro-quo might exist, and a mega-review might simply reflect an author’s specialization within the topic of review or the emphasis on coverage may have created the impression for a reviewer/editor that more of their work qualifies. Based on string similarity of each author’s name versus their MCA, we estimate that about 1.4% of the authors have an MCA that is actually a variation on the spelling of their own name. This suggests that mistaking SC for NSC is happening at a fairly low rate.
Red Flag #1: Seeing large and/or frequent blocks of consecutive NSC to an author within a paper
References are expected to support the topic at hand and, although there may be valid reasons for an author-centric series of consecutive references, it is not the norm. Unusually large and/or frequent blocks of contiguous NSC are highly suggestive authors may be accommodating a request from either a reviewer or editor. Not only did we observe this for the coercive reviewer documented in our case report 8, but it makes sense that when authors are adding citations solely to satisfy a reviewer’s concerns, they would generally do it in one or possibly a few blocks of consecutive citations. The alternative would be to try to weave them throughout the text and, particularly for unmerited citations, would be quite difficult to do in a way that appeared natural or logical to the reader. Thus, we expect that a common “fingerprint” left by citation hackers would be large blocks of contiguous citations to them within a paper. This metric lends itself to validation by examining the surrounding context of citation blocks. The more generic the statement (e.g., “other work has been done in this area”) and the larger the block, the less likely the citations were motivated by the topic or necessary to the paper. Considering authors with at least 200 NSC (n = 20,712), the ratio of total consecutive NSC (3 minimum) to total NSC shows that such citation blocks are relatively uncommon events in general, with 30% of authors having none at all (Fig. 5).
Red Flag #2: Multiple papers with an unusually large number of NSC to an author relative to their total NSC
Supplementary Table S1 shows the probability of observing n citations to one author’s work in a paper, both NSC and SC. For example, although it is rare to see more than 5 references to someone who is not an author of the paper (~ 1%), it is fairly common to see more than 5 citations to one of the paper’s authors (23%). Thus, conceptually similar to the H-index, we can calculate an NSC Index (NSCI) using the number of times, n, that at least n NSC came from one paper to a specific author. Because the H-index correlates with the square root of an author’s total citations 20, the NSCI is normalized (see methods). Figure 6 shows the distribution of NSCI values for all authors.
Red Flag #3: Papers that contain an extremely large number of NSC to an author’s body of work
The NSCI, similar to the H-index, seeks to discount the extreme end of the citation curve in favor of a metric that more stably reflects the entire distribution. However, extreme events are not only informative, but reflect how egregious the hacker can be and also represent instances whereby the peer-review system has clearly broken down. For example, if someone coerces the insertion of 49 references to their work, the NSCI could detect this if evenly spread out (7 papers with 7 NSC each), but not if they are all in one paper. Similarly, it seems less concerning to discover an editor did not question or notice a reviewer requesting 7 self-citations in 7 separate reviews spread out over time versus an editor who did not question or notice a reviewer who requested 49 self-citations in one review. Whereas the NSCI prioritizes consistency over egregiousness, the observed to expected ratio of extreme events prioritizes egregiousness over consistency. Note: There may be valid reasons for a paper to contain an extreme numbers of NSC to one author (e.g., honoring a retired or deceased author). However, we expect this to be relatively infrequent for most authors, but much more common among citation hackers.
As Supplementary Table S1 shows, the odds that 17 or more NSC will come from one paper to one author is approximately 0.025%, or once per 3,942 papers that contain at least one NSC to an author. 17 was chosen simply because it is a fairly high threshold such that observing 17 + NSC from a single paper to a single author would be an uncommon event for the average author. Indeed, 82% of authors in our subset have never even received 17 + NSC. A total of 12,110 instances of 17 + NSC to one author within one paper were observed within our subset These 12,110 instances contained a total of 261,067 citations (avg = 22). For authors with at least one 17 + NSC, an expected number of such events is computed by modeling the increase with total NSC using Poisson regression (Fig. 7).
Red Flag #4: An unusually large number of NSC to one author coming from papers published within one journal
A high number of NSC to one author from papers published in a specific journal are suggestive of a researcher who may have requested citations to their work in their capacity as handling editor or, possibly, a reviewer frequently used by an editor. First, for each journal publishing a paper with at least one NSC to an author, an H-index like measure, MCJI, is calculated reflecting the largest number of papers, n, whereby at least n NSC were observed. The journal with the highest n is denoted here as the Most Citing Journal (MCJ) for that author. Figure 8 shows the distribution of MCJI values normalized to author’s total number of NSCs from MCJ journal.
Note
Although a high MCJI may be informative, a low MCJI could reflect lack of editorial appointments or reluctance to use that venue for citation hacking. For example, unlike reviewers, editors lack anonymity and their requests would be seen by both authors and reviewers. In fact, a recently published case of an editor using his position to coerce reviewers into citing his papers found he created email pseudonyms so that requests to cite his papers appeared to be coming from a reviewer 21.
Red Flag #5: Self-citation at the cost of excluding field coverage
Excessive self-citations, as measured by the fraction of space reserved in an author’s reference list for self-citation, are suggestive that an author is not merely attempting to draw attention to their prior work but, more specifically, using the publication opportunity to maximize the total number of citations to their work. SC are generally transparent and can be subtracted from metrics such as the H-index, when desired. But given that SC may or may not be subtracted, for someone who wants to increase the perceived influence of their work, there is no reason to restrict their efforts to only papers they handle during peer-review, particularly when there is no consensus on whether or not excessive self-citation is an ethical breach 22,23. Thus, there is potential reward without much risk. Figure 9 shows the distribution of fractional self-citation per author, for authors with a minimum of 9 “anchor” author (i.e., 1st or last author) papers with at least 100 total references among them, broken down by anchor vs middle.
Gini captures 95% of the ranking information from other metrics
Examining the correlation structure of each red flag metric (Table 2) shows each one contains information about the others to some degree, but are not so highly correlated that they are redundant. While correlations among the first five NSC-based variables were expected, we were surprised initially to see such a strong correlation between NSC-based metrics and %SC which, in theory, should be completely unrelated.
Factor analysis, similar to Principal Component Analysis, searches for one or more latent/unobserved variables that best explains joint variation within a group of variables. Here, the first latent factor accounts for 55.4% of the group’s total variation, and the second factor only 11%. Factor scores for each variable suggest how well they reflect the behavior of the group. Notably, Gini has the highest predictive power, explaining 51.3% of the first latent variable. This, plus its relative simplicity and the fact it does not require normalization, makes it a good metric to detect potential citation hacking. Supplementary Figs. 1–7 show plots of each factor versus the others, normalized and non-normalized.
Table 2
Spearman’s rank correlation among “red flags” that are suggestive of citation hacking. Gini = Gini index of NSC distribution; NSCI = Non-Self Citation Index; %SC = average percent of the reference list used for self-citation; Blocks = citations in contiguous blocks ( > = 3); 17 + = Papers with 17 or more NSC to one author; MCJI = Most Citing Journal Index; Factor = aggregate factor analysis score.
|
Gini
|
NSCI
|
Blocks
|
17+
|
MCJI
|
%SC
|
Factor
|
Gini
|
1
|
0.82
|
0.63
|
0.48
|
0.68
|
0.55
|
0.95
|
NSCI
|
0.82
|
1
|
0.66
|
0.52
|
0.61
|
0.52
|
0.93
|
Blocks
|
0.63
|
0.66
|
1
|
0.43
|
0.47
|
0.51
|
0.75
|
17+
|
0.48
|
0.52
|
0.43
|
1
|
0.39
|
0.35
|
0.58
|
MCJ
|
0.68
|
0.61
|
0.47
|
0.39
|
1
|
0.38
|
0.73
|
%SC
|
0.55
|
0.52
|
0.51
|
0.35
|
0.38
|
1
|
0.63
|
Estimating levels of chronic and acute citation hacking
The Gini coefficient can be written in many different forms. For example, Lubrano showed Gini can be written as a scaled mean of absolute differences and that the Gini coefficient could be seen as the covariance between a variable and its rank 24. Covariance is itself a mean that converges to a normal distribution. Under the assumption that the distribution of NSC Gini index values is approximately Gaussian, which is supported by prior studies 25, we can estimate two things. First, we can assign a statistical confidence by which we can reject the null hypothesis that an author’s Gini index value is part of the normal distribution. Chronic citation hackers who engage in repeated and/or egregious RLM would be expected to fall well outside the norm. Second, under the additional assumption that authors would rarely ask for removal of references to their work from papers they review or handle as editor, then in a world where RLM did not exist, the left and right-hand sides of the curve should be symmetric. Because we do know, at least from a limited number of reports, RLM has happened, then the extent to which the real-world curve is right-shifted relative to the ideal curve provides us with a quantitative estimate of the difference.
The black line in Fig. 10 shows the distribution of Gini values for all authors, whereas the red dotted line shows a Gaussian distribution, with a mean and deviance that best fit the left-hand side of the curve. We then compute for each Gini number the p-value by which we can reject the null hypothesis that it is part of the reference distribution, correcting for false discovery rate (FDR) 26. Authors with the lowest FDR values will correspond to those who have an abnormally large NSC Gini index and by which the null hypothesis that such patterns are normal can be confidently rejected. The full list of authors and their FDR p-values is available upon request to the corresponding author. There are 81 authors (0.4%) with FDR < 0.05 and 231 with FDR < 0.10 (1.1%). Summing 1-FDR across the entire set, provides us with an estimate that 3,284 (16%) of the authors in our subset have higher Gini values than expected (Fig. 10, grey area). Note that this is a population-level statement, not a threshold to evaluate individual Gini scores. It suggests that about 16% of authors may have engaged to some degree, on one or more occasions, in RLM.
Excessive self-citation suggests an author may be more likely to coerce others to cite their work
Because we were somewhat surprised to see %SC ranking correlating so well with all the NSC-based metrics (Table 2), we examined whether or not %SC effectively represents a risk factor for RLM. A prior small-scale study of Norwegian authors found a correlation between SC and NSC 27. They hypothesized the increase in NSC was due to an “advertising” effect caused by SC. However, an alternative hypothesis might be that authors that place a high value on citation of their work might be inclined to use all venues available to them, citing themselves and asking others to as well.
Figure 11 shows that, as an author’s %SC rises, their NSC-based Gini index FDR value drops. This means the more of their reference list an author reserves for self-citation, the more distorted their single-paper NSC frequency distribution is. Interestingly, the figure shows the average FDR curve flattening around 20% SC, suggesting that the ability of %SC to predict coercive NSC behavior has reached a point of diminishing returns. Recursive partitioning identified ≥ 18% SC as the optimal threshold for the group, separating a set of 948 authors at almost 50% risk (average FDR < = 0.5), significantly higher than the rate for the group as a whole (t-test statistic = 37, p < 1e-16).
Case Studies
We provide a list of all authors analyzed, their Gini values and their red flag metrics in a supplementary Excel file which is available upon request to the corresponding author. It is important to note that numbers are calculated using the PMC citation network subset and will be different from the same calculations (e.g., total papers & citations) derived using different sources such as ISI or Google Scholar. Our goal in this section is not to judge guilt or innocence, but to illustrate how high Gini scores tend to be associated with multiple unusual patterns (“red flags”) suggestive of citation hacking, and to show how the red flag metrics lend themselves towards reasonable hypotheses regarding the potential origin of the distortion. For example, patterns that are suggestive of reviewer-coerced citation, editorial coercion or intervention, and author-author co-citation patterns that suggest mutual benefit.
The author with the highest Gini (Gini Rank #1), received 17 + NSC to his own papers 73 times, despite being in the 40th percentile in total NSC, and has the highest observed to expected 17 + NSC ratio (114) by far among all authors. His MCJ index is also the highest among all authors (2.87), which is due to 72 of these 73 papers all coming from one journal (Surgical Endoscopy). Examining a random subset of these papers, we find they are predominantly from commentaries on other papers published in the journal rather than research papers. He has the highest rank in multiple red flag categories. He has the highest ratio of consecutive blocks per NSC (0.664), and the presence of very large blocks (> 20) of consecutive citations were confirmed by manual examination of a random subset. He fell below threshold for fractional self-citation calculations with only 8 anchor author papers in the citation network, but averaged 32% self-citations in these 8. Interestingly, unusual self-citation and co-citation patterns for this author were reported in a prior study 7, which hypothesized that such a pattern suggested he was attempting to raise his H-index. Combined, this is highly suggestive of editorial citation hacking.
The author with Gini Rank #2, ranks in the 12th percentile for total NSC, but received 17 + NSC to his work from 60 separate papers (Obs/Exp = 42), the 3rd highest. An estimated 35% of the extracted references in the papers he authored are self-citations (rank = 19th out of 20,803). His MCJI suggests these distortions, in general, are not attributable to influence at one specific journal. He has the 7th highest number of blocks/NSC. Examining the context of the block citations in the published papers, we found a high degree of textual similarity surrounding them and that the context of the citations appears trivial (e.g., mentioning that user-friendly webservers are important followed by a very large number of citations to his papers). We also noticed that his name is mentioned frequently in the title of papers with excessive NSC, and that these have become increasingly frequent in recent years. Querying MEDLINE directly, we estimated almost 200 publications mention him by name in the title, but found only ~ 10% within our extracted citation network, suggesting the magnitude of his Gini distortion may be a significant underestimate. Googling textual phrases preceding large citation blocks to his papers (e.g., “to develop a really useful predictor") shows that the same phrases appear in hundreds of papers verbatim as well as in a post-publication peer-review report online 28. In this report, the reviewer requests 147 citations, the vast majority to this author, and after the first round of revision, rejects the paper because the authors did not accept the 1st -round request to change their title to include his name. This pattern is highly suggestive of reviewer-coerced citation, at least as a primary mechanism. However, we did observe an unusually large, but transient, surge of extreme NSC per paper coupled with his name being mentioned in the title of papers within two journals (Prot Pept Let from 2012-3 and J Theor Biol from 2018-9). We contacted J Theor Biol in 2019 because his activity was ongoing at the time, and an investigation by the editors in chief revealed that he had committed a number of ethical breaches as editor, including coercive requests to cite a large number of his papers 21.
The authors with Gini rank #3 and #8 cite each other extensively in papers they do not co-author, with #8 being the Most Citing Author (MCA) of #3 but not vice-versa. Having the same last name suggests they are related, the bylines in their papers shows they work at the same institution, the titles of their non-joint papers suggest they research the same subject, and PubMed shows they co-author frequently together. One of the reasons they may score so prominently is they also have a high rate of self-citation. A total of 19% and 20%, respectively, of their reference lists were self-citations, which puts them each in the top 3%. So on one hand, their NSC distortion could be attributed to a proclivity for self-citation plus entangled research activity but, on the other, it can be argued they each benefit from these co-citation patterns.