Researcher samples. Each co-author assembled an example set of researchers from within her/his field, which we broadly defined as archaeology (S.A.C.), chemistry (J.M.C.), ecology (C.J.A.B.), evolution/development (V.W.), geology (K.T.), microbiology (B.A.E.), ophthalmology (J.R.S.), and palaeontogy (J.A.L.). Our basic assembly rules for each of these discipline samples were: (i) 20 researchers from each stage of career, defined here arbitrarily as early career (0–10 years since first peer-reviewed article published in a recognized scientific journal), mid-career (11–20 years since first publication), and late career (> 20 years since first publication); each discipline therefore had a total of 60 researchers, for a total sample of 8 × 60 = 480 researchers across all disciplines. (ii) Each sample had to include an equal number of women and men from each career stage. (iii) Each researcher had to have a unique, publicly accessible Google Scholar profile with no obvious errors, inappropriate additions, obvious omissions, or duplications. The entire approach we present here assumes that each researcher’s Google Scholar profile is accurate, up-to-date, and complete.
We did not impose any other rules for sample assembly, but encouraged each compiler to include only a few previous co-authors. Our goal was to have as much ‘inside knowledge’ as possible with respect to each discipline, but also to include a wide array of researchers who were predominantly independent of each of us. The composition of each sample is somewhat irrelevant for the purposes of our example dataset; we merely attempted gender and career-level balance to show the properties of the ranking system (i.e., we did not intend for sampling to be a definitive comment about the performance of particular researchers, nor did we mean for each sample to represent an entire discipline). Finally, we completely anonymized the sample data for publication.
Citation data. Our overall aim was to provide a meaningful and objective method for ranking researchers by citation history without requiring extensive online researching or information that was not easily obtainable from a publicly available, online profile. We also wanted to avoid an index that was overly influenced by outlier citations, while still keeping valuable performance information regarding high-citation outputs and total productivity (number of outputs).
For each researcher, the algorithm requires the following information collected from Google Scholar: (i) i10-index (the number of publications in the researcher’s profile with at least 10 citations, which we denoted i10); one condition is that a researcher must have i10 ≥ 1 for the algorithm to function correctly; (ii) h-index — the researcher’s Hirsch number 4: the number of publications with at least as many citations, which we denoted h; (iii) the number of citations for the researcher’s most highly cited paper (denoted cm); and (iv) the year the researcher published her/his first peer-reviewed article in a recognized scientific journal (denoted Y1). For the designation of Y1, we excluded any reports, chapters, books, theses or other forms of publication that preceded the year of the first peer-reviewed article; however, we included citations from the former sources in the researcher’s i10, h, and cm.
Ranking algorithm. The algorithm first computes a power-law-like relationship between the vector of frequencies (as measured from Google Scholar): i10, h, and 1, and the vector of their corresponding values: 10, h, and cm, respectively. Thus, h is, by definition, both a frequency (y-axis) and value (x-axis). We then calculated a simple linear model of the form y ~ α + βx, where
$$\:y={\text{log}}_{e}\left[\begin{array}{c}{i}_{10}\\\:h\\\:1\end{array}\right]\:\text{a}\text{n}\text{d}\:x={\text{log}}_{e}\left[\begin{array}{c}10\\\:h\\\:{c}_{m}\end{array}\right]$$
(y is the citation frequency, and x is the citation value) for each researcher (Supplementary Information Fig. S2). The corresponding \(\:\widehat{\alpha\:}\) and \(\:\widehat{\beta\:}\) for each relationship allowed us to calculate a standardized integral (area under the power-law relationship, Arel) relative to the researcher in the sample with the highest cm. This implies all areas were scaled to the maximum in the sample.
A researcher’s Arel therefore represents her/his citation mass, but this value still requires correction for individual opportunity (time since first publication, t = current year – Y1) to compare researchers at different stages of their career. This is where career gaps can be taken into account explicitly for any researcher in the sample by subtracting ai = the total cumulative time absent from research (e.g., maternity or paternity leave, sick leave, secondment, etc.) for individual i from t, such that an individual’s career gap-corrected . We therefore constructed another linear model of the form Arel ~ γ + θloget across all researchers in the sample, and took the residual (ε) of an individual researcher’s Arel from the predicted relationship as a metric of citation performance relative to the rest of the researchers in that sample (Supplementary Information Fig. S3). This residual ε allows us to rank all individuals in the sample from highest (highest citation performance relative to opportunity and the entire sample) to lowest (lowest citation performance relative to opportunity and the entire sample). Any researcher in the sample with a positive ε is considered to be performing above expectation (relative to the group and the time since first publication), and those with a negative ε fall below expectation. This approach also has the advantage of fitting different linear models to subcategories within a sample to rank researchers within their respective groupings (e.g., such as by gender; Supplementary Information Fig. S4). An R code function to produce the index and its variants using a sample dataset is available from github.com/cjabradshaw/EpsilonIndex.
Discipline standardization. Each discipline has its own citation characteristics and trends 16, so we expect that the distribution of residuals (ε) within each discipline to be meaningful only for that discipline’s sample. We therefore endeavored to scale (‘normalize’) the results such that researchers in different disciplines could be compared objectively and fairly.
We first scaled the Arel within each discipline by dividing each i researcher’s Arel by the sample’s root mean square:
$$\:{A}_{{\text{rel}}_{i}}^{{\prime\:}}=\frac{{A}_{{\text{rel}}_{i}}}{\sqrt{\frac{\sum\:_{i=1}^{n}{A}_{{\text{rel}}_{i}}}{n-1}}}$$
where n = the total number of researchers in the sample (n = 60). We then regressed these discipline-scaled \(\:{A}_{\text{rel}}^{{\prime\:}}\) against the loge number of years since first publication pooling all disciplines together, and then ranked these scaled residuals (ε′) as described above.