We examined the prevalence of doping in competitive and recreational sport from IEM through a systematic review with a meta-analysis and bibliometric analysis.
Study Characteristics
Against the rich literature on IEM spanning over half a century [54, 55], application of this method to doping only started around the turn of the millennium [25, 26, 28, 29, 31], with the first full publication appearing in the English literature in 2006 [56]. Our findings indicate limited variability in study origin with the preponderance of studies included in the meta-analysis conducted in European countries. A plausible explanation for our finding that majority of the European studies originated from Germany or were based on German samples is recent trends and focus on IEM methodology [14, 57]. Bibliometric analyses revealed that this trend had primarily been driven by the dominance of two closely-knit but separate research groups in Germany. However, over time, trends depicted in Figs. 2, 3 and 4 show the emergence of new research groups in the UK and the Netherlands. The WADA Doping Prevalence Project between 2017 and 2023, and its focus on survey development [13], also facilitated the observed recent expansion in the number of outputs, authors and diversity in IEM.
Study participants were mostly multi-sport and diverse in competition levels comprising international, regional, national, local, and recreational, university and schools. We however found limited variability in the estimation models with majority of studies included in the meta-analysis applying the Unrelated Question and Forced Response models, which reflect the maturity of these models [54, 71]. Other IEM applied in the studies we reviewed, such as the Single Sample Count and the Crosswise Model have more recent history [14, 54]. The other potential reason for the extensive use of the Unrelated Question and Forced Response models is researcher’s preference. The research group around Ulrich, R applies the Unrelated Question model whereas Pitsch, W and colleagues work with the Forced Response model. Work arose from the WADA Prevalence Project dominantly features the Extended Crosswise Model [40, 46] with limited application of the Single Sample Count [86] in earlier studies.
Our finding that most data were not collected at sport events may be attributed to the bureaucratic and practical exigencies of data collection at sports events [59]. Due to the focus on doping in sports and the sampling of elite athletes in many studies, it is reasonable that the WADA code was applied as a definition of doping for data collection in most studies. On the other hand, the sampling of non-competitive and recreational sportspersons in other studies may explain the application of non-WADA definitions of doping.
Doping Prevalence
Doping is typically detected through biological testing using urine and blood samples producing Adverse Analytical Findings (AAF, a.k.a. positive doping tests) or via longitudinal analysis of selected biomarkers (e.g., Athlete Biological Passport). Due to the clearance rate of the prohibited substances, AAFs can only indicate an incidence that is specific to a substance or group of substances, and bound by a short time window [1, 3, 60], whereas the Athlete Biological Passport is highly sensitive to potential confounding factors [61] and predominantly applied in specific (endurance) sports. In contrast, past year/season and lifetime use of doping substances, particularly for large non-competitive samples, is more amenable to self-report such as surveys and interviews. It is therefore reasonable that most studies included in the meta-analysis assessed past year/season and lifetime doping prevalence.
The estimated lifetime and past year admitted doping prevalence rates and confidence intervals in our study suggest that one of six competitive athletes and recreational sportspersons in our sample of included studies admitted to doping under IEM with mostly overlapping confidence intervals. It is noteworthy that lifetime prevalence is naturally higher than past year/season prevalence due to the former’s wider coverage. Additionally, a plausible interpretation of the absence of significant lifetime and past year prevalence difference between competitive athletes and recreational sportspersons is the value of IEM in protecting respondents thereby facilitating the generation of honest responses [14]. Exploring the differences and advantages of combining both lifetime and past year questions for a single compound variable for prevalence estimate, Sayed et al. [46] proposed the use of multinomial model to estimate the prevalence of past year users more efficiently than the binomial model with a single question, and to create a degree of freedom necessary to test for survey instruction compliance.
Quality of Included Studies and Research Instruments
It appears that our quality assessment of data was more affected negatively by the general prevalence criteria [62] than the novel IEM-specific criteria [14]. Specifically, at least half of the studies received a ‘penalty point’ for representation, sampling frame, random selection, and the validity and reliability of the instrument applied, which comprise four of the ten criteria. In contrast, three of the ten IEM-specific criteria failed by 50% or more of the included studies. These were: statistical power (due to the relatively small sample sizes), noncompliance with survey instruction (which is a known threat to the validity of IEM-generated prevalence rates, and lack of precision, defined by 95% CI being larger than 25% of the prevalence estimates (which is the function of the IEM, the estimated prevalence rate and the sample size). Paradoxically, IEM with higher level of protection and sufficient degrees of freedom to detect and correct noncompliance tend to have larger 95% CIs. Thus, in applications, it is a compromise between validity of the data and precision of the estimation, as well as protection offered to participants.
A considerable segment of the included studies failing the general prevalence criteria [62] raises questions about the suitability of some of these criteria - developed in clinical settings where standardized assessment tools are common - for IEM. Representation, sampling frame and random selection of the participants do not seem to be specifically affected by the IEM format. However, one surprising aspect of the quality assessment is the low score for validity and reliability for all studies but one [79], even for articles otherwise rated as high quality and low risk of bias. Again, this perplexing outcome raises the question of the applicability of the previously used criteria [14] for research instruments using IEM. A more-detailed exploration of IEM validity and reliability is warranted.
Handling of Instruction Noncompliance
Among the studies included in this review, less than half addressed noncompliance with survey instructions. This is concerning because noncompliance presents the biggest threat to the validity and reliability of the IEM instrument, and therefore negatively impacts the quality of the data for prevalence estimation. The rate of assessed noncompliance in our study (28.8 ± 17.4%) is in line with those recorded in the literature where the average rate of noncompliance was estimated at 24.4% with a wide range of 3.7–67.5% [63].
Interpretation and handling noncompliance portrayed a diverse picture. Some authors in our review interpreted this as cheating [48, 51, 85]. Alternatively, some studies [38, 39, 50, 81] reported the proportion of honest ‘no’ responses, and thus leaving the combination of honest yes (admitted doping) and survey noncompliers open to interpretation. Others [27, 44, 76] assumed that survey noncompliance is motivated by self-protective cheating, and thus reported the maximum value of noncompliance as the possible upper limit of the discriminating behaviour, which resonates with a similar interpretation by Ostapczuk et al. [64]. Prevalence estimations using the Single Sample Count (SSC) model [13, 36, 49, 74] reported the estimated noncompliance proportionate to admitted dopers and honest non-dopers.
Several plausible hypotheses can be devised about how dopers and non-dopers might respond to a survey and whether motivated as well as nonmotivated noncompliance is equally present among dopers and non-dopers - e.g., Ulrich et al. [37] but these assumptions, to date, lack empirical evidence. Cruyff et al. [40] addressed self-protective noncompliance based on empirical evidence from a series of studies and literature [65], but noted the lack of a test for inattentive noncompliance in the Crosswise Models. Furthermore, Nepusz et al. [66] proved that the independent model, where noncompliance is assumed to be independent of being guilty, cannot be statistically outperformed by a dependent model that assumes that noncompliance and the guilty attributes are not independent. For example, the subsample of guilty has a higher degree of noncompliance because it combines nonmotivated careless responding with motivated self-protective lying whilst the non-guilty group is only affected by the latter. Unfortunately, the SSC model cannot help with the decision about which assumption describes actual noncompliance better because for every dependent model there is an equally fitting independent model.
The magnitude of noncompliance in this review, as well as in the broader IEM literature [63], highlighted that the weakness of indirect estimation models, and self-reports in general, is the unknown probability of dishonest and inattentive (random) responding - thus the human element. Naturally, attention turned to understanding, comprehension and trust - e.g. [67–69], and self-protective cheating - e.g. [70, 71]. However, as much as providing a safe survey environment addresses respondents’ fear of exposure, it does not necessarily motivate full engagement with the survey. Previous studies also showed that random responding is present in applications of IEM - e.g. [72, 87]. At maximum prevalence of inattentive, random responding (i.e., all respondents answer randomly), the estimated prevalence rate approaches 50%, whereas its impact is negligeable if the proportion of random responding is low [72].
Impact and Relevance of IEM in Estimating Doping Prevalence
Bibliometric analysis offered insight into the impact of published IEM-based studies for doping prevalence through exploring publication and citation patterns. Choices of journals as outlets for the IEM-based doping prevalence studies appear to be influenced by two competing interests and research focus. These are authors’ interest in the methods (i.e., positioning the article as a methodological paper where doping prevalence is only an application to generate empirical proof for the proposed method) as well as authors’ primary interest in the results (i.e., estimating doping prevalence). Juxtaposing the first, last and/or corresponding authors’ subject fields onto the journal selection where the prevalence paper published suggests that the outlet choice for the paper was more driven by ‘familiarity’ of the author with the type of journal(s) than a careful consideration of the audience (who should read about the prevalence of doping). Altogether, the results on the scientific impact of the sample of included studies conveys scientific impact as a quality component of the sample of included studies.
Our finding that the local citation network is composed of a single component suggests that all papers in the sample are directly or indirectly connected to each other through citation relations, so that no unconnected part of the sample exists. These structural features describe a highly coherent research line, where research in the topic is developing based on previous results. In this case, a certain set of “core” papers are identifiable such as Pitsch et al. [29], Striegel et al. [52], and Dietz et al. [53] that serve as the referential basis of more recent publications. On authorship, an important feature of the sample of included studies is that they show a substantial internal variation in publication years, which suggests stable or regular collaborations within the communities with only a few new, unconnected entrants to the field. In sum, the sample is quite coherent regarding authorship patterns as well.
Utility of Bibliometric Analysis in Systematically Reviewing the IEM Approach to Doping Prevalence
A secondary objective of this study was to introduce a bibliometric analysis into the systematic review as bibliometric analysis holds untapped potential to significantly contribute to the primary goals of systematic reviews and meta-analyses. One area where bibliometric analysis adds substantial value is in characterizing the quality, impact, and relevance of reviewed papers. This information enhances research synthesis by providing additional evidence for evaluating the scientific quality of knowledge and identifying publication biases. Bibliometric analysis allows placing the research topic in context for a broader understanding within the research landscape. Conversely, systematic reviews and meta-analyses contribute to bibliometric analysis by providing a structured framework for synthesizing and interpreting findings from a diverse range of studies.
In this study, we employed various bibliometric methods to characterize the multidimensional aspects of the quality components of included studies. The scientific impact of individual papers was assessed through their field-normalized citation score (NCS), providing a measure for each publication. The overall scientific impact of the sample surpasses the international average. Papers published prior to 2019 predominantly fall around or above the global average. These findings characterize scientific impact as a key quality element of the reviewed papers. An examination of the citation measure of the reviewed papers reveals the scientific impact of this specific theme, presenting a fundamental method for evaluating their significance.
The papers included into the review delineate a remarkably cohesive research trajectory, showcasing a consistent awareness of prior findings within the topic. A specific group of “core” papers can be recognized, functioning as the foundational reference for more contemporary publications. Besides, citation analysis was employed to achieve a more comprehensive understanding of how a specific paper fits into the broader landscape of the literature of IEM approach to estimating doping prevalence. Content-based citation analysis represents the evolution of traditional citation analysis, going beyond mere citation frequencies to delve into the semantic aspects of reference information. This includes examining how a reference is cited and how knowledge concepts or domain entities are referenced. Consequently, analysing the content of the included articles, particularly through citation behavior, provided insights into the knowledge development and the "semantics" of the information flow.
This analysis highlighted the functional relationship between individual research papers, specifically the transmission of methods, results, or other relevant aspects from previous work in the field. Within the sample, two models, the Forced Response (FR) and the Unrelated Question Model (UQM), dominate, with the most frequent types of connections transferring the FR and UQM models. These findings suggest methodological continuity or, at the very least, methodological awareness within this research community which seems to hold, to date, against the emergence of new models such as the SSC or the Crosswise Model. The collaboration network offers a plausible explanation for this observation, indicating that the choice for IEM for a specific study might be more based on preference or familiarity with a method than merits (see Supplementary Table 6).
In essence, the integration of bibliometric analysis into systematic reviews and meta-analyses creates a symbiotic relationship, with bibliometrics offering in-depth insights into the scientific landscape and quality of the reviewed studies, while systematic reviews and meta-analyses provide a holistic framework for interpreting and synthesizing this information. By integrating these two approaches, researchers can offer a more nuanced and comprehensive analysis of a research field by combining, depth from systematic reviews with quantitative insights from bibliometrics. Hence, this combined approach enhances the robustness and comprehensiveness of research synthesis efforts, and identifies areas for future research better than any one approach alone.
Strengths, Limitations, and Implications
To our knowledge, the present study is the first systematic review on the prevalence of doping in sport from IEM. The multi-lingual (English, German, Dutch, French, Russian, and Spanish) literature search, inclusion of quality assessment, and combination of meta-analysis, qualitative synthesis, and bibliometric analysis are other strengths of our study. Study limitations include the European and German sample predominance, the use of the average estimate for the Cheater Detection Models, and sample heterogeneity (recreational, competitive, bodybuilders etc.) limiting generalizability to a specific population of sportspersons. It is noteworthy that due to the mode of addressing survey noncompliance in some studies using the Cheater Detection Model, by either combining the estimated admitted prevalence and noncompliance as ‘potential use’ of doping or reporting honest ‘no sayers’, we used the midpoint of the lower and upper bound as the point estimate of doping use of the combined honest users and noncompliers in the meta-analysis. Other limitations include the low interrater reliability (Kappa), albeit resolved through discussion, and the susceptibility of past year and lifetime prevalence estimates to recall bias due to their retrospective nature. It is plausible that the latter is more applicable to recreational athletes than their competitive counterparts due to cognizance of the severe consequences of using prohibited substances in competitive sport.
Results of the quality assessment showing that majority of studies are of moderate quality indicates some weaknesses in the included studies. From the quality assessment, the key factor affecting the quality of included studies is low reliability and validity of study instruments. This is partially explainable by the lack of a well-validated self-report measure of doping, and denotes the importance of developing such measure for empirical research. More importantly, future work is required to define validity and reliability in IEM, and approaches to evidence these important properties for IEM-generated data on doping prevalence. Reassessment of the prevalence studies on data quality, using these new criteria, is also warranted. The results of the quality assessment also indicates that future studies can be improved mainly by, ensuring clear reporting of the parameters (e.g., CI and SE) of prevalence estimates, ensuring high response rates, conducting a priori power analysis and ensuring adequate study power. Moreover, results of the quality assessment suggest that future research can be improved by studying representative samples, ensuring noncompliance assessment, resorting to adequate sampling frames and through the use of random sampling.
In short, our estimated lifetime and past year doping prevalence rates underline the need to intensify effort to address the issue of doping in sport.
With the preponderance of included studies conducted in European countries, particularly Germany, more research is recommended in other regions and countries particularly among samples not from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) contexts. Given the limited variability in the estimation models with majority of studies applying the Unrelated Question and Forced Response models or Single Sample Count, more empirical applications of other IEM such as the Extended Crosswise Model, and Kuk’s Design, and the Cheater Detection Model are encouraged. Future IEM research on doping prevalence in sports are encouraged to navigate bureaucratic and practical obstacles to collect data at sports events. As noted previously, a more-detailed exploration of IEM validity and reliability in as well as recommendations for future studies applying IEM to doping prevalence (e.g., IEM selection, data collection, analysis, and dissemination) is warranted. We endeavour to address this in a separate article, along with recommendations for conducting and reporting IEM-based doping prevalence studies