Exploring Doping Prevalence in Sport from Indirect Estimation Models: A Systematic Review and Meta-Bibliometric Analysis

doi:10.21203/rs.3.rs-4104397/v1

Download PDF

Research Article

Exploring Doping Prevalence in Sport from Indirect Estimation Models: A Systematic Review and Meta-Bibliometric Analysis

https://doi.org/10.21203/rs.3.rs-4104397/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

To our knowledge, no previous systematic review and meta-analysis on doping prevalence in sport from Indirect Estimation Models (IEM) exists. We conducted a systematic review and meta-analysis complemented with a bibliometric analysis on empirical IEM-based studies of admitted doping prevalence in sport.

Methods

We conducted electronic database and ad hoc searches up to December 2023, estimated lifetime and past year prevalence rates, and conducted study quality/risk of bias as well as bibliometric analysis.

Results

Forty five studies were included in the review (meta-analysis: k = 33, bibliometric analysis: k = 22). The World Anti-Doping Agency’s definition of doping use was applied for data collection in most studies (k = 18) and doping prevalence was mostly assessed as past year/season (k = 20). Studies included in the meta-analysis were mostly conducted in Europe, particularly Germany (k = 10), and applied the Unrelated Question (k = 11) and Forced Response (k = 10) models. The lifetime prevalence rate of doping for competitive athletes was 22.5% (95% CI: 15.3–31.4, k = 10) and 17.2% (95% CI: 10.5–26.0, k = 4) for recreational sportspersons (t = -1.69, p = .100). Additionally, the past year prevalence rate of doping for competitive athletes was 14.3% (95% CI: 9.6–20.3, k = 14) and 10.3% (95% CI: 6.1–16.3, k = 7) for recreational sportspersons (t = -1.69, p = .100). Study participants were mostly multi-sport (k = 22) and competed at diverse levels. Additionally, majority of data (k = 27) were collected outside sport events. Most studies were evaluated as of moderate risk/quality (k = 17). Eight of the 17 journals where the prevalence estimation studies were published target sports readership. The mean normalized citation score is above the international average at 1.48 (range: 0.00–6.26), with two dominant but unconnected author communities.

Conclusions

One of six competitive athletes and recreational sportspersons in our sample of included studies admitted doping under IEM, exceeding the prevalence obtained via direct questioning and analytical testing. Despite their advantages, considerable internal variation in publication years and stable collaborations within the two identified communities of authors, IEM-based studies of doping prevalence require functional improvement.

bibliometrics

doping

indirect estimation models

meta-analysis

prevalence

One of six competitive athletes and recreational sportspersons in our sample of included studies admits to doping under I Indirect Estimation Models (IEM), with no significant differences in lifetime and past year prevalence rates between competitive athletes and recreational sportspersons.
Empirical applications of IEM for doping prevalence, mostly assessed as past year/season, have been preponderantly conducted in Europe, particularly Germany, and based mainly on the Unrelated Question and Forced Response models with samples largely multi-sport competitors of various levels.
About half of publication outlets of studies are targeted at sports readership with two unconnected author communities characterized by internal variation in publication years and stable collaborations.
Evidence that most empirical applications of IEM for doping prevalence are evaluated as of moderate risk/quality denotes the need to improve empirical applications of IEM for optimal functioning in doping prevalence estimation to inform anti-doping efforts.
The novel combination of systematic review with meta- and bibliometric analyses provides detailed elucidation of the corpus literature on IEM-based doping estimation, and enhanced data synthesis.

Estimating the prevalence of doping, operationalized as the use of prohibited substances and/or methods without therapeutic use exemption, is vital for sport for multiple reasons. First, doping prevalence reflects the burden of doping in a given population such as a sport, a country or a major event. This burden is not limited to the costs of doping control but includes the detrimental impact on the sport, the athletes, event organisers, supporters, sponsors, broadcasters, sport equipment and apparel manufacturers. Second, knowing the prevalence of doping assists policy makers in determining where investments in detection and prevention should be targeted. It is also useful for formulating preventive measures and methods for early detection, as well as for monitoring programmes for evaluating the effectiveness of the measures put in place to reduce prevalence and incidence rates.

In the estimation of doping prevalence in sport, researchers and practitioners turned to self-reported surveys which, although cost-effective and feasible for large representative samples, have limitations such as dishonest and socially desirable responding [1–3]. To address this limitation, anti-doping researchers and organisations have turned to Indirect Estimation Models (IEM) such as the Randomised Response Technique (RRT) [3] surveys as they offer protection over and above anonymity due to their unique design [4]. Examples of RRT are the Forced Response [5], Kuk’s [6] Design, the Unrelated Question Model [7, 8], and the Crosswise Model [9]. However, as far as we are aware, previous systematic review and meta-analysis from IEM – which allows for pooling smaller samples segmented by countries, sports and different levels of competition together for estimating the global prevalence of doping in sport – has not been conducted. Therefore, the primary aim of this systematic review is to identify research outputs using one or more IEM to estimate the prevalence of doping in sport, to critically evaluate these studies, and qualitatively and quantitatively synthesize related research findings.

A secondary aim was to introduce and incorporate bibliometric analysis into the systematic review. Bibliometric analysis, a set of methods for quantitative analyses of citation and patterns of literature [10], has been increasingly popular in meta-analytic studies. In most cases, however, its full potential is not being utilized, as it usually serves as an accessory in further describing the selected sample. The bibliometric analysis could, however, provide a profound contribution to the main objectives of systematic reviews and meta-analyses. One such objective is the mapping and delineation of the set of relevant studies. Known as bibliometric information retrieval, bibliometric mapping goes beyond traditional search strategies, based on textual descriptors, and builds on various relations between publications (such as citation, co-citation, co-reference, and co-authorship relations) to retrieve relevant but otherwise latent pieces of the literature.

A less explored but equally valuable contribution of bibliometric analyses we aim to explore is the characterization of the quality, impact, and relevance of papers to assist the improvement of research synthesis by providing further evidence for analysing the scientific quality of knowledge and publication biases. For example, the citation measure of topic publications reveals the scientific impact of specific themes within a specific field of science, providing a fundamental approach to assessing their relevance. In the present study, we set out to explore the feasibility and usefulness of bibliometric analysis as part of a systematic review by applying various bibliometric methods for the multidimensional characterization of included studies’ quality components.

Registration, Search Strategy and Inclusion Criteria

The study was pre-registered in PROSPERO (CRD42022373691). We conducted a systematic literature search in ProQuest, PsycNET, PubMed, Web of Science, and Google Scholar. The following keywords were used: “doping OR anabolic OR prohibited AND prevalence OR estimat* AND model OR response”. The same search was conducted in German in SPORTDiscus, SPONET, BISp-Surf, Scopus, Web of Science, and Google Scholar. The English literature search was conducted by DS and the German literature search by AV.

Additionally, automated searches were conducted by AP in French, Russian, and Spanish. The key inclusion criteria were studies: 1) using indirect (randomized and non-random) estimation models in determining doping prevalence in sport, and 2) published in English, Dutch, German, French, Russian, and Spanish. Ad hoc searches were also conducted in OpenGrey (SIGLE) and reference lists for grey literature and for comprehensiveness assurance. The latest database literature search was conducted in December 2023. We conducted the literature search and selection in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) procedure [11].

Data Extraction and Synthesis

Using a standardized data extraction form, the following data were extracted from the identified studies: author(s) and publication year, model used, sample (size, country, sport, competition level, age range, M ± SD) and data (collection method, year, and whether at a sport event, and the World Anti-Doping Agency [WADA] code was applied) characteristics, estimated doping prevalence, non-compliance assessment, doping definition, prevalence timeframe, and validation, and response rate. See Supplementary Table 1. Using content analysis [12], the first author (DS) conducted the data extraction and selection of articles based on the aforementioned criteria with the last author (AP) providing data from a WADA [13] report.

Quality Assessment

The quality of included studies was assessed using quality/risk of bias assessment criteria for empirical applications of the Crosswise Model and other IEM [14]. Two authors (DS, RC) independently assessed the quality of included studies.

Meta-Analysis

We conducted a meta-analysis to estimate two main prevalences of doping among competitive athletes and recreational sportspersons: lifetime and past year (comprising current, past three months, and past season prevalence estimates).We computed lifetime and past year prevalence estimates for competitive athletes and recreational sportspersons. Here, we excluded studies comprising a mix of competitive and recreational samples. We separated lifetime and past year prevalences in line with the empirical data although, by definition, lifetime use of doping comprises current and past year use. The analysis includes 54 prevalence estimates nested within 33 empirical IEM studies. Due to the hierarchical structure of the data, we adopted the procedure of Lensvelt-Mulders et al. [15], and conducted a weighted multilevel analysis with the 54 doping prevalence estimates at level 1 and the 33 studies at level 2. To normalize the dependent variable we converted the prevalence estimates from the probability scale to the probit scale by computing their z-scores. To account for the heterogeneity of the prevalence estimates, we included prevalence type (lifetime vs. past year) and sample type (competitive vs. recreational) in the multilevel model as predictors.

To account for the heterogeneity in the precision of the prevalence estimates, we weighted each estimate by the inverse of its estimated variance, so that estimates with narrow confidence intervals are weighted more heavily in the analysis than estimates with a wide confidence interval. A special case in this respect are the prevalence estimates obtained with the Cheater Detection Model [16] which yields a lower bound estimate of “honest” doping users and an upper bound of “honest” doping users and “cheaters”. In the analysis we used the midpoint of the lower and upper bound as the point estimate of doping use, and computed the variance as the compound variance of the lower and upper bound plus the variance of the point estimates of “honest” doping users and “cheaters”. The meta-analysis was conducted using R version 4.0.5 [17] with the lme4 [18] and tidyverse [19] packages.

Bibliometric Analysis

The scientific impact of individual papers is measured by their field-normalized citation score (NCS), conveying, for each publication, the ratio of its received citations to the average citation number within its subject category and in the same publication year. In this way, the normalized score is also adjusted for the year of publication, so that papers published along a greater period are comparable. To assess the extent to which the sample constitutes a coherent line of research the local citation network model was applied. The local citation network of the sample is understood as the citation network spanned by the incoming and outgoing citations of sample papers, that is, their citation environment is reduced only to the sample papers. In other words, it consists of the internal citation relations of the sample. To gain support for the coherence of the research line embodied in the citation flow, we refined the analysis of the local citation network hence traditional citation indices lack the capacity to delve into contextual information or the purpose behind a citation.

Citation analysis retains the conventional bibliographical details found in traditional citation indices while augmenting them with additional contextual data, including the citation statement, the surrounding citation context, and the location of the citation within the citing article [20–22]. In addition to examining citations on an individual article basis, citation analyses have also been conducted, taking into account the type of citation, for subsets of articles. We added two features to the model to make it more informative: (1) nodes – outputs have been sorted into eight categories according to the specific IEM employed within the article, and (2) edges – citation links have been categorized according to the role of the citation, i.e. the citation statements represented by the links. The taxonomy of the edges (roles) is simplified into three categories: method, multiple use and other. In the method category the cited article is used for methodological reasons; in multiple use the cited article is used for multiple purposes. The other category covers several roles such as mentioning, contrasting, supporting and comparing. Citation analysis was applied to gain a more comprehensive understanding of how a given paper fits into the broader landscape of scientific literature. Overlap between the authors of the cited and citing articles was also considered.

Based on authorships present in the sample, a special network model can assist to evaluate the selection of papers for their background in terms of research communities. We constructed a so-called co-document network based on authorship. This model is the inverse of the common co-author network model: in our case, the network represents papers and their connections based on shared authors. That is, two papers A and B are connected in the network if A and B share at least one common author. In order to detect the latent community structure in the pattern of connections, we used the Louvain algorithm to uncover coherent subgraphs within the full network [23]. The algorithm used was the Louvain method implemented in the igraph R package [24].

Study Selection

A total of 4946 hits were identified from the database search, and 25 records through an informal search comprising the authors’ sources and reference list checks. Of these, 4786 records remained after removing duplicates with 4520 records deleted after title screening. Through abstract screening of the remaining 266 records, 211 records were excluded. Of the remaining 55 records assessed for eligibility, 23 records were excluded due to duplicate data (k = 20) and absence of IEM (k = 3). Thus, 32 full-text records met our eligibility criteria. Also, 13 additional studies were identified through searches in other languages and experts, yielding a total of 45 studies included in the review. Figure 1 presents results of the literature search and selection process.

Insert Fig. 1 about here

Overview of Outputs

Due to the heterogeneity of the included papers, not all outputs were included in all analyses. These are detailed in Table 1. Although all studies (k = 45) were included in the qualitative synthesis and the co-authorship network analysis, a subset (k = 33) was included in the meta-analysis (for the avoidance of duplicate data analysis), and a smaller subset (k = 22) where outputs were suitable (indexed in Web of Science) was used for the bibliometric analysis.

Table 1

Data structure for the different analyses presented in the systematic review.
Study	Language	First author’s country of affiliation	Sample (competitive vs. recreational)	Overview (qualitative synthesis)	Quality/ bias risk analysis	Bibliometric analysis	Meta-analysis
Abdulrazzaq and Tareq [73]	English	Iraq	Recreational	Ο	Ο	□	Ο
Anti-Doping Agency of Serbia [49]	English	Serbia	Competitive	Ο	□	□	Ο
Backhouse et al. [74]	English	UK	Competitive	Ο	Ο	□	Ο
Balk and Dopheide [33]^a	Dutch	Netherlands	Competitive	Ο	□	□	□
Balk et al. [34]^a	English	Netherlands	Competitive	Ο	Ο	Ο	Ο
Boardley et al. [41]	English	United Kingdom^g	Competitive	Ο	Ο	Ο	Ο
Breuer and Hallmann [50]	German	Germany	Competitive	Ο	□	□	Ο
Christiansen et al. [39]^b	English	Denmark	Mixed, mostly recreational	Ο	□	Ο	□
Cruyff et al. [40]	English	Netherlands^g	Competitive	Ο	Ο	□	Ο
Dietz et al. [42]	English	Germany	Recreational	Ο	Ο	Ο	Ο
Dietz et al. [53]	English	Germany	Competitive	Ο	□	Ο	□
Duiven and de Hon [32]^f	Dutch	Netherlands	Competitive	Ο	Ο	□	Ο
Elbe and Pitsch [44]	English	Germany	Competitive	Ο	Ο	□	Ο
Fincoeur and Pitsch [51]	Dutch	Belgium^g	Competitive	Ο	□	□	Ο
Franke et al. [75]	English	Germany	Competitive	Ο	Ο	Ο	Ο
Frenger et al. [76]	English	Germany	Mixed, mostly recreational	Ο	Ο	Ο	Ο
Heller et al. [77]	English	Germany	Recreational	Ο	□	Ο	□
Heyes [45]	English	United Kingdom	Competitive	Ο	Ο	□	Ο
Hilkens et al. [78]	English	Netherlands	Recreational	Ο	Ο	Ο	Ο
James et al. [43]	English	United Kingdom	Competitive	Ο	Ο	Ο	Ο
Musch and Plessner [30]^d	English	Germany	Competitive	Ο	Ο	□	Ο
Nakhaee et al. [79]	English	Iran	Mixed	Ο	Ο	□	Ο
Nilaweera et al. [80]	English	Sri Lanka	Competitive	Ο	Ο	□	Ο
Petróczi et al. [36]^e,h	English	United Kingdom^g	Competitive	Ο	Ο	Ο	Ο
Pitsch [81]	English	Germany	Mixed	Ο	Ο	□	Ο
Pitsch [38]^b	English	Germany	Mixed, mostly recreational	Ο	Ο	Ο	Ο
Pitsch et al. [29]^c	German	Germany	Competitive	Ο	□	□	□
Pitsch and Emrich [27]^c	English	Germany	Competitive	Ο	Ο	Ο	Ο
Pitsch et al. [28]^b	German	Germany	Competitive	Ο	□	□	□
Pitsch et al. [29]^b	English	Germany	Competitive	Ο	Ο	Ο	Ο
Pitsch et al. [25]^c	German	Germany	Competitive	Ο	□	□	□
Pitsch et al. [25]^c	German	Germany	Competitive	Ο	□	□	□
Plessner and Musch [31]^d	German	Germany	Competitive	Ο	□	□	□
Sayed et al. [46]	English	Netherlands	Mixed, mostly recreational	Ο	Ο	Ο	Ο
Schröter et al. [48]	English	Germany	Recreational	Ο	Ο	Ο	Ο
Seifarth et al. [47]	English	Germany	Recreational	Ο	Ο	Ο	Ο
Simon et al. [56]	English	Germany	Recreational	Ο	Ο	Ο	Ο
Stamm et al. [83]	German	Germany	Competitive	Ο	□	□	□
Striegel [84]	German	Germany	Recreational	Ο	□	□	Ο
Striegel et al. [52]	English	Germany	Recreational	Ο	Ο	Ο	Ο
Stubbe et al. [85]	English	Germany	Recreational	Ο	Ο	Ο	Ο
Ulrich et al. [35]^e	English	Germany	Competitive	Ο	Ο	Ο	Ο
Ulrich et al. [37]^g,h	English	Germany	Competitive	Ο	□	Ο	□
World Anti-Doping Agency [13]	English	United Kingdom^g	Competitive	Ο	□	□	Ο
^a,b,c,d same data, ^e same population, ^f published English summary is available; ^g international collaboration; ^h re-analysis of the same data; Ο: included, □: not included.

Insert Table 1 about here

Among the 45 studies identified, most were published in English (k = 32), followed by German (k = 9), Dutch (k = 3) and Serbian (k = 1). A notable overlap and duplication were found between the English and language versions. Specifically, two studies in German [25, 26] present the same studies and data as in Pitsch and Emrich [27] in English, and also contain data from an earlier study [28] which was also reported in another [29]. In the latter studies, Pitsch et al. [25, 26] explained that they conducted the replication study in response to the criticism about the chosen method in their first study design. Data from an unpublished manuscript [30], shared with permission to include, were presented in a conference abstract [31] in English. Additionally, results from the more recent doping prevalence study in the Netherlands [32, 33] were published in Balk et al. [34]. Studies by Ulrich et al. [35] and Petróczi et al. [36] were conducted in the same settings (and with the same sample for one set of data) but using different IEM, whereas Ulrich et al. [37] presents a re-analysis of the data reported in Petróczi et al. [36]. Lastly, Pitsch [38] and Christiansen et al. [39] report the same study.

Adoption and early-year-applications of IEM to estimate doping prevalence was dominated by authors from Germany (Fig. 2) since 2002. The only other country where researchers demonstrated sustained involvement in study with IEM were the UK and Netherlands in both national and international collaborations.

Insert Fig. 2 about here

Outputs were dominantly research articles (k = 28), followed by book chapters (k = 5), publicly available reports (k = 4), unpublished research report (k = 2), published conference abstracts (k = 2), magazine article (k = 1), PhD thesis (k = 1) and unpublished manuscript (k = 1). Over 60% of the outputs focused on competitive sport at levels ranging from local club to elite international but the duplicate publications inflate this number (Table 1).

Eighty-four unique authors contributed to the literature on doping prevalence estimation with IEM, appearing 165 times (Fig. 2). Among them, only 29 authors (34.5%) contributed with more than one output, and only 12 authors (14.3%) had three outputs or more: Pitsch, W (k = 12), Simon, P (k = 9), Petróczi, A (k = 8), Emrich, E (k = 7), Dietz, P (k = 7), Ulrich, R (k = 7), Striegel, H (k = 5), Cruyff, M (k = 6), De Hon, O (k = 5), Van der Heijden, P (k = 4), Sayed, KH (k = 3) and Frenger, M (k = 3). See supplementary material Table 1 for details. Over time, the number of outputs fluctuated between two and five per year, showing an overall upward trend for both the number of authors and outputs (Fig. 3). There was a more notable increase in the number of outputs.

Insert Fig. 3 about here

Altogether, Figs. 2 and 3 indicate healthy development in this field with modest increase in the number of ‘core’ researchers but a considerable diversity in the number of countries. However, collaboration among the authors who have worked with IEM to estimate doping prevalence offers an intriguing picture (Fig. 4). The 84 unique authors (of which 29 published more than one paper) formed two unconnected clusters of different sizes, and six unconnected studies. The small cluster is a tightly knit group of multiple jointly authored outputs centred around Pitsch, W. The other, large cluster is an amalgamation of three loosely connected groups around Ulrich, R, Cruyff, M, and Petróczi, A, with Petróczi, A serving as a bridge between the other two author groups.

Insert Fig. 4 about here

Study Characteristics

Publication Years and Origin

Of the 33 studies included in the qualitative synthesis and meta-analysis, publication years range from 2002 [30] to 2024 [40]. Similar to the full set of outputs, studies were mostly conducted in Germany (k = 10), followed by the Netherlands (k = 4), and the UK (k = 3), with one study each from Belgium, Denmark, Iran, Iraq, Serbia, Sri Lanka, the USA, and an unspecified countries in a multi-study paper [40]. There were eight international studies [13, 35, 36, 38, 40–43]. See Supplementary Table 1.

Participants

The 33 studies included in the meta-analysis sampled 43121 (specified n: males = 13508, females = 2725) participants. Sample size ranged from 249 [34] to 4629 [13] with a mean of 1347.53 (SD = 1083.19) and were justified by power analysis in six studies [40, 42, 44–47].

Study participants were preponderantly multi-sport (k = 22). Studies also sampled triathlon (k = 3), bodybuilding (k = 2), cycling (k = 2), and track and field (k = 1) athletes as well as gym goers (k = 2), and chess players (k = 1). Participants’ competition levels comprised diverse (e.g., international, regional, national, local, and recreational, k = 8), international (k = 7), national (k = 11), regional (k = 1), local (k = 3), university (k = 1), schoolboy (k = 1), “competitive” (k = 2), and recreational (k = 3). See Supplementary Table 1.

Estimation Models

Majority of studies included in the meta-analysis applied the Unrelated Question (k = 11) and Forced Response (k = 10) models followed by the Single Sample Count (k = 6), the Extended (k = 3) and Crosswise (k = 2) models, Kuk’s Design (k = 3), and the Cheater Detection Model (k = 2). Three outputs each [13, 43, 48] presented results from two different models.

Data

Data were specified as collected between 2003 (k = 2) and 2022 (k = 1) using online surveys (k = 16), paper-and-pencil surveys (k = 11), digital (tablet computer) surveys (k = 2), and online and paper-and-pencil (k = 1), as well as online and tablet computer (k = 1) surveys, and an in-person interview (k = 1). Majority of data (k = 27) was collected at sport events whereas six datasets (k = 6) were not collected at sports events. The WADA code was applied as a definition of doping for data collection in most studies (k = 18) although the definitions varied across the studies with reference to forbidden to prohibited substances. In the other studies (k = 14), doping was operationalized as use of anabolic-androgenic steroids (AAS), SARMs, illicit drugs, prescription drugs or controlled drugs. Doping prevalence was assessed as current (k = 3), past year/season (k = 20), and lifetime (k = 15).

Noncompliance

The average estimated noncompliance in the included studies was 28.81% (SD = 17.44, range: 0.0–64.9). Less than half of the included studies (k = 13) estimated the magnitude of noncompliance in the sample and only one study [40] estimated and corrected for motivated, self-protective noncompliance. In five studies, the possibility of noncompliance was discussed but its magnitude or impact was not quantified. Authors of the remaining studies (k = 15) did not consider the potential impact of noncompliance with survey instructions. Notably, the two studies excluded for duplication of data [37, 39] also consider noncompliance for Forced Response and Single Sample Count models, respectively.

Quality Assessment

Initial inter-reviewer agreement was 71.43% (kappa = -0.07, p = .557) indicating less than chance agreement between the two reviewers (DS, RC). Thus, discrepant evaluations were resolved through discussion with the last author (AP). Most studies (k = 17) were evaluated as of moderate risk/quality, seven studies (k = 7) were evaluated as high risk/low quality, and four studies (k = 4) were assessed as low risk/high quality. Five studies were not included in the quality assessment due to their unavailability in the public domain [13, 49], because they were not published in English [50, 51], and [39] for the avoidance of duplication [38]. Results of the quality assessment are presented in Table 2, as well as Supplementary Table 2 and Supplementary Fig. 1.

Table 2

Summary of results of the quality assessment of studies included in the meta-analysis.
Model	Low quality/high risk	Moderate quality/risk	High quality/low risk
Forced Response (k = 9)	Musch and Plessner [30]; Nilaweera et al. [80]	Elbe and Pitsch [44]; Frenger et al. [76]; Pitsch [81]; Pitsch [38]; Pitsch and Emrich [27]; Pitsch et al. [29]; Stubbe et al. [85]
Kuk’s Design (k = 3)	Balk et al. [34]; Duiven and de Hon [32]	Hilkens et al. [78]
Unrelated Question Model (k = 8)		Boardley et al. [41]; Franke et al. [75]; Heyes [45]; Seifarth et al. [47]; Simon et al. [56]; Striegel et al. [52]	Dietz et al. [42]; Ulrich et al. [35]
Crosswise Model (k = 4)	Abdulrazzaq and Tareq [73]; Nakhaee et al. [79]	Sayed et al. [46]	Cruyff et al. [40]
Single Sample Count (k = 2)	Backhouse et al. [74]		Petróczi et al. [36]
Unrelated Question Model and Forced Response (Cheating Detection) (k = 1)		Schröter et al. [48]
Unrelated Question Model and Single Sample Count (k = 1)		James et al. [43]

Insert Table 2 about here

Patterns of ‘penalty’ scores are presented in Table 3 and indicate areas where improvements can be made to improve study quality or reduce risk of bias. The main factor for reduced study quality/increased risk of bias was low reliability and validity of study instruments (affected 96.4% of the relevant studies). Other key factors were large confidence intervals of prevalence estimates (82.1%), low response rates (78.6%), the absence of or inadequate power (66.1%), non-representative sampling (64.3%), lack of attention to noncompliance (57.1%), lack of or inadequate sampling frame (50.0%), and non-random sampling (50.0%).

Table 3. Patterns of quality and bias assessment scores.

k = 28. Maximum score for each criterion is the number of relevant papers.

Insert Table 3 about here

Meta-Analysis of Doping Prevalence

The overall lifetime prevalence rate of doping for competitive athletes was 22.5% (95% CI: 15.3–31.4, k = 10, observations: n = 11) and 17.2% (95% CI: 10.5–26.0, k = 4, n = 4) for recreational sportspersons, with no significant difference (diff [z-score] = − .19, t = -1.69, p = .100). Similarly, the overall past year prevalence rate of doping for competitive athletes and recreational sportspersons were 14.3% (95% CI: 9.6–20.3, k = 14, observations: n = 30) and 10.3% (95% CI: 6.1–16.3, k = 7, n = 9) respectively, albeit not significantly different (diff [z-score] = − .19, t = -1.69, p = .100). Figure 5 presents the forest plot comparison of the lifetime and past year doping prevalences for competitive athletes and recreational sportspersons.

Insert Fig. 5 about here

Bibliometric Analysis and Impact

Publication Channels and Research Fields

The distribution of the selected papers in terms of publication channels (journals) and research fields are presented in Supplementary Tables 3 and 4 respectively. In both regards, a high diversity is reflected by the individual frequencies (number of papers), especially in the case of journal composition. Of the 22 Web of Science-indexed items, seven were published in two journals (PLOS One and Sports Medicine-Open), while each of the rest (k = 15) were published in different journals. This diversity can be assumed to contribute to the overall quality of the sample (beyond individual journals’ quality, which is reported as the best JIF quartile), as its contents have been subjected to various peer communities behind top-tier outlets. Note that the field composition table indicates the contribution of individual subjects independently, whereas these subjects form combinations for individual papers. Despite the heavily sport specific nature of the research question (i.e., the prevalence of doping), only half of the journals (8 of the 17) were intended for readership in sports.

Scientific Impact of Sample Studies

The Mean Normalized Citation Score (MNCS) of the sample of included studies is 1.48 (range: 0.00–6.26, median: 1.04, 1st quartile: 0.59, 3rd quartile: 2.11). The overall scientific impact of the sample of included studies is above the international average level (MNCS), since both the mean and median values are above 1, representing the international standard (normalized to 1 within the corresponding fields). The median value is due to the skewed distribution and some outliers, representing an almost identical-to-standard level (= 1.04). Even more informative is the distribution of scores along the timeline because, despite the age-normalization of citations, it is still advised to maintain a 3-year citation window for the reliable measurement of impact. Figure 6 presents the overall scientific impact of the sample of included studies. Older outputs (published up to 2019) are mostly around or above the world average (MNCS = 1, black dotted line). Calculation of the mean value without the last three years (up to 2019) and the outlier (NCS = 6.26) yields a fairly high mean citation score (NCS = 1.4, green dotted line).

Insert Fig. 6 about here

Local Citation Network of Sample Studies

The local citation network model was utilized to evaluate the degree of coherence within the sample as a coherent line of research. The most informative feature of the network is that it is composed of a single component. It means that all articles in the sample are directly or indirectly connected to each other through citation relations, leaving no unconnected part for the sample. Beyond the overall connectedness, the observed distances in the network, characterized by network-level measures, are also small. The diameter (d) of the graph is three, meaning that given shortest paths between papers, the longest distance comprises only three steps (i.e., three citation links). Due to the time-bound nature of citation, both incoming and outgoing links are counted. These structural features describe a highly coherent research line, where research in the topic exhibits a continuous awareness of previous outputs. In this case, a certain set of “core” papers are identifiable such as Pitsch et al. [29], Striegel et al. [52], Ulrich et al. [35], or Dietz et al. [53] that serve as the referential bases of more recent publications. To confirm this and gain support for the coherence of the research line embodied in the citation flow, we refined the analysis of the local citation network. This enhancement provided insight into the “semantics” of the information flow beyond its structure, explicating the type of knowledge that is prominently being transferred between the interacting outputs (and authors).

The final local citation network is visualized in Supplementary Figs. 2a and 2b using two different layouts. In both cases, the colouring of nodes refers to the model employed in the paper, and edge colour indicates the type of the citation statement. The first layout (Supplementary Fig. 2a) represents the community structure of the citation network, whereas the latter represents the development of the citation network in time along the vertical axis (Supplementary Fig. 2b). The taxonomy of the edges (roles) is simplified into three categories: method, multiple use and other. In the method category, the cited paper is used for methodological reasons; in multiple use, the cited paper is used for multiple purposes (e.g., mentioned in the introduction, used as method reference and/or used in discussion for context). The ‘other’ category covers several roles such as mentioning, contrasting, supporting and comparing. Though these are distinct and sometimes contradicting roles, our results showed that this category is dominantly present with the mentioning statement (which is an acknowledgement of the other work without direct relevance to the study), while the other three were negligible in our actual network. Hence, the rationale for collating these different roles under a single label. Moreover, we considered method and multiple use as substantial interactions for knowledge transfer, also because the ‘other’ category consisted mainly of mentioning, which can be considered much less of an act of knowledge take-up than the previous forms of citation. This weighting of links with the citation types is expressed in the visualization style as well: method and multiple use type links are solid lines (brown and grey, respectively), while other type links are dotted lines in the graph.

According to the graphs depicted in Supplementary Figs. 2a and 2b, the results firmly show that the research line is coherent and internally connected. The citation flow described in the previous sections from a structural perspective dominantly consists of two types of citation statements: method and multiple use (hereafter referred to as “strong links”). This indicates that there really is a functional relationship between individual research papers, namely, the transfer of the methods, results or other relevant aspects of the previous work in the field. Note that this strong link does not equate to nor should be interpreted blindly as ‘support’, ‘adoption’ or ‘adaptation’ of the method or support for the previous results. The opposite (critiquing, contrasting or refuting) also produces strong links here. In the current study, we did not differentiate between positive and negative strong links but based on the outputs and authors’ collaboration (Supplementary Figs. 2a and 2b) it is fair to assume that strong links within the cluster are positive (method reference, knowledge transfer) whereas strong links between clusters feature a mix of supportive as well as critical views.

Incorporating the information on which IEM is employed in the studies, encoded in node colouring, an even more granular picture emerges. Two models, the Forced Response (FR) model and the Unrelated Question Model (UQM), dominate the sample, and the most frequent types of connections transfer the FR model (FR → FR connection, with n = 7 links), and the UQM model (UQM → UQM connection, with n = 4 links). However, in terms of the number of different connection types, there is an even higher total number of strong links connecting different IEM (basically each pair of models appearing in the sample), suggesting methodological continuity, or at least methodological awareness in this research community. Possibly, due to the methodological complexity of IEM, completely new authors or research teams without formal connection (i.e., co-authorship) to the existing research community is rare (also see Fig. 4).

Authorship

List of authors involved in IEM application to doping prevalence estimation are presented in Supplementary Table 5. The co-document network based on authorship indicates authors’ underlying research communities. The network constructed on the sample is presented in Supplementary Fig. 3. Detected communities (sets of outputs) are color-coded for the ease of interpretation. The Figure shows a clear community structure behind the set of outputs presenting doping prevalence estimations. Specifically, the network is separated into two unconnected communities (components). The smaller one is a fully connected group with a definitive author (Pitsch, W). The big or main component is itself organized into two coherent, but (mutually) loosely connected subgroups, interlinked by a mediator publication [35], or rather an author (Petróczi, A) who bridges the two clusters via her authorship position on Ulrich et al.’s [35] publication. This mirrors the pattern observed in the collaboration network of all authors involved in IEM-based doping prevalence estimation studies (Fig. 4), suggesting that the within-sample citation perhaps is more driven by collaborations (i.e., who authors collaborate with and whose work they follow) and self-citation than scientific content (i.e., what information is needed). Of the 79 citations, 34 (43.0%) feature at least partially overlapping authors.

The other subgroup is a densely connected cluster, mostly represented by Striegel, H and Dietz, P (with Ulrich, R often appearing as co-author). The other subgroup is a more loosely connected community, organized around Petróczi, A, Sayed, KH and Stubbe, JH. It is an important feature of these groups (of outputs) that they show considerable internal variation in publication years, which suggests stable or regular collaboration within the communities.

We examined the prevalence of doping in competitive and recreational sport from IEM through a systematic review with a meta-analysis and bibliometric analysis.

Study Characteristics

Against the rich literature on IEM spanning over half a century [54, 55], application of this method to doping only started around the turn of the millennium [25, 26, 28, 29, 31], with the first full publication appearing in the English literature in 2006 [56]. Our findings indicate limited variability in study origin with the preponderance of studies included in the meta-analysis conducted in European countries. A plausible explanation for our finding that majority of the European studies originated from Germany or were based on German samples is recent trends and focus on IEM methodology [14, 57]. Bibliometric analyses revealed that this trend had primarily been driven by the dominance of two closely-knit but separate research groups in Germany. However, over time, trends depicted in Figs. 2, 3 and 4 show the emergence of new research groups in the UK and the Netherlands. The WADA Doping Prevalence Project between 2017 and 2023, and its focus on survey development [13], also facilitated the observed recent expansion in the number of outputs, authors and diversity in IEM.

Study participants were mostly multi-sport and diverse in competition levels comprising international, regional, national, local, and recreational, university and schools. We however found limited variability in the estimation models with majority of studies included in the meta-analysis applying the Unrelated Question and Forced Response models, which reflect the maturity of these models [54, 71]. Other IEM applied in the studies we reviewed, such as the Single Sample Count and the Crosswise Model have more recent history [14, 54]. The other potential reason for the extensive use of the Unrelated Question and Forced Response models is researcher’s preference. The research group around Ulrich, R applies the Unrelated Question model whereas Pitsch, W and colleagues work with the Forced Response model. Work arose from the WADA Prevalence Project dominantly features the Extended Crosswise Model [40, 46] with limited application of the Single Sample Count [86] in earlier studies.

Our finding that most data were not collected at sport events may be attributed to the bureaucratic and practical exigencies of data collection at sports events [59]. Due to the focus on doping in sports and the sampling of elite athletes in many studies, it is reasonable that the WADA code was applied as a definition of doping for data collection in most studies. On the other hand, the sampling of non-competitive and recreational sportspersons in other studies may explain the application of non-WADA definitions of doping.

Doping Prevalence

Doping is typically detected through biological testing using urine and blood samples producing Adverse Analytical Findings (AAF, a.k.a. positive doping tests) or via longitudinal analysis of selected biomarkers (e.g., Athlete Biological Passport). Due to the clearance rate of the prohibited substances, AAFs can only indicate an incidence that is specific to a substance or group of substances, and bound by a short time window [1, 3, 60], whereas the Athlete Biological Passport is highly sensitive to potential confounding factors [61] and predominantly applied in specific (endurance) sports. In contrast, past year/season and lifetime use of doping substances, particularly for large non-competitive samples, is more amenable to self-report such as surveys and interviews. It is therefore reasonable that most studies included in the meta-analysis assessed past year/season and lifetime doping prevalence.

The estimated lifetime and past year admitted doping prevalence rates and confidence intervals in our study suggest that one of six competitive athletes and recreational sportspersons in our sample of included studies admitted to doping under IEM with mostly overlapping confidence intervals. It is noteworthy that lifetime prevalence is naturally higher than past year/season prevalence due to the former’s wider coverage. Additionally, a plausible interpretation of the absence of significant lifetime and past year prevalence difference between competitive athletes and recreational sportspersons is the value of IEM in protecting respondents thereby facilitating the generation of honest responses [14]. Exploring the differences and advantages of combining both lifetime and past year questions for a single compound variable for prevalence estimate, Sayed et al. [46] proposed the use of multinomial model to estimate the prevalence of past year users more efficiently than the binomial model with a single question, and to create a degree of freedom necessary to test for survey instruction compliance.

Quality of Included Studies and Research Instruments

It appears that our quality assessment of data was more affected negatively by the general prevalence criteria [62] than the novel IEM-specific criteria [14]. Specifically, at least half of the studies received a ‘penalty point’ for representation, sampling frame, random selection, and the validity and reliability of the instrument applied, which comprise four of the ten criteria. In contrast, three of the ten IEM-specific criteria failed by 50% or more of the included studies. These were: statistical power (due to the relatively small sample sizes), noncompliance with survey instruction (which is a known threat to the validity of IEM-generated prevalence rates, and lack of precision, defined by 95% CI being larger than 25% of the prevalence estimates (which is the function of the IEM, the estimated prevalence rate and the sample size). Paradoxically, IEM with higher level of protection and sufficient degrees of freedom to detect and correct noncompliance tend to have larger 95% CIs. Thus, in applications, it is a compromise between validity of the data and precision of the estimation, as well as protection offered to participants.

A considerable segment of the included studies failing the general prevalence criteria [62] raises questions about the suitability of some of these criteria - developed in clinical settings where standardized assessment tools are common - for IEM. Representation, sampling frame and random selection of the participants do not seem to be specifically affected by the IEM format. However, one surprising aspect of the quality assessment is the low score for validity and reliability for all studies but one [79], even for articles otherwise rated as high quality and low risk of bias. Again, this perplexing outcome raises the question of the applicability of the previously used criteria [14] for research instruments using IEM. A more-detailed exploration of IEM validity and reliability is warranted.

Handling of Instruction Noncompliance

Among the studies included in this review, less than half addressed noncompliance with survey instructions. This is concerning because noncompliance presents the biggest threat to the validity and reliability of the IEM instrument, and therefore negatively impacts the quality of the data for prevalence estimation. The rate of assessed noncompliance in our study (28.8 ± 17.4%) is in line with those recorded in the literature where the average rate of noncompliance was estimated at 24.4% with a wide range of 3.7–67.5% [63].

Interpretation and handling noncompliance portrayed a diverse picture. Some authors in our review interpreted this as cheating [48, 51, 85]. Alternatively, some studies [38, 39, 50, 81] reported the proportion of honest ‘no’ responses, and thus leaving the combination of honest yes (admitted doping) and survey noncompliers open to interpretation. Others [27, 44, 76] assumed that survey noncompliance is motivated by self-protective cheating, and thus reported the maximum value of noncompliance as the possible upper limit of the discriminating behaviour, which resonates with a similar interpretation by Ostapczuk et al. [64]. Prevalence estimations using the Single Sample Count (SSC) model [13, 36, 49, 74] reported the estimated noncompliance proportionate to admitted dopers and honest non-dopers.

Several plausible hypotheses can be devised about how dopers and non-dopers might respond to a survey and whether motivated as well as nonmotivated noncompliance is equally present among dopers and non-dopers - e.g., Ulrich et al. [37] but these assumptions, to date, lack empirical evidence. Cruyff et al. [40] addressed self-protective noncompliance based on empirical evidence from a series of studies and literature [65], but noted the lack of a test for inattentive noncompliance in the Crosswise Models. Furthermore, Nepusz et al. [66] proved that the independent model, where noncompliance is assumed to be independent of being guilty, cannot be statistically outperformed by a dependent model that assumes that noncompliance and the guilty attributes are not independent. For example, the subsample of guilty has a higher degree of noncompliance because it combines nonmotivated careless responding with motivated self-protective lying whilst the non-guilty group is only affected by the latter. Unfortunately, the SSC model cannot help with the decision about which assumption describes actual noncompliance better because for every dependent model there is an equally fitting independent model.

The magnitude of noncompliance in this review, as well as in the broader IEM literature [63], highlighted that the weakness of indirect estimation models, and self-reports in general, is the unknown probability of dishonest and inattentive (random) responding - thus the human element. Naturally, attention turned to understanding, comprehension and trust - e.g. [67–69], and self-protective cheating - e.g. [70, 71]. However, as much as providing a safe survey environment addresses respondents’ fear of exposure, it does not necessarily motivate full engagement with the survey. Previous studies also showed that random responding is present in applications of IEM - e.g. [72, 87]. At maximum prevalence of inattentive, random responding (i.e., all respondents answer randomly), the estimated prevalence rate approaches 50%, whereas its impact is negligeable if the proportion of random responding is low [72].

Impact and Relevance of IEM in Estimating Doping Prevalence

Bibliometric analysis offered insight into the impact of published IEM-based studies for doping prevalence through exploring publication and citation patterns. Choices of journals as outlets for the IEM-based doping prevalence studies appear to be influenced by two competing interests and research focus. These are authors’ interest in the methods (i.e., positioning the article as a methodological paper where doping prevalence is only an application to generate empirical proof for the proposed method) as well as authors’ primary interest in the results (i.e., estimating doping prevalence). Juxtaposing the first, last and/or corresponding authors’ subject fields onto the journal selection where the prevalence paper published suggests that the outlet choice for the paper was more driven by ‘familiarity’ of the author with the type of journal(s) than a careful consideration of the audience (who should read about the prevalence of doping). Altogether, the results on the scientific impact of the sample of included studies conveys scientific impact as a quality component of the sample of included studies.

Our finding that the local citation network is composed of a single component suggests that all papers in the sample are directly or indirectly connected to each other through citation relations, so that no unconnected part of the sample exists. These structural features describe a highly coherent research line, where research in the topic is developing based on previous results. In this case, a certain set of “core” papers are identifiable such as Pitsch et al. [29], Striegel et al. [52], and Dietz et al. [53] that serve as the referential basis of more recent publications. On authorship, an important feature of the sample of included studies is that they show a substantial internal variation in publication years, which suggests stable or regular collaborations within the communities with only a few new, unconnected entrants to the field. In sum, the sample is quite coherent regarding authorship patterns as well.

Utility of Bibliometric Analysis in Systematically Reviewing the IEM Approach to Doping Prevalence

A secondary objective of this study was to introduce a bibliometric analysis into the systematic review as bibliometric analysis holds untapped potential to significantly contribute to the primary goals of systematic reviews and meta-analyses. One area where bibliometric analysis adds substantial value is in characterizing the quality, impact, and relevance of reviewed papers. This information enhances research synthesis by providing additional evidence for evaluating the scientific quality of knowledge and identifying publication biases. Bibliometric analysis allows placing the research topic in context for a broader understanding within the research landscape. Conversely, systematic reviews and meta-analyses contribute to bibliometric analysis by providing a structured framework for synthesizing and interpreting findings from a diverse range of studies.

In this study, we employed various bibliometric methods to characterize the multidimensional aspects of the quality components of included studies. The scientific impact of individual papers was assessed through their field-normalized citation score (NCS), providing a measure for each publication. The overall scientific impact of the sample surpasses the international average. Papers published prior to 2019 predominantly fall around or above the global average. These findings characterize scientific impact as a key quality element of the reviewed papers. An examination of the citation measure of the reviewed papers reveals the scientific impact of this specific theme, presenting a fundamental method for evaluating their significance.

The papers included into the review delineate a remarkably cohesive research trajectory, showcasing a consistent awareness of prior findings within the topic. A specific group of “core” papers can be recognized, functioning as the foundational reference for more contemporary publications. Besides, citation analysis was employed to achieve a more comprehensive understanding of how a specific paper fits into the broader landscape of the literature of IEM approach to estimating doping prevalence. Content-based citation analysis represents the evolution of traditional citation analysis, going beyond mere citation frequencies to delve into the semantic aspects of reference information. This includes examining how a reference is cited and how knowledge concepts or domain entities are referenced. Consequently, analysing the content of the included articles, particularly through citation behavior, provided insights into the knowledge development and the "semantics" of the information flow.

This analysis highlighted the functional relationship between individual research papers, specifically the transmission of methods, results, or other relevant aspects from previous work in the field. Within the sample, two models, the Forced Response (FR) and the Unrelated Question Model (UQM), dominate, with the most frequent types of connections transferring the FR and UQM models. These findings suggest methodological continuity or, at the very least, methodological awareness within this research community which seems to hold, to date, against the emergence of new models such as the SSC or the Crosswise Model. The collaboration network offers a plausible explanation for this observation, indicating that the choice for IEM for a specific study might be more based on preference or familiarity with a method than merits (see Supplementary Table 6).

In essence, the integration of bibliometric analysis into systematic reviews and meta-analyses creates a symbiotic relationship, with bibliometrics offering in-depth insights into the scientific landscape and quality of the reviewed studies, while systematic reviews and meta-analyses provide a holistic framework for interpreting and synthesizing this information. By integrating these two approaches, researchers can offer a more nuanced and comprehensive analysis of a research field by combining, depth from systematic reviews with quantitative insights from bibliometrics. Hence, this combined approach enhances the robustness and comprehensiveness of research synthesis efforts, and identifies areas for future research better than any one approach alone.

Strengths, Limitations, and Implications

To our knowledge, the present study is the first systematic review on the prevalence of doping in sport from IEM. The multi-lingual (English, German, Dutch, French, Russian, and Spanish) literature search, inclusion of quality assessment, and combination of meta-analysis, qualitative synthesis, and bibliometric analysis are other strengths of our study. Study limitations include the European and German sample predominance, the use of the average estimate for the Cheater Detection Models, and sample heterogeneity (recreational, competitive, bodybuilders etc.) limiting generalizability to a specific population of sportspersons. It is noteworthy that due to the mode of addressing survey noncompliance in some studies using the Cheater Detection Model, by either combining the estimated admitted prevalence and noncompliance as ‘potential use’ of doping or reporting honest ‘no sayers’, we used the midpoint of the lower and upper bound as the point estimate of doping use of the combined honest users and noncompliers in the meta-analysis. Other limitations include the low interrater reliability (Kappa), albeit resolved through discussion, and the susceptibility of past year and lifetime prevalence estimates to recall bias due to their retrospective nature. It is plausible that the latter is more applicable to recreational athletes than their competitive counterparts due to cognizance of the severe consequences of using prohibited substances in competitive sport.

Results of the quality assessment showing that majority of studies are of moderate quality indicates some weaknesses in the included studies. From the quality assessment, the key factor affecting the quality of included studies is low reliability and validity of study instruments. This is partially explainable by the lack of a well-validated self-report measure of doping, and denotes the importance of developing such measure for empirical research. More importantly, future work is required to define validity and reliability in IEM, and approaches to evidence these important properties for IEM-generated data on doping prevalence. Reassessment of the prevalence studies on data quality, using these new criteria, is also warranted. The results of the quality assessment also indicates that future studies can be improved mainly by, ensuring clear reporting of the parameters (e.g., CI and SE) of prevalence estimates, ensuring high response rates, conducting a priori power analysis and ensuring adequate study power. Moreover, results of the quality assessment suggest that future research can be improved by studying representative samples, ensuring noncompliance assessment, resorting to adequate sampling frames and through the use of random sampling.

In short, our estimated lifetime and past year doping prevalence rates underline the need to intensify effort to address the issue of doping in sport.

With the preponderance of included studies conducted in European countries, particularly Germany, more research is recommended in other regions and countries particularly among samples not from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) contexts. Given the limited variability in the estimation models with majority of studies applying the Unrelated Question and Forced Response models or Single Sample Count, more empirical applications of other IEM such as the Extended Crosswise Model, and Kuk’s Design, and the Cheater Detection Model are encouraged. Future IEM research on doping prevalence in sports are encouraged to navigate bureaucratic and practical obstacles to collect data at sports events. As noted previously, a more-detailed exploration of IEM validity and reliability in as well as recommendations for future studies applying IEM to doping prevalence (e.g., IEM selection, data collection, analysis, and dissemination) is warranted. We endeavour to address this in a separate article, along with recommendations for conducting and reporting IEM-based doping prevalence studies

The novel combination of three approaches – combining systematic review with both meta-analysis and bibliometric analysis – generated a unique, in-depth perspective into the application of IEM to doping prevalence. Although recently emerged new international collaborations have added diversity to the field in terms of authors, countries and IEM applied, the included empirical studies have been preponderantly conducted in Europe, particularly Germany, by two closed groups of researchers which led to the dominance of two particular models. Studies mostly assessed admitted doping for past year/season, and based mainly on the Unrelated Question and Forced Response models with samples largely multi-sport competitors of various levels. Pooling results together, one of six competitive athletes and recreational sportspersons in our sample of included studies admits to doping under IEM with competitive athletes reporting higher lifetime and past year prevalences in comparison to recreational sportspersons.

Most empirical applications of IEM for estimations of doping prevalence studies are of moderate risk/quality vitiated mainly by low reliability and validity of study instruments based on traditional psychometric criteria set for assessing clinical studies. Overall, our findings underpin the need to improve empirical applications of IEM for optimal functioning and to inform anti-doping efforts. To assist this progress, we identified the need for developing recommendations for applying IEM to doping prevalence estimation and reporting results. Furthermore, our study shows that incorporating full bibliometric analysis into systematic reviews and meta-analyses would add novel and important dimensions to comprehensive appraisal of previous literature to inform practical application of the results as well as future research directions.

AAF: Adverse Analytical Findings

FR: Forced Response model

IEM: Indirect Estimation Models

MNCS: Mean Normalized Citation Score

PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses

PROSPERO: International Prospective Register of Systematic Reviews

RRT: Randomised Response Technique

SSC: Single Sample Count model

UQM: Unrelated Question Model

WADA: World Anti-Doping Agency

WEIRD: Western, Educated, Industrialized, Rich, and Democratic

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Availability of Data and Material

The manuscript has data included as electronic supplementary material.

Competing Interests

DS, MC, OH, and AP are members of the World-Anti-Doping Agency’s (WADA) Working Group on Doping Prevalence since 2017. Working Group members receive no salary for their work for WADA but are entitled to expenses covered and to receive indemnity for formal meetings and up to five days per year for preparation.

Authors’ Contributions

AP conceptualized the study. DS, MC, and OH contributed to the study design. AP, AV, and DS conducted the literature search and selection. AP, AK, DS, MC, and SS conducted the statistical analyses and interpreted the results. AP, DS, and RC conducted the risk of bias assessment. AP, AK, DS, and SS drafted the initial version of the manuscript. AP supervised the project. AP, AK, AV, DS, MC, OH, PH, RC, and SS revised and approved the final manuscript.

Acknowledgements

None.

Funding

None.

De Hon O, Kuipers H, Van Bottenburg M. Prevalence of doping use in elite sports: a review of numbers and methods. Sports Med 2015;45:57–69.
Dimeo P, Taylor J. Monitoring drug use in sport: the contrast between official statistics and other evidence. Drugs 2013;20:40–7.
Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 1965;60:63–9.
Gleaves J, Petróczi A, Folkerts D, De Hon O, Macedo E, Saugy M, Cruyff M. Doping prevalence in competitive sport: evidence synthesis with “best practice” recommendations and reporting guidelines from the WADA Working Group on Doping Prevalence. Sports Med 2021;51:1909–34.
Boruch RF. Assuring confidentiality of responses in social research: a note on strategies. Am Sociol 1971;6:308–11.
Kuk AY. Asking sensitive questions indirectly. Biometrika 1990;77:436–8.
Horvitz DG, Shah BV, Simmons WR. The unrelated question randomized response model. Social Stat Sect Proc Am Stat Assoc 1967;65–72.
Greenberg BG, Abul-Ela AL, Simmons WR, Horvitz DG. The unrelated question randomized response model: theoretical framework. J Am Stat Assoc 1969;64:520-39.
Yu JW, Tian G-L, Tang M-L. Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 2008;67:251–63.
de Bellis N. Bibliometrics and citation analysis: from the science citation index to cybermetrics. Lanham, MD: Scarecrow Press; 2009.
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg 2021;88:105906.
Finfgeld-Connett D. Use of content analysis to conduct knowledge-building and theory-generating qualitative systematic reviews. Qual Res 2014;14:341–52.
World Anti-Doping Agency. Doping Prevalence Working Group (Petróczi, A., De Hon, O., Saugy, M., Cruyff, M., Sagoe, D., Gleaves, J.) interim report (Unpublished report). Montreal: Canada; 2022.
Sagoe D, Cruyff M, Spendiff O, Chegeni R, De Hon O, Saugy M, et al. Functionality of the Crosswise Model for assessing sensitive or transgressive behavior: a systematic review and meta-analysis. Front Psychol 2021;12:655592.
Lensvelt-Mulders GJ, Hox JJ, Van der Heijden PG, Maas CJ. Meta-analysis of randomized response research: thirty-five years of validation. Sociol Methods Res 2005;33:319–48.
Clark SJ, Desharnais RA. Honest answers to embarrassing questions: detecting cheating in the randomized response model. Psychol Methods. 1998;3:160–8.
R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2021. Available online at: https:// www.R-project.org/
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed effects models using lme4. J Stat Softw 2015;67:1–48.
Wickham H, Averick M, Bryan J, Chang W, McGowan LDA, François R, et al. Welcome to the tidyverse. J Open Source Softw 2019;4:1686.
Ding Y, Zhang G, Chambers T, Song M, Wang X, Zhai C. Content‐based citation analysis: the next generation of citation analysis. J Assoc Inf Sci Technol 2014;65:1820–1833.
Garfield E. Can citation indexing be automated? In Stevens, ME. Giuliano VE, Heilprin LB, editors. Statistical association methods for mechanized documentation. Symposium proceedings. Washington: National Bureau of Standards; 1964. p. 189–92.
Peroni S, Shotton D. FaBiO and CiTO: ontologies for describing bibliographic resources and citations. J Web Semant 2012;17:33–43.
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008;P10008.
Csardi G, Nepusz T. The igraph software package for complex network research. Int J Complex Syst 2006;1695:1–9.
Pitsch W, Maats P, Emrich E. Zur Häufigkeit des Dopings im deutschen Spitzensport [On the frequency of doping in German elite sport]. Magazin Forschung 2009;15–19.
Pitsch W, Maats P, Emrich E. Zur Häufigkeit des Dopings im deutschen Spitzensport–eine Replikationsstudie [On the frequency of doping in German elite sport–a replication study]. In: Emrich E, Pitsch W, editors. Sport und Doping: zur Analyse einer antagonistischen Symbiose. Frankfurt: Peter Lang; 2009. p. 19–36.
Pitsch W, Emrich E. The frequency of doping in elite sport: results of a replication study. Int Rev Sociol Sport 2012;47:559–80.
Pitsch W, Emrich E, Klein M. Zur Häufigkeit des Dopings im Leistungssport: Ergebnisse eines www-surveys [On the frequency of doping in high-performance sport: results of a www survey]. Leipziger Sportwissenschaftliche Beiträge, 2005;46:63–77.
Pitsch W, Emrich E, Klein M. Doping in elite sports in Germany: results of a www survey. Eur J Sport Soc 2007;4:89–102.
Musch J, Plessner H. A randomized response investigation of the prevalence of doping. 2002. Unpublished manuscript used with authors’ permission.
Plessner H, Musch J. Wie verbreitet ist Doping im Leistungssport? Eine www Umfrage mit Hilfe der Randomized-Response-Technik [How widespread is doping in competitive sports? A www survey using the randomized response technique]. In: Strauß B, editor. Expertise im sport. Cologne: bps. 2002. p. 78–9.
Duiven E., De Hon O. De Nederlandse topsporter en het anti-dopingbeleid 2014 - 2015 [The Dutch elite athlete and anti-doping policy 2014 - 2015. Capelle aan den IJssel: Anti-Doping Authority Netherlands; 2015.
Balk L, Dopheide M. Dopinggebruik in de Nederlandse topsport [Doping use in Dutch elite sport]. Utretcht: Mulier Institute; 2021. Available at: https://www.mulierinstituut.nl/publicaties/25952/doping-in-dutch-elite-sports/
Balk L, Dopheide M, Cruyff M, Duiven E, de Hon O. Doping prevalence and attitudes towards doping in Dutch elite sports. Sci J Sport Perform 2023;2:132–43.
Ulrich R, Pope HG, Cléret L, Petróczi A, Nepusz T, Schaffer J, et al. Doping in two elite athletics competitions assessed by randomized-response surveys. Sports Med 2018;48:211–9.
Petróczi A, Cruyff M, De Hon O, Sagoe D, Saugy MO. Hidden figures: revisiting doping prevalence estimates reported for two major international sport events in Ulrich et al. (2018) in the context of further empirical evidence and the extant literature. Front Sports Act Living 2022;4:1017329.
Ulrich R, Cléret L, Comstock RD, Kanayama G, Simon P, Pope HG Jr. Assessing the prevalence of doping among elite athletes: an analysis of results generated by the Single Sample Count Method versus the Unrelated Question Method. Sports Med Open 2023;9:112. doi: 10.1186/s40798-023-00658-5.
Pitsch W. Doping in recreational sport as a risk management strategy. J Risk Financ Manag 2022;15:574.
Christiansen AV, Frenger M, Chirico A, Pitsch W. Recreational athletes’ use of performance-enhancing substances: results from the first European Randomized Response Technique survey. Sports Med Open 2023;9:1–17.
Cruyff MJ, Sayed KH, Petróczi A, van der Heijden PG. Accounting for self-protective one-sayers in the extended Crosswise Model. J R Stat Soc Ser A Stat Soc 2024;doi: 10.1093/jrsssa/qnae009
Boardley ID, Smith AL, Ntoumanis N, Gucciardi DF, Harris TS. Perceptions of coach doping confrontation efficacy and athlete susceptibility to intentional and inadvertent doping. Scand J Med Sci Sports 2019;29:1647–54.
Dietz P, Ulrich R, Dalaker R, Striegel H, Franke AG, Lieb K, Simon P. Associations between physical and cognitive doping–a cross-sectional study in 2997 triathletes. PLoS ONE 2013;8:e78702.
James RA, Nepusz T, Naughton DP, Petróczi A. A potential inflating effect in estimation models: cautionary evidence from comparing performance enhancing drug and herbal hormonal supplement use estimates. Psychol Sport Exerc 2013;14:84–96.
Elbe AM, Pitsch W. Doping prevalence among Danish elite athletes. Perform Enhanc Health 2018;6:28–32.
Heyes AR. Psychosocial factors facilitating use of performance and cognitive enhancing drugs in sport and education. Birmingham: University of Birmingham; 2021.
Sayed KH, Cruyff MJ, van der Heijden PG, Petróczi A. Refinement of the extended crosswise model with a number sequence randomizer: evidence from three different studies in the UK. PLoS ONE 2022;17:e0279741.
Seifarth S, Dietz P, Disch AC, Engelhardt M, Zwingenberger S. The prevalence of legal performance-enhancing substance use and potential cognitive and or physical doping in German recreational triathletes, assessed via the Randomised Response Technique. Sports 2019;7:241.
Schröter H, Studzinski B, Dietz P, Ulrich R, Striegel H, Simon P. A Comparison of the Cheater Detection and the Unrelated Question models: a randomized response survey on physical and cognitive doping in recreational triathletes. PLoS ONE 2016;11:e0155765.
Anti-Doping Agency of Serbia. Who is your team? The importance of "sport entourage" for sport fellows of Serbia - recommendations to Ministry of Youth and Sports. Belgrade: Anti-Doping Agency of Serbia; 2014.
Breuer C, Hallmann K. Dysfunktionen des spitzensports: doping, match-fixing und gesundheitsgefährdungen aus sicht von bevölkerung und athlete [Dysfunction in elite sport: doping, match fixing and health risks from the perspective of the population and athletes]. Bonn: Bundesinstitut für Sportwissenschaft; 2013. Available at: https://fis.dshs-koeln.de/en/publications/dysfunktionen-des-spitzensports-doping-match-fixing-und-gesundhei
Fincoeur B, Pitsch W. Omgaan met sociale wenselijkheid: Inschatting van de dopingprevalentie aan de hand van de Randomized Response Technique [Dealing with social desirability: estimating doping prevalence using the Randomized Response Technique]. Panopticon 2017;38:376–86.
Striegel H, Ulrich R, Simon P. Randomized response estimates for doping and illicit drug use in elite athletes. Drug Alcohol Depend 2010;106:230–2.
Dietz P, Dalaker R, Letzel S, Ulrich R, Simon P. Analgesics use in competitive triathletes: its relationship to doping and on predicting its usage. J Sports Sci 2016;34:1965–9.
Le TN, Lee SM, Tran PL, Li CS. Randomized response techniques: a systematic review from the pioneering work of Warner (1965) to the present. Mathematics 2023;11:1718.
Nayak TK. A review of rigorous randomized response methods for protecting respondent’s privacy and data confidentiality. Washington, D.C.: U.S. Census Bureau; 2021. Available at: https://www.census.gov/content/dam/Census/library/working-papers/2020/adrm/RRS2020-06.pdf
Simon P, Striegel H, Aust F, Dietz K, Ulrich R. Doping in fitness sports: estimated number of unreported cases and individual probability of doping. Addiction 2006;101:1640–4.
Schnell R, Thomas K. A Meta-analysis of studies on the performance of the Crosswise Model. Sociol Methods Res 2023;52:1493–518.
World Anti-Doping Agency. WADA’s work on prevalence of doping: understanding the effectiveness of anti-doping programs. Montreal: World Anti-Doping Agency; 2022. Available at: https://www.wada-ama.org/en/news/wadas-work-prevalence-doping-understanding-effectiveness-anti-doping-programs
Dimitrovski D, Leković M, Đurađević M. The issue of methodological rigour within the data collection process in tourism and sports studies investigating the economic impact of sporting events. Curr Issues Tour 2023;26:2389–404.
Botrè F, de la Torre X, Donati F, Mazzarino M. Narrowing the gap between the number of athletes who dope and the number of athletes who are caught: scientific advances that increase the efficacy of antidoping tests. Br J Sports Med 2014;48:833–6.
Krumm B, Faiss R. Factors confounding the athlete biological passport: a systematic narrative review. Sports Med Open 2021;7:1–30.
Hoy D, Brooks P, Woolf A, Blyth F, March L, Bain C, et al. Assessing risk of bias in prevalence studies: modification of an existing tool and evidence of interrater agreement. J Clin Epidemiol 2012;65:934–939.
Ibbett H, Dorward LJ, Kohi EM, Jones JP, Sankeni S, Kaduma J, et al. Topic sensitivity still affects honest responding, even when specialized questioning techniques are used. Conserv Sci Pract 2023;5:e12927.
Ostapczuk M, Much J, Moshagen M. Improving self-report measures of medication non-adherence using a cheating detection extension of the randomised-response-technique. Stat Methods Med Res 2011;20:489–503.
Heck DW, Hoffmann A, Moshagen M. Detecting nonadherence without loss in efficiency: a simple extension of the crosswise model. Behav Res Methods 2018;50:1895–1905.
Nepusz T, Petróczi A, Naughton DP, Epton T, Norman P. Estimating the prevalence of socially sensitive behaviors: attributing guilty and innocent noncompliance with the single sample count method. Psychol Methods 2014;19:334–355.
Jerke J, Johann D, Rauhut H, Thomas K. Too sophisticated even for highly educated survey respondents? A qualitative assessment of indirect question formats for sensitive questions. Surv Res Methods 2019;13:319–351.
Landsheer JA, Van Der Heijden P, Van Gils G. Trust and understanding, two psychological aspects of randomized response. Qual Quant 1999;33:1–12.
Meisters J, Hoffmann A, Musch J. Can detailed instructions and comprehension checks increase the validity of crosswise model estimates? PLoS ONE 2020;15:e0235403.
Böckenholt U, Van der Heijden PG. Item randomized-response models for measuring noncompliance: risk-return perceptions, social influences, and self-protective responses. Psychometrika 2007;72:245–62.
Lensvelt-Mulders GJ, Boeije HR. Evaluating compliance with a computer assisted randomized response technique: a qualitative study into the origins of lying and cheating. Comput Hum Behav 2007;23:591–608.
Wolter F, Diekmann A. False positives and the “more-is-better” assumption in sensitive question research: new evidence on the Crosswise Model and the Item Count Technique. Public Opin Q 2021;85:836–63.
Abdulrazzaq Z, Tareq A. The psychosomatic reflection of AAS (androgenic anabolic steroid) usage between bodybuilders in Baghdad Gyms. J ReAtt Ther Dev Divers 2023;6:224–32.
Backhouse S, Whitaker L, McKenna J, Beggs C, Petróczi A, Watkins S, Nunn R. Schoolboy supplement use behaviours and doping vulnerability. Leeds: Leeds Beckett University; 2016. Available at: https://eprints.leedsbeckett.ac.uk/id/eprint/7554/1/SchoolboySupplementUseBehavioursAndDopingVulnerabilityPV-BACKHOUSE.pdf
Franke AG, Dietz P, Ranft K, Balló H, Simon P, Lieb K. The use of pharmacologic cognitive enhancers in competitive chess. Epidemiology 2017;28:e57–8.
Frenger M, Pitsch W, Emrich E. Sport-induced substance use—an empirical study to the extent within a German Sports Association. PloS ONE 2016;11:e0165103.
Heller S, Ulrich R, Simon P, Dietz P. Refined analysis of a cross-sectional doping survey among recreational triathletes: support for the nutritional supplement gateway hypothesis. Front Psychol 2020;11:561013.
Hilkens L, Cruyff M, Woertman L, Benjamins J, Evers C. Social media, body image and resistance training: creating the perfect ‘Me’ with dietary supplements, anabolic steroids and SARM’s. Sports Med Open 2021;7:1–13.
Nakhaee MR, Pakravan F, Nakhaee N. Prevalence of use of anabolic steroids by bodybuilders using three methods in a city of Iran. Addict Health 2013;5:77–82.
Nilaweera A, Nadishani U, Nipunya G, Wijekoon N. 369 Knowledge, attitude and usage of doping drugs among national level athletes in Sri Lanka. Brit J Sports Med 2020;54(Suppl 1):A150.
Pitsch W. Assessing and explaining the doping prevalence in cycling. In: Fincoeur B, Gleaves J, Ohl F, editors. Doping in cycling: interdisciplinary perspectives. London: Routledge; 2018. p 13–30.
Pitsch W, Emrich E, Frenger M. Doping im Breiten- und Freizeitsport. Zur Überprüfung von Hypothesen mittels RRT-gewonnener Daten [Doping in mass and recreational sports. For checking hypotheses using RRT-derived data]. In: Kempf H, Nagel S, Dietl H, editors. Im Schatten der Sportwirtschaft. Schorndorf: Hofmann; 2013. p. 253–64.
Stamm H, Stahlberger M, Gebert A, Lamprecht M, Kamber M, Schweiz A. Supplemente, Medikamente und Doping im Freizeitsport [Supplements, drugs and doping in leisure sport]. Schweizerische Zeitschrift fur Sportmedizin und Sporttraumatologie 2011;59:122–6.
Striegel H. Doping im Breiten- und Freizeitsport [Doping in popular and recreational sports]. In: Vieweg K editor. Akzente des Sportrechts (1st ed.). Berlin: Duncker & Humblot; 2012. p. 31–42.
Stubbe JH, Chorus AM, Frank LE, De Hon O, van der Heijden, PG. Prevalence of use of performance enhancing drugs by fitness centre members. Drug Test Anal 2014;6:434–8.
Petróczi A, Nepusz T, Cross P, Taft H, Shah S, Deshmukh N, et al. New non-randomised model to assess the prevalence of discriminating behaviour: a pilot study on mephedrone. Subst Abuse Treat Prev Policy 2011;6:20. doi:10.1186/1747-597X-6-20
Schnapp P. Sensitive question techniques and careless responding: adjusting the crosswise model for random answers. Methods Data Anal 2019;13:307–20.

SUPPLEMENTARYFILE.docx

Download PDF

Reviewers agreed at journal
26 Apr, 2024
Reviewers invited by journal
21 Mar, 2024
Editor assigned by journal
17 Mar, 2024
First submitted to journal
15 Mar, 2024

You are reading this latest preprint version

Exploring Doping Prevalence in Sport from Indirect Estimation Models: A Systematic Review and Meta-Bibliometric Analysis

Status:

Version 1

Abstract

Background

Methods

Results

Conclusions

Figures

KEY POINTS

BACKGROUND

METHODS

Registration, Search Strategy and Inclusion Criteria

Data Extraction and Synthesis

Quality Assessment

Meta-Analysis

Bibliometric Analysis

RESULTS

Study Selection

Overview of Outputs

Study Characteristics

Publication Years and Origin

Participants

Estimation Models

Data

Noncompliance

Quality Assessment

Meta-Analysis of Doping Prevalence

Bibliometric Analysis and Impact

Publication Channels and Research Fields

Scientific Impact of Sample Studies

Local Citation Network of Sample Studies

Authorship

DISCUSSION

Study Characteristics

Doping Prevalence

Quality of Included Studies and Research Instruments

Handling of Instruction Noncompliance

Impact and Relevance of IEM in Estimating Doping Prevalence

Utility of Bibliometric Analysis in Systematically Reviewing the IEM Approach to Doping Prevalence

Strengths, Limitations, and Implications

CONCLUSIONS

ABBREVIATIONS

DECLARATIONS

REFERENCES

Supplementary Files

Status:

Version 1