This descriptive review found that a majority of meta-analyses of drug effects published in high-impact medical journals utilized a variety of ERSs, most commonly the Cochrane Risk of Bias Tool for interventional trials and the Newcastle-Ottawa scale for observational studies, and that systems rating the body of evidence are less commonly utilized.
Notably, the meta-analysis reporting use of AHRQ methods referenced the AHRQ Methods Guide for Effectiveness and Comparative Effectiveness Reviews; this document provides information about the process of evaluating risk of bias, but does not propose its own ERS per se, and did not explicitly state the actual ERS used to assess the risk of bias [14]. The variety of unique ERSs identified confirms there is no well-accepted gold standard system, consistent with statements to this effect by respected organizations commenting on this topic [14]. While most meta-analyses included an ERS to rate quality of individual studies, only 19.1% incorporated an ERS evaluating the body of literature. In addition, inter-journal practices varied, as noted by the disparity in the proportion of meta-analyses using any ERS, which was as low as 53.7% in Lancet, compared to 100% in Annals of Internal Medicine and JAMA.
These findings indicate that adherence to the recommendations for the reporting of several aspects of meta-analytic designs might be suboptimal. For example, the PRISMA statement and Cochrane Handbook recommend that authors specify the assessments of risk of bias for each study, across studies, and at the outcome level. Similarly, MOOSE guidelines suggest less specifically that risk for bias should be discussed [6–8]. Our findings (88.4% of meta-analyses in high-impact medical journals) show variations in use across journals; these findings suggest standards is not consistently meet. Systematic reviewers should have flexibility to choose the tool that best matches their study and literature base.
Other notable findings were that more than 10 meta-analyses used modifications of an ERS. This practice may not be optimal, as authors of the GRADE system have claimed such modifications undermining the goal of promoting a single system in which all readers can be familiar with [25]. Indeed, recent commentaries have questioned the rigor in using modified versions of ERS in meta-analyses, specifically the Newcastle-Ottawa scale [26]. Readers of meta-analyses should therefore be observant for such modifications and consider their effects on estimates of study quality. Additionally, author-defined systems were utilized to evaluate interventional and observational studies in 12 and 2 meta-analyses, respectively. One of these relied on methods in rating study quality based on those from a previous publication, which was not an ERS per se, but evaluated how specific elements of study design biased the estimates of an intervention’s effect [27, 28]. In such instances, authors should be explicit in the methods used to assess the risk for bias to clearly explain the process to readers.
Our findings suggest potential improvements in standards for publication of meta-analyses; it could be beneficial for journals to consider more explicit statements regarding the amount, and type of detail, regarding risk of bias assessments to improve reporting. For example, guidances from journals included in this analysis refer authors to PRISMA, MOOSE, and other relevant guidances in reporting of systematic reviews [29–33]. In addition to the commonly used system developed by GRADE Working Group, others have been developed by the AHRQ, the US Preventive Services Task Force and the Oxford Centre for Evidence-Based Medicine [18–20]. These systems evaluate different domains and incorporate their own processes of translating assessments of a body of literature into clinical recommendations, such as those provided by clinical practice guidelines. However, an assessment of 40 of these systems by the 2002 AHRQ report determined that they are less uniform than those used for assessing individual studies, which may complicate the selection of an appropriate system to rate a body of evidence [12]. Further instructions from journals may help address limitations identified in this study. For example, while certain ERSs will undoubtedly be preferred in different scenarios, journals may consider establishing a preferred ERS for meta-analyses characteristic of the journal’s scope, such as establishing a preferred ERS for interventional and another for observational studies. This may allow for more meaningful comparisons of estimates between meta-analyses published in the same journal, when for example, one meta-analysis evaluates efficacy, and another evaluates safety of the same drug. This task would be more complex if different meta-analyses used various systems to rate the quality of evidence. Journal-specific preference of particular ERSs may cultivate more awareness and familiarity among readers and facilitate application of evidence in practice.
Our analysis has several limitations. Firstly, we only considered a narrow scope of journals, the top five journals in Medicine, General & Internal; this category covers resources on medical specialties such as general medicine, internal medicine, clinical physiology, pain management, military and hospital medicine, whereas, Pharmacology and Toxicology category are not included and covers resources on the discovery and testing of bioactive substances, including animal research, clinical experience, delivery systems, and dispensing of drugs. This category also includes resources on the biochemistry, metabolism, and toxic or adverse effects of drugs. These findings may not be representative of non-medical journals or medical journals with lower impact factors or from specialized practice areas, and Embase search should also have been made. Secondly, the editorial and peer review standards of higher-impact journals in this analysis may have produced findings more reflective of the “best practice” in meta-analysis production. Among these journals, a large number of meta-analyses originated in BMJ, and the requirements of this journal may disproportionately influence our overall findings. Thirdly, we only evaluated meta-analyses of drug effects, and our conclusions are not generalizable to meta-analyses of other interventions. Combined together, these limitations indicate there may be a greater variety of ERS utilization outside journals and interventions considered in this review.
Future research, conducting an overview of systematic reviews would be needed. The 40 fields of the protocol should be prospectively registered on PROSPERO, an international prospective database of registered systematic reviews, developed and managed by the Centre for Reviews and Dissemination (CRD) at the University of York.
Our review represents, to the authors’ knowledge, the first description of the frequency of use of ERSs in the medical literature. Further research should address these findings to develop a general framework for best practices in this field.