Many types of biomedical research use time to event data – this is particularly prominent in cancer research which rightly attracts considerable focus given the significant economic and health burden that it inflicts on society throughout the world [1, 2].
Meta-analysis is a tool to combine data from multiple studies in order to increase the precision of an estimation of treatment effect. It is used commonly in clinical practice to combine data from large clinical trials and is as such heralded among the highest levels of evidence in medicine [3, 4]. It also allows for the assessment of heterogeneity – that is, investigation of the source of variation of estimates given in individual studies. Preclinical studies are a pillar in the ‘bench to bedside’ process and are vital in informing the development of novel treatments throughout medicine.
However, translation of treatments from the preclinical to the clinical phase is known to be inefficient [5, 6]. Systematic review and meta-analysis has been adopted as a tool to thoroughly summarise the literature pertaining to a particular area or treatment, and identify pitfalls that might explain translational efficiency: typically, these include limitations in external or internal validity, and publication bias [7–9]. Investigation or explanation of heterogeneity is the key step in meta-analysis of preclinical studies. This is frequently done using meta-regression, although this is likely limited by low sensitivity and therefore type II errors are an issue in result interpretation [10, 11]. Furthermore, collinearity is often present in preclinical survival datasets, so multivariate meta-regression has a theoretical advantage in identifying and accounting for interactions between predictors. However, this is thought to come at the cost of further compromise in sensitivity. Publication bias is an issue almost universally encountered in preclinical meta-analyses – while tools to estimate its effect have been developed, they may underestimate its influence [12, 13].
Analysis of time to event data typically focuses on the generation and comparison of hazard functions – most commonly via log-rank or Cox proportional hazards methods [14]. The hazard function is the most popular assessment method for survival data because it is versatile, allows for the inclusion of censored data (i.e. individuals for whom the event of interest does not happen during the study or those who drop out of the study early), and provides a single metric that represents the risk of event occurrence throughout the observation period [15, 16]. Similarly, hazard ratios (HRs) are deemed the gold standard summary statistic for use in meta-analysis of clinical trials [17]. However, the calculation and pooling of precise HRs for meta-analysis is only possible by either obtaining individual patient data (IPD), or if measures relating to hazard function are reported directly in each included study. The former is impractical because of the time and resources required in obtaining and handling individual patient data, and the second unfeasible because relevant hazard data is only presented in a minority of clinical trial manuscripts [18].
In preclinical studies, the relevant information for use of HRs is rarely included in manuscripts and contacting study authors for induvial animal information cannot be relied on as response rates to direct communication can be inconsistent. Furthermore, preclinical meta-analyses typically consist of a large number of small studies which would compound the issues with IPD meta-analysis discussed above. While there are methods reported to estimate hazard functions from Kaplain-Meier graphs [19], this process is cumbersome when compared with the collection of other outcome data, such as those relating to volume or functional performance scales. Other methods of HR estimation are reported but are generally laborious or require specific programming [20–22].
There are other metrics that can be used to summarise survival data. The simplest, and most intuitive, is the median survival; this is frequently if not universally used when describing survival datasets, for example for clinical trials. It is much simpler to generate accurately than the other methods and can easily be measured from a Kaplain-Meier chart of any study size. The significant limitations, however, are the lack of an inherent measure of spread and a concern that median survival may not accurately represent the entire observation period – for example a difference in long term survivorship may not be accounted using this metric but would detected with hazard-based analyses. Odds ratio or risk ratio-based summary statistics have the same problem [20].
Michiels et al[23] compared meta-analyses of clinical trial data using IPD-derived HR, an odds ratio-based approach (comparable to the Kaplain-Meier HR estimation tool described in [19]), and median survival ratio (MSR) as summary statistic. They found comparable global estimates of effect for each method and on this basis we developed a technique for pooling survival data using median survival ratio, with the number of animals per experiment used for weighting in place of inverse variance [24]. It appears to have given sensical results in preclinical meta-analyses [25–27], especially when compared to prior metrics [28]. Michiels did, however, observe a proportion of MSRs favouring treatment/control oppositely to their counterpart HRs, and on this basis the authors advised against the use of MSR in clinical meta-analysis.
It could be argued that these findings are not relevant to animal data - the authors pooled data from a small number of medium-sized clinical trials (n in the hundreds), where a single MSR figure may be more, or less, representative of the dataset as a whole than for small animal studies. The weightings between studies were relatively constant and study sizes ranged much less than for animal experiments encountered in prior glioma SRMAs (n = 3–30 per group). Heterogeneity in preclinical meta-analyses is generally much higher than that seen in this meta-analysis. Crucially, the authors did not compare performance on investigation of heterogeneity. Thus there is limited applicability of their conclusions to preclinical meta-analysis, where the focus falls on the investigation of heterogeneity rather than the magnitude and precision of efficacy estimates.
In this study we therefore aimed to assess MSR as a summary statistic when compared with gold standard, IPD-derived HR using a simulated dataset. Secondly, we aimed to compare the performance of each summary statistic at meta-analysis in terms of the detection of overall treatment effects, between-study heterogeneity, and effects of covariates at both univariate and multivariate meta-regression. Finally we aimed to assess the impact of a publication bias effect on these simulations.