Study selection
The search identified 11,118 articles. After duplicate articles were removed, 10,773 articles were screened, ten articles were identified from the reference lists of eligible articles and two articles were identified through the search of citations of eligible articles. Review of titles and abstracts reduced the number of articles for full review to 169. Review of full text articles resulted in a further 125 exclusions (additional file 3 lists the articles excluded at this point). The main reasons for exclusion after full text review were: the method presented was not original or the original application of a method for the analysis of AEs (33%); there was no comparison group or comparison made (23%); articles were published conference abstracts and therefore were not peer-reviewed and/or lacked sufficient detail to undergo a full review (14%). This left 44 eligible articles for inclusion that proposed 73 individual methods (figure 1).
Characteristics of articles
Articles were predominantly published by authors working in industry (n=20 (45%)), eight (18%) were published by academic authors and four (9%) were published by authors from the public sector. Eight (18%) articles were from an industry/academic collaboration, two (5%) an academic/public sector collaboration, one (2%) an industry/public sector collaboration and one (2%) from an industry/academic/public sector collaboration.
Taxonomy of statistical methods for AE analysis
Due to the number and variety of methods identified, we developed a taxonomy to classify methods. Four groups were identified (figure 2).
1. Visual summary methods
Methods that propose graphical approaches to view single or multiple AEs as the principal analysis method.
2. Hypothesis testing methods
Methods under the frequentist paradigm. These methods set up a testable hypothesis and use evidence against the null hypothesis in terms of p-values based on the data observed in the current trial.
3. Estimation methods
Methods that quantify distributional differences in AEs between treatment groups without a formal test.
4. Methods that provide decision making probabilities
Statistical methods under the Bayesian paradigm. The overarching characteristic of these methods is output of (posterior) predicted probabilities regarding the chance of a predefined threshold of risk being exceeded based on the data observed in the current trial and/or any relevant prior knowledge.
All methods were further sub-divided into whether they were for use on prespecified events, which could be listed in advance as harm outcomes of interest to follow-up and may already be known or suspected to be associated with the intervention, or followed for reasons of caution; or could be applied to emerging (not prespecified) events that are reported and collected during the trial and may be unexpected. Further, we made the distinction between (group) sequential methods (methods to monitor accumulating data from ongoing studies) and methods for final/one analysis (figure 3).
The number of articles and methods identified by type is provided in Table 1. Articles most frequently proposed estimation methods (15 articles proposing 24 methods), followed by hypothesis testing methods (11 articles proposing 16 methods). Ten articles proposed thirteen methods to provide decision-making probabilities and eight articles proposed 20 visual summaries. The majority of articles developed methods for emerging events (35 articles proposing 61 methods) and final/one analysis (34 articles proposing 61 methods). Individual article classifications and brief summaries are presented in Table 2 and articles ranked according to ease of comprehension/implementation are provided in additional file 4.
Table 1: Summary level classifications
|
Taxonomy of methods
|
|
Visual
Articles N=8
[Methods N=20]
|
Hypothesis testing
Articles N=11
[Methods N=16]
|
Estimation
Articles N=15
[Methods N=24]
|
Decision making probabilities
Articles N=10
[Methods N= 13]
|
Classification
|
n (%)
|
n (%)
|
n (%)
|
n (%)
|
Type of event
|
|
|
|
|
|
Prespecified
|
0 (0)
[0 (0)]
|
5 (55.6)
[7 (58.3)]
|
0 (0)
[0 (0)]
|
4 (44.4)
[5 (41.7)]
|
|
Emerging
|
8 (22.9)
[20 (32.8)]
|
6 (17.1)
[9 (14.8)]
|
15 (42.9)
[24 (39.3)]
|
6 (17.1)
[8 (13.1)]
|
Time of analysis
|
|
|
|
|
|
(Group) sequential
|
0 (0)
[0 (0)]
|
5 (50.0)
[6 (50.0)]
|
0 (0)
[0 (0)]
|
5 (50.0)
[6 (50.0)]
|
|
Final/one-analysis
|
8 (23.5)
[20 (32.8)]
|
6 (17.6)
[10 (16.4)]
|
15 (44.1)
[24 (37.5)]
|
5 (14.7)
[7 (11.5)]
|
Summaries of methods by taxonomy
Visual summaries – Emerging events
The review identified eight articles published between 2001 and 2018 that proposed twenty methods to visually summarise harm data, including binary AEs and, continuous laboratory (e.g. blood tests, culture data) and vital signs (e.g. temperature, blood pressure, electrocardiograms) data (additional file 5, table 3).[14, 25-31] The majority of the proposed plots were designed to display summary measures of harm data (n=14) and the remaining plots displayed individual participant data (n=6). None of the plots required the event to be prespecified. Eight of the plots were designed to display multiple binary AEs; an example of one such plot is the volcano plot (figure 4).[25, 32] The remaining plots were proposed to focus on a single event per plot, three of which proposed time-to-event plots and nine proposed plots to analyse emerging, individual, continuous harm outcomes such as laboratory or vital signs data. These plots can aid the identification of any treatment effects and identify outlying observations for further evaluation.
Hypothesis tests - Prespecified outcomes
Five articles published between 2000 and 2012 present seven methods to analyse prespecified harm outcomes under a hypothesis-testing framework (additional file 5, table 4).[33-37] Six of these methods were specifically designed and promoted for sequentially monitoring prespecified harm outcomes. Two of the methods incorporated an alpha-spending function (as originally proposed for efficacy outcomes)[22], two performed likelihood ratio tests, one used conditional power to monitor the futility of establishing safety and one proposed an arbitrary reduction in the traditional significance threshold when monitoring a harm outcome.[33-35, 37] In addition, one method proposed a non-inferiority approach for the final analysis of a prespecified harm outcome.[36]
Hypothesis tests - Emerging
Six articles published between 1990 and 2014 suggest nine methods to perform hypothesis tests to analyse emerging AE data (additional file 5, table 5).[38-43] All of the methods were designed for a final analysis with one method incorporating an alpha-spending function allowing the method to be used to monitor ongoing studies. Methods are suggested for both binary and time-to-event data with several accounting for recurrent events.
Two methods proposed a p-value adjustment to account for multiple hypothesis tests to reduce the false discovery rate (FDR).[42, 43] One article proposed two likelihood ratio statistics to test for differences between treatment groups when incorporating time-to-event and recurrent event data.[41] Three articles adopted multivariate approaches to undertake global likelihood ratio tests to detect differences in the overall AE profile, where the overall profile describes multiple events that are combined for evaluation.[38-40]
Estimation – emerging
Fifteen articles proposed 24 methods published between 1991 and 2016 for emerging events (additional file 5, table 6).[15, 44-57] These estimates reflect different characteristics of harm outcomes such as point estimates for incidence or duration, measures of precision around such estimates, or estimates of the probability of occurrence of events. They rely on subjective comparisons of distributional differences to identify treatment effects.
Point estimates such as the risk difference, risk ratio and odds ratio to compare treatment groups with corresponding confidence intervals (CIs) such as the binomial exact CI (also known as the Clopper-Pearson CI) are a simple approaches for AE analysis.[4, 46] Three articles proposed alternative means to estimate CIs.[45, 51, 53]
Eight articles provided methods to calculate estimates that take into account AE characteristics, such as recurrent events, exposure-time, time-to-event information, and duration, which can help develop a profile of overall AE burden.[15, 44, 47-49, 52, 54, 56, 57] For example, methods such as the mean cumulative function, mean cumulative duration or parametric survival models estimating hazard ratios. Several of these methods incorporated plots that can highlight when differences between treatment groups start to emerge, which would otherwise be masked by single point estimates.
A Bayesian approach was developed to estimate the probability of experiencing different severity grades of each AE, accounting for the AE structure of events within body systems.[50] One article developed a score to indicate if continuous outcomes such as laboratory values were within normal reference ranges and to flag abnormalities.[55]
Decision making probabilities – Prespecified outcomes
Four articles suggested five Bayesian approaches to monitor prespecified harm outcomes (additional file 5, table 7).[58-61] The first paper was published in 1989 but no further research was published in this area until 2012, the last paper was published in 2016. Each of the methods incorporates prior knowledge through a Bayesian framework, outputting posterior probabilities that can be used to guide the decision whether to continue with the study based on the harm outcome.
Each of the methods was designed for use in interim analyses to monitor ongoing studies but could be used for the final analysis without modification. They could be implemented for continuous monitoring (i.e. after each observed event) or in a group sequential manner after several events have occurred. These methods require a prespecified event, an assumption about the prior distribution of this event, a ‘tolerable risk difference’ and an ‘upper threshold probability’ to be set at the outset of the trial.[60] At each analysis, the probability that the ‘tolerable risk difference’ threshold is crossed is calculated and if the predetermined ‘probability threshold’ is crossed then the data indicate a predefined unacceptable harmful effect.
Decision making probabilities – emerging outcomes
Six articles published between 2004 and 2013 proposed eight Bayesian methods to analyse the body of emerging AE data (additional file 5, table 8).[62-67] Each of the methods utilise a Bayesian framework to borrow strength from medically similar events. Berry and Berry were the first, proposing a Bayesian three-level random effects model.[62] The method allows AEs within the same body system to be more alike and information can be borrowed both within and across systems. For example, within a body system a large difference for an event amongst events with much smaller differences will be shrunk toward zero. This work was extended to incorporate person-time adjusted incidence rates using a Poisson model and to allow sequential monitoring.[63, 67] Two alternative approaches were also developed following similar principles. The output from all these models is the posterior probability that the relative measure does not equal zero or that the AE rate is greater on treatment than control.