ACE item 3b (new): Type of adaptive design used, with details of the pre-planned adaptations and the statistical information informing the adaptations
Explanation – A description of the type of AD indicates the underlying design concepts and the applicable adaptive statistical methods. Although there is an inconsistent use of nomenclature to classify ADs, together with growing related methodology [13], some currently used types of ADs are presented in Box 3. A clear description will also improve the indexing of AD methods and for easy identification during literature reviews.
Specification of pre-planned opportunities for adaptations and their scope is essential to preserve the integrity of AD randomised trials [22] and for regulatory assessments, regardless of whether they were triggered during the trial [14,113,114]. Details of pre-planned adaptations enable readers to assess the appropriateness of statistical methods used to evaluate operating characteristics of the AD (item 7a) and for performing statistical inference (item 12b). Unfortunately, pre-planned adaptations are commonly insufficiently described [124]. Authors are encouraged to explain the scientific rationale for choosing the considered pre-planned adaptations encapsulated under the CONSORT 2010 item “scientific background and explanation of rationale” (item 2a). This rationale should focus on the goals of the considered adaptations in line with the study objectives and hypotheses (item 2b) [112,113,124,150].
Details of pre-planned adaptations with rationale should be documented in accessible study documents for readers to be able to evaluate what was planned and unplanned (e.g., protocol, interim and final SAP or dedicated trial document). Of note, any pre-planned adaptation that modifies eligibility criteria (e.g., in population enrichment ADs [96,151]) should be clearly described.
Adaptive trials use accrued statistical information to make pre-planned adaptation(s) (item 14c) at interim analyses guided by pre-planned decision-making criteria and rules (item 7b). Reporting this statistical information for guiding adaptations and how it is gathered is paramount. Analytical derivations of statistical information guiding pre-planned adaptations using statistcial models or formulae should be described to facilitate reproducibility and interpretation of results. The use of supplementary material or references to published literature is sufficient. For example, sample size re-assessment (SSR) can be performed using different methods with or without knowledge or use of treatment arm allocation [42,45,49,152]. Around 43% (15/35) of regulatory submissions needed further clarifications because of failure to describe how a SSR would be performed [124]. Early stopping of a trial or treatment group for futility can be evaluated based on statistical information to support lack of evidence of benefit that is derived and expressed in several ways. For example, conditional power [57,153–156], predictive power [56,154,157–159], the threshold of the treatment effect, posterior probability of the treatment effect [105], or some form of clinical utility that quantifies the balance between benefits against harms [160,161] or between patient and society perspectives on health outcomes [105].
Box 9. Exemplars on reporting item 3b elements
|
Example 1. Pre-planned adaptations and rationale; inferentially seamless phase 2/3 AD
|
“The adaptive (inferentially) seamless phase II/III design is a novel approach to drug development that combines phases II and III in a single, two-stage study. The design is adaptive in that the wider choice of doses included in stage 1 is narrowed down to the dose(s) of interest to be evaluated in stage 2. The trial is a seamless experience for both investigators and patients in that there is no interruption of ongoing study treatment between the two phases. Combining the dose-finding and confirmatory phases of development into a single, uninterrupted study has the advantages of speed, efficiency and flexibility [15,17] … The primary aim of stage 1 of the study was to determine the risk-benefit of four doses of indacaterol (based on efficacy and safety results in a pre-planned interim analysis) in order to select two doses to carry forward into the second stage of the study” [145]
|
Example 2. Analytical derivation of statistical information to guide adaptations; population enrichment AD with SSR
|
Mehta et al. [104] detail formulae used to calculate the conditional power to guide modification of the sample size or to enrich the patient population at an interim analysis for both cutaneous and non-cutaneous patients (full population) and only cutaneous patients (subpopulation) in the supplementary material. In addition, the authors detail formulae used to derive associated conditional powers and p-values used for decision-making to claim evidence of benefit both at the interim and final analysis (linked to item 12b).
|
Example 3. Pre-planned adaptations; 5-arm 2-stage AD allowing for regimen selection, early stopping for futility and SSR
|
“This randomized, placebo-controlled, double-blind, phase 2/3 trial had a two-stage adaptive design, with selection of the propranolol regimen (dose and duration) at the end of stage 1 (interim analysis) and further evaluation of the selected regimen in stage 2 [69,71]. Pre-specified possible adaptations to be made after the interim analysis, as outlined in the protocol and statistical analysis plan (accessible via journal website), were selection of one or two regimens, sample-size reassessment, and non‑binding stopping for futility.” [103]
|
Example 4. Type of AD; pre-planned adaptations and rationale; Bayesian adaptive-enrichment AD allowing for enrichment and early stopping for futility or efficacy
|
“The DAWN trial was a multicenter, prospective, randomized, open-label trial with a Bayesian adaptive–enrichment design and with blinded assessment of endpoints [12]. The adaptive trial design allowed for a sample size ranging from 150 to 500 patients. During interim analyses, the decision to stop or continue enrolment was based on a pre-specified calculation of the probability that thrombectomy plus standard care would be superior to standard care alone with respect to the first primary endpoint (described in the paper). The enrichment trial design gave us the flexibility to identify whether the benefit of the trial intervention was restricted to a subgroup of patients with relatively small infarct volumes at baseline. The interim analyses, which included patients with available follow-up data at the time of the analysis, were pre-specified to test for the futility, enrichment, and success of the trial.” [105] See supplementary appendix via journal website (from page 39) for details.
|
Example 5. Rationale; type of AD and pre-planned adaptations; information to inform adaptations; information-based GSD
|
“Because little was known about the variability of LVMI changes in CKD during the planning stage, we prospectively implemented an information-based (group sequential) adaptive design that allowed sample size re-estimation when 50% of the data were collected [51,162]” [163]. Pritchett et al. [51] provide details of the pre-planned adaptations and statistical information used to inform SSR and efficacy early stopping.
|
Example 6. Pre-planned adaptation and information for SSR
|
“To reassess the sample size estimate, the protocol specified that a treatment-blinded interim assessment of the standard deviation (SD) about the primary endpoint (change from baseline in total exercise treadmill test duration at trough) would be performed when 231 or one half of the planned completed study patients had been randomized and followed up for 12 weeks. The recalculation of sample size, using only blinded data, was adjusted based on the estimated SD of the primary efficacy parameter (exercise duration at trough) from the aggregate data … [164–166].” [39]
|
CONSORT 2010 item 3b: Important changes to the design or methods after trial commencement (such as eligibility criteria), with reasons
ACE item 3c (modification, renumbered): Important changes to the design or methods after trial commencement (such as eligibility criteria) outside the scope of the pre-planned adaptive design features, with reasons
Explanation – Unplanned changes to certain aspects of design or methods in response to unexpected circumstances that occur during the trial are common and will need to be reported in AD randomised trials, as in fixed designed trials. This may include deviations from pre-planned adaptations and decision rules [15,63], as well as changes to timing and frequency of interim analyses. Traditionally, unplanned changes with explanation have been documented as protocol amendments and reported as discussed in the CONSORT 2010 statement [3,4]. Unplanned changes, depending on what they are and why they were made may introduce bias and compromise trial credibility. Some unplanned changes may render the planned adaptive statistical methods invalid or may complicate interpretation of results [22]. It is therefore essential for authors to detail important changes that occurred outside the scope of the pre-planned adaptations and to explain why deviations from the planned adaptations were necessary. Furthermore, it should be clarified whether unplanned changes were made following access to key trial information (e.g., interim data seen by treatment group or interim results). Such information will help readers assess potential sources of bias and implications for interpretation of results. For ADs, it is essential to distinguish unplanned changes from pre-planned adaptations (item 3b) [167].
Box 10. Exemplars on reporting item 3c elements
|
Example. Inferentially seamless phase 2/3 (5-arm 2-stage) AD allowing for regimen selection, SSR and futility early stopping
|
Although this should ideally have been referenced in the main report, Léauté-Labrèze et al. [103] (on pages 17-18 of supplementary material) summarise important changes to the trial design including an explanation and discussion of implications. These changes include a reduction in the number of patients assigned to the placebo across stages – randomisation was changed from 1:1:1:1:1 to 2:2:2:2:1 (each of the 4 propranolol regimens: placebo) for stage 1 and from 1:1 to 2:1 for stage 2 in favour of the selected regimen; revised complete or nearly complete resolution success rates for certain treatment regimens – as a result, total sample size was revised to 450 (excluding possible SSR); and a slight increase in the number of patients (from 175 to 180) to be recruited for the interim analysis.
|
Section 6. Outcomes
CONSORT 2010 item 6a: Completely define pre-specified primary and secondary outcome measures, including how and when they were assessed
ACE item 6a (modification): Completely define pre-specified primary and secondary outcome measures, including how and when they were assessed. Any other outcome measures used to inform pre-planned adaptations should be described with the rationale
Comment – Authors should also refer to the CONSORT 2010 statement [3,4] for the original text when applying this item.
Explanation – It is paramount to provide a detailed description of pre-specified outcomes used to assess clinical objectives including how and when they were assessed. For operational feasibility, ADs often use outcomes that can be observed quickly and easily to inform pre-planned adaptations (adaptation outcomes). Thus, in some situations, adaptations may be based on early observed outcome(s) [168] that are believed to be informative for the primary outcome even though different from the primary outcome. The adaptation outcome (such as a surrogate, biomarker, or an intermediate outcome) together with the primary outcome influences the adaptation process, operating characteristics of the AD, and interpretation and trustworthiness of trial results. Despite many potential advantages of using early observed outcomes to adapt a trial, they pose additional risks of making misleading inferences if they are unreliable [169]. For example, a potentially beneficial treatment could be wrongly discarded, an ineffective treatment incorrectly declared effective or wrongly carried forward for further testing, or the randomisation updated based on unreliable information.
Authors should therefore clearly describe adaptation outcomes similar to the description of pre-specified primary and secondary outcomes in the CONSORT 2010 statement [3,4]. Authors are encouraged to provide a clinical rationale supporting the use of adaptation outcome that is different to the primary outcome in order to aid the clinical interpretation of results. For example, evidence supporting that the adaptation outcome can provide reliable information on the primary outcome will suffice.
Box 11. Exemplars on reporting item 6a elements
|
Example 1. SSR; description of the adaptation and primary outcomes
|
“The primary endpoint is a composite of survival free of debilitating stroke (modified Rankin score >3) or the need for a pump exchange. The short-term endpoint will be assessed at 6 months and the long-term endpoint at 24 months (primary). Patients who are urgently transplanted due to a device complication before a pre-specified endpoint will be considered study failures. All other transplants or device explants due to myocardial recovery that occur before a pre-specified endpoint will be considered study successes ... The adaptation was based on interim short-term outcome rates.” [170]
|
Example 2. Seamless phase 2/3 Bayesian AD with treatment selection; details of adaptation outcomes
|
“Four efficacy and safety measures were considered important for dose selection based on early phase dulaglutide data: HbA1c, weight, pulse rate and diastolic blood pressure (DBP) [171]. These measures were used to define criteria for dose selection. The selected dulaglutide dose(s) had to have a mean change of ≤+5 beats per minute (bpm) for PR and ≤+2 mmHg for DBP relative to placebo at 26 weeks. In addition, if a dose was weight neutral versus placebo, it had to show HbA1c reduction ≥1.0% and/or be superior to sitagliptin at 52 weeks. If a dose reduced weight relative to placebo ≥ 2.5 kg, then non-inferiority to sitagliptin would be acceptable. A clinical utility index was incorporated in the algorithm to facilitate adaptive randomization and dose selection [160,172] based on the same parameters used to define dose-selection criteria described above (not shown here).” [102]
|
Example 3. Seamless phase 2/3 AD with treatment selection; details of adaptation outcomes
|
“For the dose selection, the joint primary efficacy outcomes were the trough FEV1 on Day 15 (mean of measurements at 23 h 10 min and 23 h 45 min after the morning dose on Day 14) and standardized (average) FEV1 area under the curve (AUC) between 1 and 4 h after the morning dose on Day 14 (FEV1AUC1–4h), for the treatment comparisons detailed below (not shown here).” [145]
|
Example 4. MAMS AD; adaptation rationale (part of item 3b); rationale for adaption outcome different from the primary outcome; description of the adaptation and primary outcomes
|
“This seamless phase 2/3 design starts with several trial arms and uses an intermediate outcome to adaptively focus accrual away from the less encouraging research arms, continuing accrual only with the more active interventions. The definitive primary outcome of the STAMPEDE trial is overall survival (defined as time from randomisation to death from any cause). The intermediate primary outcome is failure-free survival (FFS) defined as the first of: PSA failure (PSA >4 ng/mL and PSA >50% above nadir); local progression; nodal progression; progression of existing metastases or development of new metastases; or death from prostate cancer. FFS is used as a screening method for activity on the assumption that any treatment that shows an advantage in overall survival will probably show an advantage in FFS beforehand, and that a survival advantage is unlikely if an advantage in FFS is not seen. Therefore, FFS can be used to triage treatments that are unlikely to be of sufficient benefit. It is not assumed that FFS is a surrogate for overall survival; an advantage in FFS might not necessarily translate into a survival advantage.” [173]
|
CONSORT 2010 item 6b: Any changes to trial outcomes after the trial commenced, with reasons
ACE item 6b (modification): Any unplanned changes to trial outcomes after the trial commenced, with reasons
Comment – Authors may wish to cross-reference the CONSORT 2010 statement [3,4] for background details.
Explanation – Outcome reporting bias occurs when the selection of outcomes to report is influenced by the nature and direction of results. The prevalence of outcome reporting bias in medical research is well documented: discrepancies between pre-specified outcomes in protocols or registries and those published in reports [12,174–177]; outcomes that portray favourable beneficial effects of treatments and safety profiles being more likely to be reported [178]; some pre-specified primary or secondary outcomes modified or switched after trial commencement [179]. Only 13% (9/67) of trials published had all primary and secondary outcomes completely reported consistent with the protocol or registry. Changes to trial outcomes may also include changes to how outcomes were assessed or measured, when they were assessed, or order of importance to address objectives [177].
Sometimes when planning trials, there is huge uncertainty around the magnitude of treatment effects on potential outcomes viewed acceptable as primary endpoints [37,177]. As a result, although uncommon, a pre-planned adaptation could include the choice of the primary endpoints or hypotheses for assessing the benefit-risk ratio. In such circumstances, the adaptive strategy should be clearly described as a pre-planned adaptation (item 3b). Authors should clearly report any additional changes to outcomes outside the scope of the pre-specified adaptations including an explanation of why such changes occurred in line with the CONSORT 2010 statement. This will enable readers to distinguish pre-planned trial adaptations of outcomes from unplanned changes, thereby allowing them to judge outcome reporting bias.
Box 12. Exemplar on reporting item 6b
|
Example. Bayesian adaptive-enrichment AD; unplanned change from a secondary to a co-primary outcome, rationale, and when it happened
|
“The second primary endpoint was the rate of functional independence (defined as a score of 0, 1, or 2 on the modified Rankin scale) at 90 days. This endpoint was changed from a secondary endpoint to a co-primary endpoint at the request of the Food and Drug Administration at 30 months after the start of the trial, when the trial was still blinded.” [105]
|
Section 7. Sample size and operating characteristics
CONSORT 2010 item 7a: How the sample size was determined
ACE item 7a (modification): How sample size and operating characteristics were determined
Comments – This section heading was modified to reflect additional operating characteristics that may be required for some ADs in addition to the sample size. Items 3b, 7a, 7b and 12b are connected so they should be cross-referenced when reporting.
Explanation – Operating characteristics, which relate to statistical behaviour of a design, should be tailored to address trial objectives and hypotheses, factoring in logistical, ethical and clinical considerations. These may encompass the maximum sample size, expected sample sizes under certain scenarios, probabilities of identifying beneficial treatments if they exist, and probabilities of making false positive claims of evidence [180,181]. Specifically, the predetermined sample size for ADs is influenced, among other things, by:
-
type and scope of adaptations considered (item 3b);
-
decision-making criteria used to inform adaptations (item 7b);
-
criteria for claiming overall evidence (e.g., based on the probability of the treatment effect being above a certain value, targeted treatment effect of interest, and threshold for statistical significance [182,183])
-
timing and frequency of the adaptations (item 7b);
-
type of primary outcome(s) (item 6a) and nuisance parameters (e.g., outcome variance);
-
method for claiming evidence on multiple key hypotheses (part of item 12b);
-
desired operating characteristics (see Box 2), such as statistical power and an acceptable level of making a false positive claim of benefit;
-
adaptive statistical methods used for analysis (item 12b);
-
statistical framework (frequentist or Bayesian) used to design and analyse the trial.
Information that guided estimation of sample size(s) including operating characteristics of the considered AD should be described sufficiently to enable readers to reproduce the sample size calculation. The assumptions made concerning design parameters should be clearly stated and supported with evidence if possible. Any constraints imposed (e.g., due to limited trial population) should be stated. It is good scientific practice to reference the statistical tools used (e.g., statistical software, program, or code) and to describe the use of statistical simulations when relevant (see item 24b discussion).
In a situation where changing the sample size is a pre-planned adaptation (item 3b), report the initial sample sizes (at interims analyses before the expected change in sample size) and the maximum allowable sample size per group and in total if applicable. The planned sample sizes (or expected number of events for time-to-event data) at each interim analysis and final analysis should be reported by treatment group and overall. The timing of interim analyses can be specified as a fraction of information gathered rather than sample size.
Box 13. Exemplars on reporting item 7a elements
|
Example 1. MAMS AD; assumptions and adaptive methods; approach for claiming evidence or informing adaptations; statistical program
|
“The primary response (outcome) from each patient is the difference between the baseline HOMA-IR score and their HOMA-IR score at 24 weeks. The sample size calculation is based on a one-sided type I error of 5% and a power of 90%. If there is no difference between the mean response on any treatment and that on control, then a probability of 0.05 is set for the risk of erroneously ending the study with a recommendation that any treatment be tested further. For the power, we adopt a generalisation of this power requirement to multiple active treatments due to Dunnett [184]. Effect sizes are specified as the percentage chance of a patient on active treatment achieving a greater reduction in HOMA-IR score than a patient on control as this specification does not require knowledge of the common SD, σ. The requirement is that, if a patient on the best active dose has a 65% chance of a better response than a patient on control, while patients on the other two active treatments have a 55% chance of showing a better response than a patient on control, then the best active dose should be recommended for further testing with 90% probability. A 55% chance of achieving a better response on active dose relative to control corresponds to a reduction in mean HOMA-IR score of about a sixth of an SD (0.178σ), while the clinically relevant effect of 65% corresponds to a reduction of about half an SD (0.545σ). The critical values for recommending that a treatment is taken to further testing at the interim and final analyses (2.782 and 2.086) have been chosen to guarantee these properties using a method described by Magirr et al [185], generalising the approach of Whitehead and Jaki [186]. The maximum sample size of this study is 336 evaluable patients (84 per arm), although the use of the interim analysis may change the required sample size. The study will recruit additional patients to account for an anticipated 10% dropout rate (giving a total sample size of 370). An interim analysis will take place once the primary endpoint is available for at least 42 patients on each arm (i.e., total of 168, half of the planned maximum of 336 patients). Sample size calculation was performed using the MAMS package in R [187].” [58]
|
Example 2. 3-arm 2-stage AD with dose selection; group sequential approach; assumptions; adaptation decision-making criteria; stage 1 and 2 sample sizes; use of simulations
|
“Sample size calculations are based on the primary efficacy variable (composite of all-cause death or new MI through day 7), with the following assumptions: an event rate in the control group of 5.0%, based on event rates from the phase II study (24); a relative risk reduction (RRR) of 25%; a binomial 1-sided (α=0.025) superiority test for the comparison of 2 proportions with 88% power; and a 2-stage adaptive design with one interim analysis at the end of stage 1 data (35% information fraction) to select 1 otamixaban dose for continuation of the study at stage 2. Selection of the dose for continuation was based on the composite end point of all-cause death, Myocardial Infarction (MI), thrombotic complication, and the composite of Thrombosis in Myocardial Infarction (TIMI) major bleeding through day 7, with an assumed probability for selecting the “best” dose according to the primary endpoint (r = 0.6), a group sequential approach with futility boundary of relative risk of otamixaban versus UFH plus eptifibatide ≥1.0, and efficacy boundary based on agamma (-10) α spending function [188]. Based on the above assumptions, simulations (part of item 24b, see supplementary material) showed that 13,220 patients (a total of 5625 per group for the 2 remaining arms for the final analysis) are needed for this study.” [189] See Figure 1.
Fig 1 legend: Adapted from Steg et al. [197].
|
CONSORT 2010 item 7b: When applicable, explanation of any interim analyses and stopping guidelines
ACE item 7b (replacement): Pre-planned interim decision-making criteria to guide the trial adaptation process; whether decision-making criteria were binding or non-binding; pre-planned and actual timing and frequency of interim data looks to inform trial adaptations
Comments – This item is a replacement so when reporting, the CONSORT 2010 [3] item 7b content should be ignored. Items 7b and 8b overlap but we intentionally reserved item 8b specifically to enhance complete reporting of ADs with randomisation updates as a pre-planned adaptation. Reporting of these items is also connected to items 3b and 12b.
Explanation – Transparency and complete reporting of pre-planned decision-making criteria (Box 2) and how overall evidence is claimed are essential as they influence operating characteristics of the AD, credibility of the trial, and clinical interpretation of findings [22,32,190].
A key feature of an AD is that interim decisions about the course of the trial are informed by observed interim data (element of item 3b) at one or more interim analyses guided by decision rules describing how and when the proposed adaptations will be activated (pre-planned adaptive decision-making criteria). Decision rules, as defined in Box 2, may include, but are not limited to, rules for making adaptations described in Box 3. Decision rules are often constructed with input of key stakeholders (e.g., clinical investigators, statisticians, patient groups, health economists, and regulators) [191]. For example, statistical methods for formulating early stopping decision rules of a trial or treatment group(s) exist [52,53,192–195].
Decision boundaries (e.g., stopping boundaries), pre-specified limits or parameters used to determine adaptations to be made, and criteria for claiming overall evidence of benefit and/or harm (at interim or final analysis) should be clearly stated. These are influenced by statistical information used to inform adaptations (item 3b). Decision trees or algorithms can aid the representation of complex adaptive decision-making criteria
Allowing for trial adaptations too early in a trial with inadequate information severely undermines robustness of adaptive decision-making criteria and trustworthiness of trial results [196,197]. Furthermore, methods and results can only be reproducible when timing and frequency of interim analyses are adequately described. Therefore, authors should detail when and how often the interim analyses were planned to be implemented. The planned timing can be described in terms of information such as interim sample size or number of events relative to the maximum sample size or maximum number of events, respectively. For example, in circumstances when the pre-planned and actual timing or/and frequency of the interim analyses differ, reports should clearly state what actually happened (item 3c).
Clarification should be made on whether decision rules were binding or non-binding to help assess implications in the case when they are overruled or ignored. For example, when a binding futility boundary is overruled and a trial is continued, this would lead to a type I error inflation. Thus, non-binding decision rules are those that can be overruled without having a negative effect on the control of the type I error rate. Use of non-binding futility boundaries is often advised [56].
Box 14. Exemplars on reporting item 7b elements
|
Example 1. 2-arm 2-stage AD with options for early stopping for futility or superiority and to increase the sample size; binding stopping rules
|
“To calculate the number of patients needed to meet the primary end-point, we expected a 3-year overall survival rate of 25% in the group assigned to preoperative chemotherapy (arm A) (based on two previous trials [198,199]). In comparison, an increase of 10% (up to 35%) was anticipated by preoperative CRT. Using the log-rank test (one-sided at this point) at a significance level of 5%, we calculated to include 197 patients per group to ensure a power of 80%. In the first stage of the planned two-stage adaptive design [200], the study was planned to be continued on the basis of a new calculation of patients needed if the comparison of patient groups will be 0.0233< p1< 0.5. Otherwise, the study may be closed for superiority (p1< 0.0233) or shall be closed for futility (p1≥ 0.5). There was no maximum sample size cap and stopping rules were binding.” [201]. p1 and p2 are p-values derived from independent stage 1 and stage 2 data, respectively. Evidence of benefit will be claimed if the overall two-stage p-value derived from p1 and p2 is ≤0.05.
|
Example 2. Timing and frequency of interim analyses; planned stopping boundaries for superiority and futility
|
Table 3. Stopping boundaries
Interim
Analysis
|
Number of primary outcome events (information fraction)
|
Stopping boundaries
|
Superiority
|
Futility
|
Hazard ratio
|
P-value
|
Hazard ratio
|
P-value
|
1
|
800 (50%)
|
<0.768
|
<0.0002
|
>0.979
|
>0.758
|
2
|
1200 (75%)
|
<0.806
|
<0.0002
|
>0.931
|
>0.216
|
Final
|
1600 (100%)
|
<0.906
|
<0.0500
|
|
|
|
|
|
|
|
|
Adapted from Pocock et al. [202]; primary outcome events are cardiovascular deaths, myocardial infarction, or ischemic stroke.
|
Example 3. Planned timing and frequency of interim analysis; pre-specified dose selection rules for an inferentially seamless phase 2/3 (7-arm 2-stage) AD
|
“The interim analysis was pre-planned for when at least 110 patients per group (770 total) had completed at least 2 weeks of treatment. The dose selection guidelines were based on efficacy and safety. The mean effect of each indacaterol dose versus placebo was judged against pre-set efficacy reference criteria for trough FEV1 and FEV1AUC1–4h. For trough FEV1, the reference efficacy criterion was the highest value of: (a) the difference between tiotropium and placebo, (b) the difference between formoterol and placebo, or (c) 120 mL (regarded as the minimum clinically important difference). For standardized FEV1AUC1–4h, the reference efficacy criterion was the highest value of: (a) the difference between tiotropium and placebo or (b) the difference between formoterol and placebo. If more than one indacaterol dose exceeded both the efficacy criteria, the lowest effective dose plus the next higher dose were to be selected. Data on peak FEV1, % change in FEV1, and FVC were also supplied to the DMC for possible consideration, but these measures were not part of the formal dose selection process and are not presented here. The DMC also took into consideration any safety signals observed in any treatment arm.” [145]
|
Example 4. Timing and frequency of interim analyses; decision-making criteria for population enrichment and sample size increase
|
“Cohort 1 will enrol a total of 120 patients and followed them until 60 PFS events are obtained. At an interim analysis based on the first 40 PFS events, an independent data monitoring committee will compare the conditional power for the full population (CPF) and the conditional power for the cutaneous subpopulation (CPS). The formulae for these conditional powers are given in the supplementary appendix (part of item 3b, example 2, Box 9). (a) If CPF <0.3 and CPS <0.5, the results are in the unfavourable zone; the trial will enrol 70 patients to cohort 2 and follow them until 35 PFS events are obtained (then test effect in the full population). (b) If CPF <0.3 and CPS >0.5, the results are in the enrichment zone; the trial will enrol 160 patients with cutaneous disease (subpopulation) to cohort 2 and follow them until 110 PFS events have been obtained from the combined patients in both cohorts with cutaneous disease only (then test effect only in the cutaneous subpopulation). (c) If 0.3≤ CPF ≤0.95, the results are in the promising zone (so increase sample size); the trial will enrol 220 patients (full population) to cohort 2 and follow them up until 110 PFS events are obtained (then test effect in the full population). (d) If CPF >0.95, the results are in the favourable zone; the trial will enrol 70 patients to cohort 2 and follow them until 35 PFS events are obtained (then test effect in full population)” [104]. See Figure 2 of Mehta et al. [104] for a decision-making tree.
|
Example 5. Bayesian GSD with futility early stopping; frequency and timing of interim analyses; adaptation decision-making criteria; criteria for claiming treatment benefit
|
“We adopted a group-sequential Bayesian design [194] with three stages, of 40 patients each (in total), and two interim analyses after 40 and 80 randomised participants, and a final analysis after a maximum of 120 randomised participants. We decided that the trial should be stopped early if there is a high (posterior) probability (90% or greater) (item 3b details) that the 90-day survival odds ratio (OR) falls below 1 (i.e. REBOA is harmful) at the first or second interim analysis. REBOA will be declared “successful” if the probability that the 90-day survival OR exceeds 1 at the final analysis is 95% or greater.” [203]
|
Example 6. Sample size cap following SSR and treatment selection
|
“… the IDMC may recommend to increase the sample size at the interim analysis if there is less than 80% conditional power (CP) (item 3b details) to demonstrate superiority of the selected regimen (or at least one of the selected regimens in the case where two regimens are chosen) over placebo. The maximum permitted sample size increase will be fixed at 100 additional patients per selected propranolol regimen and 50 additional patients on the placebo arm” [103]. Extracted from supplementary material.
|
Example 7. 2-stage AD; use of non-binding futility boundary
|
“It has to be noted that, to protect the global type 1 error in case the decision was taken to overrule the futility rule, nonbinding boundaries would be used adding a very conservative boundary for efficacy. Overwhelming efficacy would be assessed on the adjudicated primary efficacy endpoint using the gamma (-10) alpha spending function [188] , in comparing the observed one-sided p-value with 0.00004 during this interim analysis. With such a conservative alpha spending function, the global alpha level of the study would be maintained at 0.05 (2-sided).” [148]. Extracted from supplementary material.
|
Section 8. Randomisation (Sequence generation)
CONSORT 2010 item 8b: Type of randomisation; details of any restriction (such as blocking and block size)
ACE item 8b (modification): Type of randomisation; details of any restriction (such as blocking and block size); any changes to the allocation rule after trial adaptation decisions; any pre-planned allocation rule or algorithm to update randomisation with timing and frequency of updates
Comments – In applying this item, the reporting of randomisation aspects before activation of trial adaptations must adhere to CONSORT 2010 items 8a and 8b. This E&E document only addresses additional randomisation aspects that are essential when reporting any AD where the randomisation allocation changes. Note that the contents of extension items 7b and 8b overlap.
Explanation – In AD randomised trials, the allocation ratio(s) may remain fixed throughout or change during the trial as a consequence of pre-planned adaptations (e.g., when modifying randomisation to favour treatments more likely to show benefits, after treatment selection, or introduction of a new arm to an ongoing trial) [76]. Unplanned changes may also change allocation ratios (e.g., after early stopping of a treatment arm due to unforeseeable harms).
This reporting item is particularly important for response-adaptive randomisation (RAR) ADs as several factors influence their efficiency and operating characteristics, which in turn influence the trustworthiness of results and necessitate adequate reporting [13,204–207]. For RAR ADs, authors should therefore detail the pre-planned:
-
burn-in period before activating randomisation updates including period when the control group allocation ratio was fixed. Use of RAR with inadequate or without burn-in period has raised debate on credibility of some trials [206,207];
-
type of randomisation method with allocation ratios per group during the burn-in period as detailed in the updated standard CONSORT item 8b;
-
method or algorithm used to adapt or modify the randomisation allocations after the burn-in period;
-
information used to inform the adaptive randomisation algorithm and how it was derived (item 3b). Specifically, when a Bayesian RAR is used, we encourage authors to provide details of statistical models and rationale for the prior distribution chosen;
-
frequency of updating the allocation ratio (e.g., after accrual of a certain number of participants with outcome data or defined regular time period) and;
-
adaptive decision-making criteria to declare early evidence in favour or against certain treatment arms (part of item 7b).
In addition, any envisaged changes to the allocation ratio as a consequence of other trial adaptations (e.g., early stopping of an arm or addition of a new arm) should be stated.
Box 15. Exemplars on reporting item 8b elements
|
Example 1. Pre-planned changes to allocation ratios as a consequence of treatment selection or/and sample size increase
|
“All new patients recruited after the conclusions of the interim analysis are made, will be randomised in a (2:) 2: 1 ratio to the selected regimen(s) of propranolol or placebo until a total of (100:)100: 50 patients (or more in the case where a sample size increase is recommended) have been randomised over the two stages of the study.” [103] Extracted from supplementary material. (2:) and (100:) are only applicable if the second best regimen is selected at stage 1.
|
Example 2. Bayesian RAR; Pre-planned algorithm to update allocation ratios; frequency of updates (after every participant);no burn-in period; period of a fixed control allocation ratio; information that informed adaptation; decision-making criteria for dropping treatments (part of item 7b)
|
See Additional file 1 as extracted from Giles et al. [73].
|
Example 2. Bayesian RAR; burn-in period; fixed control allocation ratio; details of adaptive randomisation including additional adaptations and decision-making criteria (part of item 7b); derivation of statistical quantities; details of Bayesian models and prior distribution with rationale
|
“ … eligible patients were randomized on day 1 to treatment with placebo or neublastin 50, 150, 400, 800, or 1200 mg/kg, administered by intravenous injection on days 1, 3, and 5. The first 35 patients were randomized in a 2:1:1:1:1:1 ratio to placebo and each of the 5 active doses (randomisation method required) (i.e., 10 patients in the placebo group and 5 for each dose of active treatment). Subsequently, 2 of every 7 enrolled patients were assigned to placebo. Interim data evaluations of pain (AGPI) and pruritus questionnaire data (proportion of patients who reported “the itch is severe enough to cause major problems for me” on an Itch Impact Questionnaire) were used to update the allocation probability according to a Bayesian algorithm for adaptive allocation and to assess efficacy and futility criteria for early stopping of enrolment (Fig. 1, not shown here). Interim evaluations and updates to the allocation probabilities were performed weekly. (. Enrolment was to be stopped early after ≥50 patients had been followed for 4 weeks if either the efficacy criterion (>80% probability that the maximum utility dose reduces the pain score by ≥1.5 points more than the placebo) or the futility criterion (<45% probability that the maximum utility dose reduces pain more than the placebo) was met.” [144] Details of statistical models used including computation of posterior quantities; prior distribution with rationale; generation of the utility function; and weighting of randomisation probabilities are accessible a weblink provided (https://links.lww.com/PAIN/A433).
|
Section 11. Blinding
ACE item 11c (new): Measures to safeguard the confidentiality of interim information and minimise potential operational bias during the trial
Explanation – Preventing or minimising bias is central for robust evaluation of the beneficial and harmful effects of interventions. Analysis of accumulating trial data brings challenges regarding how knowledge or leakage of information, or mere speculation about interim treatment effects, may influence behaviour of key stakeholders involved in the conduct of the trial [22,128,208]. Such behavioural changes may include differential clinical management; reporting of harmful effects; clinical assessment of outcomes; and decision-making to favour one treatment group over the other. Inconsistencies in trial conduct before and after adaptations have wide implications that may affect trial validity and integrity [22]. For example, use of statistical methods that combine data across stages may become questionable or may make overall results uninterpretable. AD randomised trials whose integrity was severely compromised by disclosure of interim results resulted in regulators questioning the credibility of conclusions [209,210]. Most AD randomised trials, 76% (52/68) [117] and 60% (151/251) [125], did not disclose methods to minimise potential operational bias during interim analyses. The seriousness of this potential risk will depend on various trial characteristics and the purpose of having disclosure is to enable readers to judge the risk of potential sources of bias, and thus judge how trustworthy they can assume results to be.
Literature covers processes and procedures which could be considered by researchers to preserve confidentiality of interim results to minimise potential operational bias [46,150,211]. There is no universal approach that suits every situation due to factors such as feasibility; nature of the trial; and available resources and infrastructure. Some authors discuss roles and activities of independent committees in adaptive decision-making processes and control mechanisms for limiting access to interim information [211–213].
Description of the process and procedures put in place to minimise the potential introduction of operational bias related to interim analyses and decision-making to inform adaptations is essential [22,131,211]. Specifically, authors should give consideration to:
-
who recommended or made adaptation decisions? The roles of the sponsor or funder, clinical investigators, and trial monitoring committees (e.g., independent data monitoring committee or dedicated committee for adaptation) in the decision-making process should be clearly stated
-
who had access to interim data and performed interim analyses
-
safeguards which were in place to maintain confidentiality (e.g., how the interim results were communicated and to whom and when)
Box 16. Exemplars on reporting item 11c elements
|
Example 1. Inferentially seamless phase 2/3 AD
|
“The interim analysis was carried out by an independent statistician (from ClinResearch GmbH, Köln, Germany), who was the only person outside the Data Monitoring Committee (DMC) with access to the semi-blinded randomization (sic) codes (treatment groups identified by letters A to G). This statistician functioned independently of the investigators, the sponsor’s clinical trial team members and the team that produced statistical programming for the interim analysis (DATAMAP GmbH, Freiburg, Germany). The independent statistician was responsible for all analyses of efficacy and safety data for the interim analysis. The DMC was given semi-blinded results with treatment groups identified by the letters A to G, with separate decodes sealed in an envelope to be opened for decision-making. The personnel involved in the continuing clinical study were told which two doses had been selected, but study blinding remained in place and the results of the interim analysis were not communicated. No information on the effects of the indacaterol doses (including the two selected) was communicated outside the DMC.” [145]
|
Example 2. Bayesian inferentially seamless phase 2/3 AD with RAR
|
“An independent Data Monitoring Committee (DMC) external to Lilly provided oversight of the implementation of the adaptive algorithm and monitored study safety. The DMC fulfilled this role during the dose-finding portion, and continued monitoring after dose selection until an interim database lock at 52 weeks, at which time the study was unblinded to assess the primary objectives. Sites and patients continued to be blinded to the treatment allocation until the completion of the study. The DMC was not allowed to intervene with the design operations. A Lilly Internal Review Committee (IRC), independent of the study team, would meet if the DMC recommended the study to be modified. The role of the IRC was to make the final decision regarding the DMC’s recommendation. The external Statistical Analysis Center (SAC) performed all interim data analyses for the DMC, evaluated the decision rules and provided the randomization updates for the adaptive algorithm. The DMC chair and the lead SAC statistician reviewed these (interim) reports and were tasked to convene an unscheduled DMC meeting if an issue was identified with the algorithm or the decision point was triggered” [102]
|
Example 3. Inferentially seamless phase 2/3 AD with treatment selection, SSR, and nonbinding futility stopping
|
“Following the interim analysis of the data and the review of initial study hypotheses, the committee (IDMC) chairman will recommend in writing to the sponsor whether none, one or two regimen(s) of propranolol is (are) considered to be the ‘best’ (the most efficacious out of all regimens with a good safety profile) for further study in stage two of the design. The second ‘best’ regimen will only be chosen for further study along with the ‘best’ regimen if the first stage of the study suggests that recruitment in the second stage will be too compromised by the fact that 1 in 3 patients are assigned to placebo. The IDMC will not reveal the exact sample size increase in the recommendation letter in order to avoid potential sources of bias (only the independent statistician, the randomisation team and the IP suppliers will be informed of the actual sample size increase). Any safety concerns will also be raised in the IDMC recommendation letter. The chairman will ensure that the recommendations do not unnecessarily unblind the study. In the case where the sponsor decides to continue the study, the independent statistician will communicate to the randomisation team which regimen(s) is (are) to be carried forward.” [103] Extracted from supplementary material.
|
Section 12. Statistical methods
CONSORT 2010 item 12a: Statistical methods used to compare groups for primary and secondary outcomes
ACE item 12a (modification): Statistical methods used to compare groups for primary and secondary outcomes, and any other outcomes used to make pre-planned adaptations
Comment – This item should be applied with reference to the detailed discussion in the CONSORT 2010 statement [3,4].
Explanation – The CONSORT 2010 statement [3,4] addresses the importance of detailing statistical methods to analyse primary and secondary outcomes at the end of the trial. This ACE modified item extends this to require similar description to be made of statistical methods used for interim analyses. Furthermore, statistical methods used to analyse any other adaptation outcomes (item 6) should be detailed to enhance reproducibility of the adaptation process and results. Authors should focus on complete description of statistical models and aspects of the estimand of interest [214,215] consistent with stated objectives and hypotheses (item 2b) and pre-planned adaptations (item 3b).
For Bayesian ADs, item 12b (paragraph 6) describes similar information that should be reported for Bayesian methods.
Box 17. Exemplars on reporting item 12a elements
|
Example 1. Frequentist AD
|
Authors are referred to the CONSORT 2010 statement [3,4] for examples.
|
Example 2. 2-stage Bayesian biomarker-based AD with RAR
|
In a methods paper, Gu et al. [216] detail Bayesian logistic regression models for evaluating treatment and marker effects at the end of stage 1 and 2 using non-informative normal priors during RAR and futility early stopping decisions. Strategies for variable selection and model building at the end of stage 1 to identify further important biomarkers for use in RAR of stage 2 patients are described (part of item 3b), including a shrinkage prior used for biomarker selection with rationale.
|
ACE item 12b (new): For the implemented adaptive design features, statistical methods used to estimate treatment effects for key endpoints and to make inferences
Comment – Note that items 7a and 12b are connected.
Explanation – A goal of every trial is to provide reliable estimates of the treatment effect for assessing benefits and risks to reach correct conclusions. Several statistical issues may arise when using an AD depending on its type and the scope of adaptations, the adaptive decision-making criteria and whether frequentist or Bayesian methods are used to design and analyse the trial [22]. Conventional estimates of treatment effect based on fixed design methods may be unreliable when applied to ADs (e.g., may exaggerate the patient benefit) [101,217–221]. Precision around the estimated treatment effects may be incorrect (e.g., the width of confidence intervals may be incorrect). Other methods available to summarise the level of evidence in hypothesis testing (e.g., p-values) may give different answers. Some factors and conditions that influence the magnitude of estimation bias have been investigated and there are circumstances when it may not be of concern [217,222–226]. Cameron et al. [227] discuss methodological challenges in performing network meta‑analysis when combining evidence from randomised trials with ADs and fixed designs. Statistical methods for estimating the treatment effect and its precision exist for some ADs [71,228–237] and implementation tools are being developed [86,238–240]. However, these methods are rarely used or reported and the implications are unclear [50,217,241]. Debate and research on inference for some ADs with complex adaptations is ongoing.
In addition to statistical methods for comparing outcomes between groups (item 12a), we specifically encourage authors to clearly describe statistical methods used to estimate measures of treatment effects with associated uncertainty (e.g., confidence or credible intervals) and p-value (when appropriate); referencing relevant literature is sufficient. When conventional or naïve estimates derived from fixed design methods are used, it should be clearly stated. In situations where statistical simulations were used to either explore the extent of bias in estimation of the treatment effects or operating characteristics, it is good practice to mention this and provide supporting evidence (item 24c). For example, some authors quantified bias in the estimation of treatment effects in a specific trial context through simulations [189,242].
ADs tend to increase the risk of making misleading or unjustified claims of treatments effects if traditional methods that ignore trial adaptations are used. In general, this arises when selecting one or more hypothesis test results from a possible list in order to claim evidence of the desired conclusion. For instance, the risks may increase by testing same hypothesis several times (e.g., at interim and final analyses), hypothesis testing of multiple treatment comparisons, selecting an appropriate population from multiple target populations, adapting key outcomes, or a combination of these [22]. A variety of adaptive statistical methods exist for controlling specific operating characteristics of the design (e.g., type I error rate, power) depending on the source of inflation of the false positive error rate [52,62,86,98,200,243–248].
Authors should therefore state operating characteristics of the design that have been controlled and details of statistical methods used. The need for controlling a specific type of operating characteristic (e.g., pairwise or familywise type I error rate) is context dependent (e.g., regulatory considerations, objectives and setting) so clarification is encouraged to help interpretation. How evidence of benefit and/or risk is claimed (part of item 7a) and hypotheses being tested (item 2b) should be clear. In situations where statistical simulations were used, we encourage authors to provide a report, where possible (item 24b).
When data or statistical tests across independent stages are combined to make statistical inference, authors should clearly describe the combination test method (e.g. Fishers combination method, inverse normal method or conditional error function) [200,246,247,249,250] and weights used for each stage (when not obvious). This information is important because different methods and weights may produce results that lead to different conclusions. Bauer and Einfalt [112] found low reporting quality of these methods.
Brard et al [251] found evidence of poor reporting of Bayesian methods. To address this, when a Bayesian AD is used, authors should detail the model used for analysis to estimate the posterior probability distribution, the prior distribution used and rationale for its choice; whether the prior was updated in light of interim data and how; and clarify the stages when the prior information was used (interim or/and final analysis). If an informative prior was used, the source of data to inform this prior should be disclosed where applicable. Of note, part of the Bayesian community argue that it is not principled to control frequentist operating characteristics in Bayesian ADs [252], although these can be computed and presented [22,160,253].
Typically, ADs require quickly observed adaptation outcomes relative to the expected length of the trial. In some ADs, randomised participants who have received the treatment may not have their outcome data available at the interim analysis (referred to as overrunning participants) for various reasons [254]. These delayed responses may pose ethical dilemmas depending on the adaptive decisions taken, present logistical challenges, or diminish the efficiency of the AD depending on their prevalence and the objective of the adaptations [209]. It is therefore useful for readers to understand how overrunning participants were dealt with at interim analyses especially after a terminal adaptation decision (e.g., when a trial or treatment arms were stopped early for efficacy or futility). If outcome data of overrunning participants were collected, a description should be given of how these data were analysed and combined with interim results after the last interim decision was made. Some formal statistical methods to deal with accrued data from overrunning participants have been proposed [255].
Box 18. Exemplars on reporting item 12 elements
|
Example 1. GSD; statistical method for estimating treatment effects
|
“Stagewise ordering was used to compute the unbiased median estimate and confidence limits for the prognosis-group-adjusted hazard rates. [256]” [257]
|
Example 2. Inferentially seamless (4-arm 2-stage) AD with dose selection; statistical methods for controlling operating characteristics
|
“ … the power of the study ranged from 71% to >91% to detect a treatment difference at a one-sided α of 0.025 when the underlying response rate of ≥1 of the crofelemer dose groups exceeded placebo by 20%. The clinical response of 20% was based on an estimated response rate of 55% in crofelemer and 35% in placebo during the 4-week placebo-controlled assessment period … For the primary endpoint, the test for comparing the placebo and treatment arms reflected the fact that data were gathered in an adaptive fashion and controlled for the possibility of an increased Type I error rate. Using the methods of Posch and Bauer, [71] as agreed upon during the special protocol assessment process, a p-value was obtained for comparison of each dose to the placebo arm from the stage I data, and an additional p-value was obtained for comparison of the optimal dose to the placebo arm from the independent data gathered in stage II. For the final primary analysis, the p-values from the first and second stages were combined by the inverse normal weighting combination function, and a closed testing procedure was implemented to test the null hypothesis using the methods of Posch and Bauer, [71] based on the original work of Bauer and Kieser [84]. This closed test controlled the experiment-wise error rate for this 2-stage adaptive design at a one-sided α of 0.025.” [258] Extracted from appendix material.
|
Example 3. 3-arm 2-stage group-sequential AD with treatment selection; combination test method; multiplicity adjustments; statistical method for estimating treatment effects
|
“The proposed closed testing procedure will combine weighted inverse normal combination tests using pre-defined fixed weights, the closed testing principle, [71,259,260] and the Hochberg-adjusted 1-sided P-value on stage 1 data. This testing procedure strongly controls the overall type I error rate at α level (see “Simulations run to assess the type I error rate under several null hypothesis scenarios”). Multiplicity-adjusted flexible repeated 95% 2-sided CIs [225] on the percentage of patients will be calculated for otamixaban dose 1, otamixaban dose 2, and UFH plus eptifibatide. Relative risk and its 95% 2-sided CIs will also be calculated. Point estimates based on the multiplicity-adjusted flexible repeated CIs will be used.” [189] See supplementary material of the paper for details.
|
Example 4. Population-enrichment AD with SSR; criteria for claiming evidence of benefit; methods for controlling familywise type I error; combination test weights
|
Mehta et al. [104] published a methodological paper detailing a family of three hypotheses being tested; use of closure testing principle [260] to control the overall type I error; how evidence is claimed; and analytical derivations of the Simes adjusted p-values [261]. This include the use of combination test approach using pre-defined weights based on accrued information fraction for the ‘full population’ (cutaneous and non-cutaneous patients) and ‘subpopulation’ (cutaneous patients). Analytical derivations were presented for the two cases assuming enrichment occurs at interim analysis and no enrichment after interim analysis. Details are reported in a supplementary file accessible via the journal website.
|
Example 5. Inferentially seamless (7-arm 2-stage) AD with dose selection; use of traditional naïve estimates
|
“Unless otherwise stated, efficacy data are given as least squares means with standard error (SE) or 95% confidence interval (CI).” [83]
|
Example 6. Inferentially seamless phase 2/3 (5-arm 2-stage) AD with dose selection; dealing with overrunning participants
|
“Patients already assigned to an unselected regimen of propranolol by the time that the conclusions of the interim analysis are available, will continue the treatment according to the protocol but efficacy data for these patients will not be included in the primary analysis of primary endpoint.” [103] Extracted from the supplementary material.
|
|