Characteristics of the included studies
We screened 3,463 phase 3 studies, which assessed survival outcomes as an endpoint. Two hundred-eight studies with 158,250 participants were identified as eligible for the analysis (Fig. 1). 79 (38.0%) trials resulted in FDA approval, and 129 (62.0%) trials did not lead to approval. Table 1 and 2 show the study characteristics and cancer types by FDA approval status. Pharmaceutical companies sponsored most included studies (197 trials, 94.7%). The included studies mainly started after 2005 (195 trials, 93.7%). The mean study duration was 5.6 years, and the average sample size was 761 patients. The dataset included 27 cancer types (19 carcinomas, one sarcoma, seven hematologic malignancies). Approximately half of the trials were conducted in second-line treatment or later (99 trials, 47.6%), and the remaining half were in the first line (99 trials, 47.6%). The most common type of cancer was non-small cell cancer; 41 trials (19.7%), followed by prostate cancer; 27 trials (13.0%), gastric cancer, melanoma, and pancreatic cancer; 14 trials (7%) respectively.
There was no significant difference in the characteristics between the FDA-approved group and non-approved group for sample size, study duration, study sponsorship, cancer types, treatment line, modality of study drug, age of control drug in market, median OS of control drug, primary endpoint, study arm design, fast track designation, and orphan drug designation. On the other hand, the year of study start (p = 0.017), study blind setting (p = 0.029), and breakthrough therapy designation (p < 0.001) showed significant differences between FDA-approved and non-approved drugs.
The funnel plot showed symmetric distribution and no significant publication bias for the FDA-approved drugs (Egger test, p = 0.727; Begg-Mazumdar test, p = 0.403, see Supplementary Fig. S1 online). However, the FDA non-approved drugs exhibited a slightly asymmetrical distribution and showed significance (Egger test, p < 0.001; Begg-Mazumdar test, p < 0.001).
Pooled effect size by subgroup meta-analysis
To explore the difference in average survival benefit by the regulatory status and the study characteristics, we used meta-subgroup analysis based on a random-effects restricted maximum likelihood model (Fig. 2). Overall pooled mean HR for OS in total 208 trials was 0.85 (95% CI: 0.83, 0.87) with 70.12% I2, indicating large amounts of heterogeneity in the dataset. The most clear separation was observed between FDA-approved and non-approved subgroups (p < 0.001). The combined risk reduction of death associated with any condition across all FDA-approved drugs was 29% compared with the control (HR 0.71 [95% CI: 0.69, 0.73]). In contrast, the FDA non-approved drug showed only a 6% risk reduction in OS (HR 0.94 [95% CI: 0.92, 0.96]). Followed by FDA approval status, breakthrough therapy designation(BTD) showed a significant separation (HR 0.70 [95% CI: 0.66, 0.74] with BTD, HR 0.87 [95% CI: 0.85, 0.90] without BTD, p < 0.001). In contrast, fast-track designation and orphan drug designation did not significantly differ in the combined risk reduction of death (p = 0.83 and p = 0.13, respectively). The pooled effects size by study characteristics are detailed in Supplementary Fig. S2 and Fig. S3 online.
Boundary between FDA-approved drugs and non-approved drugs
The histogram in Fig. 3-a shows the distribution of hazard ratio for OS by FDA approval status, which indicates that each group has a distinct peak of the distribution. The FDA-approved group peaked at around HR 0.65 to 0.70 for OS, whereas the FDA-non-approved group peaked at around HR 0.95 to 1.0 for OS. The two groups’ distributions overlap from HR 0.70 to HR 0.90 for OS. The most remarkable overlap was observed in HR 0.80 to 0.85 for OS.
Fig. 3-b shows an overlay of a scatter plot for the actual probability of FDA approval over the HR for OS and a logistic regression curve using HR for OS to predict the probability of FDA approval. The logistic regression model showed a reverse sigmoidal curve with sharp sensitivity of FDA approval to the HR for OS (coefficient: 19.8, p < 0.001).
This model projected the probability of FDA approval at 20% at HR 0.86 and 80% at HR 0.74. Therefore, the range of HR for OS at 0.74 to 0.86 was identified as a boundary between FDA-approved and non-approved drugs based on the predefined criteria. This boundary was consistent with the overlap area observed in the histogram. Drugs in the boundary area showed a significantly higher odds ratio for the incidence that the FDA consults with the ODAC, compared with those outside the boundary area (Odds ratio: 22.3, p = 0.0001, see Supplementary Table S2 and Table S3 online).
The same histogram was depicted for other survival endpoints: extension of median OS, hazard ratio in PFS, and extension of median PFS. However, contrary to HR for OS, no clear separation was observed between FDA-approved and non-approved groups (see Supplementary Fig. S4 online). Similarly, the ROC curve showed the most extensive AUC in the predictive model using HR for OS (AUC: 0.97 [95% CI 0.94 – 0.99]) among the survival endpoints. The extension of median OS (AUC: 0.92 [95%CI 0.89 – 0.96]), hazard ratio for PFS (AUC: 0.81 [95%CI 0.75 – 0.87]), and the extension of median PFS (AUC: 0.72 [95%CI 0.64 – 0.80]) followed respectively in Fig. 4.