Final Data Set
The variables used in the analyses and the number of participants with values for those variables are listed in Supplementary Table 2 and complete descriptive statistics for each quantitative variable are provided in Supplementary Table 3. In addition, the frequencies of different tumor types among those identified as participating in the Phase I trials in our data set are provided in Supplementary Table 4. Overall, males made up 51.1% of our sample, 58.6% participants identified as white, and the ECOG scores ranged from 0 = 11.4% of participants in our study, 1 = 69.4%, 2 = 7.3% and 3 < 1%. The average (standard deviation), median, and mode of the number participants in these trials were 5.42 (9.67), 2.0, and 1.0 respectively. 30 trials had more than 10 participants and 102 had only 1 participant. Figure 1 provides correlations between the candidate continuous prognostic biomarkers with Standard Pearson (below the diagonal) and Spearman non-parametric (above the diagonal).
Initial Trial Index and Tumor Type Analysis
We initially explored the relationships between the trial indices and the outcomes. Table 1 provides the results of a stepwise logistic regression analysis for the two dichotomous outcomes and Supplementary Table 5 has the stepwise regression results of the two continuous outcomes. Table 1 suggests that 5 or 6 of the trials have a stronger association with the outcomes than others, suggesting that some trials may have tested, e.g., inferior therapies relative to others, or enrolled participants that may have been more compromised to begin with and more likely to manifest, e.g., negative outcomes of interest. These analyses also suggest there is potential for heterogeneity in the relationships between the prognostic biomarkers and the outcomes that could possibly be accommodated in random effects models.
The results of the multiple regression and stepwise regression results relating the candidate prognostic factors and the different cancer types to the outcomes are provided in Supplementary Table 6a for the continuous outcomes using linear regression (as well treating time on study as a survival endpoint in a survival analysis), and Supplementary Table 6b for the dichotomous outcomes using logistic regression. These analyses did not account for random effects across the trials. Hemoglobin and albumin both had positive relationships with time on study on, as did whether the trial was focused on renal cell carcinoma, although if the trial was focused on rectal cancer it had a negative relationship based on stepwise multiple regression [37]. For time on therapy ratio, no cancer type, but DOPT score, WBC, and Uric Acid had positive effects. For the survival analysis of time on study, only albumin had a significant, yet negative relationship. Due most likely to multicollinearity, a few other variables had significant relationships with the continuous outcomes from a standard multiple linear regression analysis that included all candidate prognostic factors. For the dichotomous outcomes, many more candidate prognostic variables exhibited associations, with hemoglobin levels and albumin exhibiting negative relationships and WBC and LDH exhibiting positive relationships with both outcomes based on the stepwise multiple logistic analysis. Renal cell carcinoma and non-Hodgkin’s lymphoma trials were negatively associated with both outcomes, while liver cancer trials were positively associated based on stepwise multiple logistic regression.
Machine Learning Model Analysis
Analyses evaluating the performance of the various ML models based on many widely used metrics (see Methods section) are provided in Supplementary Tables 7–9 for each outcome. We emphasize that the performance metrics for the models are based on analyses of the test data sets, not the training data sets. More detailed information including a description of the various performance metrics for each model is provided in the Supplementary Methods section. In particular, to overcome imbalances termed in the number of individuals exhibiting the mortality phenotypes during the trials versus those that did not, we used the ‘Synthetic Minority Over-sampling Technique (SMOTE)’ [40, 41] (see the Supplementary Methods). In addition, the ML methods did not account for random effect variation across the trials, but rather focused on the overall accuracy of the predictions about outcomes. Therefore, individual variable importance was not an emphasis in these analyses. However, Supplementary Fig. 1 provides a graphical representation of the importance or influence of each independent variable on 90-day mortality and Grade 5 toxicity based on the Random Forest (RF) analyses and suggests that albumin levels and WBC and whether the therapy tested was a targeted therapy are more important predictors of response. Figure 2 provides example ROC curve plots derived from the different ML methods for these two categorical outcomes. Note that some techniques produced the same ROC curves, so are overlapping. Ultimately, our use of an aggregated model, the Super Learner, provided some of the best fits (Fig. 2), which makes sense given that it pulls together information from the other models.
Mixed Model Analysis
As noted, we used trial index as a grouping variable in mixed or random effects models to account for variation across the trials (see Methods). Final models only considered significant random intercept effects based on univariate model analysis for the various outcomes. The most significant predictors are described in Table 2a for the continuous outcomes and 2b for the dichotomous outcomes. No random effects survival analysis was pursued for the time on treatment outcome. Table 3 provides the significance levels for those predictors that exhibited significant (p < 0.05) random effects across the outcomes in univariate models as well as exhibiting significant fixed effects from the stepwise multiple linear and logistic regression analyses. Figure 3 provides an example plot of the random effects for lymphocyte count and 90-day mortality assuming a y-intercept random effect and a slope effect in the model as clearly shows the relationship between lymphocyte count and 90-day mortality varies across the trials.
Time on Study. The strongest predictors for time on study include HGB, WBC, lymphocyte count, LDH, albumin level, ECOG2 assessment and Breast cancer (Left panels of Table 2a). WBC, LDH, and ECOG2 assessment, exhibited negative relationships and the others had positive relationships. All but hemoglobin exhibited significant random intercept effects (Table 3, first panel).
Time on Therapy Ratio. DOPT, WBC, albumin, and uric acid, were associated with Time on therapy ratio (Right panel of Table 2a). Only albumin and whether the therapy was targeted had negative effects and only albumin and WBC exhibited random effects (Table 3, 2nd panel).
90-Day Mortality. WBC, albumin, LDH, hemoglobin, BMI, DOPT and lymphocyte count were associated with 90-day mortality (left columns of Table 2b). Albumin, hemoglobin, BMI, DOPT, and lymphocyte count all had negative associations. BMI and DOPT did not exhibit random effects, but all the other predictor variables did (3rd column of Table 3).
Grade 5 Toxicity. Albumin, WBC, hemoglobin, and ECOG2 assessment were associated with grade 5 toxicity with albumin and hemoglobin exhibiting negative associations (right panel of Table 2b). Albumin, WBC and hemoglobin also exhibited random effects (last column of Table 3).
Sensitivity Analysis
We considered the potential influence of the use of imputed data on our results [33]. Supplementary Fig. 2 provides the results and suggests there was no overt difference as a comparison of models using participants with and without imputed values did not reveal any notable and consistent differences based on the various metrics for accuracy. Supplementary Table 10 has the results for the 90-day mortality analyses. Results for the other outcomes are not shown but are available upon request.
Example Predictions
We pursued example predictions based on the fixed effects from the final random effects models (Tables 2a and 2b; see Methods). Supplementary Table 11 provides the values used for the predictions as well as the coefficients used in the linear and logistic regression models. The code to produce the predictions is available from the authors but the use of the coefficients provided in Table 10 should be straightforward.