We trained prediction models for short-term mortality after colorectal cancer surgery based solely on preoperative covariates with excellent discrimination and good calibration. Discrimination in terms of AUROC ranged from 0.871–0.876 and AUPRC from 0.35–0.54 and calibration ranging from calibration-in-the-large 0.98–1.01, calibration slope 1.001–1.02, calibration intercept − 0.06 − 0.05, Brier score 0.04–0.07, and with solid calibration plots as seen in Figs. 5–7. Compared to models based on only age and sex as predictors, the data-driven prediction models showed vastly better performance. Based on the calibration plots, the model slightly underpredicted risks for patients with more than 50% risk of mortality.
All predictors used in the prediction model could be available at a preoperative MDT-conference. The risk factors for short-term postoperative mortality are aligned with the current literature, namely that increasedage, high American Society of Anesthesiology (ASA) score, exploratory procedures, and poor tumor differentiation were risk factors for mortality [13]. We found that predictors such as young age, low World Health Organization Performance Status (WHO PS), low ASA score, and slightly overweight body mass index (BMI) were associated with a lower risk of death during the time at risk.
Designing prediction models targeted for clinical use is not a new phenomenon. The most well-known surgical risk assessment tool is the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) Surgical Risk Calculator. Discriminative accuracy for 30-day mortality showed an AUROC of 0.944 and a Brier score of 0.011 [14], however in validation for colorectal cancer patients performance was somewhat lower with AUROC of 0.86 and a Brier score of 0.018. Comparably Fazio et al. designed a 30-day mortality model after colorectal surgery with an AUROC of 0.801 [15], van der Sluis et al. created the Identification of Risk in Colorectal Surgery score with an AUROC of 0.83 [16], and van den Bosch et al. designed a 30-day mortality model with an AUROC of 0.82 [17]. Generally, most studies do not include as many performance metrics, and a majority show no calibration measures such as calibration-in-the-large, intercept, and calibration slope. Brier score has previously been critizised for not being an optimal measure of performance and calibration in clinical models [18], and other parameters such as calibration-in-the-large are considered essential for external validation [19]. The four studies above all defined the covariates of the model initially, however, our approach to the model was to provide the LASSO model with all available covariates and let it exclude all irrelevant covariates in order to use all covariates, that affected the prediction. This model requires 50 variables in the shape of 114–142 covariates, which are either positive or negative, and this is a large amount of variables to input into the model. However, this issue could be addressed through automation of data retrieval for the models through software interfaces to the electronic health record (EHR). Although having a large number of covariates might be impractical for input purposes, including a large number of variables with a data-driven approach minimizes bias from experts including variables assumed to be important without considering all possible options. On the other hand, including variables without a clinician set boundary may lead to bigger variance of covariates and covariate weights. Fazio et al. had 6 preoperative covariates, van der Sluis et al. used 8 pre-, intra- and postoperative covariates, and the ACS NSQIP surgical risk calculator included 21 preoperative covariates. In comparison, we found 50 weighted preoperative predictors.
We view the use of our model as a tool to estimate mortality risk and tailor different patient treatment trajectories. This is because the current treatment guidelines for colorectal cancer leads some patients to overtreatment and some to undertreatment – both with unnecessarily high risk for the patient. The model should be viewed as a decision-support tool rather than a decision-making tool, where the individual patient risks should be put into context by experienced clinicians and fuel multidisciplinary treatment approaches.
Knowledge about individual risks of mortality shortly after surgery can support the MDT-conferences in making individualized treatment plans, which takes all relevant risk factors into consideration. This personalization of treatment to risk profiles may limit both over- and undertreatment and consequences thereof.
A significant limitation of this study is the lack of external validation, which is essential for testing model generalizability and has been shown to improve clinicians’ trust in the model and its predictions [20]. Also, due to the complexity of the treatment of colorectal cancer and the multitude of different variables in DCCG, some variables may be proxy for outcomes or actions in the patient course, which can lead to multicollinearity [21]. However, this is partly addressed using LASSO Logistic Regression, which considers whether or not multicolleniarity seems to occur between variables and downscales their predictive weight [22].
Strengths of this study include the development of a prediction model based on a large national validated quality assurance health database including more than 95% of all patients with colorectal cancer in Denmark, and that the model only includes preoperative data, making the model available as a clinical decision support tool in an the preoperative setting. The utilization of OMOP-CDM allows for future external validation and enrichment of data from other databases.
In conclusion, we found that designing a short-term postoperative mortality model for outcomes after colorectal surgery using a data-driven approach and utilizing only preoperative covariates is feasible and leads to models with excellent discrimination and good calibration.