Our study analyzed and compared data of patients who would and who would not have SIC on the first sepsis day. Two variants of machine-learning models were developed (the full and the compact models), and could dynamically predict SIC with significantly improved accuracy. The relationships between clinical variables and SIC were analyzed based on model interpretation.
Our study compared the differences in characteristics between SIC and non-SIC groups on the onset of sepsis. As shown in Table 1, SIC patients were significantly younger but had worse physiological status (higher Simplified Acute Physiology Score [SAPS]-II, SOFA, and rate of support treatment) than those who were non-SIC. More types of antibiotics and a lower rate of heparin were administered to the SIC group on the first day. Interestingly, linezolid and vancomycin were administered to a higher rate of SIC patients. This was probably because patients with SIC had more severe infection. On the other hand, the administration of two antibiotics could cause a decrease in platelet and exacerbate clotting abnormalities [27, 28]. Additionally, the SIC group had a significantly higher mortality rate and longer length of hospital/ICU stays than the non-SCI group, consistent with the previous research [13].
Currently, there is a lack of reliable tools for the early prediction of coagulopathy in septic patients. Our study has demonstrated that advanced machine-learning algorithms can predict SIC with high accuracy and excellent AUC. They outperformed conventional Logistic Regression and SIC scores in both internal and external validations. CatBoost, an open-sourced gradient boosting algorithm, has not been widely adopted in critical care research. Gradient boosting is a powerful machine-learning technique that iteratively trains a weak classifier (e.g., decision tree) to fit residuals of previous models. Among these models, CatBoost successfully handles categorical features and takes advantage of dealing with them during training instead of preprocessing time [29]. That means categorial features no longer need to be encoded, and a CatBoost model can be successfully developed based on raw data. Another advantage of the algorithm is that it uses a new schema to calculate leaf values when selecting the tree structure. The schema helps to reduce overfitting, the major problem that constrains the generalization ability of machine-learning models [29].
In this study, we developed two variants of CatBoost models that can identify patients with a high risk of SIC and provide clinical decision-makers with more information. Generally, based on more valuable variables, models have better discrimination but worse clinical usability. Therefore, in our study, two model variants were developed for different application scenarios. The full model predicted SIC based on 88 clinical variables and reached the greatest AUC in this study. In external validation, the full model maintained good discrimination and only had a slight reduction in AUC. However, it is tough to collect 88 variables and apply this model. As a result, the full model is recommended to the hospitals with a well-designed clinical data system. By contrast, the compact model was trained based on 15 selected variables. Under the condition of ensuring necessary accuracy, it achieved practicality as far as possible. As shown in Fig. 5, our models had great and comparable AUC in different patient cohorts, demonstrating that machine-learning models based on big data have good generalization capability. Besides, a website tool was developed to help clinicians to use the compact model in clinical practice. By logging on the website and entering the values of 15 variables, our compact model will give the prediction results, and interpretation of the prediction result will be shown to the user.
By interpreting the full model, it was found that many clinical variables can help to indicate the risk of SIC. In this study, renal function indicators (urine output and creatinine) were important variables next only to coagulopathy profile. As shown in Fig. 3, patients with poorer renal function (less urine output and higher serum creatinine) tended to have a higher risk of SIC. Also, body mass index (BMI), vital signs (heart rate and mean arterial pressure), laboratory tests (such as lactate and white blood cell count), the use of MV and vasopressor, and SAPS-II scores can help assess the risk of SIC. In addition, prediction results can be interpreted in the instance level, as shown in Fig. 6, which makes our model clinically explainable.
Several limitations of this study should be considered. First, only septic adults in critical care were included, whereas hospitalized sepsis cases were not analyzed. Besides, considering the immaturity of the coagulation system in children, especially newborns, more research is needed for SIC in children with sepsis. Second, our models screen out patients with high risks of SIC but do not indicate who will benefit from the anticoagulant therapy. It is still up to clinicians to decide whether to administrate anticoagulant agents. However, the process from sepsis to severe coagulopathy is a continuous condition arising from coagulation disorder. Early and accurate prediction of SIC can provide more time for clinical workers to adjust treatment strategies, and also help to study the potential effect of anticoagulant therapy in early stage. Third, this is a retrospective observational study. Data missing and input errors exist, despite the very high quality of the MIMIC-IV and eICU-CRD databases. Therefore, prospective validation is still needed in the future. Compared with septic shock, for which advances have been made in recent years, giving rise to significant survival improvements, there is still a long way to go for diagnosis and management of sepsis-associated coagulopathy.