In this study, we applied machine-learning algorithms to develop prognostic models for predicting mortality in confirmed cases of COVID-19. All models performed well in overall population. Particularly, prediction performance of the GBDT was superior to LR models in the subgroup of severe COVID-19. Furthermore, we developed a simplified LR-5 models with 5 indices as a convenient tool for clinical doctors that showed an acceptable AUC and accuracy.
The demographic and clinical characteristics of this cohort were representative. Most of the risk factors found in non-survivors have been reported in previous study14–16. The top ten features in the models included LDH, BUN, lymphocyte count, age, SPO2, platelets, CRP, IL-10, HDL-C, and SaO2, most of which have been repeatedly documented in literature 6,17,18. These variables reflected different aspects of the characteristics of COVID-19, for example, the respiratory failure (SpO2 and SaO2), the renal dysfunction (BUN). Notably the indicators of the systemic inflammation (LDH, CRP, IL-10, Platelets) comprised almost half of top ten features. Systemic inflammation has been reported in severe COVID-1919. The cytokine storm may play a crucial role in the development of respiratory failure and consequently organ failure20,21. Higher cytokine level (IL-2R, IL-6, IL-10, and TNF-a) has been found in non-survivor group patients in this study, which was consistent with previous studies21,22. Moreover, one of the top ten features in the machine learning models was IL-10, which is a cytokine with potent anti-inflammatory properties that can induce T cell exhaustion23,24. This might partially contribute to the lymphopenia in severe COVID-19.
The models in this study were derived from real-world data with comprehensive details, thus the selection bias was limited and the results were more representative than other models. All of the three models performed well with AUC of 0.911–0.943 and NPVs exceeded 97%. However, the PPVs were relatively low, which were consistent with all the other prediction models reported in literature. The major reason for this could be the dynamic change of the disease. All the models in this study as well as in the literature were derived from baseline data collected on admission, where highly heterogeneity exited. A dynamic model could have better performance.
Compared with LR models, GBDT performed better in mortality prediction in both full cohort and subgroup of different severity. GBDT is not sensitive to missing data, therefore can serve as a good tool for early detection of potential critical patients and optimize the medical resource allocation. In contrast, LR model has superiority on high-speed calculation and provides result handy for interpretation, which might be more user-friendly in clinics. However, this LR full model included 161 features and the application could be cumbersome for daily clinical practice, especially when the healthcare systems were confronting severe human resource shortage. As a simplified model, the LR-5 model incorporating only 5 common variables with an excellent PPV and satisfying accuracy could be recommended as a simple tool for clinical use.
We also conducted an external validation test using data from Brunei. During 29th Feb and 29th March 2020, a total of 72 confirmed cases of COVID-19 were followed, among whom 2 died (Supplemental Appendix E). Based on LR-5 model, patients’ data of leukomonocyte (%), urea, age and SPO2 were collected for analysis, while data on LDH were unavailable. LDH was then filled using the median value that estimated from training set (median = 239 U/L). As a result, leukomonocyte (%) turned to show the highest AUC (0.917), followed by urea (0.867), age (0.826), and SPO2 (0.704) (data not shown). As a prediction tool, LR-5 model showed a strong ability in death prediction with a very high AUC of 0.97. However, it shall be noted that selection bias due to small sample size could never be eliminated and further external validation study using larger sample size should provide warranty.
There were several limitations in this study. Firstly, we only used 5-fold cross validation rather than external validation due to the lack of external data. Second, only the Chinese patients were included, the generalisability and implementation of these models across different settings and populations remains unknown.
In conclusion, three models were developed in this study. GBDT models performed the best in different severity. LR-5 is a simple tool for routine care.