Characteristics of the study population
There were 96 patients from China in the training dataset and 43 patients from Iran in the test dataset. The mean age of patients in the training and test datasets were 63.47 and 63.37 years, respectively. The patients in the two datasets differ in several characteristics at the time of admission (Table 1). In total, 49 (51%) male patients in the training and 30 (69.8%) male patients in the test dataset (P=0.039). There were more patients with fever (89.6% versus 46.5%), fatigue (89.6% versus 42.2%) and diarrhea (20.8% versus 2.3%) in the training dataset compared to those in test dataset. in addition, patients in the training dataset had faster respiratory rates (27.24 versus 22.76) than those in the test dataset. The proportion of deaths in the two data sets (32.3% versus 30.2%) was roughly the same.
Feature selection
Figure 2 shows the results from information gain ranking, the top 8 of the available 60 variables (LDH, NE, SaO2, LY, NLR, CKMB, D-dimer, and CRP) were selected for modeling according to the criteria (information gain > 0.2). As shown in Supplementary Figure 1A, LDH, NE, SaO2, NLR, CKMB, D-dimer, and CRP were significantly higher and LY was lower in the severe patients who died during hospitalization compared to patients who did not die.
Derivation and validation of NSL model and NL model
When used individually to predict the risk of death, AUCs of top 8 ranked variables range from 0.763 to 0.880, sensitivities range from 73% to 100%, and specificities range from 51% to 88% (Table 2). Each of these indicators had a good prediction ability for the risk of death, but there were some exceptions, such as some patients with normal indicators who also died during hospitalization, so integrated prediction models were needed to reduce the defects of a single indicator in predicting death risk.
In the modeling, we tried to use as few variables as possible to facilitate clinical application. Because the NE and LY had a reciprocal relationship and integrated models were based on the logistic regression method, we established three model groups depending on whether the NE, LY, or neutrophils/lymphocytes ratio (NLR) was added to the model. AUCs of all integrated models range from 0.903 to 0.948, sensitivities range from 77% to 97%, and specificities range from 77% to 97% (Table 2). Integrated model, combining all top 8 variables (AUC 0.945; sensitivity 97% and specificity 83%), the NSL model, combining top 3 variables (AUC 0.932; sensitivity 97% and specificity 78%; Supplementary Figure 1B), and NL model combining NE and LDH (AUC 0.903; sensitivity 94% and specificity 82%; Supplementary Figure 1B) had High sensitivity and specificity in predicting the risk of death. Considering the need for convenient clinical application and the regions with backward medical care level, we selected the NSL model and NL model for validation in the test dataset. NL model could be used in regions where patients’ SaO2 cannot be tested.
Compared with the training dataset, NSL model (AUC 0.910; sensitivity 92% and specificity 96%) and NL model (AUC 0.871; sensitivity 92% and specificity 82%) provided similarly accurate predictability of in-hospital death in the test dataset (Table 2 and Supplementary Figure 1C).
Nomogram prediction for in-hospital death of severe patients
In order for clinicians to easily calculate the risk of mortality using the NSL model or NL model, we created two nomograms to provide graphical depictions of all indicators in the NSL model and NL model, respectively (Figure 3A,B). In both the training and test datasets, the calibration plots of nomograms were consistent between the predicted risk and the observed probability of death (Figure 3C-F). The Hosmer–Lemeshow tests for NSL model and NL model were not significant (P=0.47 and P=0.45), suggesting the NSL model and NL model were correctly specified for the prediction of in-hospital death from COVID-19.
Development of risk scoring system for predicting in-hospital death
In addition to providing a nomogram to help clinicians predict the mortality risk of severe patients, we also developed two risk scoring systems based on NSL model and NL model. As shown in Table 3, simple point systems were developed based on the logistic regression coefficients (Supplementary Table 1). and reference values for each significant risk factor (Table 3). The NSL risk score included NE (16 points), SaO2 (9 points), and LDH (9 points). The total points ranged from 0 to 34. With an increasing total points, the risk of death increased. Points of 0–13 were associated with a less than 10% risk of death and points of 14-20 with a 10–50% risk of death. Finally, points above 20 were associated with an extremely high risk of death over 50%. The cut-off of the NSL risk score for the prediction of death in training dataset is 15 (sensitivity 94% and specificity 82%, Supplementary Table 2). The AUCs of the NSL risk score were 0.928 and 0.901 in the training and test dataset, respectively. In addition, the NL risk score included NE (16 points) and LDH (9 points). The score ranged from 0 to 25. The AUCs of the NL risk score were 0.895 and 0.857 in the training and test dataset, respectively. Points of 0–9 were associated with a less than 10% risk of death, points of 10-15 with a 10–50% risk of death, and points above 16 were associated with an extremely high risk of death over 50%. The cut-off of the NL risk score for the prediction of death in training dataset is 12 (sensitivity 94% and specificity 75%, Supplementary Table 2). In clinical practice, clinicians can calculate the risk scores of each patient at admission based on the points provided in Table 3 and Table 4.