Our study developed and validated a COVID-19 mortality prediction model based on clinical and epidemiological data of COVID-19 4,049 confirmed patients recruited by Korea Centers for Disease Control and Prevention. The high AUC value of 0.9684 indicated a good reliability and performance of our model. The course of clinical symptoms of coronavirus ranges from asymptomatic infection to acute respiratory distress (ARDS) and death. As the period of COVID-19 global pandemic lasts longer, shortage of medical resources comes earlier. Therefore, differentiated patient management based on evidence is required. Risk stratification also suggests evidence to allocate resources efficiently when medical resources are limited4. Several previous Korean studies have reviewed characteristics of mortality cases of COVID-19. The Korean Society of Infectious Diseases and Korea Centers for Disease Control and Prevention has analyzed 54 COVID-19 mortality cases since the first mortality occurring from February 19 to March 10, 2020. The median age of mortality cases was 75.5 years. Of all mortality cases, 61.1% were men. The majority of such patients also had various underlying diseases such as hypertension, heart disease, diabetes, dementia, and stroke18. Another study reported in Korea was focused on 20 mortality cases in Gyeongbuk Province and Daegu city where the second outbreak wave occurred in February based on medical chart review19. Average age of mortality cases was 72 years. Of these mortality cases, 55.1% were women and 74.5% had an underlying disease. The median length from hospitalization to death was 8 days. Comorbidities such as diabetes, chronic lung disease, and chronic neurologic disease were significant risk factors associated with COVID-19 mortality. Clinical manifestations observed before death were abnormal heart rate intensity, systolic blood pressure, respiratory rate, oxygen saturated by pulse oximetry on room air, and altered mental status19. Although these two studies reported clinical characteristics of the deceased in detail at the level of descriptive epidemiology which contributed for overall understanding of COVID-19 patients, their numbers of cases were relatively small, was not enough for associational inference.
One study has developed an evidence-based COVID-19 prognostic model for military personnel in Korea20. Although there was a problem of generalization since it was developed for soldiers, age, body temperature, physical activity, history of cardiovascular disease, hypertension, visit to a region with an outbreak, feverishness, dyspnea, lethargy, and symptoms of chills were reported as significant predictors (overall C statistic: 0.963; 95% CI: 0.936–0.99) 15.
A COVID-19 mortality prediction model has been developed using machine learning after recruiting 10,237 COVID-19 confirmed patients and 228 mortality cases between January 20, 2020 and April 16, 202015. This prediction model used various variables including socioeconomic status linked with National Health Insurance Service15. However, specific clinical and epidemiological variables were lacking since that study was focused on the linkage with NHIS data. For mortality prediction, LASSO and linear SVM were used in that study, with AUC values of 0.963 and 0.962, respectively15. The most significant factors in the mortality prediction model using LASSO were old age, preexisting DM, and cancer. The most significant factors in Random Forest were old age, infection route (cluster infection or infection from personal contact), and underlying hypertension 15. However, that model could not be immediately applied to the field or clinics due to the lack of specific clinical variables.
Previous foreign studies have reported that different clinical experience can lead to substantial heterogeneity in the prognostic trajectory of COIVD-19 confirmed patients spanning from patients who are asymptomatic to those with mild, moderate, and severe disease forms with low survival rates21–23. A COVID-19 mortality prediction model has been developed previously by analyzing data of 3,841 confirmed patients in New York, USA recruited from March 9 to April 6, 2020 using machine learning20. Sex, age, race, oxygen saturation, COPD, hypertension, and diabetes were found to be significant variables in that model with AUC of 0.91 to 0.94. However, blood test results were not included in that model. In that study, the minimum oxygen saturation was emphasized as a central factor in mortality prediction20.
A study from Israel during the early period of COVID-19 pandemic estimated the risk of COVID-19 mortality when individual data were unavailable7. That study adopted a hybrid methodology under the hypothesis that the risk of severe respiratory infection or sepsis had a common etiology with the risk of COVID19. Major predictors were age, lymphocyte, and alnumin, with AUC value of 0.8207. Predictive factors found in the Israeli study were similar to those of our study. In terms of predictive power, the predictive power of the present study was much higher (at about 0.97).
A systematic review has been conducted based on 13 papers for the diagnosis and prognosis of COVID-19 infection. The majority of models used in that study failed to show sufficient performance as a predictive model due to a high risk of bias that required collaborative efforts with documented individual participant data13.
A prediction model has been developed after analyzing 53,001 ICU patients requiring mechanical ventilation as well as those diagnosed with pneumonia from the US Medical Information Mart for Intensive Care (MIMIC). When that model was applied to 114 COVID-19 confirmed patients24, AUC for 12, 24, 48, and 72 hours were reported to be 0.82, 0.81, 0.77, and 0.75, respectively24. Our study probably used the largest data set up to date to predict COVID-10 mortality involving specific clinical features of COVID-19 patients in Korea. The main advantage of our study was that we collected our clinical and epidemiological variables at the time confirmation was made. Results were obtained after a certain period of health system encounter or right after the diagnosis of COVID-19. Although we merely conducted logistic regression analysis, both development and validation sets showed high area under the curve (0.9656 and 0.9684, respectively).
Moreover, our model has the advantage of being able to easily interpret factors associated with high mortality rate of individuals according to the detailed algorithm shown in the model. In that context, our model has a high practical value for risk stratification in the clinical field.
The main limitation of our study was the issue of validation. Although our dataset was relatively large involving specific clinical features, we merely conducted an internal validation due to the lack of dataset that had similar size and variables in Korea. Thus, the possibility of overestimation exists which requires cautious interpretation of our results. An external validation study using data of COVID-19 patients that occurred afterwards is required.