In this study, the presence and extent of lung involvement on the initial CXR of COVID-19 patients has a prognostic value in both univariable and multivariable analysis. In the first developed multivariable models based on age, gender and radiological features, the ExtScoreCXR was the strongest predictor of severity and the second predictor of in-hospital mortality after age. However, the addition of parameters usually registered at admission significantly improved the predictive accuracy of the models. These results demonstrate the greatest usefulness of CXR as a prognostic tool in COVID-19 when considered together with SatO2/FiO2, age and laboratory parameters. Despite a careful analysis of the imaging, slight opacities, overlapping of structures, indeterminate opacities or normal CXR are not uncommon (31.8% in our series). The use of prognostic models attempts the safe decision-making, avoiding discharge at home of patients requiring hospital care and unnecessary hospitalizations or CT overuse.
On the other hand, we also confirmed the strong negative correlation between lung involvement extent and SatO2/FiO2. It supports the widely accepted indication of CT pulmonary angiography in case of oxygen desaturation or dyspnea and normal or mild lung involvement on CXR [15], looking for extended slight lung opacities not visible on CXR or pulmonary thrombosis/embolism [16]. In concordance with this, both developed prognostic models also include the SatO2/FiO2 as a strong predictor.
The distribution and the density of opacities were not strong enough to remain predictors in the definitive models.
In the literature, a CXR severity score in ED was predictive of risk for hospital admission and intubation of COVID-19 patients aged 21-50 [6]. Including the CXR abnormality as predictor a risk score showed an AUC-ROC of 0.88, but without extent assessment [17]. On admission chest CT, a well aerated lung parenchyma less than 73%, and after adjustment for patient demographics and clinical parameters, was associated with ICU admission or death [10]. The extent of lung involvement was also associated to worse outcomes in severe acute respiratory syndrome [18, 19].
The most reported predictors of severe prognosis in patients with COVID-19 included age, sex, features derived from CT, CRP, LDH, and lymphocyte count [9, 20] and the most published predictors of mortality are older age [21-23] and D-dimer level [20, 21]. These predictors coincide with most of those we have observed in the multivariable analysis and included in the two predictive models (SatO2/FiO2, age, CRP, lymphocyte count, ExtScoreCXR, LDH, D-dimer level and platelet count).
Days with symptoms, clinical presentation, institutionalization, comorbidities and the rest of CXR features did not show enough predictive power to be included in the models. The number of days with symptoms on arrival of patients to ED was neither related to the lung involvement extension. In other series neither a significant difference was identified between the severe and non-severe patients, regarding the median days from symptom onset to hospital admission [24]. Tobacco, comorbidities as hypertension, diabetes, cardiovascular disease, respiratory diseases, cancer history and the presence of fever, dyspnoea, haemoptysis and unconsciousness, were also associated to a worse prognosis in some publications [17, 23, 25], but not in our study. The ExtScoreCXR and laboratory parameters have been observed to have a large impact in the model in contrast to the symptoms and comorbidities. This raises the need of performing these tests to all COVID-19 patients with viral symptomatology, regardless of the type of symptoms and chronic diseases.
Regarding the addition of CNN-based diagnostic tool, a non-significant improvement of the predictive metrics of mortality prediction model was observed, probably because only the “consolidation” and “lung opacity” indices, and not the extension, were included as predictors. The extent score CNN-based promises to stage the severity disease on CXR of COVID-19 patients and its weight in a predictive model has to be investigated [26, 27].
The National Early Warning Score 2 (NEWS2), based on vital signs, is the most used score in ED. Its predictive accuracy in COVID-19 patients is higher than other clinical risk scores [28, 29], but the models developed in this study exceed this accuracy (AUC-ROC=0.94 and AUC-ROC=0.97 for severity and mortality respectively), as expected because the addition of other relevant variables.
Recently a multivariable model including CXR at admission was developed to predict critical illness in hospitalized COVID-19 patients [7]. The predictors that remained in the model were male gender, obstructive lung disease, symptom duration > 7 days, neutrophil count, CPR, LDH, distribution of lung disease and CXR score, with an AUC-ROC=0.77. The CPR, LDH and the lung involvement extension on CXR are also included in our final model but there are no further coincidences in the rest of predictors. This is probably explained by the different model development methodologies including a different feature selection strategy, as they used a univariable statistical test and we based our selection on the correlation between parameters and with respect to the clinical outcome. Other discrepancies include the data pre-processing steps, as we included a combination of some over- and under-sampling techniques as well as data standardization; and the consideration of different model architectures, as they employed a multivariable logistic regression, which relies on transformations for non-linear features. In order to overcome this issue, we tested three different model architectures: Support Vector Machine, Random Forest and Gradient Boosting, which can handle non-linear features as well as their interactions, and perform well in a large feature space.
As potential sources of bias, the severity level is a decision-based clinical outcome, with a certain degree of inter-and intra-observer variability, unlike mortality. However, decisions about hospital or ICU stay and treatment followed an agreed action guide depending on their clinical status, decreasing this variability. Also, the proportion of patients with critical evolution (22%) was within the range published in longer series (15-36%) [30, 31]. The number of comorbidities were not analysed because we observed a strong association to the age, included as predictor. The internal validation was performed with 88 cases. However, it has been reported that a minimum sample size of 100 is recommended in order to achieve a robust validation [32]. In addition, an external validation with cases from other hospitals is desirable to assess the generalizability and the potential use of the developed models in daily clinical practice.
In conclusion, the developed multivariable prognosis prediction models showed a high predictive accuracy that could allow triage of symptomatic COVID-19 patients at ED to improve the decision-making. The application to estimate the severity level and the in-hospital mortality is available on http://upv.datahub.egi.eu:30054/hulafecovid19models and it should be validated at different ED.