Parsimonious models combining only one quantitative lung CT density parameter (either SD-CT, Q875, or F1) and one clinical parameter (Neutrophil-Lymphocyte Ratio) allowed accurate prediction of critical illness (ICU, mechanical ventilation/ECMO and/or death) in Covid-19 patients with accuracy (AUC) ranging from 0.82 to 0.92, and prediction of hospital length-of-stay while controlling for the mortality risk. Remarkably, the performances were not adversely affected by the presence of IV contrast in the CT images and were even slightly better in the contrast enhanced group, although the lack of randomization for the IV contrast precludes evaluation of whether this factor leads to better predictions.
The three best CT parameters were Q875, F1 and SD-CT. SD-CT is a well-established a priori radiomic global feature related to the spread of the CT histogram largely available in commercial lung imaging software. In this study, the frequently seen parameters MLA, Skewness and Kurtosis6 performed slightly worse than either SD-CT, Q875, or F1 (see Table 1) and were excluded from the final models. Q875 is another radiomic parameter related to the HU value reached when 87.5% of the lung voxels have been counted (starting with the lowest densities). Q875 is the counterpart for the high CT densities of the 15th percentile density index (PD15) used to quantify the severity of emphysema in lung CT densitometry. F1 parameter is a result of the histogram functional principal component analysis (FPCA) in the patient cohort. Without a-priori knowledge or information about the patient outcome, FPCA extracts the main modes of variation in the sample of CT histograms for the patient cohort. F1 (score) values represent the different degrees of CT histograms shift from homogeneous low lung densities (better outcome) toward heterogeneous much higher densities (worse outcome) see Fig. 1. In the current study, other modes of variations (F2, F3, etc.) were not predictive of the patient outcome. F2 seems to reflect the transition from normal lung densities to extended ground glass opacifications (about-800HU to -600HU) and it was not a significant predictor of critically ill status (AUC: 0.53, P:0.60). In this study, the overall results using FPCA are consistent with previous ones for pulmonary disease subtyping18,19 or patient neurologic outcome prediction20 confirming the value of the FPCA method: first as a non-specific data driven exploration tool, it offers interpretable modes of variations of the CT histograms in the patient whole cohort. Second, it is a generic method giving accurate predictors related to histogram variations without a priori knowledge or delicate radiomic high dimensional parameter selection. All three CT predictors are highly correlated: Spearman Rho Q875 vs. F1: 0.97, Q875 vs. SD-CT, Rho: 0.90, F1 vs. SD-CT: Rho: 0.92 and practically exchangeable. However, the data driven FPCA approach offers a unique data analysis tool of the CT histograms in the whole patient cohort. Current machine learning research is actively extending the FPCA method with supervised FPCA21, multivariate FPCA22 (neuroimaging data), robust FPCA 23, etc. offering a rich toolbox for future medical imaging studies. See also Pratt et al.24 for a recent application in pulmonology.
The good performances of CT density features for patient outcome prediction in COVID patients are concordant with results from a few previous studies10– 12 including Lanza et al.11 who showed that COVID patients requiring oxygenation and ventilation had higher amounts of compromised lung volumes (-500-100 HU), statistically significant at 6–23% and greater than 23% respectively. Another large study by Colombi et al. showed that a percentage of well aerated lung on CT calculated by software of 71% (OR, 3.8, 95% CI: 1.9, 7.5, P < .001) or less was associated with ICU admission or death12.
Lung volume is known to affect the lung CT density in a complex way: First, the optimal lung inflation is difficult to obtain in severe acute lung disease and spirometry-controlled lung CT is often not feasible. So, partially inflated lungs may increase the apparent lung CT density. Bressem et al.10 have mentioned the potential confounding effect of the lung volume variation among patients when using CT density as a biomarker in Covid-19 patients. Second, the lung tissue density is increased with disease severity associated with extended GGO and consolidations. Third, lung CT density appears to be lower in subjects with larger lungs because of greater air spaces25. In this study, adding the lung volume feature did not improve the performances of the predictive models. However, a moderate but significant correlation has been observed for the CT parameters: Q875 - Rho: -0.60 (-0.76 -0.38), F1 - Rho: -0.54 (-0.72 -0.30), Mean CT: Rho: -0.68 (-0.81 -0.48) but not SD-CT (Rho: -0.30 (-0.49 -0.08) P = 0.0072, for the non-critically ill patient group, in agreement with the research literature. The supplementary Fig. S3 shows the relationship between Q875 and lung volume for both patient outcome groups. The linear relationship between Q875 and log. Volume in the non-critically ill group and for a large range of lung volumes may be best explained with Robert et al. hypothesis on lung CT density change with normal lung growth25.
Moreover, the performance of the clinical predictor: Neutrophil-Lymphocyte Ratio (NLR) in either the univariate analysis (Table 1) or in the multivariate best logistic regression models (Table 2 – Models 2-3-4) supports the conclusion of the recent meta-analysis from Li et al 26 pointing out the value of this biomarker to predict disease severity and patient mortality in Covid-19 patients.
Our quantitative CT density features were compared with both the COVID-GRAM score and CT severity score to predict critical illness. The CT severity score (Reader-1) alone performed well with AUC: 0.91 (0.80 0.96) and intra-class correlation (ICC): 0.90 (0.85 0.94) as previously shown in prior studies 4,27 However, Reader-2 CT severity score was suboptimal and illustrated the inter-reader variability of subjective features. See for example Fig. 2 (Odd Ratios). The COVID-GRAM score performed poorly in our study to predict critical illness with AUC of 0.64 (0.52, 0.74) 95% CI. A possible explanation of this poor performance compared to the original Chinese study to develop the model by Liang et al.5 is the presence of older patients (60.8 vs 48.9 years) and a higher prevalence of one of more pre-existing comorbidities (71.3% vs 25.1%) in our study. Remarkably, Al Hassan et al.28 recently reported similar findings with an AUC of 0.64 for COVID-GRAM score for risk stratification with Covid-19 patients.
Hospital length of stay (LOS) and hospital mortality are mutually related and thus require a competing risks method for proper assessment of the cumulative incidence of each event of interest (discharge or death). Using this method, our study showed that the patient groups with Q875>-380 HU, F1 > 0.099 or SD-CT > 213.8 HU were all associated with significantly higher cumulative incidences for longer length of stays while controlling for the hospital mortality. This information is valuable in capacity planning to provide accurate predictions of the number of beds required at each level of care.
This study has several limitations: it is retrospective and has a modest sample size resulting in larger confidence intervals or suboptimal statistical power when considering subgroup analysis (such as IV contrast vs. Non-Contrast CT) and prevents us to draw conclusions about the in-hospital mortality due to the low number of death events (14/80). The predictive accuracy results were computed with cross-validation correcting performance for overfitting. However, future work involving multiple sites would be necessary for testing the performances in a fully separated testing dataset.
Another limitation is that CT chest protocols varied based on the clinical indication with almost twice as many pulmonary angiogram studies as non-contrast studies, preventing us to better understand the role of CT contrast in the predictive performances.
Finally, the methods discussed in this study are focused on a global lung CT histogram analysis. Multi-threshold lung density analysis methods such as those described in already mentioned studies10,12,15 or more advanced CT density/texture methods based on local lung pattern classification29 were not tested and should deserve future attention.
In conclusion, the extensive and diffuse changes in lung CT density affecting the whole lungs in COVID-19 pneumonia patients offered the opportunity to compare predefined and data-driven imaging features related to the lungs CT density histograms. All SD-CT, Q875 and F1 features could accurately predict both critical patient illness and hospital length-of-stay. Combined models with one of these features and the biomarker for inflammation Neutrophil-Lymphocyte Ratio gives the highest predictive performance. This application of CT densitometry provided similar results for both Non-enhanced CT group and the contrast enhanced group. The FPCA method allowed the unsupervised analysis of the lung density histograms in the whole patient cohort to extract interpretable CT density features with high predictive values. This approach may be considered for other predictive models with diffuse lung diseases.