STUDY OVERSIGHT
The present is an observational retrospective study, conducted in a tertiary referral University Hospital in Milan (Lombardy, Italy). The study protocol followed the ethical guidelines established in the 1975 Declaration of Helsinki, compliant with the procedures of the local ethical committee and was approved by the Institutional Review Board. This study received no financial support.
DATA SOURCES
We obtained laboratory, clinical, and radiological data of hospitalized patients affected by COVID-19 from the electronic medical records; the inclusion data cutoff for the analyses was January 25th, 2020, and April 28th, 2020.
COVID-19 was diagnosed based on a positive result on rtRT-PCR assay on nasal and pharyngeal swab specimens[14]. We included only laboratory-confirmed cases who received a non-contrast chest CT at admittance in the emergency department.
We analyzed patients’ demographics, clinical and laboratory findings at admittance from electronic medical records. Patients demographics included age, sex, body mass index (BMI), concomitant/previous diseases, and smoking habit. Clinical and laboratory assessments consisted of body temperature, PaO2, PaCO2, and C-reactive protein. We also noted the time from symptoms onset, days of hospitalization, ICU admittance, medical therapy administered, and the most invasive level of oxygenation support provided. In particular, we distinguished between low-flow oxygenation (nasal cannula, face mask), high-flow oxygenation (Venturi mask, helmet CPAP), and mechanical ventilation through an endotracheal tube. We collected the time interval from CT and oxygenation support, as well as the first PaO2/FiO2 ratio available. In-hospital deaths and healed patients’ discharge dates were also noted.
The clinical features of confusion (mental test score of 8 or less), urea, respiratory rate, and blood pressure were also acquired to calculate the CURB-65, a validated score to predict the severity of Community-Acquired Pneumonia [15] that stratifies patients in groups from 1 to 3 according to the risk of mortality.
CHEST CT AND QUANTITATIVE ANALYSIS
All patients received a standard non-contrast chest CT with a multidetector CT scanner (Philips Brilliance, Amsterdam, The Netherlands) with the following setup: collimation, 64x0.25; voltage, 120 kV; tube current, 130-200 mAs, 240 mA, pitch 1.4, slice thickness after reformat, 2.5 mm. The field of view included the whole chest and was acquired during forced inspiration, in keeping with patient compliance. The dataset was anonymized and exported to a dedicated segmentation suite for medical image computing (3D-Slicer, www.slicer.org) [16] equipped with a semi-automated segmentation algorithm (Chest Imaging Platform) [17]. This software, validated as useful in the surgical setting [18], performed a first-pass automated segmentation; then, lung volumes were manually perfected using three-dimensional tools such as spherical brushes or erasers.
As a rule, a complete segmentation included both lungs with interstitial structures, segmentary vessels, and bronchi; the main pulmonary arteries and bronchi, all mediastinal structures, and eventual pleural effusion were excluded, as well as lung masses (e.g. tumours, fungal disease).
Lung volumes, considered as percentages of the total volume, were extracted according to different Hounsfield Units (HU) intervals into Non-Aerated (%NNL, density between 100, –100 HU), Poorly Aerated (%PAL, –101,–500 HU), Normally Aerated (%NAL, –501,–900 HU), and Hyperinflated (–901,–1000 HU) [19]. The additional volume “compromised lung” (%CL) was considered as the sum of %PAL and %NNL (-500,100 HU) (Figure 1). The authors in charge of segmentation (E.L., C.L., R.M.) were unaware of the laboratory and clinical parameters or hospitalization outcomes. Conflicts were resolved in consensus. The principal investigator reviewed and confirmed all segmentations before data entry. The time needed to complete each analysis was recorded.
OUTCOMES
The primary objective was to identify and validate the most accurate lung volume derived by QCT, to predict the two main study outcomes: the need for oxygenation support and the need for intubation in patients affected by COVID-19.
Other objectives included correlation with pulmonary dysfunction as measured by the PaO2/FiO2 ratio and prediction of in-hospital death.
STATISTICAL ANALYSIS
Development of prediction models
All analyses were performed using Stata 13 (StataCorp LP, College Station, USA). Multiple binomial logistic regressions were performed to explore the correlation of the lung volumes, %NAL, %PAL, and %CL, over the two outcomes of interest. All clinically relevant predictors without missing data were included in the final model as covariates: age, sex, smoke habit, CPR, heart disease, chronic lung disease, cancer, diabetes, chronic kidney failure, urea levels, and CURB-65 group. Three similar models were thus developed: %NAL-model, %PAL-model, and %CL-model.
BMI was available for 161 patients and was tested in a separate model.
A Pearson's product-moment correlation was run to assess the relationship between the selected lung interval and PaO2/FiO2 in 106 patients (nasal cannula = 26, Venturi mask = 28, helmet CPAP = 21, endotracheal tube = 27).
A Cox regression survival analysis was performed to explore potential predictors of mortality. All potential candidates were tested with univariate analyses; the cutoff for inclusion in the final model was set at p < 0.2.
Model validation
All simulations were run using Python programming language (Python Software Foundation, https://www.python.org/). Categorical variables were preliminarily tested for correlation using Chi-squared tests; Wilcoxon rank-sum tests were performed on continuous covariates to inspect the probability of being sampled from the same distribution. Two separate multivariate regressions, without regularization, were performed on both outcomes over the space of covariates. Models’ coefficients, confidence intervals, and their associated associations (p values) were investigated to assess whether the selected lung interval remained significant despite adjusting for possible confounders. Predictive machine learning models were built using logistic regression with regularization. To adjust for class imbalance, on both outcomes, and preserve the limited amount of available observations, the logistic regression was stacked upon a SMOTE model during training [20].
Hyperparameters were chosen by randomized selection over 1000 possible validations by means of 10-fold cross-validation each, for a total of 5000 actually trained models. The aim was to alleviate class imbalance by maximization of class weighted F1-score, harmonic mean of precision and recall.
Cross-validated receiver operating characteristic (CV-ROC) and mean areas under the curve (CV-AUC) were calculated [21]. Different cross-validated cut-points at 90% sensitivity and at 90% specificity were estimated for both outcomes.