Study Population. The study received formal approval from the Ethical Committee of the University G. D’Annunzio of Chieti-Pescara, Italy; informed consent was waived by the same ethics committee that approved the study (Comitato Etico per la Ricerca Biomedica delle Province di Chieti e Pescara e dell’Università degli Studi “G. d’Annunzio” di Chieti e Pescara, Italy). The study was conducted according to ethical principles laid down by the latest version of the Declaration of Helsinki. We retrospectively included a total of 120 consecutive patients diagnosed with SARS-CoV-2 infection based on RT-PCR who underwent a clinically indicated high-resolution chest CT (HRCT) between March 2020 and April 2020. Patients were included if they met all the following criteria: (a) GGO as predominant feature on chest CT scans, (b) baseline HRCT performed at hospital admission. Another set of 310 patients (nCOVID) with clinically indicated HRCT for acute respiratory disease performed between August 2019 and April 2020 was retrospectively enrolled in the study (nCOVID). For this second set, patients were included if they met all the following criteria: (a) GGO as predominant feature on chest CT scans, (b) availability of final diagnosis (clinical, laboratory, or pathology). In the first set (COVID), we excluded 92 patients: 14 had severe respiratory artefacts, 92 had a non-predominant GGO pattern. In the second set (nCOVID) we excluded 280 patients: 32 had severe respiratory artefacts, 210 had a non-predominant GGO alteration and 38 were treated in another center and the final diagnosis was not available. The final study population was composed of 28 COVID and 30 nCOVID for a total of 58 patients (Fig. 1).
CT Protocol. Non-enhanced chest CT scans were performed in a supine position, during inspiratory breath-hold, from the apex to the lung bases, with a 128-slice multi-detector CT device (Somatom Definition AS, Siemens Healthineers, Germany). The field of view (FOV) ranged between 35–40 cm according to the body size. The electronic window values were amplitude (W) 1200–1600 UH and window or center level (L) between − 600 and − 750 UH. The main scan parameters were: tube voltage = 120 kVp, automatic tube current modulation (30–70 mAs), pitch = 0.99–1.22 mm, matrix = 512 × 512. The images were reconstructed with a slice thickness of 0.625–1.250 mm with the same increment with a high spatial frequency reconstruction algorithm (B50, I50).
Radiomics Analysis. A whole-volume semi-automated GGOs delineation was independently performed by two senior radiology residents (C.V. and M.V.) using an open-source medical image computing platform, 3DSlicer Version 4.8 (www.3dslicer.org) (Fig. 2a). The GGO threshold was manually set between − 1350 and − 700 HU using the “threshold-effect” tool (9, 25, 26). If necessary, the segmentation was further manually corrected by each reader in order to exclude automated segmented pixels beyond the GGOs. Moreover, the lungs were automatically extracted via Convolutional Neural Network (CNN) algorithms to create binary mask (27). Then, a logical “and”, between these masks and the segmentations obtained by the radiology residents, was performed (using “3dcalc”) to exclude automated segmented pixels beyond the lungs, thus obtaining the final ROIs (28). All the ROIs were then finally checked by a radiologist with more than 10 years of experience in chest imaging (M.M.) to verify the correct position and correspondence with the underlying CT images. The reproducibility assessment of the features extracted by the two independent segmentation sets of the 58 CT scans (28 COVID, 30 nCOVID) was performed. The extraction of the radiomic features was conducted using PyRadiomics, a flexible open-source platform capable of extracting a large panel of engineered features from medical images; this radiomic quantification platform enables the standardization of both feature definitions and image processing (29). To avoid data heterogeneity bias, HRCT images were subjected to imaging resampling (2 × 2 × 2 mm). For each ROI, ten built-in filters (Original, wavelet, Laplacian of Gaussian (LoG), square, square root, logarithm, exponential, Gradient, LBP2D, LBP3D) were applied and seven feature classes (first order statistics, shape descriptors, glcm, glrlm, ngtdm, gldm, glszm) were calculated, for a total of 1409 radiomic features.
Machine Learning Approach: Partial Least Square (pls) Regression
A Machine learning (i.e. multivariate) approach was implemented to exploit radiomic features multidimensionality (Fig. 2b). When trying to predict an output based on these features, the information redundancy (i.e., radiomic features high correlation), coupled with a low number of independent samples (i.e., subjects), makes the prediction unstable to noise and prone to poor generalization (30, 31). To address this problem, two main approaches were implemented. The first approach was to reduce the number of features by selecting only those that were highly repeatable (r > 0.95) between the two masks (delineated by the two radiologists). The second approach was to implement a machine learning framework based on a linear regression analysis that employed a space dimension reduction procedure, namely the partial least square (PLS) regression (30, 32, 33). The PLS was used to differentiate COVID from nCOVID patients. PLS allows the construction of regression equations reducing the predictors to a smaller set of uncorrelated components, i.e. a linear combination of the original predictors, and performs regression on these components (33, 34). The goal of PLS is to identify components that capture most of the information in the independent variables (e.g., linear combinations of all radiomic features) that is useful for predicting the dependent variable (e.g., COVID vs. nCOVID). PLS is the supervised learning version of the Principal Component Analysis (PCA) (35, 36). The learning process (fitting) of the PLS algorithm delivers regression loadings that can be used to retrieve the weights (β-weights) linking the original independent variables with the dependent variable, depicting the importance and sign of the original variables in the prediction. The PLS has one hyperparameter to be optimized, namely the number of uncorrelated components to be used in the regression. To perform hyperparameter optimization and evaluate the generalizable performance of the procedure an approach that allows to minimize the loss of samples during training of the model is the nested cross-validation (nCV) (37). In nCV, data are divided in folds and the model is trained on all data except one-fold in an iterative, nested manner. The hyperparameter optimization and performance assessment are performed on the remaining fold and averaged across iterations. If the number of folds equals the number of samples (one-fold per sample) the procedure is defined leave-one-out nCV (38, 39). This approach is highly suited for medical applications where each sample represents one subject. In this work, a leave-one-out nCV was implemented to optimize the PLS number of components and to assess the PLS generalization performance. The β-weights of the PLS analysis were obtained by running the algorithm on the complete dataset with the optimal number of components delivered by the nCV analysis. The machine learning analyses were implemented in Matlab.
Statistical Analysis
The COVID vs nCOVID classification performance was assessed through Receiver Operating Characteristic (ROC) analysis comparing the inferred (out-of-training-sample) with the true group. COVID patients were attributed to the “positive” group, whereas nCOVID patients were attributed to the “negative” group. The ROC analysis was also performed on random shuffled group labels to simulate the null hypothesis and evaluate its confidence interval (repeated 106 times). The ROC analysis delivered an Area Under the Curve (AUC), which could be transformed into a z-score for assessing its statistical significance by using the random shuffled group labels. The statistical analysis was performed in Matlab.