Patient
We retrospectively included 180 patients with pathologically confirmed NSCLC in Zhejiang Cancer Hospital between January 2016 and September 2020. Inclusion criteria were as follows: (1) older than 18 years; (2) treated with PD-1/PD-L1 monotherapy; (3) stage IIIB/IIIC and IV, according to the 8th edition TNM staging system, formulated by the International Association for the Study of Lung Cancer (IASLC); (4) follow-up time more than 6 months before progressive disease; (5) no artifacts in CT images. The exclusion criteria were as follows: (1) no postcontrast CT image within 30 days before the start of immunotherapy; (2) no measurable (long diameter >10 mm) primary lung cancer lesions; (3) advanced NSCLC patients with postoperative recurrence and metastasis; (4) patients lost to imaging follow-up after treatment; (5) patient who died 10 days after the first-time treatment due to severe immune-related pneumonia. The study ultimately enrolled 112 eligible patients. The inclusion and exclusion diagram are shown in Fig. 2.
This study was approved by the Ethics Committee of Zhejiang Cancer Hospital. Informed consent was waived as private information on these patients was hidden in the retrospective data.
Clinical features
We included a total of 20 clinical features, which were all previously reported to be associated with the prognosis of immunotherapy. The clinical features in our study included baseline demographic characteristics, clinical characteristics, and peripheral blood indicators, as follows:
The demographic and clinical characteristics included age [18], sex [19], body mass index (BMI, kg/m2) [20], smoking history [21], chronic obstructive pulmonary disease (COPD) [22], Eastern Cooperative Oncology Group (ECOG) score [23], histologic type [14], type of ICIs [24], therapy line [25], tumor stage [26], bone metastasis [27], brain metastasis [28], liver metastasis [15], and pleural effusion [29].
Laboratory data were obtained within 2 weeks prior to the first ICIs treatment. The final peripheral blood indicators for data analysis included hemoglobin (g/dL) [30], serum albumin (g/dL) [31], lactate dehydrogenase (LDH) (U/L) [32], and composite inflammation indicators including the neutrophil-to-lymphocyte ratio (NLR) [33], platelet-to-lymphocyte ratio (PLR) [33], and lymphocyte-to-monocyte ratio (LMR) [34].
Efficacy evaluation criteria and follow-up
If the patients received multi-line immunotherapy, the analysis was performed using the first immunotherapy. Response assessment included complete response (CR), partial response (PR), stable disease (SD) and progressive disease (PD) based on the Response Evaluation Criteria in Solid Tumors (RECIST), version 1.1 [35]. When the results of PD, retrospectively determined according to RECIST 1.1, were inconsistent with the results determined by clinicians based on the conditions of patients, PD cases identified by clinicians in real time were regarded as events. The therapeutic efficacy was defined as DCB (CR, PR or SD lasting > 6 months) and no durable benefit (NDB: PD or SD lasting ≤ 6 months) [17]. PFS was defined as the time from the first ICIs treatment to disease progression or death from any cause, and patients without progression were censored at the time of the last clinical visit.
Image acquisition
The CT scans of all patients were acquired with a 16 or 64 row multi-slice spiral CT (Siemens SOMATOM Sensation 16; Siemens SOMATOM Definition Flash 64; GE Optima CT680). During the scan, the patients were instructed to hold their breath at the end of deep inhalation to avoid breathing motion artifacts. The tube voltage was 120 kV, and the tube current was 150-200 mAs with automatic adjustment. The scanning range was continuous from the lung apex to the lung bottom, and the pitch was 1.2-1.375. The slice thickness and slice spacing were both 5 mm. The CT images were reconstructed with a 512 × 512 matrix. In contrast scanning, a high-pressure syringe was used to inject non-ionic contrast agent into the anterior elbow vein. The injection rate was 2.0-2.5 mL/s, and the injection volume was 80-100 mL. The contrast scanning was delayed by 38-40 s.
Image segmentation and feature extraction
Image segmentation and feature extraction were performed with YITU AI Enabler, using Python pyradiomics (version 3.0.1). All imaging data were preprocessed by resampling to 1mm × 1mm × 1mm voxel size to minimize the impact of different scanning protocols or equipment on quantitative radiomics analysis. Manual segmentation can ensure the accuracy of the region of interest (ROI) and is the gold standard for clinical segmentation. The primary lung lesions were delineated as ROIs layer-by-layer for the entire tumor by a radiologist (NL, 5 years of experiences in diagnosing thoracic tumors). Another senior radiologist (LS, 15 years of experiences in diagnosing thoracic tumors) then confirmed and adjusted the outlined boundary. The two radiologists were both blinded to the therapeutic efficacy.
ROIs were delineated on the postcontrast CT images to avoid blood vessels and atelectasis as far as possible, and then the ROIs were copied to the precontrast CT images. Nine hundred and sixty features were first extracted from each patient based on precontrast and postcontrast CT images, respectively. Then a feature stability check was performed with minor changes of ROIs to filter out unstable features using intraclass correlation coefficients (ICC) between the features extracted within the lesion ROIs and the extended lesion ROIs. The extended lesion ROIs were produced by extending the boundary of ROIs by 1 image pixel. The features with an ICC greater than 0.8 were preserved as stable features.
In precontrast CT images, there were 790 stable features (Supplementary Fig. S1) from each patient including 14 shape features, 167 first-order statistics features, 213 gray level co-occurrence matrix (GLCM) features, 131 gray level difference matrix (GLDM) features, 155 gray level run length matrix (GLRLM) features and 110 gray level size zone matrix (GLSZM) features. In postcontrast CT images, there were 767 stable features (Supplementary Fig. S2) from each patient including 14 shape features, 161 first-order statistics features, 196 GLCM features, 141 GLDM features, 151 GLRLM features, and 104 GLSZM features.
Model construction
We used recursive feature elimination (RFE) to select 10 radiomic features most related to the therapeutic efficacy from precontrast and postcontrast radiomic data, respectively. The scikit-learn package (version 1.0.2) in Python programming software (version 3.9.7) was used for model construction and evaluation. We performed random over-sampling (imblearn package; version 0.9.0) of the minority class and used these balanced datasets for developing machine learning models. All the codes are available at https://github.com/BioAI-kits/RadClin.
In order to select the most suitable machine learning method for fitting radiomic data, the efficacy classification models were constructed based on radiomic features using different machine learning algorithms, including logistic regression (LR), support vector machine (SVM), multi-layer perceptron (MLP), and random forest (RF). For each machine algorithm, we used a three-step approach to build the model. First, we constructed models using various combinations of tunable hyperparameters that were adjusted depending on the algorithm. After developing these models for each hyperparameter combination, we tested the performance of the models using the average values of AUC from 5-fold cross validation. Finally, we selected the best hyperparameters with the highest average AUC for each algorithm. Furthermore, we rebuilt the machine learning model of each algorithm with the best parameters and evaluated these models with multiple metrics including AUC, balanced accuracy specificity, and sensitivity to select the optimal algorithm.
Based on the optimal machine learning algorithm, we further constructed five RF models with different input datasets. The dataset and corresponding model were as follows: precontrast CT radiomic features, precontrast model; postcontrast CT radiomic features, postcontrast model; precontrast and postcontrast CT radiomic features, radiomic model; clinical features, clinical model; combined clinical and radiomic features, combined model. The construction of the models was consistent with the methods above. We evaluated prediction performance of different models using the AUC in the ROC curves. In addition, the calibration curves were generated as a supplement to the model evaluation to visualize the goodness of fit of predictive models. The patients were divided into two groups with the prediction label (predicted DCB vs. predicted NDB), which was finally generated from the combined model. Survival analysis was then performed on the PFS of these two groups of patients.
Statistical analysis
Comparisons of clinical features were performed using SPSS 26.0 for statistical analysis. The continuous variables are presented as mean (standard deviation, SD) and median (interquartile range, IQR), which were compared by the independent sample t and Mann-Whitney U test. The categorical variables were compared by the Chi-square and Fisher’s exact test as appropriate. Kaplan-Meier analysis was used to generate survival curves, and the log-rank test was performed to compare PFS time between the two groups on R software (survminer; version 0.4.9). All statistical analyses were two-sided and the differences were considered statistically significant at P < 0.05.