Study design
A total of 64 patients with advance inoperable ESCC receiving 200mg every 3 weeks of Sintilimab plus Docetaxel (60mg/m2) and Carboplatin (AUC=5) at two centers between January 2019 and June 2020 were included in these two institutionals review board-approved study. Informed consent was waived. The study patients were confirmed by biopsy and immunohistochemistry of the original tumor tissue. All the enrolled patients were first-visit and prior to treatment. Patients who had never received cancer related treatments including radiotherapy, chemotherapy, comprehensive treatment and surgery and those who lacked CT imaging data and necessary clinical information before the initial treatment (immunotherapy plus chemotherapy) were excluded from this study. Exclusion criteria also included patients with non-squamous cell carcinoma including adenocarcinoma and signet ring cell carcinoma and those who discontinued treatment due to adverse events. Flow chart of patient enrollment is shown in Fig. 1. For patients’ clinical characteristics, information of age, gender, Body Mass Index (BMI), clinical TNM stage, hemoglobin, blood albumin, leucocyte, C-reactive protein and underlying diseases was acquired from electronic medical records system. BMI was calculated based on height and weight. Clinical TNM stage was confirmed by pre-treatment gastroscopy, CT examination, etc.
Response Kinetics and Scan Protocol
Contrast enhanced computed tomography (CE-CT) scans were acquired before (baseline) and around six weeks (two cycles) after start of treatment (follow-up). Treatment response was evaluated by assessing the relative change in diameter between baseline and follow-up, using RECIST 1.1 criteria [25]. Patients were divided into responders [complete/partial response disease] and non-responders [stable and progressive disease] according to RECIST. For progressive disease, pseudoprogression was confirmed by follow-up observation.
All preoperative enhanced CT images were obtained with multidetector CT scanners during inspiration. Detailed information of the CT scanners including manufacturer, country of origin, tube voltage, slice thickness and spacing was shown in Supplementary Table 1. Iopromide (300 mg I/m1, Schering Pharmaceutical Ltd) was used as the contrast agent for enhanced scanning protocol, and 80-100 ml was injected at 3-4 ml/s flow rate.
Lesion Segmentation and Radiomics Features Extraction
All enhanced CT images were manually segmented with an open-source software ITK-SNAP (http://www.itksnap.org/pmwiki/pmwiki.php) for feature extraction. DICOM data was outlined by chest radiologist ZY with 10 years of experience. 2D ROI was selected as the slice with maximum axial diameter of the tumor, and 3D ROI was segmented slice by slice on the whole volume of the lesions.
To correct variability from spatial information in three axes (x, y, z) and different CT protocols, all enrolled CT images were resampled to a same isotropic voxel spacing. Considering the distribution of our data, we resampled the 2D ROIs to 1 × 1mm2, and the 3D ROIs to 1 × 1 × 1 mm3 to balance between the loss of in-plane information and the interpolation of out-of-plane information. Afterwards, the CT radiomics features, from 2D and 3D ROIs respectively, were extracted with an open-source python platform Pyradiomics (version 2.1.2, https://pyradiomics.readthedocs.io/en/latest/#) . Features used in this study included 14 shape-based features (description of size and shape of ROI), 18 first order statistics features (distribution of voxel intensities within the image region from gray-level histogram of Hounsfield units) and 68 texture features containing the gray-level co-occurrence matrix (GLCM, 22 features), gray level run length matrix (GLRLM, 16 features), gray level size zone matrix (GLSZM, 16 features) and gray level dependence matrix (GLDM, 14 features). Besides the original images, eight filters were also generated for feature extraction, including wavelet transform filter (eight decompositions with low and high frequencies). All the categories of features other than shape originated from the original and filtered images were calculated. Therefore, in this study, a total of (18+68+14) + (18+68) *8=788 features were statistically analyzed.
To control the potential bias caused by various imaging acquisition protocols on the prediction efficacy of the model, ComBat correction method (https://github.com/Jfortin1/ComBatHarmonization) was applied to 2D and 3D ROIs, resulting in four different groups of features for comparison: (1) 2D uncorrected radiomics features, (2) 2D corrected radiomics features, (3) 3D uncorrected radiomics features, (4) 3D corrected radiomics features.
Feature selection
Feature selection was performed separately for each group of features. Three steps were applied to reduce dimensionality: (1) features with variance larger than 0.8 were included for further analysis, (2) univariate feature selection was done by ANOVA (continuous variable) or chi-square test (discrete variable) to explore the associations between features and treatment response. The features with p value>0.05 would be excluded from further analysis, (3) the most significant features were selected by the least absolute shrinkage and selection operator (LASSO) method. Since the total patient number was limited, the nonzero feature coefficients ranking the first five were selected for each group to avoid overfitting.
Prediction Models and Workflow
After feature selection, traditional machine learning algorithms, including support vector machine (SVM), k nearest neighbors, random forest, decision tree (DT), logistic regression (LR), were applied to build prediction radiomics models for each feature group. The performance of the models was compared by using 5-fold cross-validation in the validation cohort, with the best one being selected. All the patients were randomly split into 80% for training and the remaining 20% for validation, with 100 iterations. All feature selection and radiomics algorithm selection were based on the data in the training dataset to ensure independence from validation dataset.
Radiomic nomogram was built based on the multivariable logistic analysis of the selected radiomics features in the training group. Calibration curves accompanied by the Hosmer–Lemeshow test were plotted to evaluate the effectiveness of the radiomics nomogram.Decision curve analysis was conducted to determine the clinical usefulness of the radiomics nomogram by quantifying the net benefits at different threshold probabilities in the validation dataset. Flow chart of radiomics nomogram building was illustrated in Fig. 2.
Statistical Analysis
Statistical analyses were performed by using SPSS 22.0 (IBM, USA). Variables were described as frequency (n%). The chi-square test was used to compare patients' basic information between groups (responders versus non-responders) and P<0.05 was considered statistically significant. All machine learning analyses were performed by using the Python package scikit-learn (0.19.0), and statistical plots were generated by R software (3.6.1, http://www.R-project.org). Area under the Receiver-Operating Characteristic Curves (AUCs) were calculated to evaluate the performance of the algorithms for each model, and the Youden Index was used to generate the optimal threshold to convert probabilities into binarized labels. Statistical metrics, including accuracy, sensitivity, specificity, NPV (Negative Predictive Value), PPV (Positive Predictive Value) and AUC were also calculated to evaluate the performance of the ultimate selected algorithm in the training cohort and the validation cohort for the different radiomics models. Wilcoxon rank test with Bonferroni correction was applied for multiple comparisons, and p<0.0125 was considered statistically significant.