Patients
We searched patients who underwent surgery at our hospital between March 2019 and October 2019 consecutively, and 150 patients were identified with GC. The following were inclusion criteria: (1) a pathological confirmation of GC postoperatively and (2) availability of abdominal contrast-enhanced CT within 2 weeks prior to surgery. The following were exclusion criteria: (1) a history of GC treatment before surgery (n=4); (2) no definite information on PD-L1 and PD-L1-TILs (n=2); (3) hardly visible on CT images due to the small size of the lesion (n=21); (4) insufficient distention of the stomach (n=19); and (5) poor imaging quality due to respiratory or peristaltic motion (n=3). The flow chart of patient selection is plotted in Fig. 1. Our Institutional Review Board has approved the current study, following the regulations outlined in the Declaration of Helsinki.
A total of 101 patients (male, 80; female, 21; median age, 66 years; age range, 31-86 years) conformed to the criteria.
CT image acquisition
CT examinations were performed on 64-row scanners (VCT, Discovery HD 750, GE Healthcare, and uCT 780, United Imaging). All patients were requested to fast for at least 6 h and drink 600-1000 mL warm water to distend stomach before examination. All patients were in the supine position, and the scan covered the upper or entire abdomen. The patients were trained to hold their breath during CT scans. Following the unenhanced scan, 1.5 mL/kg iodinated contrast agent (Omnipaque 350 mg I/mL, GE Healthcare) was injected intravenously at a flow rate of 3.0 mL/s using a high-pressure syringe (Medrad Stellant CT Injector System, Medrad Inc.). Imaging was achieved with a post-injection delay of 30-40 s and 70 s after initiation of contrast material injection, corresponding to the arterial and venous phases, respectively. CT scan parameters: tube voltage 100-120 kV, tube current 150-250 mA, slice thickness 5 mm, slice interval 5 mm, field of view 35-50 cm, matrix 512 × 512, rotation time 0.7 s, and pitch 1.0875.
Image analysis
Axial venous CT images of all patients were downloaded through a picture archiving and communication system and uploaded into Imaging Biomarker Explorer software. A polygonal region of interest (ROI) was manually drawn along the margin of the tumor on maximal transverse slice as illustrated in Fig. 2, carefully avoiding the normal gastric wall tissue and gastric cavity contents. ROI segmentations were performed manually by reader 1 (X.X. with 8 years of experience in abdominal imaging) who was unaware of clinicopathological information of the patients. The general location of the tumors (cardia, body, and antrum) was informed. To evaluate the interobserver reproducibility, 20 cases of CT images were randomly selected for the second ROI segmentation and feature extraction as above by reader 2 (X.X. with 8 years’ experience in abdominal imaging). In total, 744 radiomic features were generated automatically from the ROIs. The detailed explanations and formulas of radiomic features are displayed in supplementary material.
In addition, two radiologists evaluated the routine CT characteristics of each lesion with consensus as follows: (1) location (cardia, body, antrum, and diffuse); (2) morphological type (thickening type, mass type); (3) adjacent adipose tissue (clear, muddy); (4) lymphadenectasis (absent, present).
Development and performance of signatures
As depicted in Fig. 2, first, the intraclass correlation coefficient (ICC) was calculated to evaluate the interobserver variability of radiomic features extraction using “irr” package (vers. 0.84). Radiomic features with the ICC values >0.8 were regarded as highly reproducible features and initially selected. Second, the Mann-Whitney U test was used to select significantly different radiomic features between different PD-L1/PD-L1-TILs status groups, and chi-square or Fisher's exact test (n<5) was used to select significantly different morphological characteristics. Third, the least absolute shrinkage and selection operator (LASSO) was used for the dimension reduction of radiomic features and morphological characteristics. Then the optimal variables were put into our in-house software programmed with the Python Scikit-learn package (Python version 3.8, Scikit-learn version 0.22.2, http://scikit-learn.org/). The four classic algorithms including the Support Vector Machine (SVM), Naive Bayes (NB), Decision Trees (DT), and Random Forest (RF) were used to generate signatures. The ratio of the training and testing sets was 4:1. In the training phase, a popular data-preprocessing method in machine learning-Synthetic Minority Oversampling Technique was applied to handle the class imbalance problem. The models were evaluated by repeated stratified (K=5) cross-validation.
Detection of PD-L1 andPD-L1-TILs Expression Status
The PD-L1 and PD-L1-TILs expression status were measured through immunohistochemistry testing for paraffin-embedded tumor tissues in our study. The markers cytokeratin and the lymphocyte common antigen were used to differentiate tumor cells and tumor infiltrating lymphocytes. The positivity for PD-L1 and PD-L1-TILs was assessed by one pathologist using SP142 abcam staining. The expression for PD-L1 and PD-L1-TILs was scored according to tumor cell / tumor infiltrating lymphocyte proportion, which was defined as the percentage of tumor cells / tumor infiltrating lymphocytes with complete or partial membranous staining at any intensity.
Statistical analysis
The normality distribution of radiomic features was evaluated by the Shapiro-Wilk test. Based on the normality test results, the difference of them was analyzed by the Mann-Whitney U test. Besides, the difference of morphological characteristics was assessed with the chi-square or Fisher's exact test (n<5). Interobserver agreement of radiomic features was estimated with ICC (0.000-0.200: poor; 0.201-0.400: fair; 0.401-0.600: moderate; 0.601-0.800: good; 0.801-1.000: excellent). Receiver operating characteristic (ROC) analysis and the area under the ROC curve (AUC) were performed to evaluate the diagnostic performance of signatures. Precision, recall, and F1 score were calculated to assess the machine learning models. Precision is defined as true-positive results divided by the sum of false-positive and true-positive results. Recall is defined as true-positive results divided by the sum of true-positive and false-negative results. F1 score is defined as the harmonic mean of precision and recall. All those statistical analyses were performed with SPSS (version 22.0 for Microsoft Windows x64, SPSS), MedCalc Statistical Software (version 11.4.2.0 MedCalc Software bvba; http://www.medcalc.org; 2011), R software package (version 3.5.2: http:// www.Rproject.org), and Python Scikit-learn package (Python version 3.8, Scikit-learn version 0.22.2, http://scikit-learn.org/). A two-tailed p value <0.05 was considered statistically significant.