Literature search
A systematic online search using electronic databases was conducted by two independent medical doctors (Y.H.D. and P.C.S.). If there was an inconsistent selection and lack of agreement, another medical doctor (C.H.L.) made the final judgment and decision. The three electronic databases MEDLINE (PubMed), EMBASE, and LILACS were searched for studies dating from January 1993 to December 2018. The Medical Subject Headings terms of stereotactic body radiotherapy, SBRT, hepatocellular carcinoma, and HCC were used. The search was limited to humans, the English language, and patient numbers > 100.
Patients for Clinical Evaluation and Radiomic Analysis
Clinical data of patients with HCC who received SBRT at our institution between 2007 and 2016 were retrospectively reviewed. For radiomic analysis, patients who underwent locoregional ablative treatments in the past were excluded. Enrolled patients had undergone dynamic CT before radiotherapy and follow-up CT to evaluate responses. Follow-up time was obtained from the date of treatment to the last outpatient visit, with a median follow-up time of 26.6 months. Because of the retrospective nature of this study, we obtained approval from our Institutional Review Board for a waiver of informed consent (IRB number: 1-107-05-016, analysis of treatment outcome of patients with hepatic tumors).
SBRT and clinical endpoints
All patients were treated with SBRT using the Cyberknife image-guided radiosurgery system (Accuray, Sunnyvale, CA) as described previously [11-13]. The modified Response Evaluation Criteria in Solid Tumors (mRECIST) system was used to assess therapeutic responses [14]. Tumors achieving complete and partial responses, as per mRECIST criteria, were considered as having local control (LC). The LC that occurred during the first year after treatment was considered as a 1-year LC or within 1-year response (W1R) in our study. End response was defined as a complete/partial response achieved by the end of the follow-up period (EndR). Tumors within the radiotherapy field that did not show increase in size at the end of the follow-up period were considered to have attained in-field failure-free (IFFF). EndR and IFFF were used to evaluate two different aspects of clinical response, with the former aiming to identify the final responders and the latter to address progressive events.
Extraction of radiomic features
Complete dynamic, multiphasic CT scans with contrast were retrieved individually in the Digital Imaging and Communications in Medicine (DICOM) format. All scans were performed with a tube voltage of 120 kVp and a pitch of 0.984. The images were then imported to the Computational Environment for Radiotherapy Research (CERR) platform based on MATLAB (MathWorks, Naticks, MA, 2017b) for radiation planning analysis. For each contrast-phase CT scan (arterial (A), portovenous (E), and delayed (D)), tumor regions were contoured by two experienced radiation oncologists (W.Y.H. and C.H.L.) and modified by another experienced radiologist (W.C.C.) to minimize delineation uncertainty. Each slice was resized to 128 × 128 voxels using bicubic interpolation (0.8301 × 0.8301 × 3 mm3 for each voxel). The intensity distribution was standardized by histogram equalization. To reduce image noise and increase the sensitivity of radiomic analyses, a bin width of 25 Hounsfield units was used for discretization prior to global texture analysis [15]. Radiomic features were subsequently extracted and divided into 4 categories: first-order statistics, shape, global, and texture. For matrix-based texture features, different combinations of quantization (equal probability (Ep) and uniform quantization (U)) of gray levels (8, 16, 32, and 64) were performed [16]. Incorporating CT scans from each phase, the training datasets combined features from different CT phases, quantization methods, and gray levels, and were termed AEp (8, 16, 32, and 64), AU (8, 16, 32, and 64), EEp (8, 16, 32, and 64), EU (8, 16, 32, and 64), DEp (8, 16, 32, and 64), and DU (8, 16, 32, and 64). Altogether, there were 41 tumors identified and 46 radiomic features prior further processing.
Machine Learning Characterization
Data augmentation and adjustment of imbalanced data
Oversampling with bootstrapping was used to expand our data population to 5000 samples. Synthetic Minority Oversampling Technique (SMOTE) was then used to adjust the imbalanced data. For the minority class, SMOTE is advantageous for making the decision region more general and improving the classifier performance [17].
All 41 identified tumors were used for radiomic analyses. 78% (32/41) of the initial samples were extracted randomly and oversampled for training. The remaining 22% (9/41) was oversampled in a similar way for subsequent testing. The final cohort comprised 5201 training and 1301 testing samples, both of which were then balanced by SMOTE for positive and negative LRs.
Feature selection
The radiomic features extracted from the contoured region of each tumor were averaged and subsequently normalized across the cohort. To select the radiomic features, a regression method used to improve prediction accuracy by incorporating penalized estimation functions known as Least Absolute Shrinkage and Selection Operator (LASSO) was used [18]. LASSO started feature selection by tuning a parameter (λ). During this process, most covariate coefficients were shrunk to zero, and the remaining features with non-zero coefficients were selected. For each training set, which included the training features with the corresponding training responses (i.e. W1R, ENdR, and IFFF), the features selected by LASSO were used to build the classifiers.
Support Vector Machine (SVM) and Logistic Regression (LRG) classifiers
With high prediction accuracy in various clinical settings [19, 20], SVM and LRG were adopted in our study to construct classifiers for LR. The SVM classifier deals with non-linear interaction and was used to discriminate whether a LR was achieved or not [21]. The penalty parameter C was set to 1 to determine the tradeoff between fitting error and model complexity. Radial basis function was used as the kernel function in our SVM classifier. The LRG was used to predict the likelihood of positive LR, and a probability equal to 0.5 was set as the minimum threshold to determine the predicted class.
Evaluation of model performance
The model performance of SVM and LRG classifiers were evaluated for accuracy, sensitivity, and specificity after 10-fold cross-validation in the training cohorts. F1 score was used to evaluate the model robustness in the testing cohorts. The classifier with the highest mean F1 scores for the three LRs was chosen as the candidate model. Receiver operating characteristic (ROC) curves were then used to assess the output quality with area under curve (AUC) and 5-fold cross-validation. All the functions used in our analyses were based on Python.
Statistical analysis
For the evaluation of radiomic data processing methods, the 50th percentile of accuracy was used as the threshold. The numbers of processing methods with an accuracy above the 50th percentile were identified and used for comparison of parameter robustness. Kaplan-Meier (KM) analysis with log-rank test was used to evaluate the effect of therapeutic response on survival. P value < 0.05 was considered significant. All statistical analyses and survival calculation were performed in R.