Patient Selection
Patients with newly diagnosed IDH-wildtype GBM treated with RT at Sunnybrook Health Sciences Centre, Toronto, between January 2014 and December 2018 were considered eligible for the retrospective study. The institutional ethics committee approved the study, and the requirement for informed consent were waived. Patients who had T1-weighted gadolinium contrast-enhanced (T1-CE), T2-weighted FLAIR (T2-FLAIR), and apparent diffusion coefficient (ADC) MRI sequences and completed the scheduled course of RT were considered eligible for the study. Patients with prior RT, missing survival information, missing or motion artifact corrupted images were excluded.
Radiation Planning Contour Extraction and Image Preprocessing
Patients were treated with RT and chemotherapy following surgery according to standard protocols [6, 8]. Following consultation and decision for RT, planning CT and MRI was performed for all patients. Radiation planning MRIs were acquired on GE Signa HDxT 1.5T MRI scanners (GE Medical Systems, WI, USA) or Philips Ingenia 1.5T systems (Philips Medical, WI, USA). The slice thickness for T1-CE, T2-FLAIR, and ADC was 1 mm, 2mm, and 5mm, respectively. The RT target volumes were drawn by the responsible radiation oncologists. GTV encompassed the enhancing residual disease on the T1-CE sequence and the surgical cavity. CTV included a 1.5 cm expansion beyond the GTV typically encompassing the PTR identified as hyperintense region on T2-FLAIR, duly edited from anatomical barriers like the falx, tentorium cerebelli, and bone. A planning target volume of 5 mm was used. All patients were treated on a linear accelerator device using intensity-modulated RT with image guidance. Typical dose prescriptions included 60 Gy in 30 fractions over 6 weeks in patients less than 65 years or 40 Gy in 15 fractions over 3 weeks in elderly patients or patients with poor performance status. The decision regarding concurrent and adjuvant temozolomide was taken by the responsible neuro-oncologist and radiation oncologist.
Contours for the GTV and CTV that were planned on fused CT/T1-CE images in Pinnacle (Philips Medical, WI, USA) were exported into Matlab R2018b (The Mathworks, Inc., MA, USA), where the CT-based contours were transformed to the T1-CE reference frame using the CT/T1-CE registration information. Images were skull stripped using HD-BET [9], and then the T2-FLAIR and ADC volumes were rigidly registered to the corresponding T1-CE volumes using the FMIRB Software Library (FSL) tool FLIRT [10–12]. The contours and images were resampled to a uniform resolution of 1 mm3, N4 bias field corrected [13, 14] and intensity normalized by histogram matching [15–17] to the T1-CE and T2-FLAIR images acquired on the GE scanners. Finally, the CTV contours were manually refined using ITK-SNAP [18] (http://www.itksnap.org) to omit the ventricles, prominent sulci, and the skull where applicable, and the GTV was subtracted from the CTV to generate two disjoint contours.
Feature Extraction
Radiomic features were extracted from the GTV and modified CTV for each of the three MRI sequences. Fixed bin width (FBW) quantization was used to discretize pixel intensities within each contour. To determine the FBW for each modality and contour type, the minimum and maximum intensities were measured, and the FBW was selected as the maximum width that produced bin counts greater than or equal to 30. Feature extraction was performed using PyRadiomics [19] software V2.2.0. The feature set included the following: 18 first-order statistical features; 22 gray level co-occurrence matrix features; 16 gray level size zone features; 16 gray level run length matrix features; 5 neighboring gray tone difference matrix features; and 14 gray level dependence matrix features. Ninety-one features were extracted for the GTV and CTV for each of the three MR modalities, resulting in a total of 546 radiomic features per patient. A detailed description of the features can be found on the PyRadiomics website (https://pyradiomics.readthedocs.io). A schematic of the data processing is shown in Figure 1.
Statistical Analysis
Risk Modelling
The statistical analysis was performed using R V4.0.3 [20] (http://www.R-project.org). Four independent models were considered: one including all patients and one for each of the three surgical treatment subgroups i.e., gross total resection, subtotal resection, and biopsy. Internal validation was performed using leave-one-out cross-validation. First, a patient was left out, and the radiomic features from the remaining patients were shifted and scaled to zero mean and unit standard deviation using the caret [21] package function “PreProcess”, and then the shifting/scaling parameters were applied to the test patient. Using the package glmnet [22, 23], the radiomics signatures were constructed by least absolute shrinkage and selection operator (LASSO) Cox regression. To reduce potential overfitting, the regularization weight λ was optimized to minimize the 10-fold cross-validation error on the training set. The fitted LASSO Cox model was then applied to the training and test patients and radiomics risk scores were derived by a linear combination of the features weighed by their model coefficients. The median of the training patients’ radiomics risk scores was used as a threshold to assign the test patient to either low or high risk. Using the survival [24, 25] package, Cox proportional hazard models were fit to the clinical features of the training data, which included age, GTV volume, and surgical treatment, and linear predictions were made on the test patient to produce a clinical risk score. Additionally, a combined radiomics and clinical Cox model was fit on the training data and applied to the test patient to create a combined risk score. All modeling steps were repeated from scratch for each left-out patient. A flowchart of the leave-one-out cross-validation procedure is shown in Supplementary Figure 1.
Evaluation of Predictive Accuracy
Cross-validated Kaplan-Meier curves for high/low overall survival (OS) risk were constructed using the survival package function “survfit.” The split into binary risk groups (high and low) were obtained using the major split as obtained from the algorithm. The date of surgery was considered as baseline for the survival analysis. To assign statistical significance to the log-rank tests, the permutation distributions of the log-rank statistics were obtained by randomly permuting the correspondence between radiomic features and OS and repeating the entire leave-one-out cross-validation procedure 100 times. To test the hypothesis of predicting OS using radiomic features, the proportion of permuted log-rank statistics greater than or equal to the un-permuted statistic was taken as the significance level [26]. A weighted log-rank test was used (G-rho rank test, rho = 1). The cross-validated time-dependent area under the receiver operator characteristic curves for the radiomics, clinical, and combined risk scores were generated using the package risksetROC [27], and the integrated areas under the time-dependent curves (iAUC) were evaluated with the function “risksetAUC.” To evaluate the incremental value of radiomics to clinical in OS prediction, the null hypothesis that radiomics are independent of OS and clinical variables was tested by obtaining the permutation distribution of the iAUCs. In this case, the correspondence between radiomic features and OS were randomly permuted while clinical variables were left un-permuted and the leave-one-out cross validation procedure was repeated 100 times. The difference in iAUC measures between the combined and clinical models was used as the test metric. The proportion of iAUC differences with permuted radiomic features greater than or equal to the un-permuted iAUC differences were taken as the significance level. Kendall’s τb was used to test the association between the predicted radiomic risk scores and surgical treatment. Normality of the radiomics risk scores was assessed with the Shapiro-Wilk test and a one-way ANOVA or Kurskal-Wallis test for risk scores by surgical treatment was used where appropriate. p-values < 0.05 were taken as significant.