Patient Selection
This retrospective study was approved by the institutional research ethics review board, and the requirement of obtaining consent was waived. Adult patients (≥ 18 years of age) with IDH-wildtype GBM and LGG were identified from an institutional database from January 2014 to December 2018, with MRI data available from two identical scanners. For patients in the GBM cohort, preoperative MRI was used for analysis, usually undertaken within one week prior to the date of surgery. The LGG cohort included grade 2 astrocytoma and oligodendroglioma. Patients with high-risk or atypical features on histopathological evaluation or higher grade features on radiology, like intervening contrast-enhancing regions, significant T2W heterogeneity, were excluded. Post-operative scans were included in the study for the LGG cohort, but patients with prior radiotherapy or chemotherapy were excluded. During segmentation, the surgical cavity and tract were manually excluded. Tumors with segments smaller than 1 cm on at least one dimension were excluded from the study.
MRI Protocol
Imaging was acquired on a GE Signa HDxT 1.5T MRI scanner. using an 8-channel head coil (General Electric Medical Systems, Waukesha, WI, USA) was used. Three sequences were included in the current study: Gadolinium (Gd) (Gadovist 01 mmol/kg, 10 ml maximum, bolus) enhanced 3D T1-weighted FSPGR (T1-CE); Gd-enhanced T2 fluid-attenuated inversion-recovery PROPELLOR (T2-FLAIR); and diffusion-weighted imaging-derived apparent diffusion coefficient maps (ADC). A single-shot echo-planar imaging sequence with 3 diffusion directions and a b-value of 1000 s/mm2 was used to obtain diffusion-weighted images, and ADC maps were reconstructed online using GE’s Functool. The MRI acquisition parameters are shown in Supplementary Table 1.
Image Preprocessing and Segmentation
Figure 1 shows the schema used for the current study. The T2-FLAIR and ADC scans were first resampled to the corresponding T1-CE volume to match the field-of-view and resolution using the FMIRB Software Library (FSL) tool FLIRT. A pre-trained artificial neural network-based automated method was used for skull stripping using HD-BET. The extracted brain volumes from the T2-FLAIR scans were rigidly registered to the corresponding T1-CE volumes using FLIRT. For the ADC scans, the b=0 s/mm2 images were used for brain extraction and registration with the resulting transformations were applied to the ADC volumes. Data handling and scripting were performed in Matlab R2018b (The Mathworks, Inc., Natick, MA, USA). The T1-CE, T2-FLAIR, and ADC volumes were combined into single workspaces for each patient using the software platform ITK-SNAP (http://www.itksnap.org) for manual segmentation [11].
Segmentation was carried out manually initially by a radiation oncologist (AD), and all cases were individually reviewed by a neuroradiologist (PM) and a neuro-radiation oncologist (AS) in order to achieve final consensus. The segments in LGG included the tumor observed as T2-FLAIR hyperintensity, with the surgical cavity excluded for post-operative cases. In patients with GBM, the PTR was segmented to include the T2-FLAIR hyperintense region beyond the contrast-enhancing tumor core.
Normalization & Feature Extraction
Prior to feature extraction, z-score intensity normalization was performed for all images by centering all pixel intensities within each brain to zero and then dividing by the standard deviation. The normalized images were then multiplied by 100 and shifted by 300 to ensure that the ±3σmajority of pixel intensities were non-negative. Fixed bin width (FBW) quantization was used to discretize pixel intensities within each segment [12, 13]. The FBWs and corresponding bin counts (BC)s for the T1c images was 13 (BCmedian = 52; BCrange = 16-87); for the T2f images was 20 (BCmedian = 27; BCrange = 16-43); and for the ADC images was 7 (BCmedian = 52; BCrange = 18-124).
Feature extraction was performed using PyRadiomics software V2.2.0 The feature set included the following: 18 first-order statistical features; 22 gray level co-occurrence matrix (GLCM) features; 16 gray level size zone (GLSZM) features; 16 gray level run length matrix (GLRLM) features; 5 neighboring gray tone difference matrix (NGTDM) features; and 14 gray level dependence matrix (GLDM) features. Additional features were extracted by pre-processing images, either wavelet or Laplacian of Gaussian (LoG). LoG features were extracted with kernel sizes of 1, 2, 3, 4, and 5 mm. All features were extracted from the segments in 3D. Pixel intensities outside the segments were set to zero prior to image filtration to reduce the effect of contamination. 91 features were derived from unfiltered images, 728 from wavelet filtered images and 455 from LoG filtered images, resulting in 1274 features per modality and a total of 3822 features. A detailed description of the features can be found on the Pyradiomics website (https://pyradiomics.readthedocs.io/en/latest/features.html).
Feature Selection & Classification
All model building steps were performed in python using scikit-learn V0.22.2 [14]. Three feature selection approaches were utilized. Two were filter-based methods: ANOVA F-Test and minimum redundancy maximum relevance (mRMR). Recursive feature elimination (RFE), a wrapper-based approach, was also tested. A linear support vector machine (SVM) classifier (regularization parameter C = 1) was used as the base learner, and at each iteration, the 5 least important features were eliminated from the total set until a pre-determined number of features was returned. For each feature selection method, sets of the top 4 features were included. This threshold was chosen based on the commonly employed rule-of-thumb that models should be trained on datasets that have at least 10 times as many training samples per class as the number of features to mitigate overfitting [15]. In order to demonstrate the impact of feature selection on classification performance, a fourth strategy used no feature reduction i.e. models were trained on all available features.
In order to prevent data leakage that can be present when the same samples used for feature selection are again used for model validation, a leave-one-patient-out (LOPO) cross-validation approach was utilized. This strategy precludes the identification of an “optimal” feature set, as different features can be selected as the training fold is permuted; however, it reduces the optimistic bias that can impact radiomics studies that make use of internal validation schemes. In lieu of identifying an “optimal” feature set, this approach can provide descriptive statistics of the selection frequency of different feature types and allow a degree of stability assessment with respect to feature selection. Features from the training data were scaled to zero mean and unit standard deviation at each LOPO iteration and the learned scaling parameters were applied to the features of the test patient.
Four machine learning classifiers were investigated in this study: support vector machine with a linear kernel (SVM); K-nearest neighbors (K-NN); linear discriminant analysis (LDA); and adaptive boosting using decision stumps as the base learner (AdaBoost). Hyperparameter selection was repeated at each LOPO cross-validation iteration through grid-search with nested 5-fold cross-validation to maximize balanced accuracy across the nested cross-validation procedure. For SVM, the range of the regularization parameter C was 10-4 to 105 in multiples of 10; for K-NN, the range for the number of neighbors was from 1 to 11 in steps of 1; and for AdaBoost, the range for the number of trees was from 50 to 450 in steps of 100. All other tunable model hyperparameters were left as their default values assigned by scikit-learn (https://scikit-learn.org/0.22/). LOPO model performance was quantified by accuracy, sensitivity, specificity, and the area under the receiver operator characteristic curve (ROC AUC).