Patient selection and study design
A total of 421 treatment-naïve patients (215 female, mean 63 ±11 years SD, range 28-91) who got whole-body [¹⁸F]FDG-PET/CT between 01/04/2012 and 30/04/2019 were included in this single-center retrospective study based on the following criteria. All patients had histologically confirmed adenocarcinoma of the lung.
Patients with severe image artifacts in PET/CT or concomitant pneumonia, atelectasis, pleural adhesions, and other cancer types were excluded. For all patients, clinical data such as age, sex, TNM stage, predominant growth pattern, degree of tumor differentiation, EGFR mutation status, smoking status and treatment strategy were gleaned from their medical records if available or inferred from primary clinical data such as medical imaging data, surgical or histological records.
Tumor stages were defined according to the American Joint Committee on Cancer Staging Manual [23]. The definition of growth patterns and tumor grading was based on the classification of lung adenocarcinomas published by the IASLC/ATS/ERS in 2011 [24]. Growth pattern risk stratification was done by defining micropapillary (MMP), solid predominant (SPA) and variants of invasive (VIA) lung adenocarcinomas as high-risk and all other growth patterns as low-risk lesions following the literature on the prognostic subgroups of lung adenocarcinomas [4–7]. All inconclusive histologic samples were subsumed in others. Histopathologic tumor grades were dichotomized into poor and moderate to high differentiation. Patients were split into four cohorts (4-year OS, 3-year OS, TG, GPR) based on the available data on the predicted endpoints (see Table 1).
The primary endpoints of this study were overall survival, which was defined as the time from PET/CT examination until death or the date of the last contact with the patient. The minimum follow-up period to ascertain 4-year and 3-year OS were 0.07; the maximum was 87.85 months (30.1 ± 20.4, 29.8 ± 21.1 median and SD months respectively). The secondary endpoints were the predictions of tumor grading and histologic growth pattern risk.
Ethical approval was obtained and the need for informed consent was waived by the medical ethics committee of the Peking University Third Hospital, Beijing, People's Republic of China.
FDG PET/CT image acquisition
All [¹⁸F]FDG-PET/CT studies were obtained on a Siemens Biograph TruePoint PET/CT system.
Prior to tracer injection, all patients had been fasting for a minimum of 6 h. Measured blood glucose values in all patients were 104 ± 16 mg/dl. Patients were intravenously injected with 5.55 MBq/kg (0.15 mCi/kg) [¹⁸F]FDG and rested recumbent in a calm environment afterwards. Sixty minutes post-injection, the scans were obtained from skull base to upper femur. PET scans were acquired in 5-7 bed positions with 2 minutes per bed position followed by deep inspiratory high-resolution CT scans at 120 kVp and 100 mAs without intravenous contrast.PET images were reconstructed iteratively using the ordered-subset expectation maximization (OSEM) algorithm (21 subsets, 3 iterations) and subsequently CT-data-based scatter and attenuation corrected (PET matrix size 168x168, CT matrix size 512x512).
Texture analysis
Initial delineation of the volume of interests (VOI) of the primary and secondary tumor lesions was performed semiautomatically using the Hybrid 3D software ver. 4.0.0 (Hermes Medical Solutions, Stockholm, Sweden) with delineation criteria set according to the PET Response Criteria in Solid Tumors (PERCIST 1.0) recommendations for target lesions at baseline [25], mediastinal blood pool as background and minimal lesion size greater or equal to 130 voxels. Lesions smaller than 130 voxels or with subthreshold uptake were defined manually based on the CT images. Subsequently, delineations were reviewed and if needed, adjusted manually according to a consensus decision of two nuclear medicine experts. Finally, kriging interpolation was used to resample all delineated VOIs to 2.0 x 2.0 x 2.0 uniform voxel resolution [26].
Following the Imaging Biomarker Standardization Initiatives (IBSI) guidelines, radiomic features with “very strong” and “strong” consensus values were extracted from each individual lesion using the Medical University Wien (MUW) Radiomic Engine (ver. 2.0), which was prior validated according to IBSI standards [27] (see Supplemental Material - Table 1a). To obtain lesion number-independent feature vectors of equal length per patient, lesions of the same patient were ordered by their volume and the distribution of each radiomic feature across the ordered lesions was determined, resulting in a distribution function for each radiomic feature. The distribution function then underwent IBSI intensity histogram feature evaluation, generating overall 2082 radiomic features. Lastly, radiomic features, number of lesions, patient characteristics and varying clinical characteristics depending on the predicted endpoint were merged to form feature vectors. (see Figure 2)
Cross-validation and machine learning scheme
Monte Carlo (MC) cross-validation, data preprocessing and ensemble machine learning was performed using the Dedicaid Automated Machine Learning platform (Dedicaid GmbH, Vienna). First a MC cross-validation scheme was applied assigning the patients in each cohort randomly into training and validation sets 100 times with a ratio of 80-20%, resulting in 100 unique folds per cohort. Equal subsampling of validation samples ensured balanced representation within the validation set. (see Supplemental Material; Sections 4YOS, 3YOS, GPR, TG). Subsequently all folds underwent preprocessing by feature range normalization, feature redundancy reduction, feature selection and sample balancing. Feature redundancy reduction was performed by covariance matrix analysis defining redundancy as an absolute Pearson correlation coefficient greater than 0.85 [28]. Feature selection followed the rules of the curse of dimensionality, where number of features were selected (S = training sample count) in each fold. To avoid class imbalance Synthetic Minority Over-sampling Technique (SMOTE) was employed to synthesize new instances of the minority class between preexisting class members [29].
Machine learning (ML) was performed by a mixed, stacked ensemble learning scheme to build predictive models of 4-year overall survival, 3-year overall survival, tumor grade and growth pattern risk (models abbreviated as M4OS M3OS, MTG, MGPR respectively). (see Supplemental Table 3a-d and 4a-d). Kaplan-Meier survival analysis was performed based on the M4OS and M3OS predictions using the Lifelines Python package.