We devote this work to exploring the effectiveness of the machine learning (ML) methodology of high-dimensional CT radiomics in making a prudent or educated guess of the β-catenin mutation status of HCC patients. Our results indicate that CT radiomics using different ML classifiers (the extra trees classifiers, and the CatBoost classifiers) is potentially useful for predicting HCCs irrespective of whether the β-catenin mutation exists or not.
We recall that radiomics is a medical technique that applies algorithms of data characterization to radiographic medical images for extracting a large number of features[23]. CT-based radiomics analysis had been used to predict survival of patients with metastatic colorectal cancer [24]. Radiomics could also be used to predict response of individual HER2-amplified colorectal cancer liver metastases, as well as the biomarkers of molecular subtype prognosis [25].
As radiogenomics could reveal the relationship between imaging features and genomic features [26], radiogenomics could be used to bridge imaging and genomics. Our current study may have important practical and clinical implications.
The β-catenin mutation in HCCs may promote immune escape and might affect responsiveness to therapeutic procedures [27]. The evaluation of genetic mutations of liver cancer could prove impractical if implemented for every patient. Nevertheless, the radiomic features derived from CT texture analysis might provide potential biomarkers for predicting HCCs (whether the β-catenin mutation exists or not) after the validation of such biomarkers in larger datasets. Moreover, we anticipate that new biomarkers and models could be developed through forthcoming research that might involve larger datasets and different feature selection algorithms as well as supporting ML schemes.
In our analysis, the radiomics parameters in each ML-based model were similar. We used cross validation, testing on unseen data methods to optimize the model performance. Total 18 types of classifiers had been selected during the feature selection process. Several experiments with various ML classifiers might be needed to find the best ML scheme, when less data available.
One of the well-known challenges in the field of radiomic is interpretation of the selected features in model development, even if they were validated [28]. Regarding radiomics of HCCs for identifying β-catenin mutation status [29], the selected features might represent some kinds of information that are associated with pathological stage, or differentiation grade, which are correlated with β-catenin mutation status [30].
A few limitations of this study should be addressed. First, as a retrospective study design, this study provided some inferior level of evidence. Second, ML-based classifiers might risk overfitting induced by the small and imbalanced patient population. We strived to reduce this expected overfitting problem through the application of data augmentation techniques to increase the number of the labeled samples, a truly fruitful method for overcoming overfitting in ML-based classification. Third, although 3D segmentation could represent radiomics information more effectively, we just used the largest 2D slice and its adjacent consecutive upper and lower slices for CT radiomics, because most former clinical research on HCCs had been based on a single segmentation or segmentations of a few slices. Fourth, we derived the imaging data from TCGA-LIHC on The Cancer Imaging Archive (TCIA) website, which includes patients from different centers and sources using different image acquisition protocols, just the same as in standard clinical practice. To minimize various kinds of variabilities, all image samples underwent normalization and pixel rescaling procedures as shown in the methods section. The current technique had been proved to reduce both variabilities and bias. Fifth, we used the same dataset for training, validation and testing, an action that could certainly be viewed as a bias, and hence we implemented a 10-fold cross-validation procedure to minimize such a potential bias. It is obvious that independent external datasets are needed to validate the performance of the classifiers in any further exploration. Sixth, we included the portal phase images only in the analysis since they are widely available. Further research is warranted for the unenhanced CT or arterial phase-enhanced CT. Seventh, we evaluated only the β-catenin mutation status because the corresponding patient group possessed sufficient imaging data that satisfied our criteria and attained clinical usefulness with an effective prognostic value in this study. Ninth and finally, all radiogenomic studies suffer from the same common problem, namely the possibility of some discrepancy between the data present on imaging studies and the small sample used for genomic analysis [31].
In conclusion, CT radiomics based on machine learning is shown to be a feasible and potentially successful method for predicting β-catenin mutation status in HCC patients. Due to the advantage of routine acquisition of enhanced CT images, we prudently propose that this radiogenomics approach could be used as a future clinical decision support tool in larger and prospective trials.