In the present study, three predictive models: SVM, random forest, and logistic regression, were constructed based on a four-gene signature to predict effective and ineffective TACE in HCC patients. The model based on the random forest algorithm had the best AUC value amongst the three models with AUC values of 100% and 85.2% in the training and testing datasets, respectively. All models showed good performance in the training and testing datasets with high accuracy and AUC values. The four-gene signature was identified by combining traditional multiple bioinformatic analysis and feature selection methods in a machine learning algorithm. Interestingly, the expression of all four genes was highly correlated with the prognosis of HCC patients (Fig. 7); in general, significantly higher expressions of LIN28B and S100A9 were related to a gradually worse prognosis of HCC. However, the significantly higher expression of IFIT1 and SPARCL1 were related to a better prognosis of HCC. It is noteworthy that the most predictive single gene for effective and ineffective TACE in HCC patients is SPARCL1, with an AUC value of 84.2% (Fig. 5F).
During the last two decades, the incidence of HCC significantly increased, but the mortality was not decreased[15]. TACE is widely performed globally as an effective treatment for inoperable HCC, but its efficacy varies greatly. At present, the six-and-twelve score is used to predict the prognosis of stratifying recommended TACE candidates[16]. However, the risk of selection bias is unavoidable in observational studies. Therefore, predictive biomarkers are urgently needed to predict the effectiveness of TACE and to outline an individualized treatment plan for HCC patients.
In this study, four genes were identified for constructing the predictive model. These genes were found to be involved in several cancer-related activities. S100 A9 belongs to the S100 family, mainly expressed in neutrophils and monocytes, and plays an important role in regulating inflammation and innate immunity[17]. Furthermore, S100A9 is up-regulated in several solid tumors, including colorectal, prostate, breast, and liver cancers[18]. Importantly, overexpression of S100A9 is positively correlated with poor differentiation, tumor invasion, metastasis, and poor clinical outcomes in these cancers, indicating its key role in mediating tumor progression.
SPARCL1 is a potential tumor suppressor gene in most tumors[19]. The downregulation of SPARCL1 is considered to be regulated by epigenetic modifications (including DNA methylation). In addition, SPARCL1 regulates cell viability, migration, invasion, cell adhesion, and drug resistance. The downregulation of SPARCL1 is associated with increased mortality in patients with liver cancer[20]. However, the related mechanism between SPARCL1 and HCC is still unclear.
The IFIT gene family consists of four genes. They perform various cellular functions by mediating protein-protein interaction and forming multi-protein complexes with cells and viral proteins through different TPR motifs[21]. IFIT protein is involved in cancer progression and metastasis. The expression of IFIT family members (IFIT1, IFIT2, IFIT3) decreased in HCC[22], IFIT1, IFIT2, and IFIT3 may be involved in the progress of HCC. IFIT1 or IFIT3 silencing can reduce the expression of IL-17 and IL-1β and reduce the migration ability of HCC cells[23].
Lin28 is a major regulator of the microRNAs let-7 family[24]. Lin28B is overexpressed in human hepatoma cells and clinical samples[25, 26] by promoting malignant transformation[30, 31], promoting tumor-associated inflammation[27, 28], reprogramming metabolism, obtaining immortality, and avoiding immune destruction[29]. Cheng et al. reported that Lin28B was associated with high tumor grade, large tumor volume, AJCC stage, clinical liver cancer stage, and recurrence in Barcelona. In addition, clinical, epidemiological studies have shown that Lin28B is related to HCC susceptibility and to the overall survival rate of patients[30, 31, 32].
However, several limitations of the current study should be considered. Firstly, our study only focused on the samples from the GSE104580 dataset. The numbers of patients in the Gene Expression Omnibus (GEO) database are relatively small. More patients and clinical information should be collected to validate the stability of the model further. Secondly, some genes might be excluded because of our rigorous screening criteria. Thirdly, our study provides evidence that four novel genes are significantly related to the survival of HCC patients; more experiments will be needed for validation or even correction and confirm the KEGG pathway analysis and GO enrichment results.
In conclusion, we constructed a four-gene predictive model by performing logistic regression analysis and 5-fold cross-validation based on datasets from GEO. The stability and accuracy were further assessed in three independent models. The proposed algorithm obtains stable results with high accuracy and low bias, superior prediction performance is achieved. Future studies suggested that genes from the predictive model are involved in several cancer-related biological processes. This predictive model has provided new insight into the prediction of the effectiveness of TACE and has potential prognostic and therapeutic implications for HCC.
To our best knowledge, this study is the first to use machine learning techniques to predict the effectiveness of TACE in patients with HCC through genes expression. TACE is currently a preferred surgical approach for patients with advanced HCC, but its treatment efficacy varies greatly, and the five-year survival rate is still low. Therefore, it is crucial to predict the therapeutic effectiveness of TACE before operation and establish personalized treatment strategies to manage the prognosis of HCC patients. Thus, our predictive model based on a four-gene signature may have good prospects in clinical practice.