Identification of differentially expressed genes related to autophagy and prognosis
We first identified 58 genes that were abnormally expressed in tumors compared with normal tissues, of which 4 genes had abnormally low expression and 54 genes had abnormally high expression (Fig. 1A and B). A total of 43 prognosis-related genes were screened using univariate Cox regression analysis (Fig. 1C). The intersection of the two gene sets resulted in 23 overlapping genes (Fig. 1D and Table. S1).
The overlapping genes were all highly expressed in tumor tissues (Fig. 1E), and they were all unfavorable factors for prognosis (Fig. 1F). Among them, BIRC5, PEA15 and RAB24 were the most significant genes. CAPN10 and SPNS1 were the genes with the greatest impact on prognosis. To explore the interactions and coexpression relationships between the overlapping genes, we generated a protein interaction network. MAPK3, SQSTM1, CASP8 and HSP90AB1 were the core genes in the network and showed research potential (Fig. 1G). In the coexpression network, the lines between genes represent coexpression relationships, and red represents a positive correlation. IKBKE and PRKCD had the strongest coexpression relationship (Fig. 1H).
The established model has satisfactory predictive power
We used LASSO regression analysis and multivariate Cox regression analysis to construct prognostic models. Lasso regression analysis filtered out 12 genes(ATG10, ATIC, BIRC5, CAPN10, FKBP1A, GAPDH, HDAC1, PRKCD, RHEB, SPNS1, SQSTM1 and TMEM74) (Fig. 2A and B). These 12 genes are closely related to the patient's prognosis. Based on these 12 genes, a prognostic model can be constructed to predict the patient's survival. The risk score was calculated by multiplying the expression of each gene in the model by the corresponding coefficient and then adding the sum. The formula for calculating the risk score obtained by LASSO regression is as follows:
Risk score = EXATG10*0.265 + EXATIC*0.216 + EXBIRC5*0.065 + EXCAPN10*0.034 + EXFKBP1A*0.114 + EXGAPDH*0.019 + EXHDAC1*0.230 + EXPRKCD*0.018 + EXRHEB*0.185 + EXSPNS1*0.189 + EXSQSTM1*0.094 + EXTMEM74*0.147 (EX, expression).
The prognostic model established by multivariate Cox regression analysis included 10 genes. The results showed that ATIC, HDAC1, HSP90AB1, MAPK3, RHEB, SPNS1 and SQSTM1 were independent factors affecting prognosis (Fig. 2C). The formula for calculating the risk score from this model is as follows:
Risk score = EXATIC*0.535 + EXCASP8*-0.400 + EXFKBP1A*0.391 + EXHDAC1*0.422 + EXHSP90AB1*-0.321 + EXMAPK3*-0.563 + EXPRKCD*0.216 + EXRHEB*0.415 + EXSPNS1*0.972 + EXSQSTM1*0.252.
We use data from ICGC database to verify the model established based on TCGA data. The patients in the TCGA group and the ICGC group were divided into a high-risk group and a low-risk group according to risk score. The results show that the risk score is related to the patient's disease stage and grade (Table. 1). Furthermore, the difference in the prognosis of the two groups was calculated. The results showed that the survival difference between the high-risk group and the low-risk group was more obvious for the model established by LASSO regression analysis than for the model established by Cox regression analysis (Fig. 2D-G).
ROC risk curves were drawn and the area under the curve (AUC) values were calculated to evaluate the predictive performance of the model. For the LASSO regression model, the 1-, 2-, and 3-year AUC values for TCGA patients were 0.768, 0.714 and 0.696, respectively (Fig. 2H). The ICGC group was used to test the efficacy of the model, and the AUC values were 0.745, 0.761 and 0.739, respectively (Fig. 2I). The AUC values of the model established by multivariate Cox regression analysis were slightly lower than those of the model established by multivariate COX regression (Fig. 2J and K). Therefore, the model established by LASSO regression analysis had better predictive performance, and the model was further verified and analyzed. The distribution of risk scores among patients in the TCGA cohort was symmetrical (Fig. 2L). The distribution of patient survival statuses showed that the survival time of patients in the high-risk group was shorter than that of patients in the low-risk group (P < 0.01, Fig. 2M). When ICGC patient data were used for verification, the same distribution was observed (Fig. 2N and O). To show the clustering of risk scores more clearly, we conducted PCA and t-SNE analysis. The results showed that in both the TCGA and ICGC groups, patients in the high-risk and low-risk groups were clearly distributed in different clusters (Fig. 2P-S).
The risk score is an independent prognostic factor and is related to immune cells and immune function
We used univariate and multivariate Cox regression analyses to determine whether the risk score was an independent prognostic indicator. The results showed that in the TCGA and ICGC groups, disease stage and risk score were independent factors affecting prognosis (P < 0.001, Fig. 3A-D). We further performed an enrichment analysis of genes with differential expression (logFC > 1, adjusted P < 0.05). The results of the GO analysis showed that the genes were mainly involved in cell cycle-related functions, including nuclear division and chromatic segregation (Fig. 3E and F). The results of the KEGG pathway enrichment analysis showed that the autophagy-related genes were closely related to the cell cycle (Fig. 3G and H). In addition, we also found that the autophagy-related genes were also related to immune function in the GO analysis and KEGG analysis. Related GO terms included humoral immune response, chemokine production, IL-17 signaling pathway and cytokine activity.
The differences in immune cells and functions were quantified and analyzed. Between the high-risk group and the low-risk group in the TCGA cohort, there were significant differences in the activation degree of immune cells related to cellular immunity, including macrophages, natural killer (NK) cells, T helper 1 (Th1) cells, Th2 cells and regulatory T (Treg) cells. In terms of immune-related functions, CCR and the type II IFN response, which are also related to cellular immunity, were significantly differentially activated between the high-risk and low-risk group. In addition, immune terms related to antigen presentation, such as APC costimulation and MHC class I, were also significantly differentially enriched between the two groups (Fig. 3I and J). In the ICGC cohort, the degree of activation of macrophages and Th2 cells was obviously different between the high-risk and low-risk group, which was the same as the result in the TCGA group. The changes in immune function in the ICGC group were the same as those in the TCGA group, and significant enrichment of MHC class I and the type II IFN response was found in both cohorts (Fig. 3K and L).
For the key gene ATIC in the model, we divided patients into a high expression group and a low expression group according to the expression level of ATIC. In the TCGA cohort, the expression of ATIC was related to the infiltration of CD8 + T cells, macrophages and mast cells (Fig. 3M). ATIC was related to immune checkpoints and type I and type II IFN response immune functions (Fig. 3N). In the ICGC cohort, ATIC was still associated with CD8 + T cells and NK cells but not with macrophages (Fig. 3O). In the ICGC cohort, ATIC was related to immune checkpoints and IFN response, which is consistent with the results in TCGA (Fig. 3P).
Based on the differential genes between the high-risk group and the low-risk group, we generated a connectivity map to identify small molecules with therapeutic potential using online tool cMAP (Table S2). Among them, meclofenamic acid, UNC-0321, fananserin and icariin showed great therapeutic potential. Doxorubicin and erlotinib are used in the treatment of liver cancer. For high-risk patients, these two drugs may have a better effect (Fig. 3Q and R).
ATIC is highly expressed in liver cancer tissues and associated with a poor prognosis
The results of the multivariate Cox regression analysis showed that ATG10, ATIC, HDAC1, RHEB, SPNS1 and TMEM74 had a significant effect on prognosis. Among them, the abnormally high expression of ATIC and HDAC1 was the most obvious harmful factor affecting prognosis (P < 0.05, hazard ratio > 1). The immunohistochemical results in The Human Protein Atlas database (https://www.proteinatlas.org) verified that the expression of ATIC, HDAC1, RHEB, SPNS1 and TMEM74 in liver cancer tissues was significantly higher than that in normal liver tissues (Fig. 4A and B).
Since the hazard ratio of ATIC was greater than 1 and it had a large coefficient in the established model, we decided to functionally verify the effects of ATIC. According to follow-up and gene expression data from TCGA, high expression of ATIC was closely related to poor prognosis (P < 0.001, Fig. 4C). Compared with the normal liver cell line LO2, the HepG2 and Huh7 cell lines showed significantly higher expression of ATIC (Fig. 4D). We collected tumor samples and paracancerous samples of 52 liver cancer patients from the the Biological Repositories, Zhongnan Hospital of Wuhan University and verified that ATIC was significantly high expressed in liver cancer tissue samples (P < 0.001, Fig. 4E).
We analyzed the association between ATIC and the clinicopathological characteristics of liver cancer patients. As shown in Table 2, it was observed that ATIC expression was associated with lymph node invasion (P = 0.048) and prognosis (Fig. S1). However, the expression of ATIC was independent of age (P = 0.393), gender (P = 0.760) and portal vein tumor thrombosis (PVTT, P = 0.405) (Table 2). We also analyzed the relative risk of ATIC in the prognosis of liver cancer. The results of Cox regression analysis showed that, compared with low-expressing liver cancer patients, high ATIC expression was closely related to the poor prognosis of liver cancer patients (P = 0.002). Multivariate Cox regression analysis further confirmed that the expression of ATIC was significantly correlated with the prognosis of liver cancer (P = 0.006) (Table 3).
Table 1
Correlation between risk score and clinicopathologic characteristics of patients from TCGA and ICGC database.
characteristics | TCGA cohort | ICGC cohort |
| High risk | Low risk | χ² | High risk | Low risk | χ² |
No. of patients | 182 | 183 | | 115 | 116 | |
Age | | | 0.120 | | | 0.061 |
≥ 65 y | 67 | 82 | | 81 | 68 | |
༜65y | 115 | 101 | | 34 | 48 | |
gender | | | 0.882 | | | 0.480 |
Male | 122 | 124 | | 87 | 83 | |
Female | 60 | 59 | | 28 | 33 | |
Stage | | | 0.010 | | | 0.006 |
I | 69 | 101 | | 13 | 23 | |
II | 48 | 36 | | 45 | 60 | |
III | 51 | 32 | | 43 | 28 | |
IV | 1 | 3 | | 14 | 5 | |
Unknown | 13 | 11 | | 0 | 0 | |
T | | | 0.004 | | | |
T1 | 74 | 106 | | NA | NA | |
T2 | 54 | 37 | | NA | NA | |
T3 | 45 | 33 | | NA | NA | |
T4 | 9 | 4 | | NA | NA | |
Unknown | 0 | 3 | | NA | NA | |
N | | | 0.615 | | | |
N0 | 128 | 120 | | NA | NA | |
N1 | 2 | 2 | | NA | NA | |
Unknown | 52 | 61 | | NA | NA | |
M | | | 0.712 | | | |
M0 | 134 | 129 | | NA | NA | |
M1 | 1 | 2 | | NA | NA | |
Unknown | 47 | 52 | | NA | NA | |
Grade | | | < 0.001 | | | |
Grade 1 | 17 | 38 | | NA | NA | |
Grade 2 | 77 | 98 | | NA | NA | |
Grade 3 | 76 | 42 | | NA | NA | |
Grade 4 | 10 | 2 | | NA | NA | |
Unknown | 2 | 3 | | NA | NA | |
Table 2
Correlation between ATIC expression and clinicopathologic characteristics of liver cancer patients
Characteristics | ATIC | P/χ2 value |
Low | High |
Age(y) | 62.12 ± 10.667 | 59.77 ± 8.887 | 0.393 |
Gender | | | 0.760 |
Male | 19 | 18 | |
Female | 7 | 8 | |
HBV infection | | | 0.734 |
Yes | 21 | 20 | |
No | 5 | 6 | |
AFP | | | 0.780 |
≥ 400 | 15 | 14 | |
< 400 | 11 | 12 | |
Tumor diameter (cm) | 3.819 ± 1.828 | 4.735 ± 1.617 | 0.062 |
TNM classification | | | 0.061 |
I | 2 | 0 | |
II | 13 | 7 | |
III | 8 | 13 | |
IV | 3 | 6 | |
PVTT | | | 0.405 |
Yes | 12 | 15 | |
No | 14 | 11 | |
Lymphatic invasion | | | 0.048 |
Yes | 7 | 14 | |
No | 19 | 12 | |
PVTT, Portal vein tumor thrombosis;HBV, hepatitis B virus; AFP, Alpha-fetoprotein. |
Table 3
Univariate and multivariate Cox-regression analysis of various prognostic parameters in patients with liver cancer
| Univariate analysis | Multivariate analysis |
| P | HR | 95% CI | P | HR | 95% CI |
TNM classification | < 0.001 | 7.021 | 3.686–13.372 | 0.001 | 4.407 | 1.870-10.383 |
Tumor diameter | 0.003 | 1.386 | 1.115–1.721 | | | |
PVTT | 0.001 | 4.286 | 1.798–10.217 | 0.006 | 5.969 | 1.686–21.128 |
AFP | 0.001 | 5.048 | 1.990-12.807 | 0.029 | 4.297 | 1.157–15.967 |
ATIC | 0.002 | 1.593 | 1.186–2.139 | 0.006 | 2.099 | 1.243–3.544 |
HR, Hazard ratio; CI, confidence interval. |
We used HepG2 and Huh7 cell lines for the next experiment. After transfection, the mRNA and protein expression levels of ATIC were significantly decreased (Fig. 4F). The results of westernblot are consistent with the results of PCR, showing that ATIC is highly expressed in liver cancer tissues (Fig. 4G) and liver cancer cell lines(Fig. 4H). siRNA not only reduced the mRNA level of ATIC, but also reduced the protein level (Fig. 4I).