1. Six ARGs and five autophagy-related lncRNAs were screened out from the training cohort
There were 507 ccRCC patients with raw expression data and corresponding clinical follow-up information extracted from TCGA project in the present study which were randomly divided into a training dataset (n = 255) and a testing dataset (n = 252). A total of 222 ARGs were integrated from HADb database. Considered as the criteria of FDR < 0.05 and |log2(FC)|>1, 45 differentially expressed genes were finally obtained between tumor and non-tumor tissues including 36 up-regulated and 9 down-regulated ARGs (Fig. 2). In addition, we identified 467 autophagy-related lncRNAs by co-expression networks (corFilter = 0.6, P < 0.001). Univariate Cox regression analyses were utilized to single out the prognostic ARGs and lncRNAs in the training group, resulting in 12 ARGs (P < 0.001) and 79 autophagy-related lncRNAs (P < 0.001) associated with the ccRCC patients’ overall survival. Lasso regression analyses of those screened genes (Fig. 3A, B, log(lambda.min)= -2.9642) and lncRNAs (Fig. 3C, D, log(lambda.min)= -2.4841) above were performed for further multivariate Cox regression model to avoid overfitting the prognostic problems. Six ARGs (EIF4EBP1, IFNG, BID, BIRC5, CX3CL1 and RAB24, Fig. 4A) and five autophagy-related lncRNAs (AC093278.2, AC010326.3, AC099850.3, AC016773.1 and AC009549.1, Fig. 4B) were finally identified to retain their prognostic significance (Table 1, Table 2). There were five ARGs (EIF4EBP1, IFNG, BID, BIRC5, and RAB24) and four autophagy-related lncRNAs (AC010326.3, AC099850.3, AC016773.1 and AC009549.1) associated with shorter survival which were regarded as prognostic risky factors whereas the remaining one ARG (CX3CL1) and one autophagy-related lncRNA (AC093278.2) with negative coefficient of multivariate Cox regression analyses may be protective factors due to the close association between their high expression and longer patients’ survival.
Table 1
Six ARGs and five autophagy-related lncRNAs were selected by multivariate Cox regression
Type | Gene name | Coefficient | HR | HR.95L | HR.95H | P value |
gene | EIF4EBP1 | 0.006 | 1.006 | 0.999 | 1.013 | 0.114 |
| IFNG | 0.119 | 1.126 | 1.002 | 1.266 | 0.047 |
| BID | 0.113 | 1.119 | 1.013 | 1.238 | 0.027 |
| BIRC5 | 0.093 | 1.097 | 1.016 | 1.184 | 0.018 |
| CX3CL1 | -0.012 | 0.988 | 0.974 | 1.002 | 0.100 |
| RAB24 | 0.049 | 1.051 | 0.999 | 1.104 | 0.054 |
lncRNA | AC093278.2 | -0.123 | 0.884 | 0.801 | 0.976 | 0.014 |
| AC010326.3 | 0.101 | 1.107 | 0.966 | 1.268 | 0.143 |
| AC099850.3 | 0.101 | 1.107 | 1.051 | 1.165 | 0.000 |
| AC016773.1 | 0.340 | 1.406 | 0.924 | 2.137 | 0.111 |
| AC009549.1 | 0.079 | 1.083 | 0.991 | 1.183 | 0.079 |
HR, hazard ratio; lncRNAs, long non‑coding RNAs |
Table 2
List of the six ARGs and five autophagy-related lncRNAs associated with prognosis in ccRCC patient
ID | Gene name | Protein name | Location | Positive correlation | Negative correlation |
1978 | EIF4EBP1 | Eukaryotic translation initiation factor 4E-binding protein 1 | Chr8 | IGFBP4 VKORC1 PPP1R14BP3 | RBL2 TSTD2 INIP |
3458 | IFNG | Interferon gamma | Chr12 | - | - |
637 | BID | BH3-interacting domain death agonist | Chr22 | TMSB10 ARPC3 KIAA0930 PDCD5 RGS19 | WLS DCAF11 L2HGDH DYNLL2 TSPYL1 |
332 | BIRC5 | Baculoviral IAP repeat-containing protein 5 | Chr17 | - | - |
6376 | CX3CL1 | Fractalkine | Chr16 | - | - |
53917 | RAB24 | Ras-related protein Rab-24 | Chr5 | FAM193B APBB3 ZNF692 SPPL2B ANKZF1 | DPH3 EIF4G2 CLTC TM9SF2 YWHAB |
ENSG00000261269 | AC093278.2 | - | Chr5 | LDB2 GIMAP8 | PPIF ATP6V0E2 |
ENSG00000269867 | AC010326.3 | - | Chr19 | - | - |
ENSG00000265303 | AC099850.3 | - | Chr17 | - | - |
ENSG00000207009 | AC016773.1 | - | Chr4 | TEPSIN GSDMB CHKB NOL12 | PCYOX1 CHMP3 SERINC1 CCDC47 |
ENSG00000270607 | AC009549.1 | - | Chr11 | - | - |
Chr, Chromosome; lncRNAs, long non‑coding RNAs |
2. Construction of a six-gene prognostic risk model and a five-lncRNA prognostic risk model in the training cohort
Two prognostic risk models were respectively constructed by integrating the expression values of the genes and lncRNAs selected above and corresponding estimated regression coefficient derived from multivariate Cox regression analysis. The formula of risk score for each patient was calculated as follows(six-gene prognostic risk model): risk score = (0.0057 × expression level of EIF4EBP1) + (0.1189 × expression level of IFNG) + (0.1128 × expression level of BID) + (0.0925 × expression level of BIRC5) + (-0.012 × expression level of CX3CL1) + (0.0493 × expression level of RAB24). The five-lncRNA prognostic risk model were built as follows: risk score = (-0.1233 × expression level of AC093278.2) + (0.1014 × expression level of AC010326.3) + (0.1012 × expression level of AC099850.3) + (0.3404 × expression level of AC016773.1) + (0.0794 × expression level of AC009549.1). All the 507 ccRCC patients in the training dataset and the testing dataset for two models were divided into high-risk and low-risk groups according to the median values of the OS-related prediction model, respectively.
3. A comparison was made of the predictive ability between the two prognostic risk model for training dataset and testing dataset
To compare the performance of prognostic risk model in predicting the clinical outcome of ccRCC patients, the K-M survival curves were plotted to analyze the different survival time between the high-risk and low-risk groups. K-M survival analyses indicated that low-risk group (5-year survival rate 83.1%) had a lower mortality rate than high-risk group (5-year survival rate 52.9%) in the six-gene prognostic risk model (Fig. 5A, P = 4.138e-07). Patients in the high-risk group had significantly shorter survival (5-year survival rate 49.4%) than those in the low-risk group (5-year survival rate 85.9%) in the five-lncRNA prognostic risk model (Fig. 6A, P = 4.546e-09). ROC curves analyses were carried out to show prognostic effectiveness for predicting the 3-year and 5-year survival. The 3- and 5-year of AUC for training dataset in the six-gene prognostic risk model were respectively 0.708 and 0.73.8 (Fig. 5C, D), while those in the five-lncRNA prognostic risk model were 0.740 and 0.757 (Fig. 6C, D). The preliminary analyses demonstrated the five-lncRNA model had a better competitive performance than the another in the training dataset.
With the increase in the risk scores, Fig. 5G and Fig. 6G showed the distribution of the risk score, the number of patients in two risk groups and prognostic expression profiles in the training dataset. The expression levels of ARGs (EIF4EBP1, IFNG, BID, BIRC5, and RAB24) and lncRNAs (AC010326.3, AC099850.3, AC016773.1 and AC009549.1) were up-regulated and the expression levels of remaining ARG (CX3CL1) and lncRNA (AC093278.2) were down-regulated.
To further validate the prognostic power of the two models, patients in the testing dataset were marked as a high-risk group and a low-risk group by the same cutoff derived from the training dataset. In consistent with the findings in the training dataset, K–M survival analysis (Fig. 5B, 6B) and ROC curves analysis (Fig. 5E, F; 6E, F) compared with the training dataset were summarized in Table 3. The heatmap of expression profiles, the risk scores and distribution and patients’ survival status in the TCGA database were also performed in testing dataset (Fig. 5H, 6H). The partial analyses demonstrated the six-gene model had a better competitive performance than the latter in the testing dataset.
Table 3
Predictive performances of two prognosis prediction models based on six ARGs and five autophagy-related lncRNAs
Type | Gene model | | | lncRNA model | | |
| K-M analysis P value | AUC (3 year) | AUC (5 year) | K-M analysis P value | AUC (3 year) | AUC (5 year) |
Training | 4.138e-07 | 0.708 | 0.738 | 4.564e-09 | 0.74 | 0.757 |
dataset | | | | | | |
Testing | 1.125e-03 | 0.684 | 0.701 | 2.485e-03 | 0.67 | 0.693 |
dataset | | | | | | |
K-M, Kaplan-Meier; lncRNAs, long non‑coding RNAs; AUC, area under curve |
4. Comparison of the relationships between clinical parameters and two prognosis‑related prediction model
Age, gender, grade, tumor stage, T-staging and M-staging were combined with models to evaluate whether the prognostic power of the two models were independent of other clinical characters.
The univariate Cox regression analyses demonstrated six-gene prognostic risk model (HR = 1.019, 95% CI = 1.010–1.029, Fig. 7A) and five-lncRNA prognostic risk model (HR = 1.022, 95% CI = 1.011–1.032, Fig. 7B) were all available independent prognostic indicator (P < 0.001). Six-gene prognostic risk model (P = 0.002, HR = 1.014, 95% CI = 1.005–1.023, Fig. 7C) had similar prognostic efficiency with another model (P = 0.002, HR = 1.016, 95% CI = 1.006–1.027, Fig. 7D) after adjustment for the same clinical variables by multivariate Cox regression analyses. In addition, time-dependent ROC curves were plotted and calculated by clinical characters. The results revealed six-gene prognostic risk model (Fig. 7E) had a higher AUC of 0.765 than the five-lncRNA prognostic risk model (Fig. 7F) at an AUC of 0.759. All results above demonstrated a similar competitive performance of the two prognostic risk model for survival prediction.
5. Comparison of the different status between high-risk and low-risk in two prognostic risk model
In order to further explore these differences in statuses, PCA were performed for all genes extracted from TCGA, ARGs and autophagy-related lncRNAs and six genes and five lncRNAs selected finally under the analyses of two models, respectively. Figure 8 indicated the divider line between the red dots and the green dots became increasingly obvious with the small number of candidate genes selected. Four typical autophagy-related gene sets were employed for gene functional enrichment analysis to identify the low-risk group is enrich in autophagy (P < 0.01) except for ‘kegg regulation of autophagy’ in the lncRNA model (P = 0.4). The higher the normalized enrichment score (NES) score is obtained, the greater the enrichment of the gene set. The results showed that the two gene sets (go positive regulation of autophagy and kegg regulation of autophagy) had a greater enrichment in the low-risk group for six-gene prognostic risk model while another two gene sets (go selective autophagy and go regulation of autophagy) indicated a better enrichment in the five-lncRNA prognostic risk model (Fig. 9).
6. Further functional analysis of the six-gene prognostic risk model and five-lncRNA prognostic risk model
To explore the primary mechanism of the gene model, GO enrichment and KEGG analysis were performed according to the differentially expressed ARGs. The ARGs were found significantly related to top GO biological processes according to the results of DAVID, including regulation of endopeptidase activity, regulation of peptidase activity, autophagy and process utilizing autophagic mechanism (Fig. 10, P < 0.05). The heatmap of the relationship between ARGs and GO enrichment analysis was also displayed. Moreover, 5 KEGG signaling pathways functionally involved ARGs, consisting of human cytomegalovirus infection, shigellosis, autophagy-animal, HIF-1 signaling pathway and Kaposi sarcoma-associated herpesvirus infection (Fig. 11, P < 0.05). In addition, the expression correlation between prognostic ARGs and lncRNAs and all genes extracted from ccRCC in TCGA were examined. Correlation circle maps were plotted based on genes which had both positive correlation and negative correlation genes (Fig. 12, Table 2). Color intensity is proportional to the correlation coefficients. These results suggested that the two models might affect autophagy-related biological processes and pathways involved in ccRCC.