3.1 Identification of methylation-driven genes (MDGs) in RCC
We collected 485 samples with 450K methylation data from TCGA cohort, including 319 ccRCC samples and 166 precancerous samples. Transcriptome files were obtained from 611 samples, including 541 tumors and 70 normal samples. We then extracted and normalized the methylation and gene expression information of ccRCC using the “limma” package. MethyMix was utilized to conduct correlation analysis, where a mixed model was constructed and Wilcoxon rank test was calculated for identifying differential methylation with |logFC| > 0, P༜0.05, |Cor| > 0.3. A total of 175 driven genes were selected via the screening procedure (Additional Table S1). We illustrated the top hypermethylated and top hypomethylated genes in Fig. 1.
3.2 Functional Pathway Analysis Of 175 Mdgs
Firstly, we exhibited the clustering heatmap of 175 MDGs in tumors and normal tissues using “pheatmap” package in Fig. 2A. To further investigate the molecular mechanisms and underlying biological processes that the drivers may participate in, we conducted the functional pathway analysis based on ConsensusPathDB database. The results revealed that these 175 driven genes were mainly involved in oxygen metabolism process, one carbon metabolism, inflammatory response pathway, cell communication, as well as glucose metabolism pathway (Fig. 2B, Table 2).
Table 1
Clinical characteristics of all 618 ccRCC patients included in this study.
Variables | TCGA | ICGC | | (N = 537) | (N = 91) | Age (Mean ± SD) | 60.59 ± 12.14 | 60.47 ± 9.97 | Follow-up (y) | 3.12 ± 2.23 | 4.14 ± 1.73 | Status | | | | Alive | | 367 (68.34) | 61 (67.03) | Dead | | 170 (31.66) | 30 (32.97) | Gender | | | | Male | | 346 (64.43) | 52 (57.14) | Female | | 191 (35.57) | 39 (42.86) | AJCC-T | | | | T1 | | 275 (51.21) | 54 (59.34) | T2 | | 69 (12.85) | 13 (14.28) | T3 | | 182 (33.89) | 22 (24.18) | T4 | | 11 (2.05) | 2 (2.20) | AJCC-N | | | | N0 | | 240 (44.69) | 79 (86.81) | N1 | | 17 (3.17) | 2 (2.20) | Unknow | | 280 (52.14) | 10 (10.99) | AJCC-M | | | | M0 | | 426 (79.33) | 81 (89.01) | M1 | | 79 (14.71) | 9 (9.89) | Unknow | | 32 (5.96) | 1 (1.10) | Pathological stage | | | | I | | 269 (50.09) | - | II | | 57 (10.61) | - | III | | 125 (23.28) | - | IV | | 83 (15.46) | - | Unknow | | 3 (0.56) | - | Grade | | | | G1 | | 14 (2.61) | - | G2 | | 230 (42.83) | - | G3 | | 207 (38.54) | - | G4 | | 78 (14.53) | - | Unknow | | 8 (1.49) | - | |
Data are shown as n (%). |
Table 2
Functional analysis from ConsensusPathDB database of methylated-driven genes
Enriched pathway | p-value | q-value | source | external_id |
Biological oxidations | 0.006187155 | 0.0783234 | Reactome | R-HSA-211859 |
One Carbon Metabolism | 0.00238886 | 0.0783234 | Wikipathways | WP241 |
Inflammatory Response Pathway | 0.003841911 | 0.0783234 | Wikipathways | WP453 |
Tight junction | 0.005985322 | 0.0783234 | KEGG | path:hsa04530 |
glucose Metabolism | 0.000273727 | 0.02965375 | Wikipathways | WP1495 |
Cell-Cell communication | 0.006663174 | 0.0783234 | Reactome | R-HSA-1500931 |
Cell-cell junction organization | 0.003138643 | 0.0783234 | Reactome | R-HSA-421270 |
Beta3 integrin cell surface interactions | 0.008638311 | 0.0783234 | PID | integrin3_pathway |
Generation of second messenger molecules | 0.007099411 | 0.0783234 | Reactome | R-HSA-202433 |
Tight junction interactions | 0.000184328 | 0.02965375 | Reactome | R-HSA-420029 |
Trans-sulfuration and one carbon metabolism | 0.003209727 | 0.0783234 | Wikipathways | WP2525 |
Glutathione synthesis and recycling | 0.007746806 | 0.0783234 | Reactome | R-HSA-174403 |
Nicotinate and Nicotinamide Metabolism | 0.007746806 | 0.0783234 | SMPDB | SMP00048 |
γ-glutamyl cycle | 0.005689606 | 0.0783234 | HumanCyc | PWY-4041 |
O2/CO2 exchange in erythrocytes | 0.006681952 | 0.0783234 | Reactome | R-HSA-1480926 |
sarcosine oncometabolite pathway | 0.006681952 | 0.0783234 | SMPDB | SMP02313 |
IRF3-mediated induction of type I IFN | 0.006681952 | 0.0783234 | Reactome | R-HSA-3270619 |
Nicotinamide salvaging | 0.005689606 | 0.0783234 | Reactome | R-HSA-197264 |
folate transformations | 0.004771254 | 0.0783234 | HumanCyc | PWY-2201 |
superpathway of choline degradation to L-serine | 0.001868254 | 0.0783234 | HumanCyc | PWY66-414 |
glycine betaine degradation | 0.000900954 | 0.073202476 | HumanCyc | PWY-3661-1 |
3.3 Construction of a prognostic MDGs model and molecular-classification of patients based on the signature
Since we have extracted the 175 drivers in 485 samples, we ultilized univariate Cox regression method to select 52 prognostic epi-drivers that were significantly associated with OS (P༜0.01). The stepwise selection methods in multivariate Cox model revealed that 38 MDGs that were screened out with the most robust combinations. To further validate the MDGs and simplify variables, we integrated 450K methylation data and transcriptome profiles to conduct correlation analysis in GSE61441, GSE105260 and GSE105261. With thresholds of FDR༜0.05 and |Fold Change|≥1 to deal with methylation data, we detected a list of 59 hypomethylated and upregulated genes by overlapping 145 hypomethylated genes (1049 in GSE61441 and 1102 in GSE105260) and 619 upregulated genes in GSE105261. Meanwhile, a total of 155 hypermethylated and downregulated genes were screened out by overlapping 156 hypermethylated genes (734 in GSE61441 and 897 in GSE105260) and 639 downregulated genes in GSE105261. Therefore, we finally identified 5 hub prognostic MDGs (TAGLN2, PDK2, HHLA2, HOXA2, XAF1) by integrating across three GSE series and TCGA cohort (Fig. 4). Besides, we constructed a representative volcano plot for GSE105261 in Fig. 4C. We conducted multivariate Cox regression model to obtain respective coefficients (βi ), where βi represented the weight of each variables. Then, five genes can be calculated as a prognostic index with the following formula: (0.64916 * expression value of TAGLN2–0.17096 * expression value of PDK2–0.18242 * expression value of HHLA2 + 0.64882 * expression value of HOXA2 + 0.43195 * expression value of XAF1). With the median risk value as the cutoff, 319 ccRCC in the methylated data samples that matched the clinical information were divided into low- (n = 159) and high-risk group (n = 158). The distribution of vital status of ccRCC patients in two groups were shown in Fig. 4A-B, and the heatmap of the 5 MDGs was exhibited in Fig. 4C. To assess the predictive value of the risk signature, ROC curve was drawn and the AUC was 0.713 for 3-year OS prediction (Fig. 4D). Kaplan-Meier survival curve analysis of patients revealed that the prognosis was poor in high-risk group, and the difference across the two groups showed statistically significant with log-rank test of P༜0.001 (Fig. 4E). Furthermore, we ultilized another ICGC-RCC cohort as a validation dataset and we found the AUC was 0.769 (Fig. 4F). Patients with high risk suffered from worse clinical outcomes compared with that in low-risk group (Fig. 4J). We thus summarized that the risk model based on 5 MDGs signature has certain accuracy in assessing the prognosis of ccRCC patients.
Given that the 5 MDGs signature was proved to be well associated with prognosis, we considered whether the 5 genes could be ultilized as the molecular markers for guiding classifications of patients. We totally collected 618 patients from the TCGA-KIRC and ICGC-RCC cohort and normalized the sequencing data for following analysis. Besides, we used the ConsensusClusterPlus package and iterated 1000 times (K = 1: 10) to stabilize the classification categories, obtaining the optimized classification of the samples (Fig. 5A and 5B). We observed that the classifications reached the best stable status with k = 5 in Fig. 5C. Kaplan-Meier analysis suggested that MDGcluster classifications correlated significantly with survival outcomes when k = 5, among which the MDGcluster4 (N = 35) has the best prognosis, and MDGcluster5 (N = 105) has the worst prognosis (Fig. 5D). Sankey diagram also illustrated the underlying associations among MDGcluster, risk score levels and vital outcomes (Fig. 5E). Given the differential survival outcomes between MDGcluster4 and MDGcluster5, we intended to figure out the underlying mechanisms between two groups. We performed the differential analysis between MDGcluster4 and MDGcluster5 to identify totally 162 DEGs with FDR < 0.001 (Table S2). Meanwhile, GO functional analysis also revealed that the top 400 genes were mainly enriched in mitochondrion organization, notch developmental processes and cell cycle biological items (Fig. 5F). The GSEA analysis also indicated that the notch signaling pathway, wnt signaling pathway and EMT pathway were significantly up-regulated in MDGcluster5 than MDGcluster4, partially contributing to the poor prognosis of MDGcluster5 (Fig. 5J).
3.4 Establishment Of A Comprehensive Nomogram
To figure out that whether the MDGs risk score could be ultilized as an independent predictor, we combined with other clinical risk variables associated with prognosis based on the univariate Cox regression results, including age, tumor grade, pharmaceutical therapy, as well as pathological stages. Besides, we conducted the multivariate Cox regression method to integrate clinical features with the MDGs signature. Based on the multi-variate analysis consisting of the above variables, the risk score maintained a significant and independent factor for OS prediction (P = 0.001122) in the TCGA cohort (Table 3). Hence, a MDGs-nomogram was accordingly constructed to predict the 1-, 3-, 5-year OS and the calibration graphs for the 3-, 5-year OS rate revealed the well curve fitting between observed outcomes and predicted survival (Fig. 6A-B). In addition, our integrated MDGs-nomogram showed the superior predictive efficiency (AUC of 3-year: 0.842, AUC of 5-year: 0.862), compared with traditional independent variables such as TNM stage (AUC of 3-year: 0.759, AUC of 5-year: 0.717) in Fig. 6D.
Table 3
Univariate and multivariate Cox regression analysis of clinical characteristics for 319 ccRCC patients.
Hazard ratio (95%CI) P value Hazard ratio (95%CI) P value |
Age, years | | | | |
< 50 50–59 60–69 70–79 ≥ 80 | 1.086 (0.5079–2.321) 1.849 (0.911–3.753) 2.170 (1.054–4.469) 6.372 (2.743–14.8) | 0.832 0.089 0.036 1.65e-05 | 1.2448 (0.578–2.682) 1.793 (0.873–3.683) 2.380 (1.142–4.959) 4.961 (2.095–11.746) | 0.576 0.112 0.021 0.000271 |
Gender Female Male | 1.073 (0.711–1.618) | 0.738 | | |
Race Black or Africa Asian White | 4.020 (0.507–31.853) 1.373 (0.691–21.727) | 0.188 0.366 | | |
Radiation Yes No | 1.266 (0.311–5.137) | 0.742 | | |
Pharmaceutical No Yes | 0.1894 (0.128–0.2803) | <2e-16 | 0.361 (0.232–0.563) | 7.03e-06 |
Tumor grade Low High | 3.3335 (2.063–5.387) | 8.78e-07 | 1.686 (1.004–2.829) | 0.048 |
Pathlogical stage Stage I&II Stage III&IV | 4.417 (2.873–6.789) | 1.27e-11 | 2.718 (1.686–4.380) | 4.04e-05 |
AJCC-T T1&2 T3&4 | 3.08 (2.072–4.58) | 2.72e-08 | | |
AJCC-N N0 N1&Nx | 1.043 (0.705–1.543) | 0.835 | | |
AJCC-M M0 M1&Mx | 3.312 (2.229–4.921) | 3.11e-09 | | |
Risk score Low High | 3.285 (2.137–5.049) | 5.85e-08 | 2.1927 (1.367–3.517) | 0.001122 |
Abbreviations: HR, hazard ratio; 95%CI, 95% confidence interval; AJCC, American Joint Committee on Cancer. |
3.5 Mapping of Kaplan-Meier curves of driver genes and joint survival analysis
We performed the prognostic survival analysis of 5 drivers based on the “survival” R package, where log-rank test of P༜0.05 was used as a meaningful statistical value. We found methylation levels of 3 MDGs were associated with the prognosis of OS (TAGLN2, HHLA2, XAF1), while no significant difference was observed in PDK2 and HOXA2 (Fig. 7A-E). What is more, the joint survival analysis showed that combination of methylation and expression levels had a highly significant correlation with survival outcomes in ccRCC. The hypomethylation with high expression level of TAGLN2, HOXA2 and XAF1 correlated with lower survival rate, while the hypomethylation with high expression level of PDK2 and HHLA2 showed the higher survival rate (Fig. 7F-J).
Given that we have already identified the 5 hub MDGs associated highly with survival prognosis of KIRC, we intended to further detect the significant methylation loci harboring these hazard aberrant methylated drivers. We extracted the β value of methylation data according the 5 gene symbols from the 874 files and merged into a matrix. A list of 187 methylation sites were screened out and the univariate Cox analysis revealed that 32 key methylation loci were associated with survival outcomes in Table 4 (P༜0.05).
3.6 Enriched pathways and co-expression analysis of the 5 MDGs signature
The GSEA was conducted to investigate potential pathways that associated with the 5 MDGs signature. The result revealed that there were six MDGs related pathways enriched significantly in high-risk group with P༜0.05 and the top four were mainly involved in glycolysis, sucrose metabolism, P53 signaling pathways and cytochrome p450 crosstalk (Fig. 8A-D). Besides, we further extracted the five-MDGs signature related genes with filter cutoff value of correlation coefficient = 0.5, P = 0.001. A total of 70 genes were then selected and put into “DAVID 6.8” (https://david-d.ncifcrf.gov/)20 to perform the functional enriched pathway analysis, including gluconeogenesis, fructose metabolism, HIF-1 signaling pathway, carbon metabolism, as well as central carbon metabolism in cancer (Fig. 8E). In accordance with GSEA results, the functional pathway analysis also indicated that the 5 MDGs signature were mainly associated with oxygen metabolism, glycometabolism, even the HIF-1 signaling pathway.