Identification of Differentially Expressed DNA methylase Modification Regulators in PCa
First, we used cbioportal to find mutations in eight genes in prostate cancer.(Fig. 1A). Subsequently, We conducted a differential expression analysis of 8 DNA methylase regulatory genes in PCa (n = 502) and adjacent tissues (n = 52) (Fig. 1B-C). Specifically, the expression levels of DNMT1, DNMT3A, DNMT3B, and TET3 were remarkably higher in tumor samples than those in normal tissues. Moreover, a correlation analysis was performed to further understand the intrinsic association between 8 DNA methylase regulatory genes. This shows that the correlation between TET1 and TET2 is the most significant. The TET1 expression level is most likely to be negative correlation with TET2(Fig. 1D). At the same time, we examined how eight genes differed between the groups with high and low Gleason scores(Fig. 1E). We found that DNMT1, DNMT3A, and DNMT3B had significant elevated. In conclusion, DNA methyltransferase have a relatively higher status than DNA demethylase among the DNA methylase in prostate cancer.
Use of Consensus Clustering Based on DNA methylase Modification Regulators to Identify Two PCa Patient Clusters with Distinct Clinical pathological characteristics
To further investigate the clinical relevance of 8 DNA methylase modification regulators, we clustered PCa patients into subgroups according to their gene expression patterns.
The 8 genes were further utilized for cluster analysis. The cluster effect was best when PCa patients were clustered into two subgroups, and the subgroup internal consistency and stability were good (Fig. 2A). In order to verify the result of the clustering, we further analyzed these two clusters by t-SNE. The t-SNE plot showed significant distinction between cluster 1 and cluster 2 (Fig. 2B). We then assessed whether there were significant differences in RFS and clinical parameters between these two clusters. A significantly better RFS was observed in cluster 1, compared to that in cluster 2 (p < 0.01) (Fig. 2C). Moreover, the expression level of most DNA methylase modification regulators of cluster 2 was higher than that of cluster 1. Compared with cluster 1, cluster 2 was significantly associated with gleason score (p = 0.011), and higher T stage (p < 0.05) (Fig. 2D). Thus, the results of consensus clustering suggested a close association between the expression pattern of DNA methylase modification regulators and PCa progression.
To further interpret the clustering results from the perspective of fundamental biological processes, we performed GO and KEGG analyses on genes that are differentially expressed between cluster 1 and cluster 2. According to the results of the GO analysis, upregulated genes were primarily enriched in biological processes, such as” motor protein”. Simultaneously, down-regulated gene enrichment in malignancy-related biological processes ,for exeample “Vascular smooth muscle contraction”,” cellular response to zinc ion”and” negative regulation of growth”(Supplementary Fig. 1A-B). The ESITIMATE algorithm showed that Cluster 2 had a lower immune score, stromal score, ESITIMATE score and higher tumour purity. (Supplementary Fig. 1C) Subsequently, various algorithms were used to analyse the differences in immune infiltration between the two clusters. We found significant differences in immune cells and immune-related functions between the two clusters. The TIMER algorithm showed that Cluster 2 was associated withmore immune cell infiltrates, such as CD8 + T cells, B cells naive ,neutrophils, and NK cells activated. The CIBERSORT algorithm revealed similar results (Supplementary Fig. 1D). We also found that Cluster 2 was related to lower expression of many HLA and MHC molecules (Supplementary Fig. 1E).
Establishment of a Prognostic Risk Model Based on the Expression Level of DNA methylase Regulator Genes
Considering the strong association between DNA methylase regulators and the prognosis of PCa patients, we applied a univariate Cox regression analysis on the expression levels of eight genes. The results showed that five out of eight regulators were significantly correlated with RFS (p < 0.05) (Fig. 3A). Among these 5 regulators, DNMT1, DNMT3B, DNMT3L were considered as risky genes, with HR > 1. Ultimately, three optimal genes (DNMT1, DNMT3B, DNMT3L) were selected for the establishment of the risk model for PCa, and the corresponding coefficients from the LASSO algorithm(Fig. 3B). The formula for calculating the risk score was as follows: risk score = (1.28 * expression value of DNMT3B) + (1.4 * expression value of DNMT1) + (4.67 * expression value of DNMT3L). A dot pot was used to display the survival status of each patient (Fig. 3C).
To explore the prognostic role of this three-gene signature model, PCa patients were separated into low- and highrisk groups based on the median risk score. Survival analysis demonstrated a worse RFS in patients with a high-risk score relative to patients with a low-risk score (Fig. 3D, p = 3.22e − 07). We then performed a ROC curve analysis and assessed the area under this curve (AUC) of 0.734 for the 5-year RFS, respectively, which showed good predictive power for survival outcomes (Fig. 3E). To evaluate the prognostic value of the three-gene signature for survival prediction in other datasets, we used the 32 PCa patients data for validation. A total of 32 PCa patients were divided into highrisk (n = 16) and low-risk (n = 16) groups according to the cutoffvalue ofthe 502 PCa cases and 52 normal adjacent tumor tissues cohort. Similar with results in the training cohort, the survival analysis demonstrated that PCa patients in the low-risk group had markedly better RFS compared to high-risk patients (Fig. 3F, p = 0.015). The AUCs for 3-year RFS was 0.783 indicating that this prognostic model could predict RFS of PCa patients with a good accuracy (Fig. 3G).
The expression of three prognostic genes in the high- and lowrisk groups was displayed in a heatmap (Supplementary Fig. 2A). After excluding cases with incomplete clinical information, 349 cases were eligible for Cox regression analysis. Univariate analysis revealed that the three-gene risk score and gleason score were significantly related to the RFS ofpatients with PCa (Supplementary Fig. 2B, p < 0.001). In order to evaluate whether the three-gene risk signature was independent from other clinicopathologic characteristics as a prognostic factor for PCa, we additionally conducted multivariate Cox regression analyses, which demonstrated that both risk score and gleason score were independently correlated with RFS for patients with PCa (Supplementary Fig. 2C, AUC = 0.794/0.739). These results demonstrated that the three-gene risk signature was able to predict prognosis independently of age, histological grade, gleason score and pathological stage. Then, we explored the relationship between clusters ,risk score and various clinical features (Supplementary Fig. 2D).Based on this, We found that high Gleason scores were strongly associated with cluster 2 and high-risk groups. Also, T and N stage showed positive correlation with Gleason score.
By comparing the genetic differences between the high and low risk groups in the model, we found that the differences between the two groups were significant(Fig. 4A). We subsequently conducted GSEA between the low and high risk score groups based on Hallmark and found that the high risk score group was significantly enriched in immune related Hallmark pathways, including allograft rejection and IL6/JAK/STAT3 signalling (Fig. 4B).
The burden of both copy number gain and loss at the focal level was higher in the high-risk group (Fig. 4C). It appears that focal copy number alterations strongly contribute to the difference in immune infiltration in prostate cancer. To investigate the differences in genomic mutations between the high- and low-risk groups, we downloaded simple nucleotide variation data. TP53 (17%), TTN (15%), SPOP (13%), MUC16 (8%) and MUC17 (8%) were the top 5 genes with the highest mutation frequencies in the high-risk group, while SPOP (10%), TTN (10%), MUC16 (7%), MLL2 (6%) TP53 (6%) and SYNE (6%) were the top 5 genes in the low-risk group (Figs. 4D-E). By comparing the TMB of the two groups, we found that the high-risk group had higher tumor mutation burden. (Fig. 4F).
After the above study, we have found that prostate cancer is closely related to immunity. Therefore, the relationship between the signature and tumour immune microenvironment was further investigated. We explore the relationship between model genes and immune cells(Fig. 5A-C). Furthermore, our study delved into the examination of HLA and MHC expression levels between the high-risk and low-risk groups. Our findings revealed a notable increase in MHC expression within the high-risk group, whereas the fluctuation of HLA expression was found to be more evenly distributed between the two groups (Fig. 5D-E). Immune cells that predominantly infiltrated included B cells, CD8 + T cells, and Plasma cells (Fig. 5F).Through the ssGSEA algorithm, the high-risk group had higher immune cell infiltration and more immune-related functions or pathways than the low-risk group (Fig. 5G-H).
Prediction of the Chemotherapy Response and Screening of Small-Molecule Drugs
We used two different algorithms to find drugs that are sensitive in high-risk groups to provide the basis for individualized precision therapy for prostate cancer. GDSC was used topredict the chemotherapy response ofthe common chemotherapy agents between the two groups (Fig. 6A-B).The sensitivity of many chemotherapeutic agents (docetaxel and bicalutamide) was higher in the high-risk group. At the same time, we took a different algorithm. Two different approaches were adopted to identify candidate agents with higher drug sensitivity in high PPS(prognosis-associated signature) score patients. The analyses were performed using CTRP and PRISM-derived drug response data, respectively. First, differential drug response analysis between PPS score-high (top decile) and PPS score-low (bottom decile) groups was conducted to identify compounds with lower estimated AUC values in PPS score-high group (log2FC > 0.10). Next, Spearman correlation analysis between AUC value and PPS score was used to select compounds with negative correlation coefficient (Spearman’s r < − 0.30 for CTRP or − 0.35 for PRISM). These analyses yielded five CTRP-derived compounds (including birinapant, PF − 184, etoposide, clofarabine and fluvastatin) and six PRISM-derived compounds (including LY2606368, alvocidib and MK − 1775). All these compounds had lower estimated AUC values in PPS score-high group and a negative correlation with PPS (Fig. 6C- D). We provide the basis for personalized treatment in the future.