RNA-seq transcriptome data of m5C-related regulators in COAD
Based on RNA-seq transcriptome data of COAD from TCGA database, the expression of 13 m5C-related regulators between tumor tissues and adjacent mucosa was compared (Figure 2). With the exceptions of TET1 and TET3, the expression levels of the other 11 factors were significantly different in the tumor tissues and the adjacent mucosal tissues. Compared with the adjacent mucosa, the expression of NSUN3 (P < 0.001) and TET2 (P < 0.001) in the tumor group was significantly downregulated. The expression of ALKBH1 (P = 0.036), ALYREF (P < 0.001), NOP2 (P < 0.001), NSUN2 (P < 0.001), NSUN4 (P < 0.001), NSUN5 (P < 0.001), NSUN6 (P < 0.001), NSUN7 (P = 0.006), and YBX1(P < 0.001) were significantly upregulated in tumor tissues compared with the adjacent mucosa.
Correlation and interaction of m5C-related regulators in COAD
The correlations between the m5C-related regulators were analyzed using the “corrplot” package in R and their interrelationships were retrieved from the STRING database (https://string-db.org/). The expression levels of the seven “writers” were correlated with each other, except for NSUN2 and NSUN7, NSUN5 and NSUN7, NSUN2 and NSUN3, and NSUN5 and NSUN6. There were also close and complicated relationships between each regulator in the protein-protein interaction (PPI) network. We also found that the expression of TET family genes (TET1, TET2, TET3) were highly related to each other and had little correlation with ALKBH1. However, the TET family was associated with ALKBH1 in the PPI network and had interrelationships with the “writer” genes via ALKBH1. In addition, there was evidence supporting the interaction between the “reader” genes ALYREF and YBX1 in the PPI network. The expression of these genes was also positively associated with each other (Figure 3).
CNVs and SNPs of m5C-related regulators in COAD
Regarding CNVs, we found that 10 of the 13 m5C-related regulators were significantly different between the tumor tissue and the adjacent mucosa from 825 samples with CNV data. Furthermore, it was found that CNVs affect the expression of m5C-related regulators. The highest frequency of CNVs occurred in the “writer” gene NSUN5 (24.47%), followed by the “eraser” gene ALKBH1 (19.53%). The “eraser” gene TET3 had the lowest CNV frequency (2.35%) (Table 2). The “writer” genes NOP2, NSUN2, NSUN5, and NSUN7, the “eraser” genes TET2 and ALKBH, and the “reader” gene ALYREF displayed a significant difference in expression due to CNVs (Figure 4).
Regarding SNPs, we found that all of the m5C-related regulators had missense mutations, and missense mutations were the highest frequency mutation in 399 COAD cases with available sequencing data. Among them, the m5C “eraser” gene TET2 had the highest frequency of mutation events (96/399), followed by TET3 and TET1 (both 39/399). In addition, the “writer” genes NSUN2 and NSUN7, the “eraser” gene TET2, and the “reader” gene ALYREF displayed significant differences in expression levels due to SNPs. Next, we evaluated the effect of SNPs on patient prognosis, but no difference was observed due to the relatively few numbers of mutations (Figure 5).
Consensus clustering of patients with COAD and the survival rate of subgroups
Based on the expression levels of 13 m5C-related regulators, a consistent clustering analysis of patients with COAD was performed, and they were clustered into two subgroups because there was minimal interference between the two subgroups (Figure 6A, B, C, D).
PCA showed that the RNA expression levels in patients with COAD in clusters I and II were specific (Figure 6E). Nevertheless, there were many overlapping areas between each cluster on the whole, indicating that the clusters had something in common; however, there were no significant differences between the two subgroups. Notably, the overall survival rate of cluster I and cluster II when analyzed using the Kaplan-Meier method was significantly different (Figure 6F). Specifically, the 5-year survival rate was 57.4% in cluster I and 66.7% in cluster II.
Prognostic value of m5C-related regulators in COAD prognosis
To evaluate the prognostic value of these 13 m5C-related regulators in COAD, univariate Cox regression analysis was used to identify m5C-related regulators that were highly correlated with the OS in patients with COAD, and two regulators with prognostic significance (P < 0.05) were found: NSUN6 and ALYREF. Specifically, ALYREF was considered a protective factor with HR < 1 in patients with COAD and NSUN6 was considered as a risk factor with HR > 1 (Figure 7A). To further evaluate the prognostic significance of these two m5C-related regulators, LASSO Cox regression analysis was performed and it was revealed that NSUN6 (Coef=0.300256795278519) and ALYREF (Coef=0.00796895949684636) could serve as powerful prognostic factors in COAD (Figure 7B, C, D).
Based on NSUN6 and ALYREF, a risk signature was constructed and the risk score was calculated.Using the median risk score as the demarcation value, patients with COAD (n = 525) were classified into two groups, namely the high-risk and low-risk groups. To test the prognostic role of the two gene risk signatures. Survival and ROC curve analyses were conducted; the low-risk group had significantly longer survival time than the high-risk group (Figure 7E). In particular, compared with the 46.4% 5-year survival rate in the high-risk group, that of the low-risk group was 78.7%. The area under the curve (AUC) value in the time-dependent ROC curve was 0.754, suggesting good prediction performance of the survival model (Figure 7F).
Relationship between the risk score, the expression of the two selected m5C-related regulators, and clinicopathological characteristics in COAD
The expression of NSUN6 and ALYREF and the distribution of clinicopathological characteristics in the high-risk and low-risk groups are displayed as a heatmap (Figure 8A). Evident differences between the two groups according to stage T (P < 0.05) and fustat (P < 0.01) were observed.
To examine whether the risk score was an independent prognostic factor, univariate and multivariate Cox regression analyses were conducted. This revealed that the risk score was significantly associated with OS in univariate analysis, in addition to age at diagnosis, pathological stage, and TNM stage (P < 0.05). However, only the age at diagnosis and risk score were correlated with OS (P < 0.05) in the multivariate Cox regression analysis (Figure 8B,C).
Biological functional analysis of differentially expressed genes between different subgroups and m5C-related regulatory gene
As we clustered the patients with COAD into cluster Ⅰ and cluster Ⅱ, genes that were significantly upregulated (fold change > 1 and p < 0.05) or downregulated (fold change < 1 and p < 0.05) between the high-risk group and low-risk group were identified using the “edgeR” package in R. GO and KEGG pathway analysis were used for biological functional analysis.
With regard to GO analysis, the differentially expressed genes were associated with immune-related biological processes, such as “antigen binding” and “immunoglobulin receptor binding,” and pre-mRNA-related biological processes, such as “pre-mRNA 5'-splice site binding” and “pre-mRNA binding.”(Figure 9A). KEGG pathway analysis results were correlated with immune-related pathways, including “complement and coagulation cascades” and “NOD-like receptor signaling pathway,” and RNA-related pathways, including “RNA transport” and “spliceosome.” Moreover, cancer-related pathways were enriched, such as “transcriptional misregulation in cancer” and “MAPK signaling pathway” (Figure 9B).
Next, we used GSEA to predict the functional difference between clusters I and II. The results showed that cluster I had a worse OS and lower 5-year survival rate associated with malignancy-associated pathways, including the ATP-binding cassette transporter (NES = 1.79, normalized P = 0.006) and phosphatidylinositol signaling system (NES = 1.63, normalized P = 0.03) (Figure 9C,D).
Furthermore, as NSUN6 and ALYREF were shown to be important regulators of m5C in our study, GSEA was performed to investigate the potential biological processes associated with NSUN6 and ALYREF in COAD pathogenesis. GSEA suggested that increased expression of NSUN6 and ALYREF is involved in various biological functions in RNA processing, such as spliceosome, RNA polymerase, and RNA degradation. Upregulation of these genes was associated with malignancy-associated pathways, such as the cell cycle (Figure 9E,F).