Differentlly expressed RBPs between colon cancer and normal tissues
The databases of colon cancer were downloaded from TCGA contained 389 tumor and 39 normal colon tissue samples. Applying the R software packages3.64, the differently expressed RBPs were identified. A total of 1542 RBPs [17] were included in the analysis, and 490 RBPs met the screening standard of this study (P<0.05, |log2FC)| >0.5), which consist of 323 upregulated and 167 downregulated RBPs. The expression distribution and relationship each other of these differently expressed RBPs was displayed in Figure 1.
GO and KEGG pathway enrichment analysis of the differently expressed RBPs and their downstream regulator
In order to study the function and mechanism of RBPs, we divided these differently expressed RBPs into two groups: up-regulated and down regulated. Then we use R software package to process them for functional enrichment analysis. The results showed that the upregulated RBPs were enriched in the biological processes(BP) related to RNA ribosome biogenesis, rRNA metabolic process, nucleic acid phosphodiester bond hydrolysis, RNA localization, RNA phosphodiester bond hydrolysis, tRNA metabolism, RNA 3 '- terminal processing, RNA modification(Figure 1c). The down regulated RBPs were enriched in negative regulation of cellular amide metabolism, negative translation regulation, RNA splicing, mRNA catabolic process, tRNA metabolism, cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule (Figure 1e). Through cellular component (CC) analysis, we found that the upregulated RBPs with differential expression were enriched in preribosome, nuclear part, cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule, spliceosomal complex, small-subunit processome, preribosome, large subunit precursor, 90s preribosome, exoribonuclease complex, cytoplasmic exsosome(RNase complex) (Figure 1c) In addition, The cellular component (CC) analysis indicated that the upregulated differently expressed RBPs were enriched in cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule, nuclear speck, P-body, spiceseosomal complex, ribosome, sytoplasmic stress granule, ribosomal subunit, catalytic step 2 spliceosome, U2−type spliceosomal complex(Figure 1e). In terms of molecular function (MF), upregulated RBPs with different expressions are rich in RNA catalytic activity, nuclease activity, ribo nuclease activity, tRNA catalytic activity, endonuclease activity, ribonucleoprotein complex binding, mRNA 3' - UTR binding, RNA helicase activity, tRNA binding, RNA methyltrasnsferase activity(Figure 1c). In addition, the down regulated RBPs in MF have rich expression in RNA catalytic activity, mRNA 3' - UTR binding, translation regulator activity, double-stranded RNA binding, translation repressor activity, translation regulator activity, mRNA 3'−UTR AU−rich region binding, translation repressor activity, mRNA regulatory element binding(Figure 1e). With the method of the enrichment analysis of KEGG pathway, we found that the up regulated RBPs were enriched in ribosome biosgenesis in eukaryotes, RNA transport, spliceosome, mRNA surveillance pathway, RNA degradation, ribosome, RNA polymerase, DNA replication.A degradation, ribosome, mRNA monitoring pathway, RNA transport and splices (Figure 1d), while the downregulated RBPs were enriched in RNA transport, spliceosome, ribosome, TGF-βsignaling pathway and RNA degradation(Figure 1f).
Protein-protein interaction (PPI) network construction and key modules selecting.
To further investigated the roles of differently expressed RNA binding proteins in colon cancer, we used the STRING database to create the PPI network (Figure 1a). Moreover, Cytoscape software was used to sort and analyze the data(Figure 2b). The expression network was processed using the MODE tool to identify possible key modules and the three key modules acquired(Figure 2c-f).
Prognosis-related RBPs selecting
With the method of univariate Cox regression analysis, the prognostic significance of these 5 differently expressed RBPs identified were investigate and 5 prognostic-associated candidates RBPs were validated (Figure 3a). This five-gene signature involved PNLDC1, NSUN6, NOL3, PPARGC1A and LRRFIP2, Of these, PPARGC1A and LRRFIP2, were protective factors while the rest were risk factors. This five-gene signature is of great importance for the prognostic evaluation of colon cancers. In addition, these 5 prognostic-associated candidate hub RBPs were analyzed by Multivariate Cox regression analysis and investigated their impact on patient survival time and clinical outcomes, and all the five RBPs were found to be independent predictors in the patients with colon cancer (Figure 3b).
Prognosis-related genetic risk score model construction and analysis
The five RBPs identified from the Multivariate Cox regression analysis were used to construct the predictive model. The predictive model was characterized by the linear combination of the expression levels of the five genes weighted by their relative coefficient in the multivariate Cox regression as follows:
Risk score=(0.6708ExpPNLDC1)+(1.3922ExpNSUN6)+(0.7246ExpNOL3)+(-0.6708ExpPPARGC1A)+(-1.5486ExpLRRFIP2)
We then conducted a survival analysis to assess predictive ability of the 5 RBPs. A total of 337 patients with colon cancer were divided into low-risk and high-risk subgroups according to the risk score. The higher the risk score, the worse the clinical prognosis.The K-M OS curves of the two groups, based on the five genes, were significantly different(p=8.432e−08; Figure 3c). the area under the curve( AUC) of the time-dependent ROC curve was calculated to evaluate the prognostic capacity of the 5-gene features. The higher the AUC, the better the model performance. The five-gene-AUCs of biomarker prognostic model were 0.720 and 0.725 for the three and five‐year survival time, respectively, indicating that the predict model had high sensitivity and specificity (Figure 3d;3e).In addition, The expression heat map, survival status of patients, and risk score of the signature consisting of five RBPs in the low- and high-risk subgroups are shown in Figure 3f~h.
Verifying the the prognostic model in the GEO dataset
The prognostic model was evaluated in other datasets to verify the reliability for the patients with colon cancer. The five-gene model was assessed in the GEO microarray data GSE75500 with same method in GSE75500 data. The OS of the colon cancer patients in the high-risk group was lower than that in the low-risk group (p=3.17e−3; Figure 3i). The time-dependent ROC analyses for the survival prediction of the prognostic model obtained AUCs of 0.691 at three years and 0.624 at five years. (Figure 3j;3k), and the risk score of the signature consisting of five RBPs in the low- and high-risk subgroups are displayed in Figure 3l~n. All these demonstrate that this five-RBP-gene prognostic model was capable of predicting OS in HCC patients.
A nomogram based on the five RBPs
In order to found a quantitative way for predicting the prognosis of the colon cancer patients, we integrated the five RBPs signature based on the multivariate Cox analysis to establish a nomogram that might help doctors to make the clinical decision for the patients (Figure 4a). We also assessed the prognostic significance of different clinical characteristics from TCGA by using COX regression analysis. However, The results showed that the tumor stage and risk score were independent prognostic factors correlated with OS through univariate and multivariate analysis(Figure 4b;4c).
Validation of the relationship between the expression of five RBPs and prognosis
To further explore the prognostic value of five RBPs in colon cancer, we drew Kaplan-Meier survival curves (PNLDC1, NSUN6, NOL3, PPARGC1A and LRRFIP2) to investigate the relationship between hub-RBPs and OS. The results of the log-rank test showed that the five RBPs were associated with OS in colon patients (Figure 4d~h). In order to determine the relationship between each gene expression and clinical characters, we analyzed the expression of proteins encoded by these five RBPs using clinical samples in the human protein map database. NOL3 in colon tissue was significantly increased and LRRFIP2 was decreased compared with normal colon tissue (Figure 5). However, there was no significant difference in PNLDC1 and NSUN6 protein expression between tumor and normal colon tissues (Figure 5). However, PPARGC1A was not found on the website.