3.1 Screening of differentially expressed RBPs in rectal neoplasms
This research was carried out according to the flowchart shown in Figure 1. Download the gene names of 2005 RBPs from RBPTD (http://www.rbptd.com/). Next, download the RNA sequencing data of 177 samples from the TCGA database and the corresponding clinical information (including 167 rectal cancer samples and 10 normal samples). Then use the limma R software package to handling the original data and screen the differentially expressed RBPs (102 up-regulated RBPs and 34 down-regulated RBPs) according to FDR<0.05 and |log2fold change (FC)|≥1 and the names of RBPs(Figure 2A,2B). Then download the GSE87211 data set (including 203 rectal cancer samples and 160 normal tissue samples) from the GEO database, which is the same as the TCGA differential gene screening. Then use the limma R software package for preprocessing, and use FDR<0.05 ; |log2fold change (FC)|≥1 to select the differentially expressed RBPs (including 32 up-regulated and 35 down-regulated) by combining RBPs gene names. (Figure 2C,2D).
3.2 PPI Network Construction and functional enrichment analysis of differentially expressed RBPs genes
The biological function of differentially expressed RBPs in rectal cancer. Take the intersection of the differentially expressed genes screened by TCGA and GEO (including 44 up-regulated RBPs and 32 down-regulated RBPs), as shown in Figure 3A. According to the information of co-up-regulated and co-down-regulated RBPs in the STRING database, a PPI network with 73 nodes and 394 edges was constructed using Cytoscape software (Figure 3B). Subsequently, the MOCODE plug-in in Cytoscape was used to screen out the two compulsory modules with the highest scores from the PPI network; the cytoHUbba plug-in was used to select hub genes (P<0.05), and all hub genes existed in modules ≥1. Therefore, these two key cluster modules represent the critical biological roles of PPI networks.
Use DAVID (https://david.ncifcrf.gov/) to perform GO and KEGG function enrichment analysis on the 76 RBPs genes selected to discover the biological functions of the differential genes. KEGG analysis identified that differentially expressed RBPs are enriched in the biological pathways related to cell cycle, cysteine and methionine metabolism, and microRNA in cancer. The results of molecular function analysis confirmed that the differentially expressed RBPs were significantly enriched in RNA phosphodiester bonds, RNA binding catalytic activity, RNA synthesis, chromatin binding, etc. In terms of cell composition, different RBPs are mainly enriched in nucleoplasm, cytoplasm and nucleolus. In addition, we also found that differentially expressed RBPs are also enriched in RNA hydrolysis, IFN-α biosynthesis forward regulation, IFN-β synthesis forward regulation, rRNA transcription (Figure. 4)
3.3 Selection of RBPs related to prognosis
We aimed at identifying 73 differential expression hub RBPs in the PPI network. In order to study the correlation between these RBPs and prognostic significance, we used univariate Cox regression analysis to screen out 13 RBPs with a significant overall prognosis. (Figure 5A). Next, we divided the 166 samples with clinical information of TCGA-READ into a training set (82 samples) and test set (84 samples). And using multivariate Cox regression analysis to analyze their impact on patient survival time and clinical prognosis(Table 2.), it was found that six hub RBPs are independent predictors of rectal cancer patients (Figure 5B).
3.4 Construction of a genetic risk scoring model related to prognosis
Construct a risk ratio prediction model for the six hub RBPs identified by multivariate cox regression analysis(Table 3). Calculate the risk score of each patient according to the following formula:
To evaluate the predictive ability of the model. In the training set (n = 82), we risk score based on the median of 82 samples are divided into high-risk and low-risk groups and group survival analysis. The results showed that high-risk group OS is far lower than the low-risk group (Figure 6A). Secondly, we further conducted a time-dependent ROC analysis on the prognostic ability of six hub RBPs. The area under the ROC curve of the constructed RBPs risk scoring model is 0.780, indicating that it has a good predictive ability(Figure 6B). Figure 6C shows the expression heat map of the risk score, patient survival status, and six RBPs in the high-risk group and the low-risk group. Finally, we performed univariate and multivariate Cox regression analysis. The results show that the forecasting model showed moderate independent predictive power.
3.5 Prognostic model performance verification
To validate the predictive value of six key RPBs prognostic models, we use the internal test set (n = 84), the complete data set (n = 166), to assess the predictive ability of the training set. Survival analysis of the two test sets showed that patients in the high-risk group had a worse prognosis than those in the low-risk group (Figure 7A,8A), which was consistent with the results of the training set. Time-dependent ROC curve analysis shows that the AUC of the internal test set and the complete set are 0.911 and 0.798, respectively. (Figure 8B, 9B). Figure 7C and 8C showed the survival status of patients with risk scores in the prognostic model and the expression heat map of 6 RBPs. Finally, univariate and multivariate Cox analyses involving clinical factors and risk scores showed good independent predictive power in both test sets. In summary, the reliability of the prognostic model is explained (Table 4, Table 5).
3.6 Construction and verification of prediction nomogram
To establish a method for quantitatively predicting the survival probability of patients with rectal cancer, we combined six independent prognostic markers of RBPs to construct a nomogram to predict the five-year OS rate of patients (Figure 9A). Besides, the ROC curve analysis showed that t n addition, ROC curve analysis showed that AJCC staging (AUC=0.890), tumor status (AUC=0.798), tumor residual (AUC=0.618) and patient age (AUC=0.615) were all lower than the risk score of the model (AUC= 0.962),as shown in Figure 9B.
3.7 Analysis of hub genes related to RBPs, clinical features and biological functions
Based on the proportional hazard regression model analysis, six RBPs(PANCSIN2, EXO1, TOP2A, NXT1, RUVBL1, and WDR4) are associated with prognosis. The expression of these 6 RBPs in different risk groups in the TCGA data set is shown in Figure 10. We observed significant differences between the low-risk and high -risk subgroups in clinical characteristics such as risk grade, tumor status, tumor residual, and tumor stage. We analyzed 6 RBPs for different clinical characteristics. The results showed that the expression levels of NXT1, TOP2A and WDR4 were significantly different in different risk groups. In different tumor stages and grades, significant differences were found in the expression levels of TOP2A and PACSIN2. The expression of PACSIN2 in different tumor states and tumor invasion blood vessels is different.
To further analyze the potential biological functions of the six RBPs genes, we use GSEA analysis in groups of risk scores. The results showed that the high-risk group was enriched in pancreatic β cells, blood coagulation, myogenesis, and epithelial-mesenchymal transition pathways(Figure 11).
3.8 External verification of the prognostic performance and expression of hub RBPs
To further determine the relationship between the expression of these Hub RBPsin rectal cancer and the prognosis, we obtained the immunohistochemical results from the human protein atlas database. Compared with normal rectal tissues, the expression of TOP2A, RUVBL1, PACSIN2, and WDR4 in rectal cancer increased significantly. At the same time, we combined two external data (GES20482, GSE90627) with sample numbers greater than 100. It was found that PANCSIN2, TOP2A, RUVBL1, WDR4 had significant differences in expression in normal tissues compared with rectal cancer tissues. We further demonstrated the expression of 5 TFs screened in clinical specimen tissues (Figure 12). The results confirmed that it is consistent with our analysis. EXO1, TOP2A, RUVBL1, WDR4 have higher expression levels in normal tissues, and lower expression levels in tumor tissues. However, PACSIN2 and NXT1 were not statistically different between normal tissues and tumor tissues.