Identification of Differentially Expressed RBPs in OSCC
The workflow of this study is illustrated in Figure 1. RNA-sequencing data for OSCC and corresponding clinical information were downloaded from the TCGA database. In total, 331 OSCC samples and 32 normal oral samples were analysed. The Limma R software package was applied to preprocess these data and to identify the differentially expressed RBPs [11]. A total of 1542 RBPs were analysed in this study, and 257 met our inclusion criteria (P<0.05, |log2FC)| >1.0), which consisted of 121 downregulated and 136 upregulated RBPs. The expression of these differentially expressed RBPs is shown in Figure 2.
Functional Enrichment Analysis of the Differentially Expressed RBPs
The identified RBPs were divided into two groups, the upregulation group and the downregulation group, to explore their potential functions and mechanisms. We carried out GO and pathway analyses for these differentially expressed RBPs. The results showed that the upregulated RBPs were significantly enriched in defense response to virus and regulation of mRNA metabolic process (Table 1, Figure 3A), while the downregulated RBPs were considerably enriched in mRNA processing, RNA splicing, and RNA catabolic process (Table 1, Figure 3B). Regarding the CC category, the upregulated RBPs were enriched in cytoplasmic ribonucleoprotein granule, ribonucleoprotein granule, and spliceosomal complex (Table 1, Figure 3A), while the downregulated RBPs were enriched in the ribosome, ribosomal subunit, and mitochondrial matrix (Table 1, Figure 3B). The MF analysis indicated that the upregulated differentially expressed RBPs were mainly enriched in catalytic activity acting on RNA, nuclease activity, and helicase activity (Table 1, Figure 3A), whereas the downregulated RBPs were notably enriched in mRNA binding, catalytic activity acting on RNA, and nuclease activity (Table 1, Figure 3B). Additionally, the results of KEGG pathway analysis showed that the upregulated RBPs were enriched in the RNA transport, spliceosome, and mRNA surveillance pathways (Table 1, Figure 3C), while the downregulated RBPs were mainly enriched in the RNA transport, ribosome, and mRNA surveillance pathways (Table 1, Figure 3D).
PPI Network Construction and Key Module Screening
To better understand the potential roles of these identified RBPs in OSCC, we performed a PPI network using the STRING database and Cytoscape software. This PPI network included 228 nodes and 1384 edges in total (Figure 4A). We further analysed the coexpression network to detect potential critical modules using the MCODE plug-in and identified the key modules (Figure 4B). Of these, the first essential module, module 1, contained 49 nodes and 316 edges, the second module, module 2, consisted of 8 nodes and 28 edges, module 3 consisted of 13 nodes and 43 edges, module 4 consisted of 6 nodes and 12 edges, the next module consisted of 4 nodes and 6 edges, and the last module consisted of 16 nodes and 29 edges (Supplementary Figure 1A-F). The functional analysis showed that the RBPs in these key modules were mainly enriched in mitochondrial translation, mitochondrial gene expression, and cellular protein complex disassembly. More details are provided in Supplementary Table 1.
Identification of Prognosis-related RBPs
Of the differentially expressed RBPs from the PPI network, 20 prognosis-associated candidate hub RBPs were identified by univariate Cox regression analysis (Supplementary Table 2). The candidate RBPs were analysed by multivariate Cox regression to determine their impact on patient survival time and clinical outcomes. Ten hub RBPs, ZC3H12D, OAS2, INTS10, ACO1, PCBP4, RNASE3, PTGES3L-AARSD1, RNASE13, DDX4 and PCF11, were identified as independent predictors in OSCC (Supplementary Table 3).
Prognosis-related Genetic Risk Score Model Construction and Validation
The ten hub RBPs obtained by multivariate Cox regression analysis were used to construct a prognostic model. To assess the predictive ability of this model, we further carried out a risk score analysis in the training cohort. The risk score of each patient was calculated according to the following formula: RS=ZC3H12D*-1.505+OAS2*-0.006+INTS10*-0.135+ACO1*0.108+PCBP4*-0.122+RNASE3*2.757+ PTGES3L-AARSD1*2.827+RNASE13*-11.340+DDX4*8.283+PCF11*-0.204.
A total of 221 OSCC patients were divided into low-risk and high-risk groups according to the median RS. The results showed that patients in the low-risk group had significantly longer OS times than those in the high-risk group (Figure 5A). The time-dependent ROC analysis was further conducted to estimate the prognostic ability of the identified RBPs. From the chart, we can see that the area under the ROC curve (AUC) of the risk score model was 0.781 (Figure 5B), which suggested that our model has high prognostic accuracy. The expression heat map of the ten prognostic RBPs is illustrated in Figure 5C, and the RS and survival status for the low- and high-risk groups are displayed in Figure 5D. Moreover, the results from our internal validation cohort, using the same formula to assess the predictive capability of the ten-gene prognostic model, showed that patients with lower risk scores had better OS (Figure 6A-D). These results indicated that the prognostic model has significant predictive capability.
The Prognostic Value of Different Clinical Parameters
To assess the prognostic significance of different clinical features, we performed a univariate Cox regression analysis of OSCC patients in the training set and internal validation set. The results showed that stage, grade, age and RS were correlated with the OS of OSCC patients in both cohorts (P<0.05) (Figure 7A, B), whereas only stage and RS were independent prognostic factors for OS in the multiple regression analysis of both cohorts (P<0.05) (Figure 7C, D).
Development and Validation of a Nomogram Based on the Hub RBPs
We established a nomogram for predicting the 1-, 2- and 3-year OS based on the multivariate analysis results (Figure 8A). The points for each independent prognostic factor listed in the nomogram were summed the total points value, which is projected on the bottom scale and used to determine the 1-, 2- and 3-year OS probabilities for each patient. We plotted the calibration curves, which indicated that there was no apparent deviation from the ideal line, with optimal accordance achieved between the nomogram-predicted outcomes and the actual survival rates in both the training and validation cohorts (Figure 8B, C).