1. Overview of AS events in COAD
Comprehensive AS events in seven splicing types: Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA)(Fig. 1A). Overall, 22,451 AS events were detected, including 74 ME in 74 genes, and 1,661 RI in 1,187 genes, 1,682 AD in 1, 309 genes, 1,999 AA in 1,574 genes, 4,696 AT in 2,770 genes, 3,972 AP in 2,325 genes, and 8,367 ES in 4,430 genes(Fig. 1B and 2A).
2. AS events associated with survival time
The results of univariate Cox regression analysis showed that 2,004 AS events in 1439 genes have significantly relationship with OS(P < 0.05), including 10 ME in 10 genes, and 158 RI in 149 genes, 105 AD in 104 genes, 140 AA in 134 genes, 517 AT in 330 genes, 448 AP in 323 genes, and 586 ES in 518 genes(Fig. 2B), top 20 of each AS type were presented in Fig. 3. As depicted in the plot, one gene may contain more than one type of AS events.
3. DEGs with survival AS events
A total of 6455 DEGs were indentified from TCGA COAD cohort, including 4558 genes were significantly up-regulated and 1897 genes were significantly down-regulated (Fig. 4,p < 0.01, fold-change > 1). To indicate the DEGAS, we overlap the 6445 DEGs and 1439 genes with survival AS events. As a result, 211 DEGs were identified that occurred survival associated AS events at the same time. The results were showed in Venn diagram (Fig. 5A). Principal Component Analysis (PCA) was performed for DEGAS to characterize the DEGAS comprehensively (Fig. 5B-5H). Multivariate Cox regression analysis was performed to evaluate the prognostic effects of DEGAS. In COAD, AD, RI, AT, AA, ES, and AP had an area under the curve (AUC) > 0.65, and ES had the most satisfied performance (AUC: 0.98), while the ME had the lowest performance (AUC = 0.595, Fig. 5I and 5J).
4. Construction and analysis of PPI network
To explore the interactions among DEGAS, a String database was mapped for the most significant prognostic genes with a score > 0.4 using Cytoscape (Fig. 6A). Different type of AS events was shown by different color. The results indicated that the majority of DEGAS showed protein-protein interactions and participate in different biological processes. Top 30 outstanding DEGAS were identified through PPI analysis by DMHC algorithm in Cytoscape software, which were relatively more important that connected more nodes in the PPI network (Fig. 6D).
5. Functional enrichment analysis
In this study, GO and KEGG analysis of the 211 DEGAS was provided by Metascape, which may improve biological understanding of these genes(Fig. 6B, 6C). The enrichment results of GO analysis of DEGs with survival AS events performed by Metascape were mainly enriched in chordate embryonic development, response to inorganic substance, ncRNA processing, regulation of cell-cell adhesion, neurotransmitter metabolic process, response to oxygen levels, deoxyribonucleoside metabolic process, nuclear division, tRNA metabolic process, regulation of growth, cellular response to hormone stimulus, response to estradiol, carbohydrate derivative transport, cellular response to acid chemical, fatty acid metabolic process, positive regulation of cell cycle, kinase activity, protein methyltransferase activity, regulation of cellular response to stress, oxidoreductase activity. KEGG analysis were mainly enriched in purine metabolism (hsa00230), fatty acid metabolism (hsa01212), cocaine addiction (hsa05030), ovarian steroidogenesis (hsa04913), longevity regulating pathway-multiple species (hsa04213), cell cycle (hsa04110), HTLV-I infection (hsa05166), RNA degradation (hsa03018), focal adhesion (hsa04510), cytokine-cytokine receptor interaction (hsa04060),hippo signaling pathway (hsa04390).
6. Construction of the Prognostic Models
We obtained 30 hub DEGAS from PPI network, and put them into multivariable Cox regression analysis. Regression coefficients from the multivariable Cox analysis was used to construct a risk score and develop a prognostic model to predict overall survival. 12 genes (EZH2, coefficient =-3.36021; NEDD4L, coefficient =-3.59895; PRMT1, coefficient = 12.4729; LAS1L, coefficient = 5.054636; PBK, coefficient = 6.15113; KIF2C, coefficient =-16.7546; NCAPD3, coefficient =-9.33565; TRAIP, coefficient =-3.85844; PUS7, coefficient =-3.51065; COL1A1, coefficient = 4.254234; NUSAP1, coefficient =-3.00095; MKI67, coefficient = 1.669178) were included in the signature. The general information of 12 genes was shown in Table 1. The risk score was defined as the sum of the expression of each gene multiplied by its coefficients. The risk score was calculated for each patient. We used the median risk score as the cutoff value to classify patients into low risk and high risk. The gene expression, survival status of each patient was ranked by the risk score values of prognostic signature (Fig. 7A). Kaplan-Meier curve was plotted to indicate the correlation between risk score and overall survival of COAD patients (Plog−rank=2.188e-06, Fig. 7B). Time-dependent ROC curve showed an attractive performance of the signature for OS prediction (P < 0.05, AUC = 0.782, Fig. 7C).
Table 1
General information of the 12 genes in the prognostic risk score.
Gene stable ID | Gene name | Gene type | Chromosome | Gene start (bp) | Gene end (bp) |
ENSG00000142945 | KIF2C | protein coding | 1 | 44739704 | 44767767 |
ENSG00000183763 | TRAIP | protein coding | 3 | 49828595 | 49856584 |
ENSG00000126457 | PRMT1 | protein coding | 19 | 49676166 | 49688450 |
ENSG00000091127 | PUS7 | protein coding | 7 | 105456501 | 105522271 |
ENSG00000001497 | LAS1L | protein coding | X | 65512582 | 65534806 |
ENSG00000137804 | NUSAP1 | protein coding | 15 | 41332841 | 41381050 |
ENSG00000069399 | MKI67 | protein coding | 19 | 44742620 | 44760044 |
ENSG00000148773 ENSG00000108821 ENSG00000106462 ENSG00000049759 ENSG00000151503 | PBK COL1A1 EZH2 NEDD4L NCAPD3 | protein coding protein coding protein coding protein coding protein coding | 10 17 7 18 11 | 128096659 50184096 148807374 58044226 134150113 | 128126423 50201649 148884344 .58401540 134225461 |
The results of univariable and multivariable Cox regression showed that the risk score has significance in predicting overall survival of ccRCC patients [HR in univariable analysis = 1.051(1.035–1.067), HR in multivariable analysis = 1.042 (1.026–1.060), P < 0.001, Table 2,Fig. 7D,7E]. The details of risk score together with other clinical elements by ROC curve analysis were shown(Fig. 7G). The PCA analysis of all COAD patients was performed to display the characteristic of each patient (Fig. 8A).
Table 2
Univariate and multivariate Cox regression analyses of the gene signature and overall survival of ccRCC patients.
| | N | Univariate analysis | | Multivariate analysis |
HR | 95% CI | P | | HR | 95% CI | P |
age | <=60 | 260 | 1 | | < 0.001 | | 1 | | 0.045 |
| > 60 | 247 | 1.694 | 1.236–2.322 | | | 1.384 | 1.008–1.906 | |
gender | female | 174 | 1 | | 0.787 | | 1 | | 0.543 |
| male | 333 | 0.957 | 0.694–1.318 | | | 1.106 | 0.799–1.532 | |
stage | I/II | 307 | 1 | | < 0.001 | | 1 | | < 0.001 |
| III/IV | 200 | 1.637 | 1.495–1.792 | | | 3.522 | 2.510–4.941 | |
Risk score | low | 254 | 1 | | < 0.001 | | 1 | | < 0.001 |
| high | 253 | 1.089 | 1.060–1.118 | | | 2.911 | 2.017–4.202 | |
7. Splicing-factor-regulated network construction
By analyzing level 3 mRNA-seq data from TCGA, a total of 28 splicing factors associated with survival time of patients were identified(Fig. 8B).