Overview of AS events and related genes in GC cohort
AS events could be divided into seven types as illustrated in Fig.1A, including Exon skip (ES), Alternate promoter (AP), Alternate terminator (AT), Mutually
exclusive exons (ME), Retained intron (RI), Alternate acceptor site (AA) and Alternate donor site (AD). Through strict filtering, 19698 AS events from 11579 parent genes were identified in 364 GC patients, including 7189 ESs in 3562 genes, 4310 APs in 2474 genes, 3487 ATs in 1979 genes, 1664 AAs in 1310 genes, 1542 ADs in 1174 genes, 1391 RIs in 985 genes and 106 MEs in 104 genes (Fig.1B). In addition, the Upset plot indicated that one gene possesses several types of AS events (Fig.1C).
Identification of survival related AS events in GC cohort
Through Univariate Cox regression analysis, a total of 1318 AS events from 957 parent genes were viewed as prognostic ones, including 464 ESs in 377 genes, 352 APs in 229 genes, 200 ATs in 133 genes, 79 AAs in 79 genes, 128 ADs in 121 genes, 81 RIs in 75 genes and 14 MEs in 14 genes (Fig.1B). Moreover, intersections among these seven types of AS events were exhibited with the Upset plot (Fig.1D), demonstrating that one gene could hold up to several types of prognostic AS events.
Bioinformatics analysis of survival associated AS events
To elucidate the potential interference of OS associated AS events and corresponding proteins, 957 parent genes of 1391 AS events were sent for bioinformatics analyses, including GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes) and PPI (Protein-Protein Interaction). As a result, 4 terms in biological process, 8 terms in cellular component and 10 terms in molecular function were highlighted via GO analysis (Fig.2A). Moreover, 11 of 23 remarkably enriched KEGG pathways seem to be implicated in oncogenic processes, including Basal cell carcinoma, Autophagy, Proteoglycans in cancer, ECM-receptor interaction, Gastric cancer, Hepatocellular carcinoma, Focal adhesion, EGFR tyrosine kinase inhibitor resistance, Cell cycle, HIF-1 signaling pathway and Wnt signaling pathway (Fig.2B). To further explore the significances of these parent genes, a PPI network was constructed which incorporated 373 nodes and 960 edges (Fig.2C). Moreover, the key module, composed of 25 nodes and 297edges, was processed via CytoHubba tool (Fig.2D). The parent genes/proteins in the key module were mainly comprised of ribosomal proteins (RPS5, RPS6, RPLP0, etc) and ribonucleoproteins (HNRNPC, HNRNPR, SNRNP70, etc).
Construction of the prognostic signatureusing AS events for GC patients
To minimize the counts of the prognostic model, lasso and multivariate Cox regression methods were performed. After lasso regression filtering, 20 variables (If available) in each splicing type dropped to 20 in AA, 19 in AD, 17 in AP, 16 in AT, 17 in ES, 13 in ME, 15 in RI and 17 in all types, respectively (Fig.3A-H). Then, the selected AS events were further screened by the multivariate Cox regression, and thus final prognostic models were constructed, containing 15 AA, 14 AD, 9 AP, 10 AT, 13 ES, 8 ME, 10 RI and 11 mixed events, respectively (Fig.4A-H).
Using the median value of the riskscore as the cutoff, GC patients were classified into high- and low- risk groups. The Kaplan-Meier curves were employed to demonstrate the survival variation of patients between these two groups. Each AS-based prognostic model, stratified by AA, AD, AP, AT, ES, ME, RI events or not stratified, indicated the predictive power that patients in the high-risk group had poorer OS than those in the low-risk group, respectively (Fig.5A-H).
ROC curves were generated to assess the predictive accuracy of the eight AS prognostic models. As illustrated in Fig.5I, the risk score of AD model showed the greatest predictive power with an AUC of 0.804, followed by AA, AP, AT, RI and the model not stratified by AS types. The performance of these prognostic signatures with AUC>0.7 were further tested in predicting the survival status. With the increasing risk score calculated by any type of AUC>0.7 (AA, AD, AP, AT, RI and mixed events), there were more patients dead and less patients living, respectively (Fig.6A-F).
To determine whether the final model was an independent prognostic factor for GC, AS predictive models along with age, gender, grade and clinical stage, were once again sent for Uni/Multivariate Cox regression analysis. Although AA, AD, AP, AT, ES, ME, RI and mixed events exhibited predictive performance (Fig. 7A), risk scores calculated by the formula of AA, AD, AT, ES, RI events were independent prognostic indicators (Fig.7B).
Interactive analysis of Splicing factors and AS events
The regulatory network was built based on the expression of SF genes and PSI values of AS events via using the cytoscape software. As shown in Fig.8A, prognostic AS events, including 20 risky (red node) and 14 favorable ones (green node), were positively or negatively modulated by the key SF genes (blue node). Remarkably, the same SF could regulate different AS events, and the same AS could be regulated by different SFs. Moreover, a majority of adverse AS events were positively associated with SFs (red line), whereas most favorable AS events were negatively associated with SFs (green line). For the same gene and same splicing type, the SF may play different or even opposite roles producing different isoforms. For example, QKI expression was positively correlated with AT event of SEPT11-69618, but negatively correlated with AT event of SEPT11-69616 (Fig.8B, C).