Overall mapping rate after genome alignments
Reads from the RNA-seq of each esophageal clinical case were mapped to human genome sequences. The results are shown in Table 1, where the total alignment number of the sequenced fragments ranged from 15 to 39 million, and the comparison rate ranged between 83%-92%, suggesting the high reliability of these RNA-seq data.
Changes in AS patterns between ESCC tissue/matched normal tissue and two cell lines
From the results of splicing analysis, Using MISO, a total of 45,439 AS events were detected. Of these, 6,019 AS events were significantly differently spliced in ESCC tissues as compared to normal tissues, with a cut-off of BF> 5, |ΔΨ|> 0.2. These significant AS events included a number of inclusion and exclusion reads > 10, accounting for 13.25% of the total number of AS events, suggesting a relative high splicing index in ESCC (Figure 1A). After filtering out the AS events with inconsistent ΔΨ trends in the paired samples, there were 5,150 differential splicing events, including 2,324 exon skipping (SE) events, 1,014 intron retention (RI) events, and 693 mutually exclusive exon (MXE) events, 592 alternative 3’ splice site (A3SS) events, and 527 alternative 5’ splice site (A5SS) events. The AS results of ESCC tissue samples showed that the proportion of SE events that could be detected were the most abundant at about 33.6%, RI events at 6.9%, MXE 6.8%, A3SS 8.5%, and A5SS 6.4% of the total events. The percentage of SE events in differential splicing events was still the highest as 4.8%, while RI events comprising 2.5%, MXE events comprising 1.5%, A3SS events comprising 1.2%, and A5SS events comprising approximately 1.1%. Individual differences were detected during MISO analysis which were more obvious in the differential splicing events (Figure 1B). From the volcano plot, it could be observed that the ΔΨ values and log2(BF) values in detectable AS events and the differentially spliced events were symmetrically distributed (Figure 1C), Also, it could be seen from theand bar plotthat the ΔΨ trend did not have a significant difference between AS event type (Figure 1D).
On the other hand, a total of 32,891 AS events were detected in the SHEE/SHEEC cell lines, of which a total of 928 AS events were significantly differentially spliced, accounting for approximately 2.8% of the total detectable AS events (Figure 1E). The threshold for judging the differences in AS events in the SHEE/SHEEC cell lines was the same as applied in ESCC AS event filtration. In the detectable AS events, the proportion of SE was the largest, at 55.9%, with RI at 9.2%, MXE at 10.9%, A3SS accounting for 13.6%, and A5SS at 10.3%. Of the differential splicing events, the number of SE, RI, MXE, A3SS, and A5SS events were 455, 131, 152, 96, and 94, accounting for 49%, 14.1%, 16.4%, 10.3%, and 10.1% of the total number of differential splicing events, separately (Figure 1F). The distribution of both total and differential AS is similar in ESCC tissues and ESCC cell lines.
Re-annotation of AS events in paired ESCC tissues
To identify the genes and transcripts representing the differential AS events and to filter out incorrect AS events, as described in the methods, we re-annotated all differentially splicing events found in at least 3 paired samples. The number of filtered significant AS events changed across the different categories of AS due to the re-annotation, but the event trend had minimal changes (Figure 2A). For SE events, there were a total of 222 and 151 significant alternatively spliced events before and after re-annotation, reducing the number of AS events to approximately 68% of all events. There were a total of 134 RI significant AS events before re-annotation, and 115 after re-annotation, with a reduction of 14.2%. A total of 62 significant MXE events were identified before re-annotation and only 6 remained after re-annotation, reducing the total number to 9.7%. 57 significant A3SS events were found before re-annotation while 42 remained after re-annotation. 35 out of 52, 67.3%, significant A5SS events were identified after the re-annotation. Among these above results, SE events are the most prevalent in ESCC. The number of MXE events was reduced the most, because the two exons that should have been mutually exclusive in the MISO official MXE event annotation were observed to exist in the same transcript. Thus, a total of 349 significant AS events including five types were identified after re-annotation in 15 matched pairs of ESCC and normal tissue samples, of which transcripts/genes representing 169 AS events could be found both in the Ensembl and RefSeq annotations, whereas 114 could only be identified in the RefSeq annotations, and 66 AS events only in the Ensembl annotations (Figure 2B). Through we found different annotations from these two popular databases, the majority of our annotation was found in both databases. We have also calculated the fractions of AS events representing protein-coding mRNA and ncRNA (non-coding RNA) (Figure 2C). The percentage for ncRNA ranged from 10% to more than 20% in total events, suggesting alternative splicing is prevalent for ncRNA and its critical role in cancer [31]. The top 10 most prevalent splicing events that occurred in the paired samples are shown in Table 2.To better illustrate the different splicing patterns, HNRNPC, CALD1, KIAA1217, TPM1, VCL and ZNF207 were selected from the top genes and their splicing patterns are visualized by Sashimi plots (Figure 3). These included representative examples for MXE, A5SS, RI and SE splicing types, which indicates the splicing diversity in ESCC.
Overlapping AS events in ESCC tissues and SHEE/SHEEC cell lines
A total of 37 significant AS events were identified in both ESCC tissues and cell lines, 15 of which showed the same trend of ΔΨ values. Among these 15 AS events, there were 4 SE events, 7 RI events, 2 MXE events, and 2 A5SS events (Supplementary Table 1). Though the overlapping rate is quite low, we presume this might reflect the consensus and difference between clinical samples and cultured cell lines.
Classification of AS events according to representative transcript types
After the re-annotation of the 349 AS events from ESCC tissues, 312 events represented at least one protein-coding transcript, which accounted for 89.4% of the total AS events. The remaining 37 AS events did not represent any protein-coding transcripts, including protein coding transcripts, lncRNAs, and microRNAs (Supplementary Table 2). Among the 235 AS events re-annotated by Ensembl transcript annotations, 200 events represented at least one protein-coding transcript, which accounted for 85.1% of the total AS events, and the remaining 35 AS events represented non-protein-coding transcripts. Of the 283 variable splicing events re-annotated by RefSeq transcript annotations, 266 events represented at least one protein-coding transcript, which accounted for 94% of the total AS events, and the remaining 17 AS events represented non-protein-coding transcripts. These results suggest that splicing of ncRNA transcripts could be just as prevalent as protein-coding transcripts.
Functional enrichment revealed the potential roles of AS transcripts
To identify the biological processes and pathways affected by splicing abnormalities in ESCC, we performed the enrichment analyses. A total of 270 protein-coding genes were selected for functional and pathway enrichment analysis from the results of alternative splicing events after re-annotation of differential AS events with gene annotations from the Ensembl and RefSeq databases. A total of 234 genes were enriched in 543 biological processes. There were 382 biological processes satisfying the threshold of P < 0.05, and 39 genes were enriched in 12 pathways, of which 10 pathways satisfied P < 0.05. Interestingly, there were 4 KEGG pathways and 20 biological processes that potentially related to invasion and metastasis of cancer cells, such as adherens junction, ECM-receptor interaction, focal adhesion, regulation of cell mobility, regulation of cell migration and cell-matrix adhesion (Figure 4). These results indicated that AS in ESCC might involve multiple cellular functions to promote tumor progression.
Splicing regulatory network in ESCC
To find the regulators of the splicing alterations in ESCC, we constructed a splicing regulatory network by calculating the FPKM values of splicing factors and PSI values of AS events. Filtered with a cut-off of P < 0.05 after Spearman correlation analysis, 5,999 splice factor-splicing event pairs and 5,511 splice factor-target gene pairs were chosen to build a splicing regulatory network. This network involved 81 splicing factors and 223 differential AS events (Figure 5). Most of splicing factors could regulate dozens of target genes. On the other hand, one target gene could be processed by several splicing factors, suggesting the co-operation between these factors. Next, Splicing factors that were differentially expressed in ESCC and matched normal tissues were measured in two public microarray datasets (GSE53624 and GSE23400) and the RNA-seq dataset generated from our laboratory [16]. From these analyses, the splicing factor SF3B4 was significantly differentially expressed in ESCC with a 1.5 fold-change in three different datasets (Table 3).
For the splicing factors in the network, SF3B4 (splicing factor 3b subunit 4) caused us great interest. The splicing regulatory network for SF3B4 (splicing factor 3b subunit 4) and its target genes (with significant Spearman correlation) were also illustrated, which contained 92 target genes and 102 abnormal AS events (Figure 6A), as some genes generated more than one AS event. Moreover, SF3B4 was found to be a survival-related gene in ESCC. In TCGA dataset, ESCC patients with a lower expression level of SF3B4 had a longer survival time (Figure 6B). The PSI value of SRSF5’s RI event was found to have the smallest p-value with Spearman correlation analysis with the FPKM of SF3B4 (Figure 6C). To understand the potential functions of SF3B4, functional enrichment analysis was also performed to identify the biological processes for SF3B4’s target genes, resulting in a dot plot of 9 significant biological processes, which involves cell-cell adhesion, NF-KB signaling and regulation of translation (Figure 6D). Many evidences indicated AS are often linked to the occurrence of cancer driver mutations in genes encoding either core components or regulators of the splicing machinery [32]. However, we searched cBioPortal database (http://www.cbioportal.org/), but no mutation for SF3B4 in current TCGA ESCC clinical samples was found (data not shown).
SF3B4 affects AS after its knockdown
To validate the regulatory role of SF3B4 on AS events in tumor samples, we have downloaded two RNA-seq datasets with SF3B4 knockdown, and calculated the inclusive levels of AS events with IncLevelDifference between control shRNA and SF3B4 knockdown (methods described in Supplementary file 1). From the 102 SF3B4 regulated AS events described above. We selected 36 AS events with top Spearman test P-value and top correlation (1 > |Rho| > 0.5), and named them as the top SF3B4 regulated AS events in ESCC samples. Subsequently, we tried to sieve out the overlapping results between the top SF3B4 regulated AS events in ESCC and significant AS events (FDR < 0.05) in either of the two validation datasets. Six overlapping AS events regulated by SF3B4 in tumor samples were finally verified, including 1 SE event (ATP2B4) and 5 RI events in 4 genes (SRSF5, PHF6, SNHG12, KIAA1217). It is interesting to point out that their Rho values are all negative, indicating SF3B4 represses intron retention in these transcripts. These results also suggest that SF3B4 could regulate AS in a pan-cancer manner, including in ESCC. The detailed information of the top 36 SF3B4 regulated AS events in ESCC are provided in Supplementary Table 3, with the 6 verified AS events indicated in the validated cell lines.