STfusion verification in HeLa cells
To verify STfusion accuracy, we applied it to HeLa cancer cells. P. Wu et al. [31] experimentally identified nine chimeric RNAs in HeLa cells. Further detected in Hela cancer cells were the cis-SAGe SLC45A3-ELK4 by Zhang et al. [32] and the trans-splicing fusion event VMP1-RPS6KB1 by L. Wu et al. [33]. Of these 11 fusion transcripts, the number of poly(A) tails per parental gene were considered. If the concept is correct and a chimeric transcript caused by a cis-SAGe mechanism or a chromosomal rearrangement is transcribed, the 5´ genes should not have a poly(A) tail, but the 3´ genes will have one.
STfusion verification was performed using the sequenced mRNA and the number of poly(A) tails produced by TAIL-seq, and the published results of counted poly(A) tails per gene [13] (Tables 1 and 2).
STfusion verification for gene fusions and cis-SAGe
LHX6-NDUFA8, SLC2A11-MIF, and SLC45A3-ELK4 are confirmed cis-SAGe events in HeLa [31,32]. The parental genes SLC45A3 and ELK4 were not listed as having poly(A) tails in Chang et al. [13]. With an in-depth search in the sequenced mRNA published in the same paper, five poly(A) tails attached to the 3´ genes were identified (Tables 1 and S13). However, TXNDC9-LYG1 did not seem to follow the proposed hypothesis. An inversion on Chr2:87-111 megabase pairs (Mbp) was identified by Breakdancer [34] (Table S14) and experimentally confirmed by Landry et al. [35]. Both parental genes are located within this region.
The results shown in Table 1 confirm our assumption that a fusion transcript caused by a cis-SAGe mechanism or a chromosomal rearrangement lacks a poly(A) tail at the 5´ gene and instead has an elevated number of poly(A) tails attached to the 3´ gene.
Table 1. Poly(A) tail occurrences of the parental genes of cis-SAGe and gene fusions in HeLa cells
For two fusion transcripts, GFOD2-ENKD1 and MFSD7-ATP5I, no poly(A) tails were detected. Consistently, no tails were detected by Chang et al. [13] nor with an in-depth search in the TAIL-seq data.
cis-SAGe/ gene fusion
|
Poly(A) tails
|
(5´ gene - 3´ gene)
|
5´ gene
|
3´ gene
|
FOXRED2-TXN2
|
0
|
340
|
LHX6-NDUFA8
|
0
|
218
|
SLC2A11-MIF
|
0
|
842
|
SLC45A3-ELK4
|
0
|
5
|
TXNDC9-LYG1
|
103
|
0
|
UBE2Q2-FBXO22
|
0
|
97
|
Distinction among fusion transcripts caused by trans-splicing
In the case of a fusion transcript caused by trans-splicing, both parental genes were transcribed and poly-adenylated. Poly(A) tails for both parental genes were observed (Table 2).
The transcription-induced chimera VMP1-RPS6KB1 is assumed to occur via trans-splicing [33] in HeLa cells. Indeed, this event was confirmed with STfusion. The fusions TINF2-NEDD8 and DHRS13-FLOT2 were experimentally confirmed [31], but both parental genes were polyadenylated. This data suggests that these fusions are caused by a trans-splicing event. The latter fusion transcript, DHRS13-FLOT2, is suggested to be transcription induced [36], because no genetic cause could be identified.
Table 2. Poly(A) tails occurrences of the parental genes assumed to occur via trans-splicing in HeLa cells.
Trans-splicing fusion transcript
|
Poly(A) tails
|
(5´ gene - 3´ gene)
|
5´ gene
|
3´ gene
|
DHRS13-FLOT2
|
64
|
71
|
TINF2-NEDD8
|
35
|
69
|
VMP1-RPS6KB1
|
51
|
48
|
STfusion and C-scores applied to clinical tissue samples
Spatial transcriptomics data published by Berglund et al. [19] was used to localise the cis-SAGe SLC45A3-ELK3. In this study, 12 tissue sections taken from a patient with prostate cancer were analysed; each section harboured epithelial areas annotated as healthy, inflamed, prostatic intraepithelial neoplasia (PIN, a precancerous lesion), cancerous with a Gleason score (Gs) 3 + 3, or cancerous with Gs 3 + 4. The Gs is a grading system used to classify the aggressiveness of prostate cancer, scales range from 1 (appears healthy) to 5 (appears abnormal). The total Gs is a combination of two grades, one each for the dominant and minor area [37]. The tissues harbour the cis-SAGe SLC45A3-ELK4 (Tables S15-S17) which contributes to cell proliferation in prostate cancer [32], Two fusion variants were identified in the bulk RNA-sequenced tissue sections: SLC45A3-ELK4 exon 4-exon 2 and SLC45A3-ELK4 exon 5-exon 2 (Figure S1, Table S18).
Fusion transcript localisation using STfusion and C-scores
The C-score measures the fold change in the numbers of poly(A) tails on the parental gene compared to the parental gene sample mean expression. A higher C-score indicates that the occurrence of the cis-SAGe SLC45A3-ELK4 is likely. This difference was caused by the chimeric mRNA and elevated 3´ gene ELK4 expression, which is defined by the promoter of the 5´ UTR of the 5´ gene SLC45A3. A low C-score, however, represents a large number of poly(A) tails for the 5´ gene SLC45A3 compared to the sample mean SLC45A3 expression. This data indicates the occurrence of the cis-SAGe is very unlikely. The C-score mirrored the likelihood of cis-SAGe absence or occurrence in a spot as well as the expression levels of SLC45A3 or cis-SAGe SLC45A3-ELK3, respectively.
The C-scores spatial distribution per sample was compared to the activity maps of the transcriptomic factors identified in the clinical tissue samples analysed with spatial transcriptomics (Figures 2 and S2). The predicted occurrence of the cis-SAGe SLC45A3-ELK4 in the 12 tissue sections was dominant in the centre or the periphery of diseased areas; the predicted absence of the cis-SAGe was dominant in normal glands.
For the joint STD analysis of the three cancerous samples shown in Figure 2, the activity of the transcriptomic factor “PIN” overlapped with the area of the predicted cis-SAGe in sample 1.2. The areas were almost identical in size and form. In sample 3.3, the cis -SAGe occurred only occasionally in the periphery of the PIN area. For the transcriptomic factor “Cancer”, the cis-SAGe occurred intensely at its activity centre in samples 2.4 and 3.3, and at its periphery in all three samples. There were no spots with higher SLC45A3 expression, and thus there was a predicted absence of the cis-SAGe in the areas marked as cancerous. Normal glands were dominated by absent cis-SAGe. However, a few spots with high cis-SAGe intensity occurred elsewhere, often in direct vicinity of high SLC45A3 expression indicating cis-SAGe absence.
The joint STD analysis of the 12 tissue samples identified a cancerous area in sample 3.2. (Figure 2, S2) which overlapped with the areas of the predicted cis-SAGe.
Figure 2. Factor activity maps compared to cis-SAGe presence. Activity maps of transcriptomic factors “Normal glands”, “PIN glands”, ”Inflammation”, and ”Cancer” compared to the predicted occurrence of SLC45A3-ELK4. The transcriptomic factors were from a joint analysis of the cancer samples 1.2, 2.4 and 3.3, and from the joint 12-sample analysis for 3.2. In the tissues sections, normal glands were dominated by cis-SAGe absence. Further, there was a clear overlap between the activity of the “PIN glands” factor (red-orange squares) and the occurrence of SLC45A3-ELK4 in samples 1.2 (red dots). There was an obvious overlap with the centre of the “cancer” factor in samples 2.4 and 3.3.
To provide a statistical test for the coherence of disease areas and cis-SAGe occurrence, Spearman and Pearson correlations ρ were calculated (Tables 3 and S19). The strongest correlation of cis-SAGe occurrence is to the PIN area in sample 1.2 (ρ Pearson = 0.31, p = 1.11E-10), and to the cancerous areas in sample 2.4 (ρPearson = 0.25, p= 1.07E-07), sample 3.3 (ρPearson = 0.14, p = 1.55E-03), and sample 3.2 (ρPearson = 0.12, p = 4.32E-03).
The areas with active transcriptomics factors (“Normal glands”, “PIN glands”, “Inflammation”, and “Cancer”) were further analysed concerning the share of spots with predicted present or absent fusion transcripts (Figure 3). In normal glands, the cis-SAGe is dominantly absent. In the cancerous areas of the sample 1.2 and 3.3, the share of spots with mild occurrence (0 < C-score < 1) is increased compared to the other factors, whereas in the PIN areas of the same samples, the share of spots with strong occurrence (C-score > 1) is increased.
Figure 3. Fraction of spots with fusion transcript occurrence (C-score > 0) and absence (C-score ≤ 0) for the factors shown in Figure 2. The factor activity threshold was set to 20%. Mild occurrence and mild absence C-score thresholds were set to 1 and -1, respectively.
Table 3. Correlation of C-score and factor activities per sample shown in Figure 2.
Sample
|
Factor
|
Sample-wide
|
|
|
# spots
|
Correlation
⍴ Spearman
|
Correlation
⍴ Pearson
|
p-value for
⍴ Spearman
|
p-value for
⍴ Pearson
|
Sample 1.2
|
Normal glands
|
406
|
-0.07
|
-0.09
|
1.37E-01
|
6.41E-02
|
Sample 1.2
|
PIN glands
|
406
|
0.08
|
0.31
|
1.05E-01
|
1.11E-10
|
Sample 1.2
|
Cancer
|
406
|
-0.02
|
-0.01
|
6.87E-01
|
7.76E-01
|
Sample 2.4
|
Normal glands
|
451
|
-0.10
|
-0.13
|
4.19E-02
|
6.20E-03
|
Sample 2.4
|
PIN glands
|
451
|
0.14
|
0.02
|
3.15E-03
|
6.21E-01
|
Sample 2.4
|
Cancer
|
452
|
0.26
|
0.25
|
3.74E-08
|
1.07E-07
|
Sample 3.3
|
Normal glands
|
500
|
-0.02
|
-0.04
|
6.51E-01
|
3.53E-01
|
Sample 3.3
|
PIN glands
|
500
|
-0.03
|
0.00
|
4.91E-01
|
9.59E-01
|
Sample 3.3
|
Cancer
|
500
|
0.20
|
0.14
|
4.16E-06
|
1.55E-03
|
Sample 3.2
|
Inflammation
|
560
|
0.17
|
0.11
|
1.32E-04
|
1.50E-02
|
Sample 3.2
|
Normal & PIN
|
560
|
-0.17
|
-0.04
|
8.28E-05
|
3.86E-01
|
Sample 3.2
|
Cancer
|
560
|
0.16
|
0.12
|
2.61E-04
|
4.32E-03
|
Differentially expression and pathway annotation for cis-SAGe occurrence
The combination of spatial transcriptomics data and STfusion using C-scores offers new possibilities to explore fusion transcript occurrence, differences in cis-SAGe transcription levels and their spatial relation in clinical tissue samples.
Areas with absent and present SLC45A3-ELK4 fusion transcripts were compared with regards to differentially and co-expressed genes and enriched pathways (Figure 4). In areas without fusion transcripts, there were pathways activated which are related to higher transcriptional stress (protein processing in the endoplasmic reticulum and lysosome). The pathways focal adhesion and regulation of actin cytoskeleton are highly active in the areas with cis-SAGe occurrence and are known to play a crucial role in cancer cell motility and invasion [38,39]. Phosphatidylinositol-3-kinase (PI3K)-AKT signalling is linked to treatment resistance [40,41].
Figure 4. Differential expression and pathway annotation of cis-SAGe occurrence. Areas with absent and present fusion transcripts in sample 3.3 were compared. Besides normal glands, the sample harboured an area annotated as PIN and a large area annotated as cancerous of which some parts were annotated as aggressively cancerous (Gs 3 + 4). A Significantly differentially expressed genes (FDR, q < 0.1) are shown. B Significantly differentially expressed genes were submitted to PathwAX on the KEGG database. Enriched pathways are presented.
To summarise the results, the proposed method, STfusion, identified fusion transcripts caused by a cis-SAGe mechanism or chromosomal rearrangement. It also distinguished these fusion transcripts from those caused by a trans-splicing event. Applying STfusion and the C-score to clinical tissue samples analysed with spatial transcriptomics demonstrated the spatial distribution of the fusion transcripts within the tissue section. Further, the fusion transcript was linked to the disease areas (Inflammation, PIN, and cancer).