Fusion transcript detection using spatial transcriptomics

doi:10.21203/rs.2.19314/v1

Download PDF

Technical advance

Fusion transcript detection using spatial transcriptomics

https://doi.org/10.21203/rs.2.19314/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 04 Aug, 2020

Read the published version in BMC Medical Genomics →

You are reading this older preprint version

Read the latest preprint version →

Background: Fusion transcripts are involved in tumourigenesis and play a crucial role in tumour heterogeneity, tumour evolution and cancer treatment resistance. However, fusion transcripts have not been studied at high spatial resolution in tissue sections due to the lack of full-length transcripts with spatial information. New high-throughput technologies like spatial transcriptomics measure the transcriptome of tissue sections on almost single-cell level. While this technique does not allow for direct detection of fusion transcripts, we show that they can be inferred using the relative poly(A) tail abundance of the involved parental genes.

Method: We present a new method STfusion, which uses spatial transcriptomics to infer the presence and absence of poly(A) tails. A fusion transcript lacks a poly(A) tail for the 5´ gene and has an elevated number of poly(A) tails for the 3´ gene. Its expression level is defined by the upstream promoter of the 5´ gene. STfusion measures the difference between the observed and expected number of poly(A) tails with a novel C-score.

Results: We verified the STfusion ability to predict fusion transcripts on HeLa cells with known fusions. STfusion and C-sore applied to clinical prostate cancer data revealed the spatial distribution of the cis-SAGe SLC45A3-ELK4 in 12 tissue sections with almost single-cell resolution. The cis-SAGe occured in the centre or periphery of inflamed, prostatic intraepithelial neoplastic, or cancerous areas, and occasionally in normal glands.

Conclusions: STfusion detects fusion transcripts in cancer cell lines and clinical data, and distinguishes chimeric transcripts from chimeras caused by trans-splicing events. With STfusion and the use of C-scores, fusion transcripts can be localised in clinical tissue sections on almost single cell level.

Epigenetics & Genomics

Fusion transcript detection

Spatial Transcriptomics

gene fusion

cis-SAGE

oncogene

A fusion transcript is a merging of different fragment transcripts. This molecule is translated into a chimeric protein that possesses either a new or adapted function. Many fusion transcripts are classified as oncogenes that have the potential to cause cancer but also contribute to tumourigenesis as driver mutations. Mitelman et al. [1] estimated that 20% of cancer morbidity is caused by such fusions. Currently, 21,477 fusion transcripts have been identified; almost all can be found in neoplastic cells [2]. The majority of chimeric transcripts were detected in recent years by deep-sequencing technologies and the application of bioinformatics tools [2]. The occurrence of fusion transcripts is used as a cancer biomarker [3] as well as a cancer treatment target [4].

Fusion transcripts, also termed chimeric transcripts or chimeric RNA, are usually detected at the RNA level. If its underlying cause is known, it is further specified as either a genetic (i.e., gene fusion) or transcription-induced chimera. Gene fusions are caused by a genetic mutation - deletion, inversion or translocation event - of the DNA sequence from both parental genes. The mutated parental gene DNA sequences are translated into a hybrid messenger RNA (mRNA). Whereas transcription-induced chimeras are caused by an abnormal mechanism either cis-splicing or trans-splicing [5]. The parental gene sequences remain intact.

Cis-splicing fusion transcripts are termed cis-splicing of adjacent genes (cis-SAGe) or read-through transcripts. cis-SAGe result from neglected gene boundaries; instead the DNA sequences of two adjacent genes are read and transcribed into a hybrid mRNA transcript. cis-SAGe were not exclusively identified in cancer cells. Despite intense research on clinical tissues and cancer cell lines that harbour a cis-SAGe, there is no convincing genetic mutation behind their production and thus a molecular mechanism is suspected. Chimeric transcription of cis-SAGe requires two genes on the same strand within 30 kilobase pairs (kbp) [6]. Chwalenia et al. [7] summarised features of cis-splicing chimeras: (i) active transcription of the 5´ gene, (ii) multi exonic neighbouring parental genes, (iii) absence of interstitial DNA deletion, (iv) presence of transcripts between the two neighbouring genes, (v) presence of CTCF binding sites between parental genes and (vi) induction by CTCF knockdown. They further suggest that cis-SAGe are an additional element of biological processes that increases the diversity of gene products. This supposition is consistent with the fact that cis-SAGe are also found in healthy tissue samples. Different cis-SAGe fusion variants, which are characterised by the different fusion points of the parental genes, are observed. Dominant among the so far detected cis-SAGe is the fusion point at the second exon of the 3´ gene [7].

Transcription induced fusion transcripts that result from trans-splicing are hybrids of two mRNA transcripts of separately transcribed genes. These molecules are rare, but if they occur, they can contribute to neoplastic transformation due to their pro-proliferative effects [8].

The poly(A) tail is a 100–250 bp sequence of adenine nucleotides. The poly(A) tail is not part of the DNA sequence; it is attached to the 3´ gene post-transcriptionally. A poly(A) signal at the 3´ untranslated region (UTR) of the DNA sequence defines the point of the poly(A) tail synthesis. Poly(A) polymerases, which are RNA polymerases, encode and attach the poly(A) tail to the pre-mRNA. Cleavage and polyadenylation specificity factor (CPSF) is one representative protein that which recognises the poly(A) signal in the pre-mRNA sequence (in eukaryotes, often AAUAAA). Although the function of the poly(A) tail is not completely known, it increases mRNA stability during export from the nucleus to the ribosome, protects the mRNA from degradation and regulates its half-life. The poly(A) tail shortens with mRNA age. Further, the poly(A) tail, together with the counterpart at the 5´ end (the 5´ cap), initiates protein translation [9]. Recent studies by Park et al. [10] investigated its regulatory role in the somatic cell cycle. Cell cycle dysregulation is a hallmark of cancer.

Chimeric RNA can be detected experimentally (e.g. fluorescence in situ hybridisation [FISH] or Southern blot) or computationally. The latter is based on RNA-sequencing (RNA-seq) data and the application of software tools that identify the chimeric RNA. The tools search for encompassing reads, i.e., read pairs with each read aligned to one of the parental genes, and spanning reads, i.e., partially aligned reads that span the fusion point. Sensitivity and specificity depend on the tool as well as read length, read quality score and the number of reads that support each fusion transcript [11, 12]. The results are often compared to known fusion transcripts and then categorised into potential gene fusions, cis-SAGe or trans-splicing fusion transcripts. This search strategy, however, provides no information about the transcription direction and is limited in terms of spatial information.

In this study, we present a novel method to detect fused mRNA molecules using the poly(A) tail presence at the 3´ gene and its absence at the 5´ gene. Applying this method, and the C-score, to clinical tissue sections analysed with spatial transcriptomics allowed the detection of the cis-SAGe SLC45A3-ELK4 on almost single cell level. The cis SAGe clearly overlaped with diseased areas within the tissue samples. Further, we emphasise that increased cis-SAGe correlates with elevated levels of transcriptional stress.

Fusion transcript detection using STfusion and the poly(A) tail presence

During the correct transcription of two adjacent genes, the poly(A) tail is attached to each of the genes. The poly(A) tail serves as a proxy for the transcription level of each gene.

In the case of a fusion transcript caused by a cis-SAGe mechanism or chromosomal rearrangement, however, the poly(A) tail is absent from the 5´ gene. In this case, the 5´ gene expression level defines the fusion transcript expression. The poly(A) tail is attached to the 3´ end of the parental 3´ gene. In the proposed method, the number of sequenced poly(A) tails that can be mapped to the 3´ gene, and the absence of poly(A) tails at the 5´ gene, is used to indicate a fusion transcript. The number of poly(A) tails aligned to the 3´ gene mirrors the expression level of the fusion transcript (Figure 1).

STfusion verification in HeLa cells

In order to verify STfusion, sequenced mRNA from HeLa cancer cells were analysed. The occurrence of poly(A) tails aligned to the parental genes of experimentally confirmed fusion transcripts was tracked (Tables 1, 2, S1-S11).

The sequenced mRNA data produced by TAIL-seq [13], and the results published in the same paper, were used. This tool sequenced the end of the mRNA and thus includes the poly(A) tail sequence. Chang et al. [13] applied TAIL-seq to HeLa and found 4,000 genes have a poly(A) tail.

Additionally, for parental genes of the known HeLa fusion transcripts with no poly(A) tails, according to the results by Chang et al. [13], the raw RNA-seq data from the same publication were aligned and analysed in-depth. We were only interested in reads that contain a poly(A) tail sequence, which is the file that contains the second read (read 2, read length 230 bp). First, the potential poly(A) tail sequence nucleotides were removed. Second, the reads were further trimmed by 64 nucleotides (nt) to remove additional adapter sequences. The trimming was performed with Cutadapt [14]. The resulting reads were aligned with Tophat2 [15] and STAR [16] to the human assembly GRCh38 Ensembl (release 84) [17]. Uniquely mapped reads (Tophat2 mapping quality ≥ 10, STAR mapping quality = 255) were kept. Finally, reads that were aligned to a reference sequence with multiple adenine or thymine nucleotides in the direct vicinity were removed, because the eliminated sequence assumed to be a poly(A) tail is part of the genome. All reads that mapped to the 3´ UTR of the parental genes were considered (Table S12).

STfusion applied to clinical tissue samples

Spatial transcriptomics data, transcriptomic factors and activity maps

Spatial transcriptomics [18] (The Spatial Transcriptomics method, Suppl.), a novel technology, allows one to obtain expression levels throughout tissue samples while maintaining spatial information. Spatial transcriptomics opens new possibilities for the investigation of altered expression levels, especially under modified conditions (e.g., cancerous cells within tissue samples). In a recent publication, Berglund et al. [19] showed that the cells in the centre, periphery and vicinity of prostate cancer areas develop a unique expression pattern that is clearly differentiated from areas with healthy cells of a similar type. Thus, this technique can provide insights into cancer progression.

Spatial transcriptomics produces very rich expression levels data throughout a tissue sample. In order to identify hidden patterns of gene expressions that characterise cell types, spatial transcriptomics deconvolution (STD) was developed by Maaskola et al. [20]. This method, based on negative binomial regression, reveals unique expression profiles across tissue sections that present different cell types, microenvironments or tissue components. For each identified expression profile, i.e., transcriptomic factor, the method provides a spatial activity map that represents where the transcriptomic factor is active in the tissue sample. For example, the transcriptomic factor that represents “cancerous epithelial” cells exhibits a unique expression pattern that reveals genes strongly or differentially expressed compared to another transcriptomic factor. The activity map for the transcriptomic factor “cancerous epithelial” shows where the expression pattern is active within a tissue sample.

Berglund et al. [19] applied spatial transcriptomics to 12 tissue sections obtained from a patient diagnosed with prostate cancer. The spatial transcriptomics data comprised the expression levels of 5,053 protein-coding genes in 1,007 spots in each of the tissue sections. Further, the 12 tissue sections were analysed with STD in three different joined approaches: (i) cancerous samples 1.2, 2.4 and 3.3, (ii) samples 3.1 and 4.2 with inflamed and early-state cancerous areas, and (iii) the 12 tissue sections joined. The spatial transcriptomics data, STD transcriptomic factors and activity maps were used in this paper to localise the cis-SAGe SLC45A3-ELK4, link it to diseased areas, calculate differentially expressed genes and perform pathway annotation (Figures 2-4, S2 and S3).

Fusion transcript localisation using STfusion and C-scores

In spatial transcriptomics, the poly(A) tail of a transcript is captured and measured as a proxy for the expression level of a gene in a tissue sample on an almost single-cell level. However, for a gene that is abnormally transcribed, as it is the case for a fusion transcript, the amount of poly(A) tails provides shifted results. This deviation is measured.

For each parental gene, the ratio (R) of the gene expression in each spot divided by the sample mean expression was calculated. To avoid divisions of 0, a pseudo-count of 1 was added to both dividend and divisor. The C-score of a spot is the maximum value of both ratios and presents the presence or absence of the fusion transcript. In the case of absence, the C-score level mirrors the 5´ gene expression. In the case of a fusion transcript, the C-score level mirrors the fusion transcript expression level.

See Formulas 1 and 2 in Supplementary Files.

The proposed poly(A) tail detection method, STfusion using C-scores, was applied to clinical tissue samples analysed with spatial transcriptomics. The level of the C-score mirrors an abnormally high amount of poly(A) tails on the 3´ gene ELK4 and predicts the cis-SAGe SLC45A3-ELK4 (Figures 2, 3 and S2).

The C-score distribution contained peaks between 0.35 and 0.8 (Figure S4) caused by a parental gene expression of 0 in a spot and a parental gene mean expression in a sample below 1. To avoid a bias towards these low C-scores, the minimal detectable signal (MSD) threshold was set to the C-score frequency maximum. For spots with , the C-score was set to 0.

Differentially expressed genes and pathway annotation

Spots with fusion transcript presence and absence were compared to investigate differentially expressed and co-expressed genes and activated pathways (Figures 4 and S3). Spots were only chosen according to their C-score, thus the likelihood and expression level of the cis-SAGe, and regardless of an annotation as stroma or epithelial.

To assign a spot to the group ‘occurrence’ or ‘absence’, C-score thresholds were applied:

See Formula 3 in Supplemental Files

The spatial transcriptomics data with read counts for the 5,053 protein-coding genes across the spots, was checked for quality. Spots with a log-library size smaller than three median absolute deviations below the median log-library size were removed. Low-abundance genes with a read count of zero or close to zero among the spots were removed. The resulting data set was normalised per tissue sample using the R package “scater” [21]. The optimal pool size was calculated with the R package “scran” [22]. Genes with a very low standard deviation (sd) for the normalised expression levels among the chosen spots (sd < 10% of the expression mean) were removed.

The fold change per gene was calculated as gene expression mean of spots with C-scores > MSD (occurrence) divided by gene expression mean of spots C-scores < -MSD (absence). Differentially expressed genes were calculated with a two-sample t-test (confidence level 0.95) [23]. P-values were corrected for multiple testing with the Benjamini-Hochberg procedure[24]. Significantly differentially expressed genes (false discovery rate [FDR], q-value < 0.1) were submitted to PathwAX [25] on the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database [26].

Fusion transcript detection in bulk sequenced RNA from prostate cancer tissue samples

Bulk-sequenced mRNA from each of the 12 tissue sections were used to confirm the cis-SAGe SLC45A3-ELK4. The sequenced reads were aligned using two aligners to increase the possibility of identifying the cis-SAGe. The alignments of fastq reads were performed using Tophat2 (b2-sensitive and otherwise default parameters) [15] and STAR [16], both alignments against the human assembly GRCh38 Ensembl (release 84) [17]. Conversion from sam to bam format and indexing was done using Samtools [27].

Fusioncatcher [28] using Blat [29], Star and Bowtie2 [30] was applied to the aligned RNA-seq data to confirm the cis-SAGe. Additionally, the alignments were searched for encompassing reads, i.e., read pairs with each of the reads mapped to one of the parental genes, and for spanning reads that cover the fusion points identified with Fusioncatcher. The search was performed using Samtools (Tables S15 and S16).

STfusion verification in HeLa cells

To verify STfusion accuracy, we applied it to HeLa cancer cells. P. Wu et al. [31] experimentally identified nine chimeric RNAs in HeLa cells. Further detected in Hela cancer cells were the cis-SAGe SLC45A3-ELK4 by Zhang et al. [32] and the trans-splicing fusion event VMP1-RPS6KB1 by L. Wu et al. [33]. Of these 11 fusion transcripts, the number of poly(A) tails per parental gene were considered. If the concept is correct and a chimeric transcript caused by a cis-SAGe mechanism or a chromosomal rearrangement is transcribed, the 5´ genes should not have a poly(A) tail, but the 3´ genes will have one.

STfusion verification was performed using the sequenced mRNA and the number of poly(A) tails produced by TAIL-seq, and the published results of counted poly(A) tails per gene [13] (Tables 1 and 2).

STfusion verification for gene fusions and cis-SAGe

LHX6-NDUFA8, SLC2A11-MIF, and SLC45A3-ELK4 are confirmed cis-SAGe events in HeLa [31,32]. The parental genes SLC45A3 and ELK4 were not listed as having poly(A) tails in Chang et al. [13]. With an in-depth search in the sequenced mRNA published in the same paper, five poly(A) tails attached to the 3´ genes were identified (Tables 1 and S12). However, TXNDC9-LYG1 did not seem to follow the proposed hypothesis. An inversion on Chr2:87-111 megabase pairs (Mbp) was identified by Breakdancer [34] (Table S14) and experimentally confirmed by Landry et al. [35]. Both parental genes are located within this region.

The results shown in Table 1 confirm our assumption that a fusion transcript caused by a cis-SAGe mechanism or a chromosomal rearrangement lacks a poly(A) tail at the 5´ gene and instead has an elevated number of poly(A) tails attached to the 3´ gene.

Distinction among fusion transcripts caused by trans-splicing

In the case of a fusion transcript caused by trans-splicing, both parental genes were transcribed and poly-adenylated. Poly(A) tails for both parental genes were observed (Table 2).

The transcription-induced chimera VMP1-RPS6KB1 is assumed to occur via trans-splicing [33] in HeLa cells. Indeed, this event was confirmed with STfusion. The fusions TINF2-NEDD8 and DHRS13-FLOT2 were experimentally confirmed [31], but both parental genes were polyadenylated. This data suggests that these fusions are caused by a trans-splicing event. The latter fusion transcript, DHRS13-FLOT2, is suggested to be transcription induced [36], because no genetic cause could be identified.

STfusion and C-scores applied to clinical tissue samples

Spatial transcriptomics data published by Berglund et al. [19] was used to localise the cis-SAGe SLC45A3-ELK3. In this study, 12 tissue sections taken from a patient with prostate cancer were analysed; each section harboured epithelial areas annotated as healthy, inflamed, prostatic intraepithelial neoplasia (PIN, a precancerous lesion), cancerous with a Gleason score (GS) 3 + 3, or cancerous with GS 3 + 4. The GS is a grading system used to classify the aggressiveness of prostate cancer, scales range from 1 (appears healthy) to 5 (appears abnormal). The total GS is a combination of two scores, one each for the dominant and minor area [37]. The tissues harbour the cis-SAGe SLC45A3-ELK4 (Tables S14-S16) which contributes to cell proliferation in prostate cancer [32], Two fusion variants were identified in the bulk RNA-sequenced tissue sections: SLC45A3-ELK4 exon 4-exon 2 and SLC45A3-ELK4 exon 5-exon 2 (Figure S1, Table S17).

Fusion transcript localisation using STfusion and C-scores

The C-score measures the fold change in the numbers of poly(A) tails on the parental gene compared to the parental gene sample mean expression. A higher C-score indicates the occurrence of the cis-SAGe SLC45A3-ELK4 is likely. This difference was caused by the chimeric mRNA and elevated 3´ gene ELK4 expression, which is defined by the promoter of the 5´ UTR of the 5´ gene SLC45A3. A low C-score, however, represents a large number of poly(A) tails for the 5´ gene SLC45A3 compared to the sample mean SLC45A3 expression. This data indicates the occurrence of the cis-SAGe is very unlikely. The C-score mirrored the likelihood of cis-SAGe absence or occurrence in a spot as well as the expression levels of SLC45A3 or cis-SAGe SLC45A3-ELK3, respectively.

The C-scores distribution per sample was compared to the activity maps of the transcriptomic factors identified in the clinical tissue samples analysed with spatial transcriptomics (Figures 2, 3 and S2). The predicted occurrence of the cis-SAGe SLC45A3-ELK4 in the 12 tissue sections was dominant in the centre or the periphery of diseased areas; the predicted absence of the cis-SAGe was dominant in normal glands.

For the joined STD analysis of the three cancerous samples shown in Figure 2, the active transcriptomic factor “PIN” overlapped with the area of the predicted cis-SAGe in sample 1.2. The areas were almost identical in size and form. In sample 3.3, the cis -SAGe occured only occasionally in the periphery of the PIN area. Concerning the transcriptomic factor “Cancer”, the cis-SAGe occured intensely in the centre of sample 2.4, and in the periphery of the cancerous areas in all three samples. There were no spots with higher SLC45A3 expression, and thus there was a predicted absence of the cis-SAGe in the areas marked as cancerous. Normal glands were dominated by absent cis-SAGe. However, there were few spots with higher fusion transcript occurrence often in direct vicinity of high SLC45A3 expression.

For the joined STD analysis of the two samples harbouring inflammations and early stage cancer, the samples 3.1 and 4.2, the cis-SAGe occurred occasionally in the periphery of the inflammation areas. In sample 3.1, the cis-SAGe occured strongly in two spots of the early stage cancer area. In sample 4.2, the cis-SAGe is present intensively in the periphery of the cancer area.

Differentially expression and pathway annotation for cis-SAGe occurrence

The combination of spatial transcriptomics data and STfusion using C-scores offers new possibilities to explore fusion transcript occurrence, differences in cis-SAGe transcription levels and their spatial relation in clinical tissue samples.

Areas with absent and present SLC45A3-ELK4 fusion transcripts were compared with regards to differentially and co-expressed genes and enriched pathways (Figure 4). In areas without fusion transcripts, there were pathways activated which are related to higher transcriptional stress (protein processing in the endoplasmic reticulum and lysosome). The pathways focal adhesion and regulation of actin cytoskeleton are highly active in the areas with cis-SAGe occurrence and are known to play a crucial role in cancer cell motility and invasion [38,39]. Phosphatidylinositol-3-kinase (PI3K)-AKT signalling is linked to treatment resistance [40,41].

To summarise the results, the proposed method, STfusion, identified fusion transcripts caused by a cis-SAGe mechanisms or chromosomal rearrangement. It also distinguished these fusion transcripts from those caused by a trans-splicing event. Applying STfusion and the C-score to clinical tissue samples analysed with spatial transcriptomics demonstrated the spatial distribution of the fusion transcripts within the tissue section. Further, the fusion transcript was linked to the centre or periphery of disease areas (Inflammation, PIN, cancer).

We propose a novel computational method to detect fusion transcripts. It is based on poly(A) tail presence or absence. We proved that a fusion transcript that lacks the poly(A) tail at the 5´ parental gene contains one at the 3´ gene. The number of poly(A) tails attached to the 3´ gene indicates the expression level of the fusion transcript defined by the 5´ gene promoter region. The novel method was verified on the chimeric transcripts caused by an incorrect cis-splicing of adjacent genes (cis-SAGe) mechanism or a chromosomal rearrangement in HeLa cells. Fusion transcripts caused by trans-splicing with both genes that are poly-adenylated cannot be detected with the proposed method. However, our method helps to identify trans-splicing fusions among fusion transcripts identified with alternative methods.

The proposed method, STfusion, and the use of C-scores was applied to clinical tissue sections analysed with spatial transcriptomics. The tissue samples harbour areas annotated as inflammation, prostatic intraepithelial neoplastic and prostate cancer with different Gleason scores. Spatial transcriptomics, which uses the poly(A) tail as a proxy for expression levels, offers the opportunity to measure the unexpected amount of chimeric transcript parental gene expression levels at an almost single cell level. Fusion transcripts caused by cis-SAGe of the parental genes SLC45A3 and ELK4 were confirmed in the bulk sequenced RNA of the tissue sections. The identification of this fusion transcript in clinical tissues that harbour cancerous cells, and a spatial correlation to the disease areas was lacking. With the proposed method, we localised this fusion transcript in 12 tissue sections on almost single-cell level and detected a high variance of cis-SAGe expression in healthy and diseased areas which was reported earlier [42]. We showed the spatial expansion of the fusion transcript in the clinical samples and linked the fusion transcript occurrence to areas annotated as diseased. Very high cis-SAGe expression levels were observed in the periphery or centre of the disease areas. Occasionally, the cis-SAGe occured in normal glands. Very high SLC45A3 expression increases the likelihood of the cis-SAGe SLC45A3-ELK4 occurrence. This observation indicates that SLC45A3-ELK4 occurrence is an early and local event in prostate cancer development. It appears to commence with higher SLC45A3 expression and continues with high cis-SAGe expression.

Differentially expressed genes between areas with and without the fusion transcript were calculated and activated pathways inferred. The observed activated pathways in areas of cis-SAGe occurrence correlated with disease progression. Further, we observed pathways related to higher transcriptional stress during the switch from high 5´ gene expression to high cis-SAGe expression.

The cause of a cis-SAGe occurrence has not yet been identified. Genetic rearrangement can be excluded, and thus epigenetic changes are the primary focus. There is a DNA motif sequence with (CCA)_n repetitions downstream of the intra-exonic fusion point of the 5´ gene SLC45A3 that is linked to the i-Motif, a non-canonical DNA structure. In more acidic environments, the motif sequence folds reversibly into an intramolecular intercalated cytosine tetraplex and thus can serve as a molecular switch [43–45]. This switch might be involved in passing over the stop codon of the terminal exon and thus the poly(A) tail signal for the 5´ gene.

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

The spatial transcriptomics datasets analysed during the current study are available in the Spatial Research repository doi:10.1038/S41467-018-04724-5. These data sets were derived from the following public domain resources: https://www.spatialresearch.org/resources-published-datasets/10-1038-s41467-018-04724-5/.

The sequenced reads using TAIL-seq of HeLa cells analysed during the current study are available in the in the NCBI Gene Expression Omnibus (GEO) database (accession numbers GSM1242325). These datasets were derived from the following public domain resources: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1242325.

Competing interests

The authors declare that they have no competing interests.

Funding

Not applicable

Authors' contributions

SF and ES conceptualised and designed the project, performed the data analysis and wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgements

Not applicable

nt Nucleotide

MSD Minimal detectable signal

STD Spatial Transcriptomics deconvolution

R Ratio

sd Standard deviation

cis-SAGe cis-splicing of adjacent genes

GS Gleason score

PI3K Phosphoinositide 3-kinase

FDR False discovery rate

FOSB FBJ murine osteosarcoma viral oncogene homolog B

Mitelman F, Johansson B, Mertens F. The impact of translocations and gene fusions on cancer causation. Nat Rev Cancer [Internet]. Nature Publishing Group; 2007 [cited 2019 Apr 3];7:233–45. Available from: http://www.nature.com/articles/nrc2091
Mertens F, Johansson B, Fioretos T, Mitelman F. The emerging complexity of gene fusions in cancer. Nat Rev Cancer [Internet]. Nature Publishing Group; 2015 [cited 2018 Nov 30];15:371–81. Available from: http://www.nature.com/articles/nrc3947
Sanguedolce F, Cormio A, Brunelli M, D’Amuri A, Carrieri G, Bufo P, et al. Urine TMPRSS2: ERG Fusion Transcript as a Biomarker for Prostate Cancer: Literature Review. Clin Genitourin Cancer [Internet]. Elsevier; 2016 [cited 2019 Mar 13];14:117–21. Available from: https://www.sciencedirect.com/science/article/pii/S1558767315003316?via%3Dihub
Zhou J, Liao J, Zheng X, Shen H. Chimeric RNAs as potential biomarkers for tumor diagnosis. BMB Rep [Internet]. 2012 [cited 2019 Mar 12];45:133–40. Available from: http://koreascience.or.kr/journal/view.jsp?kj=E1MBB7&py=2012&vnc=v45n3&sp=133
Li Z, Qin F, Li H. Chimeric RNAs and their implications in cancer. Curr Opin Genet Dev. 2018;48:36–43.
Jia Y, Xie Z, Li H, Li H. Intergenically Spliced Chimeric RNAs in Cancer. Trends in Cancer [Internet]. Elsevier; 2016 [cited 2019 Feb 26];2:475–84. Available from: http://dx.doi.org/10.1016/j.trecan.2016.07.006475
Chwalenia K, Facemire L, Li H. Chimeric RNAs in cancer and normal physiology. Wiley Interdiscip Rev RNA [Internet]. John Wiley & Sons, Ltd; 2017 [cited 2019 Mar 14];8:e1427. Available from: http://www.ncbi.nlm.nih.gov/pubmed/28589684
Li H, Wang J, Ma X, Sklar J. Gene fusions and RNA trans-splicing in normal and neoplastic human cells. Cell Cycle [Internet]. 2009 [cited 2019 Apr 4]; Available from: https://www.tandfonline.com/action/journalInformation?journalCode=kccy20
Guydosh NR, Green R. Translation of poly(A) tails leads to precise mRNA cleavage. RNA J [Internet]. Cold Spring Harbor Laboratory Press; 2017 [cited 2019 Mar 12];23:749–61. Available from: http://www.ncbi.nlm.nih.gov/pubmed/28193672
Park J-E, Yi H, Kim Y, Chang H, Narry V, Correspondence K. Regulation of Poly(A) Tail and Translation during the Somatic Cell Cycle Genes with the TOP element escape translational suppression in M phase. Mol Cell [Internet]. 2016 [cited 2019 Mar 11];62:462–71. Available from: http://dx.doi.org/10.1016/j.molcel.2016.04.007
Carrara M, Beccuti M, Cavallo F, Donatelli S, Lazzarato F, Cordero F, et al. State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? BMC Bioinformatics [Internet]. BioMed Central; 2013 [cited 2019 Mar 26];14:S2. Available from: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S7-S2
Kumar S, Vo AD, Qin F, Li H. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep [Internet]. Nature Publishing Group; 2016 [cited 2019 Apr 23];6:21597. Available from: http://www.nature.com/articles/srep21597
Chang H, Lim J, Ha M, Kim VN. TAIL-seq: Genome-wide Determination of Poly(A) Tail Length and 3′ End Modifications. Mol Cell [Internet]. Cell Press; 2014 [cited 2019 Mar 8];53:1044–52. Available from: https://www.sciencedirect.com/science/article/pii/S109727651400121X?via%3Dihub
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal [Internet]. 2011 [cited 2019 Mar 14];17:10. Available from: http://journal.embnet.org/index.php/embnetjournal/article/view/200
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol [Internet]. BioMed Central; 2013 [cited 2019 Mar 14];14:R36. Available from: http://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-4-r36
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics [Internet]. 2013 [cited 2019 Mar 14];29:15–21. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23104886
Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. Ensembl 2018. Nucleic Acids Res. 2018;
Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science (80- ) [Internet]. 2016;353:78–82. Available from: http://www.sciencemag.org/cgi/doi/10.1126/science.aaf2403
Berglund E, Maaskola J, Schultz N, Friedrich S, Marklund M, Bergenstråhle J, et al. Spatial Maps of Prostate Cancer Transcriptomes Reveal an Unexplored Landscape of Heterogeneity. Nat Commun. 2018;
Maaskola J, Bergenstråhle L, Jurek A, Navarro JF, Lagergren J, Lundeberg J. Charting Tissue Expression Anatomy by Spatial Transcriptome Deconvolution. bioRxiv. 2018;
McCarthy DJ, Campbell KR, Lun ATL, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics [Internet]. Oxford University Press; 2017 [cited 2019 Mar 14];33:btw777. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btw777
Lun ATL, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research [Internet]. 2016 [cited 2019 Mar 14];5:2122. Available from: https://f1000research.com/articles/5-2122/v2
Welch BL. The generalisation of student’s problems when several different population variances are involved. Biometrika [Internet]. Oxford University Press; 1947 [cited 2019 Mar 14];34:28–35. Available from: https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/34.1-2.28
Benjamini, Yoav; Hochberg Y. Controlling The False Discovery Rate - A Practical And Powerful Approach To Multiple Testing. Source J R Stat Soc [Internet]. 1966;15:216–33. Available from: http://www.jstor.org/stable/2985301%0Ahttp://about.jstor.org/terms
Ogris C, Helleday T, Sonnhammer ELL. PathwAX: a web server for network crosstalk based pathway annotation. Nucleic Acids Res. 2016;44:W105–9.
Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M. New approach for understanding genome variations in KEGG. Nucleic Acids Res [Internet]. Oxford University Press; 2019 [cited 2019 Mar 14];47:D590–5. Available from: https://academic.oup.com/nar/article/47/D1/D590/5128935
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics [Internet]. Oxford University Press; 2011 [cited 2018 Aug 4];27:2987–93. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btr509
Nicorici D, Şatalan M, Edgren H, Kangaspeska S, Murumägi A, Kallioniemi O, et al. FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv [Internet]. Cold Spring Harbor Laboratory; 2014 [cited 2019 Mar 14];011650. Available from: https://www.biorxiv.org/content/10.1101/011650v1
Kent WJ. BLAT---The BLAST-Like Alignment Tool. Genome Res [Internet]. 2002 [cited 2019 Mar 14];12:656–64. Available from: http://www.ncbi.nlm.nih.gov/pubmed/11932250
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol [Internet]. BioMed Central; 2009 [cited 2018 Aug 4];10:R25. Available from: http://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-3-r25
Wu P, Yang S, Singh S, Qin F, Kumar S, Wang L, et al. The Landscape and Implications of Chimeric RNAs in Cervical Cancer supplement. EBioMedicine [Internet]. Elsevier; 2018 [cited 2019 Mar 11];37:158–67. Available from: http://www.ncbi.nlm.nih.gov/pubmed/30389505
Zhang Y, Gong M, Yuan H, Park HG, Frierson HF, Li H. Chimeric transcript generated by cis- splicing of adjacent genes regulates prostate cancer cell proliferation. Cancer Discov. 2012;2:598–607.
Wu L, Zhang X, Zhao Z, Wang L, Li B, Li G, et al. Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells. Gigascience [Internet]. Narnia; 2015 [cited 2019 Mar 28];4:51. Available from: https://academic.oup.com/gigascience/article-lookup/doi/10.1186/s13742-015-0091-4
Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, et al. BreakDancer: An algorithm for high-resolution mapping of genomic structural variation. Nat Methods [Internet]. 2009 [cited 2018 Aug 16];6:677–81. Available from: https://www.nature.com/nmeth/journal/v6/n9/abs/nmeth.1363.html
Landry JJM, Pyl PT, Rausch T, Zichner T, Tekkedil MM, Stütz AM, et al. The Genomic and Transcriptomic Landscape of a HeLa Cell Line. G3 Genes|Genomes|Genetics [Internet]. G3: Genes, Genomes, Genetics; 2013 [cited 2018 Aug 4];3:1213–24. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23550136
Huang R, Kumar S, Li H. Absence of Correlation between Chimeric RNA and Aging. Genes (Basel) [Internet]. Multidisciplinary Digital Publishing Institute; 2017 [cited 2019 Mar 29];8:386. Available from: http://www.mdpi.com/2073-4425/8/12/386
Epstein JI, Zelefsky MJ, Sjoberg DD, Nelson JB, Egevad L, Magi-Galluzzi C, et al. A Contemporary Prostate Cancer Grading System: A Validated Alternative to the Gleason Score. Eur Urol [Internet]. European Association of Urology; 2016;69:428–35. Available from: http://dx.doi.org/10.1016/j.eururo.2015.06.046
Devreotes P, Horwitz AR. Signaling networks that regulate cell migration. Cold Spring Harb Perspect Biol [Internet]. Cold Spring Harbor Laboratory Press; 2015 [cited 2019 Sep 3];7:a005959. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26238352
Yamaguchi H, Condeelis J. Regulation of the actin cytoskeleton in cancer cell migration and invasion. Biochim Biophys Acta [Internet]. Elsevier; 2007 [cited 2018 Aug 28];1773:642–52. Available from: https://www.sciencedirect.com/science/article/pii/S0167488906001558
Edlind M, Hsieh A. PI3K-AKT-mTOR signaling in prostate cancer progression and androgen deprivation therapy resistance. Asian J Androl [Internet]. 2014 [cited 2019 Sep 3];16:378. Available from: http://www.ncbi.nlm.nih.gov/pubmed/24759575
Crumbaker M, Khoja L, Joshua AM. AR Signaling and the PI3K Pathway in Prostate Cancer. Cancers (Basel) [Internet]. Multidisciplinary Digital Publishing Institute (MDPI); 2017 [cited 2019 Mar 13];9. Available from: http://www.ncbi.nlm.nih.gov/pubmed/28420128
Ren G, Zhang Y, Mao X, Liu X, Mercer E, Marzec J, et al. Transcription-Mediated Chimeric RNAs in Prostate Cancer: Time to Revisit Old Hypothesis? Omi A J Integr Biol [Internet]. Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA; 2014 [cited 2019 Feb 26];18:615–24. Available from: http://www.liebertpub.com/doi/10.1089/omi.2014.0042
Kaushik M, Kaushik S, Roy K, Singh A, Mahendru S, Kumar M, et al. A bouquet of DNA structures: Emerging diversity. Biochem Biophys Reports [Internet]. Elsevier; 2016 [cited 2019 Jun 14];5:388–95. Available from: https://www.sciencedirect.com/science/article/pii/S2405580816300024
Li T, Famulok M. I-Motif-Programmed Functionalization of DNA Nanocircles. J Am Chem Soc [Internet]. American Chemical Society; 2013 [cited 2019 Sep 3];135:1593–9. Available from: https://pubs.acs.org/doi/10.1021/ja3118224
Zemánek M, Kypr J, Vorlíčková M. Conformational properties of DNA containing (CCA)n and (TGG)n trinucleotide repeats. Int J Biol Macromol [Internet]. Elsevier; 2005 [cited 2019 Apr 1];36:23–32. Available from: https://www.sciencedirect.com/science/article/pii/S0141813005000541?via%3Dihub
Rickman DS, Pflueger D, Moss B, VanDoren VE, Chen CX, De la Taille A, et al. SLC45A3-ELK4 Is a Novel and Frequent Erythroblast Transformation-Specific Fusion Transcript in Prostate Cancer. Cancer Res [Internet]. 2009 [cited 2018 Nov 30];69:2734–8. Available from: http://cancerres.aacrjournals.org/
Makkonen H, Jääskeläinen T, Pitkänen-Arsiola T, Rytinki M, Waltering KK, Mättö M, et al. Identification of ETS-like transcription factor 4 as a novel androgen receptor target in prostate cancer cells. Oncogene [Internet]. 2008 [cited 2019 Mar 14];27:4865–76. Available from: http://www.ncbi.nlm.nih.gov/pubmed/18469865
Östman A, Hellberg C, Böhmer FD. Protein-tyrosine phosphatases and cancer. Nat. Rev. Cancer. 2006.
Chmelar R, Buchanan G, Need EF, Tilley W, Greenberg NM. Androgen receptor coregulators and their involvement in the development and progression of prostate cancer. Int. J. Cancer. 2007.
Catalona WJ, Richie JP, Ahmann FR, Hudson MA, Scardino PT, Flanigan RC, et al. Comparison of Digital Rectal Examination and Serum Prostate Specific Antigen in the Early Detection of Prostate Cancer: Results of a Multicenter Clinical Trial of 6,630 Men. J Urol. 2017;
Eidelman E, Twum-Ampofo J, Ansari J, Siddiqui MM. The Metabolic Phenotype of Prostate Cancer. Front Oncol [Internet]. Frontiers; 2017 [cited 2019 Mar 13];7:131. Available from: http://journal.frontiersin.org/article/10.3389/fonc.2017.00131/full
Fennelly C, Amaravadi RK. Lysosomal biology in cancer. Methods Mol Biol. 2017.
Aderem A. Phagocytosis and the Inflammatory Response. J Infect Dis. 2003;

Table 1. Poly(A) tail occurrences of the parental genes of cis-SAGe and gene fusions in HeLa cells

For two fusion transcripts, GFOD2-ENKD1 and MFSD7-ATP5I, no poly(A) tails were detected. Consistently, no tails were detected by Chang et al. [13] nor with an in-depth search in the TAIL-seq data.

cis-SAGe/ gene fusion	Poly(A) tails
(5´ gene - 3´ gene)	5´ gene	3´ gene
FOXRED2-TXN2	0	340
LHX6-NDUFA8	0	218
SLC2A11-MIF	0	842
SLC45A3-ELK4	0	5
TXNDC9-LYG1	103	0
UBE2Q2-FBXO22	0	97

Table 2. Poly(A) tails occurrences of the parental genes assumed to occur via trans-splicing in HeLa cells.

Trans-splicing fusion transcript	Poly(A) tails
(5´ gene - 3´ gene)	5´ gene	3´ gene
DHRS13-FLOT2	64	71
TINF2-NEDD8	35	69
VMP1-RPS6KB1	51	48

Download PDF

Journal Publication

published 04 Aug, 2020

Read the published version in BMC Medical Genomics →

Editorial decision: Major revision
02 Feb, 2020
Review #3 received at journal
16 Jan, 2020
Review #2 received at journal
16 Jan, 2020
Review #1 received at journal
16 Jan, 2020
Reviewer #3 agreed at journal
31 Dec, 2019
Reviewer #2 agreed at journal
30 Dec, 2019
Editor assigned by journal
28 Dec, 2019
Reviewers invited by journal
28 Dec, 2019
Reviewer #1 agreed at journal
28 Dec, 2019
Submission checks completed at journal
18 Dec, 2019
Editor invited by journal
18 Dec, 2019
First submitted to journal
22 Nov, 2019

You are reading this older preprint version

Read the latest preprint version →

Fusion transcript detection using spatial transcriptomics

Status:

Journal Publication

Version 1

Abstract

Figures

Background

Methods

Fusion transcript detection using STfusion and the poly(A) tail presence

STfusion verification in HeLa cells

STfusion applied to clinical tissue samples

Results

STfusion verification in HeLa cells

STfusion and C-scores applied to clinical tissue samples

Discussion and conclusions

Declarations

Acknowledgements

Abbreviations

References

Tables

Supplementary Files

Status:

Journal Publication

Version 1