1. Successful neuron isolation for circRNA detection by RNA-seq
Currently, there are no available neuronal circRNA data in C. elegans, mainly due to challenges in obtaining enough neuron samples from the tiny worms which have no obviously compartmentalized "brain" tissue. The most common method to obtain neuron cells from C. elegans is by the "labeling-dissociation-sorting" method (Fig. 1a) [30-36], in which target neurons are labeled by fluorescent protein and, after mild dissociation of the worms, labeled neurons are collected by fluorescence-activated cell sorting (FACS). This method can obtain target neurons in high purity and is used to detect gene expression in single neurons to all the neurons. However, due to the low efficiency of the dissociation [31, 32], total RNA obtained from sorted cells is limited. Hence this method is only used for mRNA detection, either by microarray or RNA-seq [30, 31, 36-38]. By optimization of previous protocols [32], I aim to improve the final total RNA yield to hundreds of nanograms for circRNA detection by RNA-seq with ribosomal RNA depletion.
Here, using a strain (NW1229) with pan-neuronal green fluorescent protein (GFP) expression, I found that by shortening the time of SDS-DTT treatment (from 2 min to 1.5 min) and washing (from 5 x 1.0 min to 5 x 40 sec) as well as increasing the time of mechanical disruption (from 10 min to 15 min) (Figure S1A), cell yield could be improved. After dissociation, the cell suspension was stained with propidium iodide (PI) to label dead/damaged cells and then subjected to FACS. GFP positive singlet cells were sorted (Fig. 1b). The majority of sorted cells showed GFP fluorescence when observed under a confocal microscope (Fig. 1c). Some neuron cells kept short neurites after sorting (Figure S1B). Consistent with previous findings, neurites can grow out after culture of sorted neuron cells (Figure S1C) [31, 32]. To further confirm the effectiveness of sorting, the levels of two marker genes (myo-3 and unc-64) were quantified by digital droplet PCR (ddPCR). As expected, the neural syntaxin unc-64 was highly enriched in the sorted cells, whereas the muscle gene myo-3 was depleted (Fig. 1d).
Using this optimized protocol (see "Methods"), 200 - 500 ng total RNA was obtained from cells sorted from ~ 1.5 - 5 million L1 worms (the sort group). RNA samples from dissociated worms before sorting were also prepared for comparison (the whole group, Fig. 1a). For RNA-seq, 150 ng total RNA from three independent trials of the sort group and the whole group was used as input for library preparation with ribosomal RNA removal and first-strand cDNA synthesis using random hexamers. More than 45 million 150 nt paired-end reads were obtained for each sample. Differentially expressed genes between the two groups were analyzed by DESeq2 [39]. Consistent with the ddPCR results (Fig. 1d), myo-3 was significantly depleted, while unc-64 was significantly enriched in the sort group compared with the whole group (Figure S1D). The significantly upregulated genes (Table S4) in the sort group were searched in WormExp [40] (https://wormexp.zoologie.uni-kiel.de/wormexp/) to identify whether these genes overlap with previous results of neuronal genes. As expected, the resulted top three datasets were all pan-neural enriched genes determined by microarray analysis of sorted neurons (Figure S1E) [37, 38], indicating the RNA-seq results from sorted samples successfully revealed the gene expression pattern in the neurons.
2. circRNAs are highly expressed in the neurons
circRNA annotation from the RNA-seq data was carried out using the DCC pipeline [41]. Prior to filtering, 6475 circRNAs were identified with at least one back-spliced junction (BSJ) read across six samples, with 4786 novel circRNAs when compared with circRNAs of C. elegans in two databases: circBase [42] and CIRCpedia v2 [43] (Figure S2A). The results were filtered with a cutoff of at least three BSJ reads in each group, which gave 1452 circRNAs derived from 1010 genes and 29 not-annotated loci (Fig. 2a, Table S5). Of the filtered circRNAs, the majority of the identified BSJ reads were from exon-to-exon joining (Figure S2B). The filtered circRNAs were compared with a published dataset of circRNAs in aging worms [18], which showed 450 overlapped circRNA (Figure S2C). The novel circRNAs identified in my dataset were mainly from the sorted group, suggesting that sequencing from sorted neuron samples was helpful to identify circRNAs that may not be detected using whole-worm samples. Gene ontology (GO) enrichment analysis of the circRNA-producing genes showed that terms related to the neuronal functions were significantly enriched, including neurogenesis, synaptic signaling, etc. (Fig. 2b).
Two strategies were used to validate the annotated circRNAs by DCC: 1) Amplification of the BSJ sequences by RT-PCR using divergent primers followed by Sanger-sequencing (Fig. 2c). Eighteen out of 19 selected circRNAs, including seven novel circRNAs, were confirmed with the BSJ sequences (Figure S2D). 2) Enrichment quantification by RT-qPCR after RNase R treatment. Since there are no ends in circRNAs, they often show resistance to degradation after treatment with RNase R. As expected, while two linear mRNAs (pmp-3 and cdc-42) were depleted after RNase R treatment, all the circRNAs were enriched (Fig. 2d). The resistance to RNase R was also confirmed by northern blot, which showed that while linear transcript was not detected after RNase R treatment, circRNA from Y20F4.4 was still detected (Fig. 2e). Together, these results show that circRNA annotation is of high accuracy.
Of the 1452 circRNAs, more circRNAs (1125/1452) were found in the sort group, with 539 identified in both groups and 586 only in the sort group (Fig. 2a). Next, the abundances of circRNAs in the sort group and the whole group were compared to check whether circRNAs were highly expressed in the neurons of C. elegans or not. TPM (transcripts per million reads) values were used for comparison. The principal component analysis (PCA) plot of circRNA TPM showed a clear separation between the two groups (Figure S2E), suggesting different circRNA profiles between them. For all the circRNAs in both groups, circRNAs in the sort group showed significantly higher TPM values than in the whole group (Fig. 2f, p < 2.2e-16, paired Wilcoxon test), indicating circRNAs were enriched in the sort group. The same trend was also observed for the shared 539 circRNAs in both groups (Fig. 2g, p = 4.7e-14, paired Wilcoxon test).
Next, differentially expressed circRNAs between the sort and the whole group were analyzed, trying to identify neuron-enriched circRNAs. Using BSJ read numbers as input for DESeq2 and filtering with adjusted p value < 0.05, 31 circRNAs were found significantly enriched, and 35 circRNAs were significantly depleted in the sort group (Figure S2F, Table S6). I asked whether these circRNAs were also derived from neuronal genes or not. The fold changes of circRNAs between the sort and the whole group were plotted against the fold changes of their cognate linear mRNAs. Here, a cutoff of baseMean (given by DESeq2) bigger than 3 was used, which contained 268 circRNAs, including all the significantly differentially expressed circRNAs (Fig. 2h). The results showed a strong positive correlation (Fig. 2h, Pearson's correlation coefficient R = 0.72, p < 2.2e-16), which suggested that at the L1 stage of C. elegans, neuronal circRNAs were more likely to be derived from neuronal genes. When all circRNAs were considered, they still showed a moderately strong positive correlation (Figure S2G, Pearson's correlation coefficient R = 0.51, p < 2.2e-16).
3. RCMs are required for circRNA production
Next, features of circRNA-flanking introns were analyzed. Similar to previous findings [17, 18], introns that flank circRNA-producing exons were much longer than average, and much more RCMs were identified when compared with flanking introns of non-circular exon controls (exon 2 and exon 8) (Fig. 3a and 3b). Lengths of the best-matched RCMs in circRNA introns were also much longer than those in introns flanking control exons (Figure S3A). Although a previous study showed that RCMs could predict the existence of circRNAs [17], the role of RCMs in circRNA formation has not been experimentally confirmed in C. elegans. Here, six circRNA genes with RCMs in flanking introns were chosen, and one RCM in each gene was deleted using CRISPR-Cas9 (See "Methods", Fig. 3c and Figure S3B). Two guide RNAs (gRNAs) that bracket the target RCM were used for each deletion. A 60-nt single-stranded oligo DNA (ssODN) was used as the repair template, with 30 nt homologous sequences in each end (Figure S3C). Which RCM selected for deletion depends on its position in that intron and the existence of highly specific gRNA sites around RCM sequences. For example, in glr-2, the downstream RCM extends to the 3' splice site, so the RCM in the upstream was chosen for deletion (Figure S3C). The coordinates and lengths of deleted sequences are summarized in Table 1. The gRNA sequences and recombinant single-strand oligo DNAs used for RCM deletions are listed in Table S3. As expected, all the circRNAs were either undetectable or reduced to an extremely low level after the removal of one RCM in the flanking introns (Fig. 3d and Figure S3C), proving that RCMs in C. elegans vigorously promote, if not required for, circRNA formation. Of note, in some of the chosen genes, the linear mRNA levels were altered in RCM deletion mutants.
Table 1. Positions and lengths of deleted RCM sequecnes in circRNA genes
Gene name
|
Deleted coordinates
|
Upstream/Downstream
|
Deleted length (bp)
|
glr-2
|
chrIII: 7142139 - 7142523
|
Upstream
|
385
|
gpa-1
|
chrV: 11176808 - 11177313
|
Upstream
|
506
|
unc-75
|
chrI: 11592753 - 11593798
|
Upstream
|
1046
|
arl-13
|
chrI: 2066176 - 2066626
|
Upstream
|
451
|
iglr-3
|
chrI: 2088411 - 2089756
|
Downstream
|
1276
|
Y20F4.4
|
chrI: 2034765 - 2035642
|
Downstream
|
878
|
4. RCMs promote both back-splicing and exon-skipping
circRNA production has been correlated with exon-skipping that skips the circularized exon(s) [25, 27-29]. In the RNA-seq data of this study, reads mapped to the skipped junctions can be identified in some circRNA genes (Figure S4A and S4B), suggesting the existence of skipped transcripts. For zip-2 and Y20F4.4, RT-PCR using primers that bracket circRNA-producing exon(s) gave two bands, of which the longer ones were full-length transcripts and the shorter ones were confirmed to be the skipped transcripts (Fig. 4a and Figure S4E). For some other circRNA genes, the skipped transcripts could be amplified in a two-round PCR, in which the corresponding skipped transcripts were gel-cut purified after first-round PCR, which were used as templates for a second-round PCR (Figure S4C and S4D). In total, skipped transcripts were confirmed in 6 out of 7 chosen circRNA genes (Figure S4E).
Previous studies have shown that conserved complementary sequences in introns are associated with exon-skipping [44]. Complementary sequences in different introns regulate mutually exclusive splicing [45-47]. Then whether RCMs are also required for exon skipping is checked. In Y20F4.4, the skipped transcript was strongly reduced after removing the upstream RCM (Fig. 4b and Figure S5A). In arl-13, the skipped transcript was downregulated in the downstream RCM-deleted mutant (Fig. 4c and Figure S5B). In zip-2, two pairs of perfectly matched RCMs, 7 nt and 13 nt in length, respectively, were identified (Fig. 4d and Figure S5C). Deletions of the RCMs in intron 1 or intron 4 were achieved by CRISPR-Cas9 (Figure S5C). Canonical splicings of intron 1 and intron 4 were not affected by RCM deletions (Fig. 4e, Exon 1-2 & Exon 4-5). However, although the circRNA and skipped transcript can be detected in the RCM-deleted strains, their production seemed not as efficient as in wildtype strain (Fig. 4e, circ & skip). Quantification of the levels of the three transcripts of zip-2 (circular, skipped, full-length linear) showed that while full-length linear zip-2 was only slightly affected, the production of both the circRNA and the skipped transcript was dramatically reduced in both RCM-deleted mutant strains (Fig. 4f and 4g). Together, these findings suggest that RCMs in the flanking introns of circRNA-producing exon(s) also promote the skipping of these exon(s).
5. RCM sequences in zip-2 are highly conserved across several nematode species
Previous studies suggest that competing RNA secondary structures formed by base-pairing between introns that regulate mutually exclusive splicing are evolutionally conserved [48, 49]. I then checked whether RCM sequences in zip-2 are conserved or not. Ortholog genes of zip-2 exist in five nematode species (C. elegans, C. brenneri, C. briggsae, C. japonica, and C. remanei). These zip-2 genes have similar gene structures (Figure S6). Sequences in the upstream introns and downstream introns of these zip-2 genes were aligned. Of the two pairs of RCMs in zip-2 of C. elegans, the 13-nt RCMs are highly conserved across the five nematode species, both in the upstream introns and the downstream introns (Fig. 5a and 5b). Using available splicing data on WormBase, transcripts that skip exons bracketed by the conserved RCMs were found in all these zip-2 genes (Figure S6), suggesting the conserved RCMs possibly promote the conserved exon-skipping in all these zip-2 genes.
6. RCMs do not promote exon-skipping through back-splicing, neither the other way
Current knowledge suggests that RCMs promote circRNA formation by bringing the splicing sites for back-splicing in close proximity. Since the correlated back-splicing and exon-skipping use the same pair of introns, it is possible that RCMs also bring the splice sites for exon-skipping together. In principle, the y-shaped intermediate of back-splicing could be further spliced to form the corresponding skipped transcripts. Moreover, a previous study has shown that circRNA can be produced through a lariat intermediate produced by exon-skipping [26].
Whether RCMs promote exon-skipping first or back-splicing first? There are three possibilities: 1). RCMs promote back-splicing first; 2). RCMs promote exon-skipping first; 3). RCMs promote both back-splicing and exon-skipping at the same time (Fig. 7). In order to clarify the three possibilities, the four splice sites (ss) and two branch points (BP) in intron 1 and intron 4 of zip-2 were mutated one by one. The 5'ss in intron 1, BP, and 3'ss in intron 4 are used for exon-skipping; hence these sites are named skip-5'ss, skip-BP, and skip-3'ss, respectively. Similarly, BP and 3'ss in intron 1 and 5'ss in intron 4 are named circ-BP, circ-3'ss, and circ-5'ss, respectively (Fig. 6a). For ss mutation, the conserved AG or GT nucleotides were deleted, and some possible cryptic splice sites nearby were mutated (Figure S7A and S7B). For BP mutation, since there is little information about BP sites in C. elegans [50], all A nucleotides were changed to G nucleotides in the 3' half of intron 1 and intron 4, without disturbing the RCM sequences. (Figure S7A and S7B).
The results showed that mutation of ss and BP for exon-skipping sufficiently abolished zip-2-skip. However, circ-zip-2 was still produced in these mutant strains (Fig. 6b). For mutations of ss/BP required for back-splicing, circ-3'ss mutation produced a circRNA using a noncanonical AA site [51] (Fig 6b, Figure S7B, and S7C). circ-5'ss and circ-BP mutation both blocked circRNA formation, but the skipped product can still be detected (Fig. 6b). These results suggest that in zip-2, exon-skipping is not absolutely required for back-splicing and vice versa. RCMs can promote both exon-skipping and back-splicing directly at the same time.