Read distribution and coverage
A read distribution comparable to the one achieved with Ion Torrent sequencing was achieved for all mitochondrial protein-coding transcripts and rRNAs[22], however the coverage for individual exons was significantly more uniform, indicating that the drops in coverage observed for exonic regions in Ion Torrent sequencing were in fact artifacts resulting from the library preparation process (Fig. 1). There was also notably higher relative coverage in intragenic regions of mitochondrial transcription units. The majority of reads were mapped to the large subunit rRNA (74,17% compared to 61.23% in Ion Proton RNA-seq data). The relative abundance of rns reads, as well as reads for all coding transcripts was lower for Nanopore sequencing (Table 1, Fig. 2). This is primarily due to increased representation of reads corresponding to tRNAs, demonstrating that two-stage mapping, along with in vitro polyadenylation, which allows for overcoming the 200bp threshold of MinION software (Supplementary Fig. 1), more than compensates for the previously reported issue of insufficient tRNA coverage[11] and provides a straightforward method for obtaining a comprehensive view o organellar tRNAs and other short transcripts. However it also underscores the degree to which the mapping strategy affects all the subsequent stages of transcriptome analysis.
Transcription units and transcript ends
The Candida albcans mitochondrial genome contains 8 policistronic transcription units (TUs)[22]. Nanopore sequencing enabled (both through direct observation and generation of consensus reads by RATTLE) the detection of long reads covering multiple genes within these transcripts, such as NAD6-NAD1 in TU01, ATP8-ATP6 in TU04, NAD2-NAD3 in TU05 and NAD4L-NAD5 in TU07, as well as multiple cases of reads covering coding transcripts along with adjacent tRNAs (Fig. 3A). Additionally, it was found that TU02 is considerably longer at the 3' end than previously reported, reaching as far as the end of an unidentified transcript at position 8940. The 5' and 3' ends observed for some mature transcripts have also been found to differ from those reported previously (Fig. 3B). Most of said changes, summarized in Table 2 are relatively small and may be due to artifacts from library preparation procedure for Ion Torrent sequencing. A significant 3’ extension of more than 100nt was also found for TU03, however its heterogeneity makes it impossible to define a specific 3’ end position. Of particular interest are the 5’ ends of TU05 and TU06, which have been found to be8 and 6 nucleotides downstream of putative transcription start sites, resoectively[22]. Most reads are in fact an additional 2–3 nucleotides shorter, and no reads corresponding in length to putative transcription start sites have been detected. If this is a consequence of posttranscriptional processing, it would not involve exonucleolytic truncation by CaPet127, as 5’ ends corresponding to the predicted promoter + 1 sites were not detected in ΔCaPET127 strain either. It should be noted however, that 5' end mapping by direct Nanopore direct RNA sequencing is also prone to artifacts, primarily 3' coverage bias, unless accompanied by 5' adapter ligation[39, 40]. Thus these relatively small changes at the 5' ends will require further verification.
Qualitative differences were also observed between the WT and deletion strains (Fig. 4). CaPet127 is a 5'→3' exonuclease — thus, changes at the 5' ends of transcripts were to be expected in ΔCaPET127 strain. This was indeed found for the TU08 and TU06/COB transcripts, where discrete isoforms with 5' extensions can be detected (Fig. 4A). These elongated transcripts are present along with properly truncated ones. This was to be expected for TU08, as the downstream tRNA genes would be excised from the transcript during the process of maturation. However, 5' elongation is also observed for TU06/COB, although at a relatively low level of approximately 3% (compared to TU08, where at least 40% of reads have 5’ elongations). This could suggest the presence of alternative transcription start sites which produce longer transcript that are subsequently truncated by CaPet127. However, a more parsimonious and indeed more likely explanation for the observed TU08 5’ extension might be that it is co-transcribed with TU07 (as transcripts overlapping both TUs were also observed). Thus what was previously believed to be two separate transcription units might in large part originate from single primary transcript, with endonucleolytic cleavage at the 3’ end of tS(GCU) in early stages of maturation producing two secondary transcripts. One might speculate if such an expression system, where TU08 might be transcribed both as a part of longer transcript along with TU07 and from its own promoter contributes to regulatory the flexibility of C. albicans mitochondria. Likewise, for TU06/COB the 5’ extensions might be an result of a low level readthroughs from TU5. However in this case any physiological significance of such events is far less likely.
The ΔCaDSS1 strain displayed a stronger phenotype and thus a higher level of intermediate transcripts corresponding to intragenic regions (also outside of established transcription units) was observed, which is in line with previously published results[13]. Since CaDSS1 is a 3'→5' exoribonuclease, long readthroughs were also observed (like in TU02, TU03 and TU04) (Fig. 4B). A short transcript of approximately 300nt was also detected at the 3' end of TU06, directly downstream of rns. It is present both in WT and deletion strains; however, in deletion strains, reads covering both said transcript and rns were also detected (Fig. 4C). This suggests that it is a product of a readthrough from rns, that is subsequently endonucleolytically excised (the presence of unexcised transcripts in deletion strains being a secondary effect of mitochondrial dysfunction).
Isoform analysis of intron-containing transcripts
The C. albicans mitochondrial transcriptome includes three intron-containing transcripts: rnl, COB and COX1. Unlike rnl, where mature rRNA makes up the majority of transcript species, for COB and COX1 the levels of mature isoforms appear to be underestimated. These results might be a consequence of lower coverage of these transcripts and in the case of COX1 also of transcript complexity and a very short exon 3 (of mere 10nt), and thus all the results from this section should be treated as semiquantitative. Relatively high levels of three different partially spliced isoforms of the COX1 transcript, missing either intron 1 (19.27%), intron 4 (10.88%) or intron 3 (5.33%) were detected. The percentages of COB transcripts retaining either intron 1 (7.47%) or intron 2 (9.08%) were also comparable. This finding indicates that splicing of these transcripts isn’t strictly a sequential process. For rnl on the other hand, the abundance of transcripts retaining intron 2 was nearly 40x higher than of those retaining intron 1, implying a considerable degree of sequentiality.
Changes in the relative abundances of transcript isoforms were observed between the WT strain and deletion strains. A decrease in mature transcript levels and an increase in unspliced transcript levels were very pronounced in ΔCaDSS1 strain, with ΔCaPET127 exhibiting intermediate levels of mature and unspliced transcripts between those of the WT and ΔCaDSS1 strains. This result is in line with previous studies indicating that ΔCaDSS1 has considerably more severe mitochondrial defects, as well as Northern blot, NGS and reverse transcription analyses for ΔCaDSS1 and ΔCaPET127 indicating, respectively: lower levels of mature mitochondrial transcripts, higher intron coverage and an increase in retention[13, 14]. Changes in levels of spliced ”intermediate” isoforms were also observed for each gene (Table 3). These findings are broadly consistent in deletion strains when compared to WT strain, indicating that splicing impairment is mainly a secondary effect of general mitochondrial dysfunction. However, some differences between the deletion strains were also observed. The relative abundance of isoforms retaining intron 2 in COB and introns 3 and 4 in COX1 was decreased in ΔCaPET127 but increased in ΔCaDSS1, and COB intron 1 and COX1 introns 1 and 2 were retained at higher relative levels in ΔCaPET127, which suggests that 5’/3’ exonucleolytic RNA processing might affect the splicing of adjacent introns.
Modification detection, identification and quantification
Comparing sequencing data from native RNA (be it from WT or deletion strains) with in vitro transcribed RNA of the same sequence allows for the detection of putative posttranscriptional modification sites (as IVT RNA contains no such modifications). This was conducted for C. albicans mitochondrial transcriptome via the MoP2 pipeline, with IVT transcribed RNAs covering all mitochondrial coding sequences, rRNAs and tRNAs serving as unmodified reference. Due to relatively low coverage for most coding transcripts, MoP2 analysis reliably indicated putative modification sites only for rnl, rns and COX2 transcripts (Fig. 5A). 1 out of 8 detected sites, found in rnl, corresponded to a splice junction. This might indicate the presence of a modification involved in or resulting from RNA splicing, but may also be an artifact of misalignment during the mapping step. However, all the sites mentioned above are detected repeatedly and for a wide range of mapping parameters. Additionally, two putative modification sites found towards the 3’ end of COX2 transcript correspond to Asn tRNA, which is a part of COX2 pre-mRNA. No qualitative differences were detected between the WT strain and deletion strands in regard to modifications described above.
Further analyses were conducted on partially ribodepleted WT RNA, which reduced the percentage of rRNA reads from 96–98% to 61–66% (Supplementary Table 6). Here, a number of individual putative modification sites were detected on coding transcripts (Fig. 5B). In the case of COX1 transcripts, all sites found were located in or around positions corresponding to splice sites. A considerable number of said sites are found towards the 3’ ends of transcripts. For NAD2 and NAD4 modifications were detected for cotranscribed tRNAs, respectively at the 5' and 3' ends of transcripts. Additionally, when entire transcription units (or large parts thereof) are being analyzed, clusters of modifications corresponding to tRNAs are detected (Fig. 5C), hinting at a high stoichiometric level of modifications in tRNAs relative to coding transcripts (MoP2 pipeline is not conducive for analyzing individual short and densely modified transcripts, such as tRNAs). This analysis yielded a general image of the modification landscape for mitochondrial transcripts in terms of modified positions. However, with the notable exception of pseudouridines (further elaborated on later in this section), it does not provide information on specific modification types. Attempts to identify three types of common RNA modifications, m6A (N6-methyladenosine), Ψ (pseudouridine) and m5C (5-methylcytidine) are described below.
FTO is a m6A RNA demethylase. Thus, a comparison between native WT RNA, FTO-treated WT RNA and in vitro transcribed RNA should enable the identification of m6A sites in mitochondrial transcripts. Data from FTO-treated RNA was analyzed with Nanopolish-based CHEUI pipeline[38], with untreated WT RNA and in vitro transcribed RNA serving as controls. Eight putative m6A positions were detected following a three-way comparison and filtering (see Materials and Methods). However, only one, detected in a k-mer 303–312 of rnl corresponded to a k-mer detected by MoP2 comparison of WT and FTO treated samples. This k-mer also had the smallest detected stoichiometry difference in FTOvsIVT comparison among filtered sites and nearly identical results for WTvsIVT and WTvsFTO comparisons. K-mer 1395–1404 in rns did correspond to the highest scoring raw k-mer in the MoP2 analysis, even though no k-mer in WT/FTO comparison in MoP2 analysis passed the pipeline’s filtering. For COB k-mer 855 a similar situation was observed, with an overlapping MoP2 k-mer scored 4th on the raw k-mer list (Table 4 and Supplementary Tables 3 and 4). A subsequent LC-MS and dot blot analyses found relatively low levels of m6A in untreated WT sample (Supplementary Figs. 2, 3 and 4). This suggests a small number of m6A positions in C. albicans mitochondrial transcriptome and/or low m6A methylation level. This also makes the estimation of FTO-mediated demethylation efficiency problematic. Thus, the putative m6A sites listed in Table 3 will require independent experimental verification.
A separate analysis of tRNA transcripts revealed a high level of U-to-C miscalling, indicative of the presence pseudouridine (Ψ)[35, 41](Table 5). Majority of uracils in position ~54 were miscalled as cytosines indicating the presence of pseudouridines in respective reads, both in pre-tRNAs and mature tRNAs, which suggests that the modification takes place prior to tRNA excision from parent transcript. Other uracils, mainly in positions 25-41 were also modified, but to a lesser extent (Fig. 6). The miscall levels for these positions were also much lower for pre-tRNAs than for mature tRNAs, indicating that they are primarily modified followingtRNA excision (Fig. 7). These results are in line with results obtained in vitro for bacterial tRNAs[42].
Low levels (of approximately 20%) of A-to-G and G-C miscalls, indicative of inosine[43] and m7G[44] modifications respectively, have also been detected in a number tRNA sites (Fig. 6, Supplementary Table 8). However, the relatively low level of these miscalls along with lower repeatability within and among tRNA transcripts means that these results warrant further experimental verification.
m5C modifications can be detected via bisulfite sequencing, wherein unmodified cytosines are converted to uracils, while m5C positions are not. Bisulfite conversion of mitochondrial RNA has previously been reported to occur with relatively low efficiency[45]. This was also the case with Candida albicans mtRNA, which necessitated the use of in vitro transcribed RNA as control and limited the analysis only to positions with high levels of m5C methylation. Bisulfite sequencing also served as an additional control for pseudouridine detection, as U-to-C miscalls due to Ψ modification are not affected by bisulfite conversion. Putative pseudouridine modifications of tRNAs at position ~ 54 were in fact the ones with strongest signals in ELIGOS and NanoRSM analyses. Due to fragmentation occurring during the bisulfite conversion, as well as spurious alignments that necessitated the exclusion of BWA short read mapping, both native and synthetic RNA were mapped either with minimap2/BWA (with long read minimap2 mapping and subsequent mapping of short reads with minimap2 and BWA with standard settings) or with short read mapping with minimap2. The list of positions thus detected is shown in Table 6. Three putative m5C sites were detected: in tyrosine, glutamine and serine tRNAs. The site in tY(GUA) was detected with high confidence both with standard and short read mapping, for both ELIGOS and NanoRSM analyses. The tS(GCU) site was detected by both algorithms with short read mapping and and by ELIGOS with standard mapping. The tQ(UUG) site was detected only by NanoRSM with short read mapping. This is almost certainly not an exhaustive list of all m5C sites present in the mitochondrial transcriptome, due to a combination of incomplete bisulfite conversion - which necessitated strict filtering of putative sites - and insufficient coverage of some mitochondrial transcripts. Likewise, only pseudouridine sites with sufficient coverage were included in the list, thus the absence of some previously detected modifications should not cast doubt on their presence (as all tRNA pseudouridine sites at position ~ 54 were detected whenever the coverage was sufficient).