RNAs are highly structured macromolecules with various post-transcriptional modifications, including 3’ polyadenylation, splicing, and chemical modifications. Since the late 1950s, more than 300 types of chemical modifications (epitranscriptomes) have been identified10, 11, adding another layer of complexity to RNA biology. These modifications control a wide range of cellular and viral processes and are associated with more than 100 human diseases11–13. Studying these modifications, however, has been slow and laborious due to technical limitations inherent in the sequencing of “true” (native) RNAs1.
HIV-1 has a significantly higher number of chemical modifications on its RNAs than typical cellular transcripts6, 14. However, the evolutionary benefits and HIV-1-specific roles of these modifications in viral replication and various RNA functions remain unclear and sometimes even controversial, showing both pro- and anti-viral effects depending on the virus type, replication stage or tested cell type3, 9, 15–18. Most RNA modification studies to date have relied on indirect analyses of the phenotypic effects of perturbing host effectors (known as writers, erasers, and readers)1–9, neglecting the potential site-specific and context-dependent roles of chemical modifications12, 19–24. Although studies using short-read sequencing have mapped several common modifications onto the HIV-1 genome, including m6A, 5-methylcytosine, 2’-O-methylation and N4-acetylcytidine, they have only provided low-resolution and population-average values of modifications of a given type for fragmented RNAs1–9. The site-specific roles of individual modifications and their ensembles on the same RNA strand remain largely unknown.
Nanopore direct RNA sequencing (DRS) is a powerful tool that can analyze individual strands of native RNAs as they continuously pass through nanopores25–27. This unique technology allows for a simultaneous evaluation of key features of RNAs at the single-molecule level, including RNA sequences, chemical modifications, splicing isoforms, 3’ polyadenylation, and absolute quantitation and profiling of a heterogeneous pool of RNA transcripts27–30. DRS is also free from the experimental biases associated with current short-read sequencing methods2, 31. However, DRS faces challenges when analyzing long RNA molecules and RNAs that cause motor enzyme stalls25, 32–36 and when analyzing chemical modifications at the single-RNA level. Here, we present several technical innovations, including full-length DRS and read-level binary classification methods, that maximize the potential of the DRS technology for the study of HIV-1 RNA biology. We demonstrate dominant and site-specific m6A modifications at three distinct locations on the RNA genome and characterize their functional significance in regulating HIV-1 replication at the individual RNA level.
Multiplex reverse-transcription improves the DRS of full-length HIV-1 RNA.
DRS of long and complex RNA molecules, such as premature transcripts (13–18 Kb), cellular Xist and mitochondrial mRNAs, plant and virus RNAs, has been challenging25, 32–36. In our study, initial conventional DRS procedures failed to generate more than 13 reads (0.01% recovery) of full-length HIV-1 sequences in 8 out of 9 runs (Fig. 1a,b and Extended Data Fig. 1). Given that HIV-1 RNAs are 2–30 times more modified than typical cellular mRNAs4, 14 and have complex secondary and tertiary structures37, 38 – features known to stall reverse-transcription31, 39, 40 (RT, an optional step of DRS known to improve the read throughput26, 36) – we reasoned that inefficient linearization of HIV-1 RNA by RT might be the cause of the failure. To alleviate this, we established a multiplex RT using oligonucleotide primers specific to different parts of the HIV-1 genome to improve full-length DRS (Fig. 1a,b and Extended Data Fig. 1). Under optimized conditions using 111 different primers, we generated a total of 810, 1,797, and 2,655 reads of full-length unspliced (US; ~ 9 Kb), partially spliced (PS; ~ 4 Kb) and completely spliced (CS; ~ 2 Kb) intracellular HIV-1 RNAs, respectively, as well as a total of 3,985 reads of full-length virion RNAs (Supplementary Table 1). The improved DRS with multiplex RT enabled a comprehensive analysis of individual reads of HIV-1 RNAs in virions and in virus-producing cells.
Unexpected simplicity in the modification landscape points to the site-specificity of m6As.
To identify modification sites on a whole-genome scale, we used a two-step signal-refinement process that involved the use of in vitro transcribed (IVT) HIV-1 RNAs as a non-modified RNA control (see Supplementary Note 1 for details). With our optimized conditions, the Tombo analysis41 generated highly reproducible per-read modification signals (p values) in our repeated experiments (Extended Data Fig. 2). The results from Tombo and other tested software tools, including Eligos242, Nanocompore43, and xPore44, consistently revealed a small number of prominent modification signals on the 3’ side of the HIV-1 genome (Fig. 1c and Extended Data Fig. 4).
To identify the most likely modification sites, we compared the modification signals generated by three different tools – Tombo, Eligos2, and Nanocompore41–43, each analyzing different aspects of DRS signals, such as ionic current levels, base-calling error rates, and dwell time45. We selected the top 149, 167, and 156 modification signals, respectively, from these tools and cross-compared them (Supplementary Note 1.6). Despite the high reproducibility of all three tools (Extended Data Fig. 4), we identified only seven common peaks, reflecting the variable detection efficiencies when using different DRS signal features45. Notably, among the seven peaks, five were located at or adjacent to the known m6A motifs (DRACH: D = A/G/U, R = A/G, H = A/C/U). One DRACH peak at position A17 was excluded because we found the signals near the end of the reads to be inherently unstable (see Supplementary Note 1.2.3). The remaining four DRACH sites (A8079, A8110, A8975, and A8989) consistently exhibited strong modification signals across all our tests (Supplementary Note 1) and were highly conserved among HIV-1 subtype B (Fig. 1e). These sites also coincided with the major m6A peaks in previous short-read sequencing studies6, 9. Considering 242 DRACH sites present in the HIV-1 genome, the predominant modification signals in these few DRACH sites suggest their strong site-specificity in terms of m6A functions or its installation.
Moreover, the 25 common modification peaks detected by both Tombo and Eligos2 were also significantly enriched near the DRACH sites (Fig. 1e) and in the m6A-reader binding sites (Extended Data Fig. 5a)6, 9. In contrast, we did not find any notable associations between these common peaks and other modifications, such as the 2’-O-methylation, 5-methylcytosine, or N4-acetylcytidine sites2, 4, 5, 7, 8(Extended Data Fig. 5a). Given these modifications are several-fold more frequent or at least as common as m6As on the HIV-1 genome6, 14, the results suggest they may be less site-specific than m6As. The signals from non-site-specific modifications are likely diluted in this population-based analysis, making them insignificant or undetectable.
DRS-detected dominant m6As are confirmed at the single-nucleotide resolution.
To evaluate the existence of the four most probable m6As, we first analyzed HIV-1 virion RNA after in vitro treatment with an m6A eraser, ALKBH546. All of the four m6A sites showed varying levels of signal reduction in the Tombo, Eligos2, Nanom6A47 and dwell time48 analyses (Fig. 2a and Extended Fig. 5b). Next, we introduced a point mutation to each of the four most probable m6A sites (A8079G, A8110G, A8975C, and A8989T) of an HIV-1 provirus plasmid (pNL4-3). All mutants showed a complete absence of modification signals when compared to IVT RNAs with identical mutations (Fig. 2b). Lastly, we confirmed the two prospective m6As at positions A8975 and A8989 by oligonucleotide liquid chromatography coupled to tandem mass spectrometry (oligonucleotide LC–MS/MS) (Fig. 2c)49. As the significant modification signals on A8079 site were consistent in all our tests, we selected A8079, A8975, and A8989 to investigate their site-specific roles. m6A at A8110, however, remains unclear.
Knocking out all three dominant m6As, but not individual m6A, affects HIV-1 fitness.
The functions of m6A modifications are believed to be determined by host effectors that catalyze, recognize and remove such modifications (known as writers, readers and erasers, respectively)12. Mounting evidence also suggest that these modifications have site-specific and context-dependent roles, controlling local RNA-protein interactions by modulating the RNA structures where the interactions occur12, 19–24. m6As play important roles in regulating various aspects of RNA biology, including RNA structure, splicing, translation, metabolism, and translocation within cells12 and promote HIV-1 replication in general3, 6, 8, 9, 17. However, our current understanding is primarily based on indirect analyses of the phenotypic effects of perturbing m6A writers, readers, or erasers in host cells, which overlook the potential site-specific roles of the modifications. Some findings remain controversial, showing inconsistent results depending on the replication stages, cell types, and assays used in the studies3, 9, 15–18.
To directly analyze the functions of m6As on HIV-1 RNA, we generated m6A-knockout viruses using site-directed mutagenesis and evaluated key steps of viral replication in WT host cells (Fig. 3a). Although the m6As were effectively removed (Fig. 2b), none of the single mutations resulted in significant reductions in any of the tested replication steps, including total HIV-1 RNA production, US RNA production, viral protein expression (Gag, Vif and envelope gp41), virion production (extracellular p24), and infection of reporter cells (Fig. 3 and Extended Data Fig. 6). However, the triple mutation of all three m6A sites significantly reduced US RNA levels (Fig. 3d). It is known that the loss or reduction of US RNA results in a drastic reduction in viral fitness50 because US RNAs are essential for producing the structural proteins (Gag/Gag-Pol) and genomic RNA. As expected, all subsequent steps, including p24 Gag production, virion release, and viral infectivity, were also significantly reduced (Fig. 3b,e,f).
The triple m6A mutation induces over-splicing of HIV-1 RNA, lowering US RNA levels.
Given the critical importance of RNA splicing in HIV-1 replication, particularly in controlling US RNA levels50, we investigated the roles of the three m6As in HIV-1 alternative splicing (Fig. 4 and Extended Data Fig. 7). HIV-1 produces over 50 different forms of spliced RNA, an extraordinarily high level of alternative splicing17, 51. While DRS can effectively disentangle complex RNA isoforms28, 32, 36, 52, analysis of HIV-1 RNAs has remained impractical due to poor full-length sequencing. Here, using the new multiplex RT method, we were able to reproducibly generate full-length reads of approximately 2 Kb of CS, 4 Kb of PS, and 9 Kb of US RNA, with recovery rates of 54.1%, 31.5% and 34.9%, respectively, (Fig. 4a,b). We successfully assigned 94.8% of these full-length reads to 186 exon combinations, including 53 major isoforms53, without any notable ambiguity (Fig. 4a). The read counts were generally consistent with the densitometric quantification of PCR amplicons of the CS and PS isoforms (Fig. 4c and Supplementary Table 2).
Regarding total HIV-1 RNA production, we observed no significant differences between the WT HIV-1 and triple-mutants (Fig. 4d-i). As expected from the molecular biology tests described above, the fraction of US RNA was significantly lower in the triple mutants than in the WT (Fig. 4d-ii). Since cells rarely tolerate unspliced or incompletely spliced transcripts, HIV-1 must heavily suppress its RNA splicing in order to maintain sufficient levels of US RNA50. However, triple mutants showed a significantly increased usage of D1 donor (Fig. 4d-iii, which occurs in all spliced RNA), A7 acceptor (Fig. 4d-iv, which occurs for all CS), and all other donors and acceptors (Fig. 4d-v). Consequently, the “over-splicing” by the triple mutants significantly reduced US RNAs while relatively increasing the CS portion (Fig. 4d-vi). Single-mutant viruses also showed an increase in CS RNA but maintained a considerably higher level of US RNA than did the triple mutants (Fig. 4d-vi).
Triple mutations reduced HIV-1 protein translation without affecting poly(A) tail lengths.
In addition to its role in RNA splicing, m6A has been associated with RNA translation, metabolism, and 3’ polyadenylation3, 54. Both viral Vif and envelope proteins are mainly translated from PS RNA. Although the PS RNA levels (Fig. 4d-vi) and mRNAs for Vif and envelope (Fig. 4e) were maintained at relatively similar levels among cells producing any mutants, intracellular Vif and envelope gp41 proteins were significantly reduced by triple mutations, but not by any of the single mutations (see Fig. 3c), indicating that inefficient translation of Vif and envelope mRNAs by triple mutants compared to those by single mutants or WT. The length of 3’ poly(A) tails is known to have significant implications for RNA translation and metabolism30. Our analysis confirmed that there were no notable differences in poly(A) tail lengths between the WT and mutant RNA isoforms in this regard (Fig. 4f and Extended Data Fig. 8). These results support the regulatory roles of these m6As in viral RNA translation.
Read-level binary classification identifies RNA subspecies with distinct m6As.
To investigate the functions of the three m6As at the single-RNA-molecule level, it is crucial to determine the presence of m6As accurately and without bias for each read and at different sites. We have developed new read-level binary classification methods that are specific to each of the three m6As (m6Arp models; see Supplementary Note 2 for details). These methods are based on the consistent read-level p-values surrounding these sites (Fig. 5a-ii and Extended Data Fig. 9e). The generation of per-read p-values is highly reproducible under our optimized conditions (r2 > 0.999 with > 25K IVT reads), and the p-values remained consistent in repeated experiments (Extended Data Fig. 2c,d).
Our pre-trained m6Arp models showed an area under the Receiver Operating Characteristics curve (AUROC) ranging from 0.95 to 0.97, with false positive rates of 8.40–10.80% and false negative rates of 8.90–12.40% for the three m6As (Supplementary Table 3). Our models outperformed Nanom6A47 and m6Anet55, which are k-mer-based methods optimized for whole transcriptome analysis (Fig. 5a-iii). Moreover, the performance of our models, which evaluate m6A presence one read at a time, is unaffected by the number of reads or the data composition of test samples (i.e., data sparsity and imbalance problems)44, 45, 55, 56. These features of our models enabled us to accurately determine RNA subspecies with distinct ensembles of the three m6As (subspecies A-H) (Fig. 5a-iv), and to compare these RNA subspecies in various settings.
The three m6As are more densely present on HIV-1 mRNAs than on genomic RNA.
Our models also demonstrated a strong linearity of quantification (r2 > 0.9982) (Extended Data Fig. 9g). Consistent with a recent report demonstrating reduced m6A levels in the genomic RNAs15, we found the stoichiometry of the three m6As were significantly higher on translating mRNAs (CS and PS RNA) than that on genomic (virion) RNA (Fig. 5b). The estimates from other tools, including Nanom6A, m6Anet and Tombo, were consistent with our findings (Extended Data Fig. 9h). These m6As were most frequently detected on CS, showing average 88.5% (±1.8 SD), 78.7% (±3.6 SD) and 47.2% (±3.6 SD) of m6A modifications at A8079, A8975, and A8989, respectively.
Interestingly, read-level analyses of RNA subspecies revealed that virtually all CS and PS reads had at least one of these m6As (subspecies A-G, collectively accounting for 97.5% and 96.3% of all CS and PS reads, respectively), while the fraction dropped to 82.1% in virion RNA (Fig. 5c). Moreover, RNA subspecies with multiple m6As (subspecies A-D) accounted for a predominant portion in the CS and PS RNAs (80.7% and 76.4%, respectively), whereas the portion was substantially lower in the virion RNAs (47.2%). US RNAs (consisting of both translating mRNA and genomic RNA types57, 58; see Fig. 4b) showed a mixed character of mRNAs and virion RNAs as expected. The group H (lacking the three m6As) was mostly not spliced and highly enriched in genomic RNAs. These results, therefore, further support the important roles of these m6As in splicing and translation. Having a relatively lower number of m6As on the genomic RNA may be favored during the virion packaging and during viral RT where m6As are reported to be inhibitory9, 15.
Given the genetic and structural differences between HIV-1 mRNAs and genomic RNAs57, 58, the differential levels of the three m6As between these two types of RNAs may be controlled co-transcriptionally59, 60. A recent report also suggests that m6As on the genomic RNA are selectively demethylated by a Gag-FTO complex prior to virion packaging15. While the exact modes of action remaining unclear, the fine-tuning of these m6As by HIV-1 emphasizes the critical importance of these m6As in viral replication.
Redundant roles of each of the three m6As in regulating RNA splicing.
We analyzed splicing patterns of these RNA subspecies to evaluate the roles of each of these three m6As and their ensembles. Consistent with splicing patterns on a total population scale, all WT subspecies showed substantially lower donor (e.g. D1, D4, A5) and acceptor (e.g. A7) usages than the triple mutants (Fig. 6a), pointing to the suppressive roles of these m6As. Among the subspecies A-G, however, there were only moderate differences in splicing patterns and donor or acceptor usages. Having at least one of the three m6As, regardless of the position or the number of m6As installed, was sufficient for these RNAs to control splicing events and produce all major splicing isoforms (Fig. 6a).
Having multiple of these m6As on individual RNAs minimizes the risk of mutational loss.
Given the functional redundancy of these m6As, we then asked why HIV-1 maintains excessive m6As on its RNAs. First, having multiple m6As may be necessary to sustain normal levels of viral replication. In such a scenario, our single mutations would show either a selection for RNAs with multiple m6As or an increase in m6A stoichiometry at intact DRACH sites to compensate for the loss of a dominant m6A. However, we found indistinguishable or only moderate differences in m6A stoichiometry (Fig. 6b,c) and m6A landscape (Extended Data Fig. 10a) between the single mutants and the WT HIV-1. Like WT RNAs, all RNA subspecies of single mutants, regardless of m6A status, showed no significant bias toward the utilization of certain splicing donors or acceptors (Extended Data Fig. 10b-d).
Given that there is no clear evidence of adaptive selection or changes in m6A functions, a more favorable scenario may be that HIV-1 spreads the risk of losing m6A functions61. HIV-1 may tolerate these multiple redundant m6As to minimize the risk of unpredictable random mutagenesis (10− 5 to 10− 3 mutations/bp/cycle for HIV-162) that may knockout a major m6As, analogous to bet-hedging in evolutionary biology reducing risks against unpredictably fluctuating environments61. Interestingly, all the single mutants maintained at least one of the three m6As in most of their CS (91.5–95.8%) and PS (89.1%-95.0%) RNAs, levels comparable to the WT HIV-1 (Fig. 6d). Despite causing substantial loss of m6A at the population level, single mutations had only a marginal effect on overall viral fitness. The loss of all three, however, eroded the potential of RNA communities to sustain their control over splicing and translation, and adversely affected the various stages of the HIV-1 life cycle (Fig. 3).
Perspective
In this study, we made significant strides in understanding the HIV-1 epitranscriptome through technological innovations enabling a full-length and individual-RNA-level analysis of long and complex RNAs. Our analysis revealed that HIV-1 maintains multiple functionally redundant m6As at a few positions near the 3’ end of its RNAs to control splicing and translation. Given the high mutation rates of HIV-162, this strategy may be necessary to minimize the risk of losing the crucial m6As during viral replication. Moreover, we discovered that the three m6As are significantly more densely installed in translating mRNAs than in genomic RNAs. HIV-1 may need to fine-tune m6A levels on genomic RNAs to reduce the potential inhibitory effects of the m6As during virion packaging and reverse transcription9, 15 while maintaining a certain level of m6As to evade innate immune responses in the new host63, 64.
We show here that a full-length, single-molecule-level analysis using DRS can provide new opportunities to untangle the complexity of the RNA biology. With the full-length DRS data, we demonstrated a systemic and comprehensive analysis of key features of complex HIV-1 RNA isotypes, including epitranscriptomic modifications, alternative splicing and poly(A) tails from the same molecules, otherwise impossible with conventional assays. Our data reveal remarkable epitranscriptomic variations among HIV-1 RNAs, which adds another layer of complexity to HIV-1’s already diverse genetics and structural characteristics17, 37, 50, 51, 65. Our discoveries, in turn, generate new questions about the functions and roles of HIV-1’s versatile RNA molecules. The new methods and analytical standards presented herein, can serve as a useful reference for future investigations of RNAs of interest and various RNA viruses in the ever-expanding RNA virosphere66.