Deep mutational scanning of LINE-1 and OR4K15 ribozymes. Our previous work indicates that deep mutational scanning can lead to co-variation signals for highly accurate inference of base-pairing structures (24). Here, the same technique was applied to the original LINE-1 ribozyme (146 nucleotides, LINE-1-ori) and OR4K15 ribozyme (140 nucleotides, OR4K15-ori). These original sequences were generated from a randomly fragmented human genomic DNA Selection based biochemical experiments (12). Thus, their functional and structural regions are unknown. In this method, high-throughput sequencing can directly measure the relative cleavage activity (RA) of each variant in the mutant libraries of these two ribozymes according to the ratio of cleaved to uncleaved sequence reads of the mutant, relative to the same ratio of the wild-type sequence. The relative activities of single mutations at each position indicate the sensitivity of cleavage activity to the mutations in that position. Figure 1A shows the average RA for a given sequence position for all LINE-1-ori mutants with single mutations at the position. Mutations in most positions in LINE-1-ori did not lead to large changes in RA. In other words, these sequence positions are unlikely to contribute to the specific structure required for the ribozyme to be functional. The activity of the LINE-1-ori ribozyme is only sensitive to the mutations in two short segments (54-71 and 83-99) where the RA can be reduced to the nearly zero. Thus, the two terminal ends and the central region between bases 72 and 82 are not that important for LINE-1 self-cleaving activity. In Figure 1B, a similar distribution of RA was observed in OR4K15-ori. Only sequence positions inside the two short segments (70-84 and 101-116) are sensitive to mutations, which suggests that only these regions are required to form the functional RNA structure of the OR4K15 self-cleaving ribozyme. Thus, we have identified the contiguous functional regions of LINE-1 ribozymes (54-99, LINE-1-rbz) and OR4K15 ribozymes (70-116, OR4K15-rbz).
The deep mutation data was further employed to search for the base pairs inside these two ribozymes by using CODA analysis. Figures 1C and 1D highlight the base pairs within the region 54-99 of LINE-1-ori (LINE-1-rbz) and the region 70-116 of OR4K15-ori (OR4K15-rbz), respectively. The CODA result for LINE-1-ori shows four base pairs (A14U34, A16U32, G17C31, C18G30) in one possible stem region, and another two base pairs (A4U43, A6A41) in the second possible stem region. Due to the lower mutational coverage, the CODA result for OR4K15-ori shows fewer base pairs than LINE-1-ori. Only two base pairs (U1A47, U3A45) in one possible stem region, and one lone base pair (A13U34) have been observed. The low mutation coverage for OR4K15-ori was due to the mutational bias of error-prone PCR (Supplementary Figures S1, S2, S3 and S4).
A few but strong co-variation signals from CODA analysis can be expanded by MC simulated annealing as demonstrated previously (24). Several contiguous base-paired regions were detected (Figures 1C and 1D). LINE-1-rbz has two highly reliable stem regions with the length of 6 nt, including one non-Watson-Crick pair A6A41. For OR4K15-rbz, the functional region has two reliable stem regions with the lengths of 5 nt and 4 nt, respectively, including one non-Watson-Crick pair G14U33. Most of the base pairs discovered by MC simulated annealing are non-AU pairs, whose covariations are difficult to capture by error-prone PCR because of mutational biases. In addition, these two stems are consistent with the single mutation profiles. For example, two single mutations G14A and U33C showed high RA, 1.55 and 1.82, respectively, suggesting the high possibility of a G14U33 pair at these positions, because these two separate mutations change the wobble GU pair to the Watson-Crick AU and GC pairs, respectively, and lead to a more stable structure and higher activity. If G14U33 is true, a natural extension of the stem region is G15C32 for another canonical Watson-Crick pair. This pairing region was missed by the CODA+MC result. There are some other base pairs that appeared in low probabilities after MC simulated annealing. They are considered as false positives because of low probabilities and lack of support from the deep mutational scanning results. The appearance of false positives is likely due to the imperfection of the experiment-based energy function employed in current MC simulated annealing.
Further validation of base pairing information by DMS of LINE-1-rbz. To further improve the signals, we employed the contiguous functional segment (54-99, LINE-1-rbz) for the second round of deep mutational scanning (Supplementary Figure S5). Performing the second round is necessary because the deep mutational scanning of the original LINE-1 ribozyme has a low 18.5% coverage of double mutations (Supplementary Table S1). This low coverage is not sufficient for accurate inference of the secondary structure in the full coverage. This second round employed a chemically synthesized doped library with a doping rate of 6%, rather than a mutant library generated from error-prone PCR biased toward the sequence positions with A/T nucleotide (Supplementary Tables S1, S2 and Figures S1, S2, S3 and S4). In addition, to amplify the signals of cleaved RNAs after in vitro transcription of the mutant library of LINE-1-rbz, we selectively capture cleaved RNAs by employing RtcB ligase and a 5′-desbiotin, 3′-phosphate modified linker because they only react with the 5′-hydroxyl termini that exist only in cleaved RNAs (25). The captured active mutants were further enriched by the streptavidin-based selection after ligation. The technology improvement leads to less biased mutations in terms of the sequence positions (Supplementary Figures S1 and S2) and mutation types (Supplementary Figures S3 and S4). More importantly, it achieves 99.3% and 99.9% coverage of single and double mutations, respectively, within the contiguous functional segment (Supplementary Table S1).
Consistent secondary structure from DMS of LINE-1-ori and LINE-1-rbz. We performed the CODA analysis (24) based on the relative activities of 45,925 and 72,875 mutation variants obtained for the original sequence and functional region of the LINE-1 ribozyme, respectively. CODA detects base pairs by locating the pairs with large covariation-induced deviations of the activity of a double mutant from an independent single-mutation model. Figure 2A shows the distribution of relative activity (RA’, measured in the second round of mutational scanning) of all single mutations for the LINE-1-rbz ribozyme, which is consistent with the distribution of RA in the functional region of LINE-1-ori ribozyme (Figure 1A and Supplementary Figure S5). Moderate or high relative activity values (RA’ > 0.5) at the two outer ends (18C, 30G, C46) of the stem regions suggest that these nucleotides might not be necessary for the function of LINE-1-rbz, whereas RA’ values at 1G were affected by the transcription efficiencies of different nucleotides downstream the T7 promoter.
Figure 2B shows that data from both the original sequence and functional region of the ribozyme reveal the existence of two stem regions. The latter clearly shows two stems with lengths of 6 nt and 5 nt, respectively, due to nearly 100% coverage of single and double mutations for the functional region. Moreover, it suggests a non-canonical pair 6A41A at the end of the first stem, although the signal is weak. Such a weak signal is expected because the non-canonical pairs are typically less stable than standard base pairings (26). There are some additional weak signals in the LINE-1-rbz result. Most of these signals are for the base pairs at a few sequence distances apart (|i-j|<6, local base pairs). We found that some of them are caused by a high relative activity (RA’) of the double mutant and are not likely to be true positives because the corresponding single mutations are not very disruptive (RA’ > 0.5). The consistency between base pairs inferred from deep mutational scanning of the original sequences and that of the identified functional regions confirmed the correct identification of functional regions for LINE-1 and OR4K15.
Probabilistic CODA results were further refined by Monte Carlo simulated annealing (CODA+MC). The resulting base pairs (Figure 2C, upper triangle) removed mostly local false positives. However, the noncanonical AA pair was removed as well, likely because the energy function does not account for noncanonical pairs. A new 18C30G pair added is a natural extension of the second stem. Two separate consecutive base pairs (7G37C, 8C36G) with relatively low signals were also added in the CODA+MC result. These two base pairs are likely false positives because they do not appear in all the models generated from Monte Carlo simulated annealing with a low probability of 0.52. Taking all the information together leads to a confident secondary structure for the LINE-1-rbz ribozyme, which contains two stem regions (P1, P2), two internal loops, and a stem-loop (Figure 2D left).
The consistent result between LINE-1-rbz and LINE-1-ori allowed us to use OR4K15-ori to directly infer the final inferred secondary structure for the functional region of OR4K15. The secondary structures for these two ribozymes (Figure 2D) are similar to each other. Both two ribozymes have two stems (P1, P2), two internal loops and a stem-loop region. Both stem-loop regions (SL2) are insensitive of ribozyme activity to mutations (Figures 1A, 1B). The internal loop regions (L1) of LINE-1-rbz and OR4K15-rbz are nearly identical except that C8 in OR4K15-rbz is replaced by U38 in LINE-1-rbz (Figure 3A). Given the conserved core regions of the LINE-1-rbz and OR4K15-rbz ribozyme, we named them lantern ribozymes because both are shaped like a Chinese lantern.
Consensus sequence of two lantern ribozymes. The consensus sequences (Figure 2E) for LINE-1-rbz and OR4K15-rbz were generated by R2R (27) based on the multiple sequence alignment of 1394 and 621 mutants with RA ³ 0.5 from deep mutational scanning. Figure 2E overlays the consensus sequences onto the secondary structure models from the CODA+MC analysis. The self-cleavage of the two lantern ribozymes occurs between a conserved CA dinucleotide which locates inside the longer part of the internal loops (L1) linking P1 and P2. The internal loops are the regions with the highest conservation, which suggests their important roles in cleavage activity. Both stem regions (P1, P2) are not very conserved because of covariations. For example, in Figure 2E (right), 1U47A can be replaced by 1C47G, and 14G33U will form 14C33G after double mutations. Covariation of all CG pairs was missing in OR4K15-rbz due to the mutational biases. The stem-loop regions show the lowest sequence identity, consistent with its insensitivity of catalytic activity to mutations in Figures 1A and 1B, providing additional support for the secondary structure models obtained here.
The secondary structure of OR4K15-rbz is a circular permutation of LINE-1-rbz self-cleaving ribozyme. Removing the non-functional loop regions (Figures 1A and 1B) allows us to recognize that the secondary structure of OR4K15-rbz is a circular permutated version of LINE-1-rbz. As shown in Figure 3A, after a permutation, OR4K15-rbz has two conserved internal loops with one loop identical to LINE-1-rbz and the other loop differed from LINE-1-rbz by one base. The only mismatch U38C in L1 has the RA’ of 0.6, suggesting that the mismatch is not disruptive to the functional structure of the ribozyme. As the simplest ribozymes reported so far, it is important to know if these two ribozymes share the same motifs with the known ribozyme families.
Here we used pattern-based similarity search (RNAbob, http://eddylab.org/software.html) to search the structural patterns against the known ribozyme families in Rfam database (Supplementary Figure S6). The structural patterns of two lantern ribozymes were derived from our deep mutational scanning results. We obtained three hits with identical motifs all from the twister sister ribozyme family (Supplementary Table S3). Figure 3A further shows the comparison with two previously published twister sister structures (28, 29). The two internal loops of LINE-1-rbz with 5 and 6 nucleotides, respectively, differ from the catalytic internal loops of the four-way junctional twister sister ribozyme only by one base in each loop (Figure 3A) (28). Thus, it is possible to use the structure of twister sister ribozyme to build a homology model.
Homology modelling of the lantern ribozymes. To obtain the structure model of lantern ribozymes, we used template-based homology modelling with the twister sister ribozyme structure as the template. Figure 3B shows the homology modelled structure of two-stranded LINE-1-core (internal loops plus stems) built from the more identical four-way junctional twister sister ribozyme (PDB ID: 5Y87). The structure of the four-way junctional twister sister ribozyme revealed the internal loops L1 as the catalytic region involving a guanine–scissile phosphate interaction (G5–C62-A63), continuous stacking interactions, additional pairings and hydrated divalent Mg2+ ions. Figure 3B, 3C and Supplementary Figure S7 show that the LINE-1-core model can be built as the twister sister ribozyme in the internal loops L1, with A11 at the cleavage site directed inwards. As shown in Figures 3C and Supplementary Figure S8, stem P1 of LINE-1-rbz is extended by the Watson-Crick U9A39 as a part of the G7(U9A39) base triple and Watson-Crick C8G40. Meanwhile, stem P2 of LINE-1-rbz is extended through trans non-canonical A12G36 and trans sugar edge-Hoogsteen A11C37. In addition to the triple base interaction in LINE-1-rbz, G7 is also H-bonded to both non-bridging O atoms of the phosphate connecting C37 and U38 (Figure 4A). This, along with the H-bond interaction between proR O atom of the phosphate and A39 N6, generates a stable square array of H-bonds.
We also modelled the structure of OR4K15-core in the same way as LINE-1-core. As shown in Supplementary Figure S9A and S9B, the model structure of L1 in OR4K15-core is similar to L1 in the LINE-1-core model, with stems P1 and P2 extended by additional base-pairing interactions in the tertiary structure of OR4K15-core. The base triple interaction of G37(A9U39) also exists in OR4K15-core, with the difference that G37 is H-bonded to A9 N6, rather than U39 (Supplementary Figure S9C). Thus, the square array of H-bonds could not be observed as there was no direct interaction between A9 and the phosphate connecting C7 and C8. This modelling result suggests that the bond between A9 (A39 in LINE-1-core) and the phosphate may not be important, consistent with the fact that the replacement of AU pair by a non-canonical UU pair did not disrupt cleavage activity (RA’ = 0.8 for A39U in LINE-1-rbz).
Like the two twister sister ribozyme structures (28, 29), the nucleotide U38 in the LINE-1-core model (or C8 in the OR4K15-core model) is extruded from the helix (Figure 3C, Supplementary Figure S9A). The corresponding nucleotides, A8 in three-way junctional twister sister ribozyme (29) and A7 in three-way junctional twister sister ribozyme (28) are involved in the tertiary contact with the stem loop SL4 (Figure 3A). Lacking the extra stem loops in twister sister ribozyme, the key contacts for U38 of LINE-1-core (C8 in OR4K15-core) are the H-bond interaction with A11 N6 (A41 N6 in OR4K15-core), and the interactions between the phosphate and G7 (G37 in OR4K15-core). No involvement of the nucleobase part (U38 in LINE-1-core, C8 in OR4K15-core) in the modelled structures is compatible to the mismatches U38C and U38A found in the comparison, as changes of the nucleobase part did not have a big influence on the H-bonded interactions involving U38 (C8 in OR4K15-core) (Figure 3A). Thus, it is not entirely clear whether the model structure of lantern ribozymes built from twister sister will remain stable in the absence of any tertiary interactions with the stem loop SL2 or it will deviate significantly from the twister sister.
Differences in mutational responses in the L1s between lantern and the twister sister ribozymes. To address the above question, we further investigated the mutational effects on internal loop L1s. Most nucleotides in the L1 region of LINE-1-rbz are relatively conserved, as single mutations showed RA’ < 0.2 according to deep mutational scanning results (Figure 2A). The conservation is consistent with the stacking and hydrogen-bonding interactions in the catalytic region. By comparison, mutations of C62 (C54 in three-way junctional twister sister ribozyme) at the cleavage site did not make a major change on the cleavage activity in previous studies (28, 29). This can be compared to the fact that single mutations of the corresponding nucleotide in LINE-1-rbz (C10) had relative activities as low as 0.07 in the deep mutational scanning result. To further confirm the above result, we performed cleavage assays on several mutations at the cleavage site of LINE-1-core. As shown in Figure 4C, mutations on C10 showed either partial (C10U, C10G) or complete loss (C10A) in cleavage activity. While the corresponding mutations in the four-way junctional twister sister ribozyme showed pronounced (C62U, C62A) or somewhat reduced (C62G) cleavage activity (28, 29). More interestingly, A11U in LINE-1-core showed partial cleavage activity, whereas the corresponding mutation A63U in the four-way junctional twister sister ribozyme showed complete loss of cleavage activity (28, 29). These different mutation effects indicate that LINE-1-core may not adopt exactly the same structure and catalytic mechanism as twister sister ribozyme. In other words, lantern ribozymes may not be simply a minimal version of twister sister ribozymes.
We further examined the interactions around the cleavage site C10-A11 in LINE-1-rbz (C40-A41 in OR4K15-core). The corresponding nucleotides in twister sister ribozymes (C62-A63 in four-way junctional twister sister ribozyme, C54-A55 in three-way junctional twister sister ribozyme) adopt either a splayed-apart or a base-stacked conformation, with the scissile phosphate H-bonded to the G5 (four-way junctional twister sister ribozyme) or C7 (three-way junctional twister sister ribozyme). In our modelled structures (Figure 4B, Supplementary Figure S9D), the bases at the cleavage site C10-A11 in LINE-1-core (C40-A41 in OR4K15-core) are splayed away, with A11 (A41 in OR4K15-core) directed inwards and C10 (C40 in OR4K15-core) directed outwards. In addition, H-bonded interactions between C10 (C40 in OR4K15-core) and the two nucleotides 3’ of C10 could be found (Figure 4B, Supplementary Figure S9D). These structure differences may partially explain why mutations of C10 were detrimental to the catalysis in lantern ribozymes. Moreover, we did not observe interactions between C10 (C40 in OR4K15-core) and G36/C37 (G6/C7 in OR4K15-core) although they were important to anchor the cytosine in twister sister ribozymes. This may be due to replacement of the non-canonical pair G5U64 (G6U57 in three-way junctional twister sister ribozyme) by A12G36 (G6A42 in OR4K15-core) in lantern ribozymes. Thus, the stability of the current model structure based on the template from twister sister is uncertain.
Activity confirmation and biochemical analysis of LINE-1-core and OR4K15-core. To confirm the cleavage activity of the core region (without the stem-loop linker), we obtained the segments (54-71 for LINE-1, 101-116 for OR4K15) as the substrate strands with the cleavage site and the segments (83-99 for LINE-1, 70-84 for OR4K15) as the enzyme strands (Figure 5A), as they made up the LINE-1-core and OR4K15-core. When the substrate strand was mixed with the enzyme strand, most of the substrate strand was cleaved in the presence of Mg2+ (Figure 5B). When Mg2+ or the enzyme strand was not included in the reaction, no cleavage could be observed even after 24h’s incubation (Supplementary Figure S10). This result further confirms that the RNA cleavage in lantern ribozymes is a catalytic reaction, not a spontaneous one. Moreover, the result confirmed that the stem loop SL2 regions in LINE-1-rbz and OR4K15-rbz (Figure 2) did not participate in the catalytic activity.
The core constructs of the two lantern ribozymes contain 35 and 31 nucleotides, respectively. They are the shortest and the second shortest self-cleaving ribozymes reported so far. In all previously reported self-cleaving ribozyme families, the self-cleaving reaction occurs through an internal phosphoester transfer mechanism, in which the 2′-hydroxyl group of the -1 (relative to the cleavage site) nucleotide attacks the adjacent phosphorus resulting in the release of the 5′ oxygen of +1 (relative to the cleavage site) nucleotide (2, 12, 30–32). We confirmed that both LINE-1-core and OR4K15-core employed the same cleavage mechanism because the analog RNAs that lacked the 2′ oxygen atom in the -1 nucleotide (dC10 for LINE-1-core, dC40 for OR4K15-core) were unable to cleave (Figure 5B). We also investigated whether lantern ribozymes can cleave when Mg2+ was replaced by Co(NH3)63+ in the reaction.Co(NH3)63+ is isosteric with Mg(H2O)62+, but the divalent cations cannot directly participate in catalysis as the amino ligands cannot readily dissociate. In Figure 5C, a total loss of cleavage activity in Co(NH3)63+ in the absence of Mg2+ indicates that the lantern ribozymes require the binding of divalent cations not only for structure folding, but also for catalysis.
As shown in Figure 5D, we further examined the dependence of the metal ions on lantern ribozymes’ cleavage activity. At a concentration of 1mM, ribozyme cleavage can be observed with Mg2+, Mn2+, Co2+ and Zn2+ but little or none with Ca2+, Cu2+, Ba2+, Ni2+, Na+, K+, Li+, Cs+ or Rb+, indicating that direct participation of specific hydrated divalent metal ions is required for self-cleavage. More interestingly, we found that the lantern ribozymes have an equivalent or even higher cleavage ratio with Mn2+ than Mg2+. For LINE-1-core, the cleaved fractions were ~57% for Mg2+, ~74% for Mn2+. For OR4K15-core, the cleaved fractions were ~9% for Mg2+, ~79% for Mn2+. We further characterized the cleavage rates of these two ribozymes under different concentrations of Mg2+ and Mn2+ by using a fluorescence resonance energy transfer (FRET)-based method, as shown in Supplementary Figure S11 and Figure S12.We used the first-order rate constant (kobs) to represent the efficiency of the cleavage reaction. The kobs of LINE-1-core was ~0.04 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5. Cleavage activities of the two lantern ribozymes were highly dependent on the concentration of divalent metal ion. The steep increase in rate constants of two lantern ribozymes plateaued at Mg2+ concentrations above 100mM, while for Mn2+ it plateaued at concentrations of 10mM. Consistent with the PAGE result, LINE-1-core showed higher cleavage rates than OR4K15-core when they were incubated with Mg2+. However, the cleavage rates of OR4K15-core were only lower than LINE-1-core when the concentrations of Mn2+ were less than 20mM, but higher, otherwise. Thus, there may exist important difference between OR4K15-core and LINE-1-core to explain this different bias toward ions. A detailed explanation for this difference may require high-resolution structure determination of the lantern ribozymes.