Development of Mobius Assembly for Plant Systems (MAPS)
MAPS is an extension of our molecular cloning framework Mobius Assembly26 to enable transformation and expression of transgenes in plant systems. MAPS has a universal acceptor vector (mUAV) at Level 0 to house a standard part in the Phytobrick format. Level 0 parts are combined at Level 1, and up to four Level 1 constructs are then combined to make Level 2 constructs. MAPS follows a linear cloning strategy until Level 2 and then iterates between two cloning levels (Level 1 and Level 2) for quadruple augmentation of cloning units each time (Fig. 1a). Using the rare cutter AarI (PaqCI), as opposed to frequently used restriction enzymes that recognize shorter sequences (e.g., BsmBI or BpiI), reduces the need for removing internal restriction sites (i.e., domestication).
Initially, we developed pGreen-based vectors but encountered issues with large constructs consistent with reported instability issues34,35. To address this, we created a new small plant binary vector called pMAP, based on the pLX architecture (Fig. 1b). This vector is suitable for transient expression in protoplasts, cell culture, tissues/organs, and whole-plant stable transformation. We devised a new origin of replication by fusing pWKS1 and pUC19 Ori. The pWKS1 Ori, derived from Paracoccus pantotrophus DSM 1107236, is functional in Rhizobium but not in E. coli, so we fused it with the minimal stable pUC Ori from pUC19. pMAPS also has two Left Border (LB) sequences instead of one. Rhizobium gene transfer is from the Right Border (RB) to the LB, and having two LBs suppresses backbone transfer to the plant genome37. Another feature of pMAPS is that terminators flank LB and RB sequences to isolate transgene activity from the plasmid backbone. Colourific markers, antibiotics selections, and restriction enzymes used for Mobius Assembly at different levels are summarized in Fig. 1c.
MAPS vector toolkit consists of a core set of pMAP cloning/destination vectors (Level 1 and Level 2 Acceptor Vectors, four variations Α-Δ for each level), which have a fusion origin of replication to replicate in E. coli and Rhizobium. The mUAV and the seven Auxiliary plasmids are also included, as described in the original Mobius Assembly kit26. The MAPS toolkit also contains a selection of plant promoters, terminators, antibiotic resistance genes, and visible reporter genes (bioluminescence and fluorescent proteins). All MAPS plasmids are listed in Supplementary Fig. S1; Supplemental Tables S1, S2 and available through AddGene (https://www.addgene.org/browse/article/28211394/).
Reflecting on user feedback, two specific improvements were made to improve the Mobius Assembly vectors. A few users of the original Mobius Assembly kit indicated that the chromoprotein selection had been lost in their clones. Upon investigation, we observed independent events of transposon insertion in the promoter of the chromoprotein genes, which we hypothesized is a response to the stress imposed by chromoprotein production (Supplementary Fig. S2). To solve the problem, we replaced the marker chromogenic protein (spisPINK) with a red fluorescent protein (mScarlet-I). (Supplementary Fig. S2). We also noticed some Mobius Assembly clones showed growth retardation in the selection media and identified the instability was caused by plasmid dimerization. Therefore, we introduced a 240-bp cer domain, which recognizes dimers and triggers recombination help keep the plasmids in the monomeric state (Supplementary Fig. S3)38.
Combinatorial DNA libraries are crucial for part characterization, as well as in applications such as biosynthetic pathway optimization39, but their manual construction is time-consuming and resource-intensive. To aid combinatorial library construction, we developed the 'MethylAble' feature in Mobius Assembly, allowing standard part variants (Level 0) to be introduced to specific sites in single or multi-gene constructs (Supplementary Fig. S4). MethylAble utilizes the DNA methylation sensitivity of BsaI to mask its recognition sites by cytosine methylation during Level 1 cloning. We designed an amilCP expression cassette with divergent and convergent BsaI recognition sites, where CpG methylation blocks BsaI digestion only at divergent sites, allowing insertion of Level 0 parts into premade Level 1 constructs. Correct constructs show a purple color from amilCP until the Level 0 parts replace the cassette. As a proof of concept, the MethylAble protocol was used to build the library of the three inducible promoters (see below), each of which was combined with the 14 terminator coparts, making 42 constructs in total (Supplemental Table S3). MethylAble presents a novel strategy to create construct libraries and can be implemented in all Golden Gate frameworks in which BsaI enzyme is used, not only in Mobius Assembly.
Designing, building, and testing MAPS promoter/terminator standard parts
To select new ‘constitutive’ promoter and terminator parts, we chose ubiquitously expressed genes that are likely to have strong expression in different tissue types (Supplementary Table S2). The promoters and terminators were characterized with a transient gene expression assay based on Arabidopsis mesophyll protoplasts and PEG transformation. We optimized the parameters throughout the protocol based on40 to improve the transformation efficiency and reproducibility. We were able to reach up to 70% transformation efficiency consistently (Supplementary Fig. S5), which was high enough to adapt to plate reader measurements in a 96-well format.
To evaluate the promoter/terminator activity levels, we used a dual luciferase system with highly sensitive nano luciferase (NLuc) as the reporter and firefly luciferase (FLuc) for normalization41. To account for possible batch-to-batch differences in overall protoplast transformation rates/efficiency, we included FLuc gene (UBQ10-FLuc:UBQ5) in each construct and calculated the NLuc/FLuc ratio as RLU (Relative Light Unit).
For the promoter testing, seventeen promoters drove NLuc expression, with termination by either the NDUFA8 or HSP terminator (Promoter:NLuc:HSP/NDUFA8). The UBQ10 promoter exhibited by far the highest expression activity among the promoters, followed by MAS (Fig. 2a,b). The HSP terminator increased gene expression for all promoters except TUB9. Two of the newly isolated promoters, UBQ11 and UBQ4, matched or exceeded the activity of the 35S and OCS promoters. Furthermore, the newly isolated promoters ACT7, TUB2, TUB9, APT1, ACT2 and LEC2 outperformed the commonly used the NOS promoter. The FAD2 and NDUFA8 promoters had the lowest expression.
For the terminator evaluation, the NLuc expression was driven by a strong (UBQ10) or weak (NDUFA8) promoter and one of the 14 terminators (UBQ10/NDUFA8:NLuc:Terminators). The luciferase expression varied by 5.3–6.3 fold for the UBQ10 and NDUFA8 promoters, respectively, depending on the terminators they were paired with (Fig. 2d,e). For the strong UBQ10 promoter, the FAD2 terminator had the highest activity (547.9 RLU), while the NOS terminator had the lowest (103.9 RLU). For the weak NDUFA8 promoter, the HSP terminator led to the highest expression (0.327 RLU) and APT1 to the lowest (0.052 RLU).Since it was surprising to see a wide range of gene expression levels led by different terminators under the same promoter, we extended terminator characterization with the three chemically inducible systems popularly used in plant sciences: dexamethasone (Dex), estradiol, and ethanol inducible systems42–44. They are all based on two-component mechanisms involving at least two transcriptional units. The exogenously applied chemical activates the transcription factor that further transactivates the downstream target genes. The target genes are activated by the specific promoters that contain binding sites for the transactivator (pOp6, lexA, and alcSynth), and hence promoters cannot be changed, while terminators can be.
Interestingly, the different terminators resulted in more uniform NLuc expression, with a 2.2- and 2.9-fold range in expression levels for the Dex and estradiol inducible promoters (pOp6-35S and lexA-35S), respectively (Fig. 2h,i). For the Dex system, RLU was spread between 37.9 and 111.6 in combination with the UBQ5 and 35S terminators. For the estradiol promoter (lexA), the E9-RbcS and 35S terminators had RLU counts of 11.6 and 25.1, respectively. In contrast, the ethanol inducible promoter (alcSynth), showed a much wider range, with the HSP terminator driving sevenfold higher expression than the LEC2 terminator (Fig. 2j). Both the Dex and estradiol systems showed a basal expression of around 10 RLU; with Dex inducing a ~ 11-fold and estradiol a ~ 3-fold activation. The ethanol system had high basal expression (i.e., it was leaky), leading to only 30% increase in luminescence upon chemical induction.
Promoter-coding sequence-terminator interactions in gene regulation
Reflecting on the observed promoter-terminator interactions, we investigated whether changing the coding sequence also influences promoter/terminator activity. Fluorescent proteins are an alternative visible reporting system to luciferases. However, using fluorescent proteins in a plant chassis can be challenging as plants emit red and green-range autofluorescence from their chloroplasts, and stress-induced blue-range autofluorescence from their cytoplasm45. Over the years, fluorescent proteins with improved brightness and expression dynamics have been developed, but their quantitative efficacy was not comprehensively characterized in protoplasts. Therefore, we screened fluorescent proteins from four spectrums (green, red, yellow, and blue) to examine their compatibility with a protoplast system using a plate reader or microscopy (Supplementary Table S4). Generally, expression of fluorescent proteins was detected 6 hours after transformation and plateaued around 15 hours (Supplementary Fig. S6). Informed by this screening, we selected the brightest two fluorescent proteins from different spectra: sfGFP and mScarlet-I. sfGFP was the main reporter, while mScarlet-I was used as the normalizing gene (similar to FLuc above) and expressed using the UBQ10 promoter and UBQ5 terminator.
Initially promoters were evaluated with the HSP terminator for sfGFP expression. Unlike luciferase reporters, not all promoters drove strong enough expression that could be detected with a plate reader (Fig. 2c). The highest expression was driven by the UBQ10 and MAS promoters, followed by the 35S promoter. Readouts from the rest of the promoters could not be distinguished from the background autofluorescence. Therefore, the UBQ10 and MAS promoters were analyzed in combination with all the MAPS terminators (Fig. 2f,g). The HSP terminator was the strongest with both promoters (9.0 and 12.1 RFU for UBQ10 and MAS, respectively). The FAD2 terminator was on the high-expression side for both promoters, along with the UBQ10 promoter-35S terminator and MAS promoter-NOS terminator combinations. The Rbsc2b terminator resulted in the lowest expression with both promoters. Overall, a 3-fold expression difference was observed by using different terminators with the UBQ10 promoter, and the difference was 6-fold with the MAS promoter.
Our part characterization revealed that although promoter choice could dominantly determine gene expression strength in some cases (e.g., pOp6 and NDUFA8) (Fig. 3a), TU activation by a promoter, terminator or coding sequence is not independent or additive, but combinatorial and even synergistic (Fig. 3b). A clear example of such all-part interactions is seen with the NOS terminator, which in combination with the UBQ10 resulted in weak, NDUFA8 strong, lexA medium, pOp6 weak and alcSynth mid-level expression in the luciferase system. When we switched to a fluorescence-based reporter, the combination of the NOS terminator with the UBQ10 and MAS promoters drove medium and strong expression, respectively. Among the 14 terminators characterized, only four exhibited stable behaviours with both coding sequences: FAD2 and HSP were consistently on the strong side, while LEC2 and RbcS2b tended towards the weak side.
Dissecting the mechanism of the combinatorial gene regulation
Next, we sought for insights into how the promoters, coding sequences, and terminators interacted in gene regulation. The interactions may regulate gene expression by changing the transcript abundance or with post-transcriptional modifications affecting translation. To distinguish these two possibilities, we performed qPCR to examine how the transcript (mRNA) level correlates with the reporter readout. Because transforming enough protoplasts for RNA extraction is laborious, we selected key constructs to test. The HSP and FAD2 terminators were chosen for consistently strong expression regardless of different promoters and reporter protein sequences. For consistently weak expression, the NOS and Rbcs2b terminators were selected for NLuc, whereas the Rbcs2b, APT1 and E9-RbcS terminators were chosen for sfGFP. The NOS terminator was chosen because its relative strength varies the most depending on the partnering promoters or coding sequences. Similarly, the 35S terminator was selected for its variability in strength, although it tends to be on the strong side.
The qPCR assay revealed that the consistently strong FAD2 and HSP terminators have significantly higher mRNA levels; similarly, the weak E9-Rbcs2b terminator had significantly lower mRNA levels than the other terminators (Fig. 4a). Generally, NLuc displayed a linear correlation between the measured reporter expression levels and mRNA levels (Fig. 4b), while sfGFP exhibited a looser correlation (Fig. 4c). Surprisingly, the 35S terminator yielded higher reporter expression in both reporter combinations compared to its mRNA levels (Fig. 4b,c). Taken together, the transcript level analysis suggests that the combinatorial gene regulation is partially explained by transcript abundance and likely to involve post-transcriptional (post-mRNA formation) processes in some constructs. The 35S terminator for both NLuc and sfGFP suggested a post-transcriptional enhancer effect, while NLuc:NOS, together with sfGFP:RbcS2b and sfGFP:APT1 constructs indicated post-transcriptional repression.
Transcriptional regulation is mediated by specific DNA sequences that recruit functional proteins to activate or repress processes from transcriptional initiation to stable mRNA formation. Surprised by how effective terminators are in controlling gene expression, we investigated the nucleotide sequence features possibly influencing terminator strength, such as the GC content and presence of likely functional sequences (e.g., the canonical poly-A signal AAUAAA and UGUA motifs) (Fig. 5, Supplementary Fig. S7).
A GC content of approximately 30% has been shown to be optimal for synthetic terminator functions, and the average GC content in natural Arabidopsis terminators is about 32.5 46.The GC content in the MAPS terminators had no clear correlations between the strength and GC content of the terminators; consistently or predominantly strong terminators [HSP (29.1%), FAD2 (36.5%), and 35S (32%)] had a similar range of GC content as weak or variable terminators [RbcS2b (30%), E9-RbcS (32.8%), APT1 (35%), and NOS (38.7%)] (Fig. 5a). One apparent feature in the strong terminators is that the GC content is apparently lower at a 50 bp window, which is located around their dominant Poly-A signal, and then the GC content goes back up. The presence of the canonical AAUAAA poly-A signal tends to enhance terminator strength46.
We also searched for sequence motifs that may link to the consistently high strength of HSP and FAD2 terminators. The presence of the UGUA motif around 30–40 bp upstream of the RNA cleavage site is thought to enhance the cleavage and thus increase terminator strength in the 35S terminator 46. The HSP terminator has two UGUA motifs at locations 119 bp and 206 bp, while FAD2 lacks the motif. Interestingly, the XSTREME motif discovery tool identified four motifs that putatively have functional roles a piriori (Fig. 5b,c,e). One of them is the 15-bp motif (CAAAUGUUUUGUGUC) found around 145 bp in both the HSP and FAD2 terminators, which correspond to the transcript cleavage site. Three other motifs - CUCAUUAUGUUA, UUGUUGUGUUAUGAC, and UUUUUCUAAUAUUA - were found at similar locations in both terminators but around 10–20 bp apart. Only the CUCAUUAUGUUA motif is present in the UBQ5 terminator at 167 bp (Fig. 5b). The effects of the UUUUU motif are complicated; it decreases terminator strength in maize protoplasts, especially if they surround a UGUA motif, while U-rich sequences increase terminator strength in tobacco leaves46. Many of these AT/U-rich motifs reside inside the 50 bp low GC domains described above.
We examined similar potentially functional features in the ‘outlier terminators’ – the terminators performing stronger or weaker than expected for their transcript levels (i.e., likely post-transcriptionally regulated) (Fig. 4a,b). The GC content of outlier terminators is moderate and varies between 30.5–35.0% (Fig. 5a). RbcS2b is the only outlier terminator that does not possess the canonical poly-A signal. The 35S terminator has three repeats of the UGUA motif (x3 starting from 96 bp separated by UU). The UUUUU motif is present in the NOS, RbcS2b and APT1 terminators (Supplementary Fig. S6). Therefore, no distinct sequence signatures were identified to differentiate the terminators that are primarily regulated transcriptionally from the outliers.
We then wanted to experimentally probe how the above-identified sequence features influence the terminator activity by generating a deletion series of the HSP and FAD2 terminators. Five terminator variations were made to sequentially delete possible regulatory elements, such as Poly-A signals, putative destabilization signals, and Musashi binding elements (Fig. 5c-f). Statistical analysis of the results showed only a significant difference between the full-length terminator (240 bp) and FAD2 Sequence 5, which is the shortest variation (70 bp) with no Poly-A site, as well as the deletion series Sequence 1 (200 bp). The rest of the HSP series, as well as the FAD2 series, showed no statistically significant difference in reporter expression (Fig. 5c-f). This result suggests short sequences (30–50 bp) at the 5’ end of terminators might determine the gene expression strength, and terminator functional elements remain unresolved, especially in plant contexts.
We also investigated a sequence feature likely enhancing promoter activity. UBQ10 is by far the strongest promoter in the MAPS toolkit (Fig. 2,3), and its TSS structure is predicted to be unstructured and single-stranded (Fig. 6). In the survey of the Arabidopsis genome, translation efficiency was found higher for transcripts with unstructured 5’UTR47. To test if the strength of the UBQ10 promoter is dependent on the open loop structure of its 3’ UTR, which consists of mostly TSS, three mutated versions were created. The first version, Mutated Control (MUTC), involves point mutations that preserve the predicted loop structure; therefore, it is expected to behave similar to the original UBQ10 TSS sequence. Additionally, two other versions, a tight stem-loop structure Mutated 1 (MUT1) and a slightly more branched but more loosely stemmed Mutated 2 (MUT2), were designed with point mutations (Fig. 6a,b). TSSPlant and Softberry software were used to confirm the mutations do not interfere with the identified motifs or introduce new motifs compared to the WT.
When the TSS variants were used to drive sfGFP with five different terminators (HSP, FAD2, 35S, NOS, and RbcS2b), the results revealed no statistically significant difference in expression between WT and MUTC, as expected (Fig. 6c). However, MUT2 exhibited a reduction in expression by approximately 20% for the HSP and FAD2 terminators. Additionally, MUT1 resulted in a 60% decrease in expression with 35S, while resulting in a 30% increase with the NOS terminator. There was no difference between the three mutated versions and the WT for the RbcS2b samples, where the WT expression is already very weak (Fig. 6c). With the strong terminators, reporter expression was reduced in MUT1 and MUT2 compared to WT and MUTC variants, suggesting that conversion from open loop to stem-loop decreases gene expression.
To gain insights into how promoters, coding sequences, and terminators interact post-transcriptionally, we studied RNA folding. RNA is single-stranded and extensively forms secondary and tertiary structures via hydrogen bonds bringing together (nearly) complementary sequences. Such 2D and 3D structures (e.g., G-quadruplex) can strongly influence translation and protein expression48,49. The combination-dependent regulatory function among the three TU parts may be explained by direct physical interactions through RNA folding.
Using the RNAFold software50, the folding energy of the whole mRNA sequence was calculated for the transcript species we selected for the qPCR analysis above. The transcription start site (TSS) was identified based on the TSSPlant software51, and the downstream promoter sequence was incorporated, along with the protein-coding and terminator sequences. The lower the holding energy, the tighter the transcript folds, and the less likely for translation to occur. No direct correlation was found between the predicted transcript folding energy and mRNA expression levels or between the folding energies and reporter gene expression (Fig. 4d,e). We therefore proceeded to examine the RNA folding structure and local interactions among the nucleotide sequences.
We then visually examined the RNA folding structures in 2D. RNA secondary structure formation was predicted using the ViennaFold2.0 software50, in which the whole transcript (identified as described above) was used as the input. Within the transcript, the strong UBQ10 and MAS promoters are likely to have little interaction with the other parts: no interaction with the HSP and RbcS2b terminators and minimal to medium interactions with the FAD2 terminator were predicted by the software (Fig. 7). On the contrary, NOS, a highly variable terminator, may have strong interactions with the other parts (UBQ10:NLuc) when it drives weak reporter expression. Interestingly, there was no apparent correlation between inducible promoter-terminator structures and reporter output, except for pOp6:NLuc:NOS (Supplementary Fig. S8). In strong and consistent terminators like HSP, we tended to find loops bigger than 20 bp (Fig. 7). When the variable strength-terminator NOS is paired with strong promoters (UBQ10 and MAS), it also may form a large loop structure, though not when combined with other promoters (e.g., NDUFA8) (Fig. 6, Supplementary Fig. S9). RNA sequence-mediated cross-part interactions among the promoters, coding sequences, and terminators could explain the combinatorial gene regulation.