1. Designing and building a versatile toolkit for metabolic engineering
1.1 Modular architecture of the toolkit. To unlock the full potential of CRISPR/Cas9 technology for metabolic engineering we developed a toolkit with a new structure, based on 7 individual modules which expands on previously well-known GG assembly systems23,37,38,43. Each module comprises a specific type of molecular operation modifying the final vector. Such operations are either an in vitro one-pot GG reaction or in vivo recombination in E. coli. Each step requires two days to prepare constructs for the next modification step or for yeast transformation (Section 1.1 of Supplementary Manual). Some operations, e.g. GG assembly and marker excision, can be combined in a single step. The modules have been designed to be used in different orders and combinations that allows the maximum degree of freedom, while streamlining the number of steps required to obtain a desired construct (Fig. 1).
The Lvl0 (level 0) Module is formed by a collection of individual genetic elements, which can be easily expanded. In the Exp (Expression) Module these parts can be combined in a variety of integrative overexpression vectors with the desired combinations of promoters, genes, terminators, HAs, and selectable markers. The Pro (Promoter) Module enables the characterisation of new promoters in yeast and it is fully compatible with the Exp Module. The Del (Deletion) Module allows the single-step assembly of marker-based constructions for gene disruptions. The Int (Integration) Module implements the fast redirection of pre-assembled overexpression constructions to alternative genome loci via HAs exchange. The MEx (Marker Excision) Module was designed to allow the generation of a marker-free version of any integration construct using a specific assembly host EcoCre. The separate Cas Module supports the simplified recombineering-based assembly of Cas9-helpers using the assembly host EcoRed. The obtained helpers are able to cut the yeast genome in any defined position enabling marker-free integration.
To verify the functionality and compatibility of these new modules we have arranged an initial example toolkit for metabolic engineering of Y. lipolytica. The basic set comprises 147 plasmids, two E. coli and three Y. lipolytica strains (Tables S8 and S9 of Supplementary Manual).
1.2 (Lvl0 Module and Exp Module) Assembly of basic parts and overexpression constructs. These two modules are the central part of the toolkit and consist of a modified three-level system adapted from the S. cerevisiae MoClo toolkit23. To permit the interchangeability of parts between labs, restriction sites and overhangs were designed to be compatible with previous Y. lipolytica GG toolkits37,38,43. The basic set of Lvl0 plasmids contains 40 different parts, including promoters, genes, terminators, selectable markers, HAs, as well as vector backbones with E. coli fluorescent reporters.
The Exp Module is the only module that may include multiple assembly steps. Depending on the strategy employed, an empty vector must be chosen with the preferred marker, integration locus (i.e. HAs), level and sublevel as explained below. Each empty Lvl1 can be used to assemble a single TU using Lvl0 plasmids with the promoter, gene, and terminator of interest. Each assembled Lvl1 cassette can be used to either overexpress a single gene or to assemble a Lvl2 plasmid containing multiple TUs. The sublevel of the Lvl1 plasmids (1, 2 and 3) determines the position of the TU in the subsequent Lvl2 assembly. The sublevel of the Lvl2 plasmid (2 and 3) indicates the number of TUs that may be assembled on it. The provided toolkit has the capacity to assemble of up to three TUs together, but this number may be further increased in the future if needed37.
Each Lvl1 or Lvl2 plasmid contains specific 500-bp HAs allowing integration of the overexpression cassettes in a defined genome locus. Initially, 50 intergenic regions in Y. lipolytica W29 were selected, with 16 functionally characterised and used for assembly of a further 80 empty Lvl1 and Lvl2 vectors of different sublevels (Sections 1.3 and 2.1 of Supplementary Manual). The loci on each chromosome have been numbered and named accordingly as IntA1, IntA2, etc. An extra set of vectors has been supplied with Zeta sequences that can be used for random integration44. In order to keep a standard nomenclature, we designed a naming system that abbreviates all the required information for each vector (Fig. S7 and S9 of Supplementary Manual). For example, pE8US1.1 is an empty Lvl1.1 vector with HAs for integration at locus 8 on the chromosome E, containing a URA3 and spectinomycin resistance marker.
1.3 (Pro Module) Assembly of new promoters. The Pro Module is formed by the unique empty vector pProUA-mScarlet. This vector contains hrGFP gene under the regulation of the inserted promoter, which permits fluorescence-based activity assays. Consequently, any promoter placed on this vector becomes immediately available for both assembly of TUs and integration in Y. lipolytica for functional characterization (Section 2.2). Using the promoter of ALK1 in Y. lipolytica as an example, the observed assembly efficiency of pProUA-ALK1 plasmid was 100% (18/18) (Table S17 of Supplementary Experiments).
1.4 (Del Module) Assembly of gene disruption constructs. The Del Module is based on a single vector designed for the one-pot assembly of pDel-series plasmids that allow gene disruptions. The empty vector pDelUK-RG contains the URA3 cassette flanked by both RFP (mScarlet) and sfGFP genes designed for bacterial expression. During the GG assembly both reporters are substituted by PCR-amplified HAs flanking the gene that need to be removed. The correct clones can be visually screened by the absence of both fluorophores (Fig. S17 of Supplementary Manual). As an example, pDelUK-AAT1 vector was assembled for the disruption of the AAT1 gene. Among the GFP/RFP-negative colonies, 50% (9/18) shown the correct structure by restriction analysis (Supplementary Experiments).
1.5 (Int Module) Exchange of HAs between the constructs. The toolkit was designed to allow switching between different HAs with already-assembled TUs. This feature was achieved by introducing two Type IIS AarI sites separating the vector backbone with HAs from the central section harbouring TUs. This allows reversible GG part exchange between empty and assembled overexpression vectors, which can be selected by altered antibiotic resistance and dropout of sfGFP (Fig. S19 of Supplementary Manual). To demonstrate the efficiency of HA exchange we transferred three TUs from pZUA2.3-HPD1-ARO4-ARO7 (with Zeta integration flanks) to the empty vector pE8US1.1 for integration in IntE8 locus. 100% (8/8) of the transformants selected by spectinomycin resistance and GFP-negative phenotype comprised the plasmid pE8US-HPD1-ARO4-ARO7.
In order to make the HAs from the Del Module available for the integration of overexpression constructs, the URA3 marker on pDel-series was flanked by two AarI sites, which enable drop-out of this marker. Performing a GG reaction enables the irreversible transfer of TUs from Lvl1 and Lvl2 vectors into assembled pDel-series bearing HAs of a target gene (Fig. S20 of Supplementary Manual). Using constructions pE8US-HPD1-ARO4-ARO7 and pDelUK-AAT1, we assembled the plasmid pDelUK-AAT1::HPD1-ARO4-ARO7 with efficiency 58.3% (7/12).
1.6 (MEx Module) Excision of the yeast selectable marker. The MEx module permits a quickly switch from marker-free to marker-based integration without additional GG steps. This is achieved by the transformation of any required GG reaction mixture (i.e. from Exp, Del or Int Modules) into two different E. coli strains. One of these strains is a regular transformation host (e.g. DH5alpha), while the other is the EcoCre strain, engineered to overexpress Cre recombinase (Section 1.6 of Supplementary Manual). Since the marker sequence is flanked by Lox66 and Lox71 sites45 this part is instantly eliminated in EcoCre cells. Assembling both versions of plasmids in a single step allow us to immediately reverse to a marker-based approach in cases where marker-free integration is not efficient. The URA3 excision using the EcoCre strain with the plasmids pC2US1.1-hrGFP, pD12US1.1-hrGFP, and pE8US1.1-hrGFP showed an efficiency of 100% (24/24) (Supplementary Experiments).
1.7 (Cas Module) Re-encoding of gRNA on the Cas9-helper plasmid. The Cas9-helper is an episomal vector that provides nourseothricin (Nat) resistance and expresses both the Cas9 and the guide (gRNA). This module enables the assembly of a new gRNA using a single 90-base oligonucleotide encoding the 20 bases required for the sequence recognition. This oligonucleotide is co-transformed with the empty helper pCasNA-RK in the recombineering E. coli strain EcoRed. Since pCasNA-RK contains a counter-selectable cassette (rpsL-kanR), desired recombinants can be isolated by streptomycin resistance. The assembly efficiency of Cas9-helpers was tested using three randomly generated 20-base recognition sequences, which resulted in the plasmids pCasNA-Rdm1, pCasNA-Rdm2 and pCasNA-Rdm3 (Table S17 of Supplementary Experiments). The percentage of correct clones was 79.4% (27/34). The toolkit contains a set of pre-assembled Cas9-helpers for marker-free integration into 16 standard loci (Table S8 of Supplementary Manual).
2. Application of the toolkit for yeast engineering
2.1 Marker-free gene disruption and overexpression. Efficiency of marker-free knockouts was assessed using ARO8 and ARO9 genes. A Y. lipolytica Ku70-mutant strain was co-transformed with different combinations of Cas9-helper and marker-free gene disruption cassette. Three alternative gRNA sequences were tested for each gene. The disruption efficiencies in separate experiments varied between 50% and 100%, while the average of the three was 77.8% (28/36) (Section 6.2 of Supplementary Experiments).
To check efficiency of marker-free integration of overexpression constructs the loci IntC2, IntD12, and IntE8 were selected. Accordingly, three marker-free Lvl1 plasmids harbouring the hrGFP gene under TEF1 promoter were co-transformed with corresponding Cas9-helpers in prototrophic Ku70-mutant Y. lipolytica strain. Several fast-growing colonies, selected by Nat-resistance, from each transformation experiments were verified by colony PCR and assayed for green fluorescence. The efficiency of marker-free integration varied between 44.4% and 88.9%, with an average of 64% (32/50) (Supplementary Experiments).
2.2 Promoter library characterisations. To test the new screening system, several promoter libraries were generated. As a source of promoters, we chose Y. lipolytica ribosomal genes encoding proteins of large (38 genes) and small (26 genes) subunits, as well as 29 other genes expected to be highly expressed (Sections 8.1 of Supplementary Experiments). Besides, a library of 43 hybrid promoters was designed combining the upstream regions of genes from several different yeast species, including Candida hispaniensis, Kluyveromyces lactis, Kluyveromyces marxianus, Komagataella phaffii, Ogataea polymorpha, S. cerevisiae, and Y. lipolytica (Sections 8.2 of Supplementary Experiments). All tested promoters were assembled as pPro-series plasmids and integrated in IntC2 locus of Ura– Ku70-mutant strain using Cas9-helper and Nat-selection. Transformants were first verified by uracil prototrophy and then confirmed by PCR. Strong promoters were visually screened by biomass fluorescence. The relative strength of each promoter was measured using either plate reader or flow cytometry during exponential growth phase in two different media (Fig. 2). We identified a variety of promoters with different strength that significantly expands the number of characterised promoters for Y. lipolytica. Interestingly, we identified 7 promoters stronger than TEF, 5 hybrid and 2 ribosomal promoters.
2.3 Metabolic pathway engineering. To prove the utility of our enhanced CRISPR/Cas9 methodology we decided to create an HGA-producing Y. lipolytica through rational engineering. Under alkali conditions HGA is spontaneously oxidized and self-polymerises to form pyomelanin, which is an excellent constituent of natural sunscreens and cosmetics46. In addition, HGA is the biochemical precursor of two different families of high-value molecules, plastoquinols and tocopherols47. Despite its high commercial potential, the available technologies for the production of HGA and pyomelanin rely on the biotransformation of expensive aromatic amino acids48. Aromatics are biosynthesised through the shikimate pathway and starts from two intermediates of central metabolism, phosphoenolpyruvate and erythrose-4-phosphate. Tyrosine shares with HGA the common intermediate 4-hydroxyphenylpyruvate, a direct precursor for both molecules (Fig. 3a).
First, we selected several genes encoding putative aromatic aminotransferases as targets for engineering. This activity draws away carbon flow from HGA and leads to accumulation of tyrosine and phenylalanine. Furthermore, these two amino acids are involved in the feedback inhibition of several steps of the shikimate pathway49,50. Most organisms possess two types of aromatic aminotransferases which show similarity either to the Aro8 protein of S. cerevisiae or TyrB of E. coli, which is closely related to Aat1 of Aspergillus sp51,52. In Y. lipolytica, we observed two paralogues of each type, named as ARO8, ARO9, and AAT1, AAT2, respectively. Using marker-free integration approach, we disrupted three (ARO8, ARO9, and AAT2) out of four putative aminotransferase genes in a Ku70-mutant and isolated strains with all possible combinations of these three deletions (Fig. S29 of Supplementary Experiments). However, marker-free disruption of the fourth aminotransferase gene (AAT1) in the triple mutant was not successful despite numerous attempts. Notable, the same deletion of AAT1 gene worked well in a parental strain. This result suggested that the inactivation of the fourth gene adversely affected the viability of Y. lipolytica. In order to increase the selection pressure, we switched to a marker-based construct containing the URA3 gene. In accordance with our assumptions, application of the auxotrophic marker allowed us to isolate a strain with all four gene disruptions, as well as other strains with combinations of these four deletions. The slow growth phenotype was always observed in transformation experiments combining deletions of AAT1 and AAT2 genes (Fig. S31 of Supplementary Experiments). Hence, these results suggest an epistatic interaction - also known as synthetic enhancement18 - between these two genes.
Next, we selected three overexpression targets with the potential to enhance the de novo synthesis of HGA; ARO4, ARO7 and HPD1. We chose the mutated S. cerevisiae genes ARO4K229L and ARO7G141S encoding feedback resistant enzymes which are known to enhance the metabolic fluxes through the shikimate pathway53,54. The third gene, HPD1, encodes a 4-hydroxyphenylpyruvate dioxygenase of Y. lipolytica55. All three genes were assembled with the strong TEF1 promoter in the Lvl1 vectors and combined in a Lvl2 cassette. Attempts to integrate all three TUs together using a marker-free approach did not result in the isolation of desired transformants regardless of the recipient strain used, e.g. with or without aminotransferase gene deletions. Transformation with all three genes separately using a marker-free approach let us overexpress ARO7G141S and HPD1 genes, while no colonies were isolated with ARO4K229L (Section 7.1 of Supplementary Experiments). At the same time, a control experiment using marker-based construction enabled ARO4K229L overexpression, however correct clones always had smaller size in comparison with incorrect transformants (Fig. S32 of Supplementary Experiments). Due to the growth defects of both ARO4K229L overexpression and ∆aat1∆aat2 doubledeletion, we decided to assemble these two modifications together using combined CRISPR/Cas9 and marker-based selection. Both triple (∆aro8∆aro9∆aat2) and quadruple (∆aro8∆aro9∆aat2∆aat1) mutant strains were co-transformed with the marker-based construct (pE8US-HPD1-ARO4-ARO7) and the corresponding Cas9-helper (pCasNA-IntE8). We first selected for the helper plasmid in the liquid medium with Nat, and then for the integrative construction on the solid medium without uracil. This method enabled the isolation of both triple and quadruple aminotransferase mutant derivatives with the overexpression of three required genes, designated as S946 and S948, respectively.
Finally, we decided to investigate and inactivate the degradation pathway of HGA. However, this pathway was yet unknown in Y. lipolytica. In most organisms, including fungi, the HGA degradation pathway starts with the activity of homogentisate 1,2-dioxygenase56. Detailed analysis of Y. lipolytica W29 genome and resequencing of the selected region allowed us to identify an ORF missed during the previous whole genome analysis. We designated this ORF as HMG2 and it encodes a protein which is highly similar to homogentisate 1,2-dioxygenase from other species (GenBank accession number MZ387986). This gene is the last unique sequence on chromosome D followed by repetitive elements. Interestingly, next to this gene we also identified two genes encoding putative fumarylacetoacetate hydrolase and glutathione S-transferase, which together with HMG2 are suggested to be involved in HGA degradation in other fungi57. Furthermore, it is well documented that Y. lipolytica frequently produces mutants secreting a brown pigment58,59, which could be pyomelanin formed from components of rich media. Due to the position of HMG2 gene we anticipated that such mutants might be associated with spontaneous truncation of this telomeric region. Therefore, we decided to induce such a truncation artificially using a Cas9-helper that cut inside of HMG2 gene. Indeed, transformation of a strain with wild type background induced intensive formation of brown pigment on rich media (Fig. S34 of Supplementary Experiments). Following this, we induced similar truncations in the strains S946 and S948, which led to the creation of S997 and S987 respectively. We found that both strains were able to synthesise HGA and pyomelanin de novo, on media with 9% glucose and 2% citrate as single carbon sources respectively (Fig. 3c and 3d). After 14 days of incubation in minimal medium with 9% glucose, the strain S997, a derivative of the triple aminotransferase mutant, was the best producer with 373.8 mg/L of HGA, while S987 produced 339.1 mg/L of HGA.