Metabolite profiling of A. chinensis in different tissues and fruit structures
We applied matrix-assisted laser desorption/ionization (MALDI)-mass spectrometry imaging (MSI) to explore the spatial distribution of triterpenoid and coumarin metabolites in epicarps, sarcocarps, and seeds of fruits (Fig. 1A). MALDI-MSI analysis showed that fraxin (m/z 409.0543 [C16H18O10 + K]+) mainly accumulated in the epicarp while triterpenoid saponins such as aesculiside A (m/z 1129.484 [C54H88O23 + K]+), aesculioside G (m/z 1183.531 [C56H88O24 + K]+), escin Ia (m/z 1169.515 [C55H86O24 + K]+), escin V (m/z 1157.515 [C54H86O24 + K]+), and isoescin Ⅱb (m/z 1139.505 [C54H84O23 + K]+) mainly accumulated in the seeds (Figs. 1B and 1C, Table S1). Additionally, LC-MS was performed to understand metabolite profiles not only in stems, flowers, leaves, epicarps, sarcocarps, and seeds but also in different developmental stages of these tissues indicated by collection times. The contents of aesculin and fraxin were significantly higher in leaves and flowers than in epicarps, sarcocarps, and seeds (p < 0.05) (Fig. 1D, Table S2-3). The content of protoescigenin, the main aglycogen of aescin, was the highest in stems and seeds collected in October. The contents of escin Ia, escin Ib, and total triterpenoids were the highest in seeds collected in October, followed by seeds collected in September, August, and July, then by flowers collected in May. (p < 0.05) (Fig. 1E, Table S2-3), while these two compounds were hardly detected in leaves collected from March to October. These data indicate that the accumulation of these compounds depends upon tissues and the developmental stage of seeds and flowers.
Genome assembly and gene annotation of A. chinensis
The size of the A. chinensis genome was predicted to be 481.90 Mb using flow cytometry (Figure S1). Genome survey of A. chinensis based on 17 k-mer frequency of Illumina short reads showed that A. chinensis genome is 504.28 Mb with a small heterozygous peak and an obvious repetitive peak, suggesting it has a low level of heterozygosity (about 0.37%) (Figure S2). In addition, long-read sequencing using Oxford Nanopore Technologies (ONT) obtained 34 Gb data with a ~ 68× coverage and an N50 length of 11.74 kb (Table S4). After error-correction, trimming, and assembling, filtered ONT reads were assembled into 656 contigs with a total size of 470.02 Mb, an N50 length of 2.05 Mb, and the longest contig of 8.58 Mb, covering 97.50% of the estimated nuclear genome size (Table S5). The assembled contigs comprised many sequencing errors because of the low accuracy of ONT sequencing. Accordingly, this contig-level assembly was further polished three times using Illumina short reads, and 97.4% (1,573 out of 1,614) plant single-copy orthologs were identified using Benchmarking Universal Single-Copy Orthologs (BUSCOs) estimation, indicating a high degree of completeness in the polished genome of A. chinensis (Table S6). In addition, Hi-C chromosome conformation capture sequencing generated 344,838,272 raw paired-end reads, of which 58.05% (189,382,086) were mapped to the contig assembly as unique paired-end reads. Among these unique reads, 174,576,440 were captured to guide the pseudo-chromosome assembly. A total of 461.08 Mb (98.09%) of the assembled genome was anchored to 20 pseudo-chromosomes (2n = 40) (Fig. 2A, Figure S3). Table S5 lists the detailed characteristics of the A. chinensis genome.
Approximately 59.14% (276,695,955 bp) of repetitive DNAs in the A. chinensis genome (Table S5) were annotated in accordance with the transposable element (TE) content of reported Sapindaceae genomes, namely Xanthoceras sorbifolium (56.39%) and Dimocarpus longan (52.87%). Of these repetitive elements, 24.37% (114,021,841 bp) TEs were long terminal repeat (LTR) retrotransposons, of which 98.1% belonged to the Gypsy superfamily (32.7%) and Copia superfamily (65.4%) (Table S7). Moreover, 36,557 protein-coding genes were identified via integrating ab initio gene predictions, homologous protein searches, and the de novo assembled transcripts from RNA-seq reads. In addition, 35,790 (97.9%) genes could be located on the 20 pseudo-chromosomes. The identified orthologs covered 95.4% of the embryophyta BUSCOs, indicating that the annotated genome is largely complete (Table S8). We further identified orthologous groups of proteomes from A. chinensis and 13 other Rosids species, including Arabidopsis thaliana, Brassica rapa, Citrus clementina, Cucumis sativus, D. longan, Glycine max, Gossypium raimondiim, Malus domestica, Populus trichocarpa, X. sorbifolium, Theobroma cacao, and Vitis vinifera, and harvested a total of 21,299 orthologous groups covering 497,631 genes. We then compared the genomes of candidate plant species to obtain gene families that are significantly expanded in A. chinensis or that are unique to A. chinensis (Figure S4). Functional prediction showed that the expanded gene families are especially enriched in the KEGG pathways of secondary metabolites, such as terpenoid biosynthesis (KO00900: terpenoid backbone biosynthesis, KO00909: sesquiterpenoid and triterpenoid biosynthesis) and phenylpropanoid biosynthesis (KO00940) (Figure S5).
Phylogenomic Dating And Whole-genome Duplication Analysis
A total of 139 single-copy genes from 13 Rosids species were selected to construct a high-confidence phylogenetic tree. The phylogenetic trees from both concatenated nucleotide and protein sequences of single-copy genes supported a close relationship of A. chinensis with other sequenced Sapindaceae species, X. sorbifolium Bunge and D. longan Lour. (Fig. 2B). In these analyses, A. chinensis was found as a sister lineage to X. sorbifolium, and both further formed a sister group to D. longan. Molecular dating of the tested lineages using the nucleotide sequences of single-copy genes and fossil age calibrations inferred that the divergence of families Sapindaceae and Rutaceae of Sapindales occurred at approximately 36.3 MYA with 95% confidence interval (CI) from the range of 21.8 MYA to 50.4 MYA. The split between A. chinensis and X. sorbifolium occurred at approximately 32.5 MYA with a 95% CI from 17.5 MYA to 45.4 MYA (Fig. 2B).
Intragenomic collinear analysis identified at least one whole-genome duplication (WGD) event in the A. chinensis genome (Fig. 2D). Collinearity analyses between A. chinensis and X. sorbifolium, and between A. chinensis and C. clementina showed that two paralogous segments in A. chinensis corresponded to one orthologous region in X. sorbifolium and C. clementina, respectively (Fig. 2D, Fig. 2E, Figure S6). These results supported that the species-specific WGD event might occur in A. chinensis after its divergence with speciation from the common ancestor of A. chinensis and X. sorbifolium. In addition, the distributions of synonymous substitutions per synonymous site (KS) for paralogous genes and anchor pairs in collinear regions of A. chinensis showed a clear peak at approximately 0.24 and a minor peak around 1.75, suggesting that the A. chinensis genome might have experienced two WGD events (Fig. 2C, Figure S7). Previous studies have suggested that no WGD event occurred in C. clementina (30) and D. longan (31) after the ancestral gamma triplication (γ-WGT) event. Our analysis confirmed that only one KS peak was detected in C. clementina and X. sorbifolium, at approximately 1.5 and 1.75, respectively. This ancestral KS peak, shared by A. chinensis, C. clementina, and X. sorbifolium genomes, represents the γ-WGT event. The KS distribution of orthologs between A. chinensis and X. sorbifolium showed one KS peak at around 0.25, slightly larger than the KS value of paralogs in A. chinensis. This again suggests that the recent WGD event in A. chinensis happened after the divergence between A. chinensis and X. sorbifolium. Using the divergence time and mean KS value of orthologs between A. chinensis and X. sorbifolium, between A. chinensis and C. clementina, and between A. chinensis and V. vinifera, we estimated that the species-specific WGD event in A. chinensis (Aα) occurred at around 30.8 ± 1.33 MYA (Fig. 2B). Further, we identified 3,358 genes that retained duplicates from the recent Aα event in A. chinensis. The functional annotation and GO enrichment analyses showed that the retained duplicates might be related to the responses of A. chinensis to various stimuli (e.g., biotic stimulus, acid chemical, and stress) and regulation of these responses (Table S9).
The origin of large and diverse taxonomic lineages can be related to ancestral polyploidy events (32). WGD is the major evolutionary force for the production of phenotypic diversity, speciation, and domestication (33, 34). The examples of WGD-derived plant phenotypic and metabolic diversity contributing to species-specific gene expansion include oil biosynthesis in wild olive trees, ursane triterpene synthesis in loquat (Eriobotrya japonica), camptothecin production in camptotheca tree (Camptotheca acuminata), and triptolide content in thunder god vine (Tripterygium wilfordii) (35–37). To investigate the impact of the A. chinensis-specific WGD event on aescin and aesculin biosynthesis, we systematically calculated Ks for each duplicated paralogous gene pair, emphasizing the upstream pathway genes involved in terpene biosynthesis (i.e., AACT, HMGS, HMGR, MVA, MVK, PMK, MVD, IDI, DXS, DXR, MCT, CMK, MDS, HDS, HDR, FPS, SQS, and SQE) and coumarin biosynthesis (PAL, C4H, C3H, 4CL, and COMT) (Table S10). We found that the Aα WGD only led to retention of duplicates in the terpene pathway, suggesting that metabolic flux may have shifted toward triterpenoid metabolism, resulting in aescin production.
Biosynthetic gene cluster and weighted gene co-expression network analyses (WGCNA) for discovering the Escin Ia Pathway
Aescins are triterpene saponins, biosynthesized from 30-carbon intermediate 2,3-oxidosqualene (Figure S8) by sequential actions of multiple enzymes, including oxidosqualene cyclase, cytochrome P450 monooxygenase (CYPs), glycotransferases (UGTs), and acyltransferases (BAHDs) (38, 39). The pentacyclic triterpene aescin skeleton is derived from dammarenylcation and D and E ring expansion en route to β-amyrin. β-amyrin scaffold further undergoes site-specific oxidation catalyzed by P450s, forming diverse non-glycosylated aglycones, which are collectively referred to as protoaescigenin (18). Acylation and glycosylation of protoaescigenin contribute to the structure diversification of aescins. After automated annotation of the whole genome and manual revision based on characteristic domains, we identified 21 OSCs, 162 CYP450s, 81 BAHDs, and 173 UGTs in the A. chinensis genome (Tables S11-14). All these P450s have been named and are shown in Table S12. To find gene clusters involved in triterpenoid biosynthesis in the A. chinensis genome, we searched for biosynthetic gene clusters (BGCs) containing skeleton OSCs and /or tailoring enzymes known to act in such metabolism. This led to discovery of four BGCs containing OSCs and one BGC containing CYP716 genes, known as AcClusters I-V, which are then implicated in triterpenoid synthesis in the A. chinensis genome (Figure S9). Syntenic gene analysis of A. chinensis paralogs revealed that AcCluster II is physically syntenic with AcCluster I (Fig. 3A, Figure S9). Of these, AcCluster I is located on chromosome 15 and contains two OSC homologs (AcOSC6 and partial AcOSC9), six CYPs (AcCYP716A274, AcCYP716A278, AcCYP716BX1, AcCYP716BX3, AcCYP716BX6, and AcCYP716A276), and five BAHDs (AcBAHD1-AcBAHD5) within a 350-kb region, harboring possible catalytic steps, including 2,3-oxidosqualene cyclization, oxidation, and acylation. RNA-seq data further showed that one OSC (AcOSC6), two CYPs (AcCYP716A278 and AcCYP716BX1), and three BADH genes (AcBAHD1, AcBAHD3, and AcBAHD5) were abundantly expressed in A. chinensis seeds, in agreement with high aescin accumulation in the seeds (Fig. 3B, Table S15). AcCluster II is located on chromosome 8 and consists of two CYP450 genes (AcCYP716A275 and AcCYP716BX) and one cellulose synthase-like gene (AcCSL1), which putatively participate in protoaescigenin formation via hydroxylation and glucuronidation (Table S15). These three genes also exhibit a seed-specific expression pattern and are co-expressed with the candidate genes localized in AcCluster I (Fig. 3B, Table S15). We reconstructed the phylogenetic relationship of AcCSL1 with characterized and related genes from different plants available in the literature (40). AcCSL1 is clustered with a clade of characterized proteins belong to CslM subfamily involved in glucuronidation at the 3-C position of oleanane-type triterpenoids, suggesting a similar function in A. chinensis (Figure S10, S11). Further in-depth examination revealed that no glycosyltransferases are found in the surrounding regions of either AcCluster I or AcCluster II. We then performed WGCNA (Figure S11) to retrieve potential genes encoding glycosyltransferases that catalyze the conversion of the glucuronosyl moiety of escin Ia to diglucoside and identified nine putative UDP-glycosyltransferases (Table S16) that may attach glycosyl chains to protoescigenin to form escin Ia.
Biochemical identification of AcOSC6, AcCYP716A278, AcCYP716A275, AcCSL1, and AcBAHD3
Heterologous overexpression of full-length AcOSC6 (β-amyrin synthase, AcBAS) in Nicotiana benthamiana, which does not naturally generate β-amyrin, led to the production of β-amyrin in plants as measured by GC-MS/MS and confirmed by comparing mass spectral fragmentation with an authentic standard (Fig. 3C, 3D and Figure S12). The function of AcOSC6 matches its phylogenetic relationships, as it clusters with other OSCs that exhibit such BAS activity (Figure S13). To further explore whether seed-specific CYP716 genes from AcClusters I and II are able to catalyze the early steps of the escin Ia pathway, we co-expressed CYP716A278 and CYP716A275 with AcOSC6, as well as AstHMGR to increase metabolic flux to terpenoids, using N. benthamiana infiltration system. Indeed, both exhibited detectable activity toward β-amyrin, creating new peaks at 13.6 and 14.9 min, respectively. According to the matching MS fragmentation pattern at 14.9 min, the new compound produced by AstHMGR/AcOSC6/AcCYP716A278 is 21β-hydroxy-β-amyrin, which is known to be produced by AstHMGR/AsbAS1/GmCYP72A69 (41) (Fig. 3C, 3E and Figure S12). AcCYP716A278 and GmCYP72A69 also were respectively transferred into engineered yeast strains Y1-20-6 producing β-amyrin (42), leading to identical catalytic activity as observed in N. benthamiana (Figure S14). Co-expression of CYP716A275 and AcOSC6 in N. benthamiana at 13.6 min leads to the production of 16α-hydroxy-β-amyrin, as similarly verified by comparison to the known production by AsbAS1/GmCYP716Y1 (43) (Fig. 3C, 3F and Figure S12). Several CYP families have been reported to act in triterpenoid oxidation, including CYP93, CYP716, and CYP72 (44–46). In Medicago truncatula, CYP716A subfamily members are involved in catalyzing the conversion of β-amyrin to 28-hydroxy-β-amyrin (47–49). Here, we showed that AcCYP716A275 catalyzes the region-specific C-16 oxidation of β-amyrin, consistent with broader function of the CYP716A subfamily in triterpenoid oxidation. To date, two CYP72A subfamily members in the CYP72 clan: oat AsCYP72A475 and soybean GmCYP72A69, have been characterized as C-21 triterpene oxidases based on their activity towards oleanane-type triterpenoids (34, 50). GmCYP72A69 oxidized soyasapogenol B and β-amyrin at position C-21, while AsCYP72A475, as a triterpene C-21 hydroxylase, hydrolyzes 12,13β-epoxy and 16β-hydroxy-β-amyrin at the C-21 site. However, we found that a CYP716A subfamily member rather than CYP72A subfamily member carries out C-21 hydroxylation for escin Ia biosynthesis in A. chinensis. This indicates convergent evolution of C-21 oxidation of oleanane-type triterpenoids by independent evolution within the CYP716(A) and CYP72(A) (sub)families. Identical compounds produced by distant species may originate from different enzymes, but more often arise via the same pathway, regardless of whether these enzymes are homologous (51).
The attachment of a hydrophilic carbohydrate fragment to the triterpenoid skeleton enhances its pharmaceutical properties and water solubility (52–54). Recent reports have identified a series of cellulose synthase-derived glycosyltransferases (CSyGT) that transfer the glucuronic acid moiety to the C-3 position of triterpenoid aglycones in four leguminous plants: Glycyrrhiza uralensis, G. max, Lotus japonicus, and Spinacia oleracea(40, 55). To examine whether AcCSL1 functions as a glucuronic acid transferase, in vivo substrate-feeding and in vitro yeast assays were performed with the previously characterized GmCSyGT1 as a positive control. The substrate-feeding experiment, carried out with recombinant expression in N. benthamiana, demonstrated that both GmCSyGT1 and AcCSL1 acted on protoescigenin, leading to observation of a peak with a molecular ion at m/z 681.3 Da, which equals the molecule weight of [protoescigenin + COOH] plus glucuronic acid (176) (Fig. 3C, 3G). Enzymatic assays with recombinant yeast microsomes containing AcCSL1 or GmCSyGT1 obtained the same results (Figure S15, S16).
Pharmacodynamic studies have shown that acylation at C-21 and C-22 positions with diangeloyl groups increased cytotoxicity of aescins (56). A few BAHDs have been identified as major acyltransferases for acylation of thalianol-derived tricyclic in Arabidopsis, tetracyclic cucurbitacins in Cucurbits, and pentacyclic triterpenoids in spinach and Boswellia trees (55, 57, 58). To decouple the activity of these transcribed BAHDs (AcBAHD1, AcBAHD3, and AcBAHD5) in AcCluster I towards substrate protoescigenin with acetyl-CoA as the donor, we performed in vitro enzyme assays using recombinant proteins from E. coli and confirmed that AcBAHD3 could utilize acetyl-CoA as an acyl donor to acetylate protoescigenin, as shown in MS with a molecular ion at m/z 593.4 Da, which equals to the molecule weight of [protoescigenin + COOH]− (551.3) plus 42 (Fig. 3H). AcBAHD3 then provides a second example of a BAHD family member that targets acylation of pentacyclic triterpenes, supplementing a previous report that the BAHD family member BsAT1 from Boswellia serrata was able to catalyze C3a-O-acetylation of α-boswellic acids (BA), βBA, and 11-keto-βBA, thus forming all the major C3a-O-acetyl-BAs (3-acetyl-aBA, 3-acetyl-bBA, and 3-acetyl-11-keto-bBA) (58). Phylogenetic reconstruction further confirmed that the BsAT1 is the closest orthologs of AcBAHD3 and AcBAHD1, 2 and 5, which are placed with the clade IIIa BAHDs representing ATs that act on distinct acceptor but predominantly utilize acetyl-CoA as donor (Figure S17). Our results demonstrate that the AcCluster I and AcCluster II BGCs contribute to biosynthesis of barrigenol-type triterpenoid (BAT), which we then name the BAT BGC.
Evolution And Organization Of Barrigenol-type Triterpenoid Biosynthetic Gene Clusters
The origin of plant BGCs stems from assembly of genes after gene duplication, neofunctionalization, and genomic relocation, not via horizontal gene transfer from microbes (59). The abundant of genome now available can provide crucial clues into the mechanisms underlying how genes encoding enzymes from a common biosynthetic pathway are assembled into formation of BGC(9). Comparative genomics in combination with functional characterization indicate that early arising BGCs underwent dynamic change of auxiliary enzymes to generate structurally diverse triterpenoids within and between Arabidopsis or between cucurbits (59). Yet, little is known about the evolutionary trajectories of triterpenoid BGCs in other plant clades.
Our functional data demonstrated that β-amyrin synthase (BAS), CYP716, BAHD IIIa, and CSL genes distributed in AcCluster I and AcCluster II are crucial enzymes catalyzing BAT saponin biosynthesis. The mean KS value of 17 paralogous gene pairs localized in the syntenic blocks of AcCluster I and AcCluster II was 0.27, which is similar to the KS value of the paralogs in A. chinensis more generally (0.24), suggesting that the duplication event of AcCluster I and AcCluster II might originate from the A. chinensis-specific WGD event, Aα WGD. Syntenic analysis showed that AcCluster I and AcCluster II are conserved among Hippocastanoideae species, including A. chinensis, Acer yangbiense, and X. sorbifolium, in accordance with the high accumulation of BAT saponins in these species (60–62). The CYP716 genes, which are specifically enriched A. chinensis, Acer yangbiense, and X. sorbifolium, presumably are responsible for the diversity of BAT triterpenoid biosynthesis in these species.
Further large-scale analysis of evolutionary dynamics for this BAT BGC among 21 published plant genomes including early-diverging angiosperm, monocots, and eudicots was conducted to examine how this syntenic region may have evolved (Fig. 4A). An approximately 1-Mb segment of 64 genes (Fig. 4B, Fig. 4C), including one cycloartenol synthase (CAS) gene and one BAHD IIIb gene in the early-diverging angiosperm Amborella trichopoda, was traced as the ancestral region. Syntenic regions are also present in monocots (Oryza sativa and Zostera marina), but these lineages only contain one CAS gene, not the BAT biosynthesis related BAS, CYP716, CSL, and BAHD genes. The CYP716A member first appeared in the corresponding region of species within Ranunculales – i.e., Papaver somniferum. However, the syntenic region is not present in all the examined Superasterids species. After the split with Superasterids species, almost all the syntenic segments retain at least one OSC gene – i.e., other than Cruciferae representatives such as A. thaliana and B. rapa in Superrosids. Remarkably, a sizeable species-specific tandem duplication of ten complete and ten partial OSC genes was found in the conserved syntenic region of the V. vinifera genome. Complex insertion, deletion, duplication, and arrangements of OSC, CYP716, BAHD, and CSL genes in the syntenic regions of Superrosids species were observed. The BAS/CYP716/BAHD gene cluster was always found in the syntenic region of Superrosids species, but dynamic duplication and re-organization occurred (Fig. 4A). We propose that the complete BAT biosynthetic gene cluster BAS/CYP716/BAHD/CSL was first assembled in Hippocastanoideae species, – i.e., the BAT producing plants (A. chinensis, Acer yangbiense, and X. sorbifolium). In addition, the BGC underwent further dynamic evolution during speciation, such as the tandem gene duplication of CYP716A and CYP716BX, and Aesculus-specific duplication of AcCluster I and AcCluster II from Aα WGD event observed here.
To further examine the evolutionary origins of OSCs, CYPs, CSLs, and BADHs, we generated a series of phylogenetic trees using the maximum likelihood method based on multiple alignments of protein sequences from the forementioned species for synteny analysis (Fig. 4D). The phylogenetic trees based on the syntenic regions of Sapindales species, including A. chinensis, Acer yangbiense, X. sorbifolium, D. longan, and C. clementina, along with V. vinifera, confirmed conservation of BASs, CYP716s, CSLs, and BADH IIIas related to triterpenoid biosynthesis. In addition, gene copy numbers and phylogenetic relationships revealed that the species-specific tandem duplication events of CYP716 family members in A. chinensis and A. yangbiense led to the expansion of CYP716A and CYP716BX subfamily members. The phylogenetic tree of kingdom-wide identification of cellulose-synthase superfamily genes suggested that AcCSL genes and their collinear genes from A. yangbiense and X. sorbifolium as well as the functionally-verified CSLs, which catalyze the conjugation of glucuronic acid to the triterpenoid backbone, share a close phylogenetic relationship sharing sequence identities of 44.7–55.2%. These results indicated that CSL genes localized in the BAT gene clusters of A. chinensis, A. yangbiense, and X. sorbifolium were presumably recruited to catalyze glucuronidation activity before their speciation.
Here, we proposed the evolutionary trajectories for the birth, death, and evolution of the BAT BGC among angiosperms based on their ancestral states reconstruction (Figure S18). (1) Birth in early-diverging angiosperms: The ancestor BGC contains one CAS and one BAHD IIIb in A. trichopoda; (2) Death in monocots: The region containing CAS and other non-BAT functional genes is retained, but with loss of BAHD IIIb; (3) Evolution in specific early-diverging eudicots: Evolution of BAS and addition of CYP716; (4) Death in Superasterids: The BGC was lost in this lineage; (5) Assembly of the BAT BGC in Hippocastanoideae: Formation of this functional BGC (BAS/CYP716/CSL/BAHD) leading to BAT production; (6) Re-organization and diversification of the BAT BGC: Dynamic evolution of this BGC in a species-specific manner via tandem duplication and WGD (Fig. 4E).
Coumarin Biosynthesis Pathway And Identification Of Ugt Synthase For Aesculin And Fraxin
Coumarins (1,2-benzopyrones) are produced via the phenylpropanoid pathway and are widely distributed in higher plants (63, 64). These flavonoids are mainly glucoconjugated and accumulated in the vacuoles. The relevant genes identified in this study from the A. chinensis genome include seven phenylalanine ammonia-lyases (PALs), three cinnamic acid 4-hydroxylases (C4Hs), five cinnamic acid 3-hydroxylases (C3Hs), four 4-coumaric acid:CoA ligases (4CLs), five feruloyl-CoA 6'-hydroxylase (F6'Hs), three caffeic-acid/5-hydroxyferulic-acid O-methyltransferases (COMTs), four caffeoyl-coenzyme A O-methyltransferases (CcoAOMTs), and five scopoletin 8-hydroxylases (S8Hs) (Figure S19, Table S17).
Nineteen putative AcUGTs were identified from phylogenetic analysis based on homology to the known UGTs involved in the flavonoid and coumarin biosynthesis (Table S19). These were cloned and expressed in E. coli to produce recombinant proteins to investigate their enzymatic activities using esculetin and fraxetin as acceptors and UDP-glucose as a donor substrates. AcUGT71A47 and AcUGT92G7 were found to utilize fraxetin as an acceptor substrate (Fig. 5A, Figure S19), whereas AcUGT84A56 and AcUGT92G7 were found to utilize esculetin (Fig. 5B, Figure S19). MS data revealed that all these enzymes could catalyze the addition of one glucose, as indicated by increases in the molecular weight of products relatives to the corresponding aglycones by 162 Da. The enzymatic products of AcUGTs with esculetin or fraxetin as the substrates were confirmed to be aesculin or fraxin, respectively, by comparing mass spectra and retention time to that of authentic standards.
De novo production of aesculin in E. coli
Feeding experiments were carried out to determine the optimal glucosyltransferase for the conversion of p-coumaric acid to aesculin (Table S19, S20). First, we generated strains XC-1 and XC-2E by transforming plasmids pCS-84 (for expression of UGT84A56) and pCS-92 (for expression of UGT92G7) into E. coli BW25113, respectively. These were then used in feeding experiments with esculetin. After 48 h, XC-1 produced 23.0 ± 0.2 mg/L of aesculin, but XC-2 produced only 9 ± 1 mg/L of aesculin (Fig. 5E), suggesting that UGT84A56 exhibits a higher catalytic efficiency than UGT92G7 in E. coli. Thus, UGT84A56 was used for aesculin biosynthesis in subsequent experiments. We constructed strains XC-3 and XC-4 by transforming plasmids pZE-F6H-4CL, for expression of F6H and 4CL both from Arabidopsis thaliana, and pCS-84-HpaBC, for expression of UGT84A56 and HpaBC from E coli, into BW25113, respectively. These were then used in feeding experiments with p-coumaric acid. After 48 h, strain XC-3 produced 103 ± 2 mg/L of aesculin (Fig. 5F).
After achieving biosynthesis of aesculin from p-coumaric acid, we proceeded with expressing the entire aesculin pathway in E. coli. In previous work, over-expressing ppsA and tkta, as well as the feedback inhibition resistant variants tyrA and aroG, improved tyrosine production. To ensure efficient expression, we constructed plasmid pET-648T, which contains the UGT84A56 gene and three other key genes F6’H, 4CL1, and TAL from Rhodotorula glutinis. First, we generated strain XC-4 by transforming plasmids pCS-TPTA-HpaBC and pET-648T into E. coli BW25113 and used it for de novo production of aesculin as a preliminary test. However, the expected intermediates of caffeic acid and esculetin were not detected (data not shown), yet the cell culture turned black. This may be because HpaBC can convert tyrosine to L-Dopa, which is unstable and readily oxidized to melanin, and the precursor PEP or chorismite is insufficient. Thus, to enhance PEP flux and avoid inhibiting cell growth, we engineered strain BW-1 by deleting genes pykA/pykF in E. coli BW25113 and further generated strain XC-5 by transforming plasmids pCS-TPTA-HpaBC and pET-648T into BW-1. Notably, 72 h after induction, 36 ± 5 mg/L of aesculin and its intermediate product esculetin were produced (Fig. 5G, Figure S20).