Determination of contents of saponin monomers
The raw dried extracts obtained from the roots, stems, and leaves of P. grandiflorus were measured using high performance liquid chromatography with evaporative light scattering detection (HPLC-ELSD). The contents of four saponin monomers in three organs with three duplicates were expressed as the mean with standard deviation (Fig. 1). The contents of platycodin D, platycodin D3, and polygalacin D were higher in the root than in aerial parts. In addition, the sum of mean saponin contents in the root was also higher than that in the stem and leaf.
P. grandiflorus transcriptome analysis via RNA-Seq and PacBio Iso-Seq
To obtain a comprehensive transcriptome profile of P. grandiflorus, BGISEQ-500 and PacBio Sequel platforms were combined. Nine RNA samples from different organs (stem, leaf, and root, each in triplicate) were sequenced separately using a BGISEQ-500 HiSeq platform, with total raw reads in three tissues produced (145.49 M, 143.74 M, and 140.15 M from stems, leaves, and roots, respectively). After data filtering, a total of 132,610 clean reads were obtained, and the number of total clean reads from stems, leaves, and roots was 131.57 M, 130.47 M, 127.65 M, respectively (Fig. 2a, Table S1). To obtain comprehensive coverage of the P. grandiflorus transcriptome, a full-length transcriptome was generated using full-length cDNAs from pooled poly (A) RNA of three organs normalised and subjected to SMRT sequencing using the PacBio Sequel platform. Altogether, 717,560 polymerase reads were generated, and 23,173,163 subreads (32.64 Gb) were obtained from one SMRT cell after filtering, with a mean length of 1,408.61 bp and N50 length of 1,992 bp. A total of 685,102 reads of insert (ROIs) were obtained after data processing, with a mean length of 2,219 bp, mean read quality of 0.95, and 28 passes. Of these, 547,395 ROIs containing two primers and poly (A) tails were classified as full-length non-chimeric (FLNC) reads with a mean length of 1,341 bp (Fig. S1a). Furthermore, FLNC reads were clustered and corrected using the Interactive Clustering and Error Correction (ICE) algorithm and Quiver program. As a result, 180,602 full-length consensus isoforms were obtained, with a mean length of 1,449 bp and quality of 0.98 (Fig. S1b), of which, the high-quality isoforms were further combined, and redundant sequences were removed using CD-Hit. A total of 173,354 non-redundant transcripts were identified (Fig. 2b), with lengths ranging from 199 bp to 15,360 bp, N50 of 1,924 bp, N90 of 818 bp, and a GC content of 42.49% (Table 1).
Table 1
Statistics of non-redundant transcripts
Total number
|
Total length (bp)
|
N50 (bp)
|
N90 (bp)
|
Max length (bp)
|
Min length (bp)
|
Sequence GC (%)
|
173,354
|
252,336,203
|
1924
|
818
|
15,360
|
199
|
42.49%
|
Functional annotation of full‑length P. grandiflorus transcriptome
For a comprehensive annotation of the P. grandiflorus transcriptome, 173,354 non-redundant transcripts were functionally annotated against the Nr, Nt, KEGG, SwissProt, Pfam, GO, and KOG databases using Blast and Blast2GO. Among them, 146,868 transcripts (84.72%) were successfully matched to at least one of the seven databases with an achievement ratio ranging from 22.62 to 84.72% (Fig. 2c, Table 2). A total of 139,018 transcripts of P. grandiflorus were analysed to be homologues by aligning sequences to the Nr database, of which the most significant homology species was Cynara cardunculus var. scolymus (11.64%), followed by Helianthus annuus (11.38%), Vitis vinifera (9.87%), and Daucus carota subsp. sativus (5.48%) Furthermore, the remaining 61.63% sequences were mapped to other plants (Fig. 2d).
To explore the main biological processes in P. grandiflorus, 111,280 transcripts were mapped to the KEGG database and classified into five categories, including cellular processes, environmental information processing, genetic information processing, metabolism, and organismal systems (Fig. 3a, Table S2), where "transport and catabolism," "signal transduction," "translation," "global and overview maps," and "environmental adaptation" were the most abundant subcategories, respectively. Notably, 1,765 transcripts of P. grandiflorus were enriched in "metabolism of terpenoids and polyketides"; specifically, 493 transcripts were associated with "terpenoid backbone biosynthesis" (ko00900), followed by 268 transcripts and 106 transcripts involved in "sesquiterpenoid and triterpenoid biosynthesis" (ko00909) and "diterpenoid biosynthesis" (ko00904) pathways, respectively.
In addition, the KOG annotation demonstrated that 109,686 transcripts were assigned to 25 functional clusters, and "general function prediction only" (24,091 transcripts) was the largest category, followed by "signal transduction mechanisms" (16,154 transcripts) and "posttranslational modification, protein turnover, chaperones" (13,603 transcripts) (Fig. 3b, Table S3).
Based on the Nr annotation results, 77,618 transcripts were assigned to the GO database and classified into three functional categories, biological process, molecular function, and cellular component, using the Blast2GO program (Fig. 3c, Table S4). The cellular component was the largest cluster comprising 164,035 transcripts, whose major terms were "cell" and "cell part," followed by biological process containing 111,975 transcripts and molecular function with 91,034 transcripts. Under the biological process and molecular function categories, "metabolic process" and "catalytic activity" were the most abundant terms, respectively, indicating that the study might provide insight into the genes involved in secondary metabolite synthetic pathways.
Table 2
Functional annotation results from seven public databases
Databases
|
Nr
|
Nt
|
Swiss-prot
|
KEGG
|
KOG
|
Pfam
|
GO
|
Intersection
|
Overall
|
Number
|
139,018
|
119,342
|
108,337
|
111,280
|
109,686
|
91,636
|
77,618
|
39,248
|
146,868
|
Percentage (%)
|
80.19
|
68.84
|
62.49
|
64.19
|
63.27
|
52.86
|
44.77
|
22.64
|
84.72
|
Identification of differentially expressed genes
To explore the variation in gene abundance and expression profiles among the stem, leaf, and root (leaf vs stem, stem vs root, and leaf vs root), clean reads from RNA-Seq were aligned to reference transcripts, and the expression level calculated using RSEM. There were 50,957 differentially expressed genes (DEGs) derived from the leaf vs stem analysis, including 34,124 that were upregulated and 16,833 downregulated. Root and stem tissue comparison resulted in 37,213 DEGs, where 14,091 were upregulated, and 23,122 downregulated. Additionally, 57,203 DEGs were identified in the leaf vs root comparison, of which 35,109 were upregulated, and 22,094 were downregulated (Fig. 4a). Moreover, among these DEGs, 14,029 common DEGs were identified in three tissues that might be vital for metabolism among the three organs. Comparing DEGs in diverse groups resulted in 8,233 specific DEGs in leaf vs root, followed by 5,700 in leaf vs stem and 3,009 in stem vs root, indicating that the difference between leaf and root was the greatest (Fig. 4b). For a thorough exploration of differential transcripts, KEGG and GO enrichment analyses were performed using the Phyper function. In KEGG analysis, 15,894 and 23,334 DEGs were identified in stem vs root and leaf vs root comparisons and mapped to 137 and 138 pathways, respectively, where the most enriched pathway for either comparison was "phenylpropanoid biosynthesis" (Fig. 4c, d).
There were 60,906 DEGs in the leaf vs stem comparison, mainly associated with "plant-pathogen interaction" (Fig. S2a). Moreover, 137 and 121 DEGs were identified and separately matched to "terpenoid backbone biosynthesis" and "sesquiterpenoid and triterpenoid biosynthesis", respectively, in stems compared with roots. There were 219 and 154 DEGs annotated to the above pathways in the leaf vs root comparison. The most expansive GO term among the different comparison groups was "cellular component" (Fig. S2b, c, d).
Identification of transcription factors
Transcription factors (TFs) play vital roles in the regulation of secondary metabolic processes. A total of 3,469 genes encoding TFs were identified through alignment with the Plant TF database and classified into 57 TF families, including 743 upregulated and 1,675 downregulated genes in the leaf vs root comparison and 878 upregulated and 1,168 downregulated genes in the stem vs root comparison (Table 3 and Table S5). A total of 313 genes encoded the mTERF family, accounting for the largest proportion, followed by 281, 262, and 259 genes encoding the MYB, bHLH, and AP2-EREBP families. We further annotated TFs with the KEGG functional database and obtained eight FHA TFs involved in terpenoid and polyketide metabolism; 14 zf-HD TFs, 5 MYB TFs, 2 RWP-RK TFs, and 1 C2C2-CO-like TF participated in the biosynthesis of other secondary metabolites. Among them, 4 zf-HD TFs, 2 RWP-RK TFs, and 1 MYB TF showed the highest expression in the roots, which are thought to be closely related to triterpene synthesis (Fig. S3, Table S6).
Table 3
Classification and number of TF families identified in the DEG database
TF Family
|
Number of genes
|
Number of upregulated genes
|
Number of downregulated genes
|
Leaf vs Root
|
Stem vs Root
|
Leaf vs Root
|
Stem vs Root
|
mTERF
|
313
|
94
|
27
|
67
|
54
|
MYB
|
281
|
57
|
129
|
87
|
98
|
bHLH
|
262
|
65
|
134
|
74
|
100
|
AP2-EREBP
|
259
|
16
|
174
|
71
|
120
|
GRAS
|
205
|
24
|
112
|
65
|
70
|
C3H
|
165
|
16
|
109
|
16
|
81
|
WRKY
|
131
|
22
|
68
|
56
|
29
|
NAC
|
111
|
9
|
64
|
1
|
32
|
Trihelix
|
103
|
21
|
58
|
15
|
51
|
C2H2
|
98
|
30
|
38
|
23
|
41
|
G2-like
|
91
|
14
|
52
|
19
|
27
|
SBP
|
84
|
21
|
27
|
23
|
16
|
C2C2-GATA
|
79
|
19
|
36
|
18
|
34
|
TCP
|
78
|
52
|
13
|
24
|
23
|
ARF
|
69
|
2
|
53
|
12
|
37
|
FAR1
|
68
|
8
|
27
|
5
|
28
|
Tify
|
65
|
15
|
36
|
28
|
19
|
C2C2-Dof
|
61
|
3
|
36
|
7
|
30
|
HSF
|
60
|
11
|
33
|
14
|
21
|
Other
|
886
|
244
|
449
|
253
|
257
|
Total number
|
3469
|
743
|
1675
|
878
|
1168
|
Putative genes involved in triterpenoid saponin biosynthesis
Triterpenoid saponins are the primary curative components of P. grandiflorus. As previously described, many known enzymes participate in the synthesis of triterpenoid saponins, such as AACT, HMGS, DXS, and SQS. Although the biosynthetic pathway of platycodins has been widely studied, exhaustive information on the genes encoding relevant key enzymes is still limited. Platycodins are oleanane-type triterpenoid saponins synthesised from oleanolic acid through several modifications by CYP450 (cytochrome P450 monooxygenase) and GTs. Subsequent modification pathways catalytically produce various platycodins (Fig. 5a).
The synthesis of platycodins predominantly comprised "terpenoid backbone biosynthesis" (map00900) and "sesquiterpenoid and triterpenoid biosynthesis" (map00909) pathways based on KEGG enrichment analysis (Fig. S4). Divergent gene expression has a vital influence on platycodin biosynthesis. After screening of FPKM > 1, a total of 113 genes encoding 18 key enzymes, 96 of which were further identified as DEGs, were considered as putative genes related to triterpenoid saponin synthesis in P. grandiflorus (Table 4 and Table S7).
DEGs distributed in MVA and MEP upstream pathways were greater than those in downstream pathways. In our transcriptome dataset, 22 DEGs encoding six enzymes relevant to the MVA pathway were identified, including nine PgAACT genes, eight PgHMGR genes, one PgHMGR gene, three PgMK genes, and one PgMDC gene. Moreover, 34 DEGs encoding seven enzymes involved in the MEP pathway were confirmed, including nine PgDXS genes, one PgDXR gene, two PgMCT genes, two PgCMK genes, seven PgMDS genes, six PgHDS genes, and seven PgHDR genes. The MEP pathway demonstrated that nearly all DEGs showed the highest expression levels in leaves and the least in roots. In comparison with the MEP pathway, highly expressed DEGs were concentrated in the root and stem in the MVA pathway. Research has suggested that the biosynthesis of triterpenoids and sesquiterpenoids occurs via the MVA pathway, whereas monoterpenoids and diterpenoids are involved in the MEP pathway. Additionally, based on our transcriptome data, only one transcript for genes encoding PgHMGS and PgMDC identified as DEGs was obtained, and the expression level was the highest in the stem, followed by the root or leaf. For DEGs of PgHMGR, the expression levels of most transcripts in the stem and leaf were higher than those in the root, yet PgHMGR1 and PgHMGR3 showed their highest expression in the roots. As genes encoding key enzymes of the MVA pathway, PgAACT and PgMK displayed higher expression in stems or roots than in leaves. Among them, PgAACT2, PgAACT3, and PgMK1 all had the highest expression levels in the roots, especially for PgAACT3, which exhibited much higher expression than other PgAACTs, presumably vital for platycodin biosynthesis.
Most of the 13 PgFPPS DEGs were expressed at higher levels in leaves, except for PgFPPS9 and PgFPPS13 whose expression was the highest in the root and stem, respectively. IDI is a crucial enzyme that catalyses the conversion between the common precursors IPP and DMAPP. There were 11 PgIDIs found to be DEGs among the three tissues, and most were highly expressed in the stem and root; notably, PgIDI6 and PgIDI7 displayed the highest expression in the roots. Pertaining to “sesquiterpenoid and triterpenoid biosynthesis”, 14 DEGs were found encoding PgSQS, PgSQE, and Pgβ-AS. Of the eight candidate genes of PgSQS, only PgSQS1 was identified as the DEG with the highest expression in the stems.
Additionally, differential transcripts of PgSQE and Pgβ-AS had similar expression trends, with the highest expression in the leaves. β-AS is a significant modification enzyme for 2,3-oxidized squalene for oleanolic acid formation, and Pgβ-AS1 was expressed at the highest level in the roots. Statistically, there were nine genes with the highest expression in the root, which were considered key genes for triterpenoid saponin biosynthesis in P. grandiflorus (Fig. 5b).
Table 4
Number of genes encoding key enzymes (FPKM > 1)
Enzyme Abbreviation
|
EC number
|
Gene number
|
AACT
|
2.3.1.9
|
14
|
HMGS
|
2.3.3.10
|
5
|
HMGR
|
1.1.1.34
|
5
|
MK
|
2.7.1.36
|
3
|
PMK
|
2.7.4.2
|
1
|
MDC
|
4.1.1.33
|
4
|
DXS
|
2.2.1.7
|
5
|
DXR
|
1.1.1.267
|
1
|
MCT
|
2.7.7.60
|
2
|
CMK
|
2.7.1.148
|
2
|
MCS
|
4.6.1.12
|
8
|
HDS
|
1.17.7.1, 1.17.7.3
|
7
|
HDR
|
1.17.7.2, 1.17.7.4, 1.17.1.4
|
6
|
IDI
|
5.3.3.2
|
13
|
FPPS
|
2.5.1.1, 2.5.1.10
|
20
|
SQS
|
2.5.1.21
|
8
|
SQE
|
1.14.14.17
|
8
|
β-AS
|
5.4.99.39
|
1
|
Characterisation of the β-amyrin synthase
β-AS, belonging to the oxidosqualene cyclase (OSC) family, plays a vital role in regulating oleanane-type triterpenoid saponin biosynthesis. β-AS catalyses the conversion of 2,3-oxidosqualene into β-amyrin, the precursor of oleic acid. To date, five types of OSCs have been isolated and identified from ginseng, including β-AS, dammarenediol synthase (DS), cycloartenol synthase (CAS), lupeol synthase (LUS), and lanosterol synthase (LS). From the screening, three full-length genes encoding putative β-AS proteins (isoform_10598, isoform_71380, and isoform 102853) in P. grandiflorus were identified and characterised using the transcript library.
Domain analysis revealed that three β-AS isoforms contained one catalytic acid site and a conserved squalene cyclase domain belonging to the ISOPREN-C2-like superfamily. The ISOPREN-C2-like superfamily contains class II terpene cyclases, including squalene cyclase and 2,3-oxidosqualene cyclase (OSC) (Fig. S5). Ultimately, we selected two complete sequences encoding β-AS from ginseng (OSCPNY1 and OSCPNY2) to perform homology analysis with three isoforms. The comparison of five amino acid sequences indicated that their consistency was 74.04%, and three conserved regions (motif QW, DCTAE, and MWCYCR) belonging to β-AS were also identified (Fig. 6d) [29–30]. The complete alignment result is shown in Fig. S6. The isoform_10598 has a mutation site, "L," in the DCTAE motif, and isoform_71380 has two mutation sites, "L" and "F," in the MWCYCR motif. Three isoforms were further selected to create 3D constructure models based on human OSC complexed with lanosterol (PDB ID: 1w6k.1.A) (Fig. 6a, b, c) [31].
The phylogenetic tree was divided into four branches, LUS, LS, CAS, and β-AS, where the typical genes were MtLUS, VrLS, CrCAS and GaβAS from Medicago truncatula, Vigna radiata var. radiata, C. reinhardtii, and G. arboreum, respectively (Fig. 7, Table S8). The results revealed that three isoforms were homologous to β-AS from other plants, consistent with our previous structure analysis and indicating that the three isoforms were β-AS genes from P. grandiflorus. Additionally, isoform_71380 clustered together with β-AS from G. arboreum, and isoform_102853 clustered together with β-AS from Panax ginseng. These two sub-branches and isoform_10598 were further clustered.
qRT-PCR validation of DEGs from the RNA-seq analysis
To validate the reliability of transcriptome analysis data, 20 DEGs related to triterpenoid saponin biosynthesis were verified using qRT-PCR with β-actin as the reference; all primers used are listed in Table S9. The results of RNA-seq and qRT-PCR revealed a high-ranked consistency, indicating that the RNA-seq data are dependable and accurate (Fig. 8). More importantly, PgAACT2, PgHMGR1, and Pgβ-AS1 displayed higher expression in roots than the tested genes.