Accumulation of isoflavonoids and triterpenoid saponins in different organs of A. mongholicus
Eight constituents that four came from isoflavonoid biosynthesis and four from triterpenoid saponins biosynthesis were quantitated in roots (AR), stems (AS) and leaves (AL) of A. mongholicus by comparison with standard compounds based on their retention times and MS fragmentation patterns. As shown in Fig. 1,four isoflavonoids and four triterpenoid saponins were mainly distributed in the roots.
Transcriptome analysis of A. mongholicuscombined sequencing approach
To obtain as many high-quality unigenes as possible and differentiate the roots, stems, leaves transcriptome of A. mongholicus, a hybrid sequencing strategy that combined SMRT and NGS technology was taken. Firstly, nine mRNA samples from three different organs (roots, stems and leaves) were sequenced on the DNBSEQ platform. The raw data with low-quality sequences on ambiguous N base. After quality filtering, the total number of clean reads from the roots, stems and leaves in A. mongholicus were 132.46M, 133.18M, 129.68M, respectively (Table 1). Secondly, full-length transcripts were reconstructed by using the PacBio Sequel platform. In total, 14.21GB subreads were assembled, with a mean length of 1826 bp and an N50 length of 2191 bp (Additional file: Figure S1a). A total of 643,812 reads of insert (ROIs) were generated, and 305,998 were identified as full-length non-chimeric (FLNC) reads with a mean length of 1619 bp (Additional file: Figure S1b). After that, we applied Interactive Clustering and Error Correction (ICE) algorithm combined with the Quiver program for sequence clustering, and removed the redundant sequences via CD-Hit program, 121,107 non-redundant transcripts with a N50 value of 2,124 bp were yielded (Table 2; Additional file 1: Figure S1c.).
Function annotation of full‑length A. mongholicus transcriptome
To obtain the putative functional annotation of A. mongholicus transcriptome, the set of 121,107 transcripts were annotated performed against a variety of protein databases (NR, NT, Swissprot, KEGG, KOG, Pfam and GO) (Table 3). There are 36,812 (30.40%) transcripts annotated by all databases in the seven databases and 104,756 (86.50%) transcripts annotated by any of the seven databases. The homologous species of A. mongholicus were analyzed by aligning sequences to the NR database (Fig. 2a) and showed that 30.31%, 16.13%, 8.67%, and 8.01% were mapped to the genes of Cicer arietinum (Leguminosae), Medicago truncatula (Leguminosae), Trifolium subterraneum (Leguminosae), Glycine max (Leguminosae).
There are 74,953 transcripts annotated into the GO terms, divided into three major categories: molecular function, cellular component and biological process (Fig. 2b). The highest proportion of the three GO categories mentioned above were “cellular process”, “cell”, and “catalytic activity”, respectively. In total, 77,196 transcripts were assigned to the KOG, which were classified into 25 functional clusters (Fig. 2c). Among them, “general function prediction” was the most highly, followed by “signal transduction mechanisms”, “posttranslational modification, protein turnover, chaperones”. Furthermore, in the KEGG annotation, 78,294 transcripts from A. mongholicus were mapped in the KEGG pathway, which involved five functional categories (Cellular Processes, Environmental Information Processing, Genetic Information Processing, Metabolism, Organismal Systems) (Fig. 2d). The “global and overview maps”, “carbohydrate metabolism”, and “translation” were the most representative pathways. Among them, 1156 transcripts were enriched in “phenylpropanoid biosynthesis”, 398 transcripts were annotated to “flavonoid biosynthesis”, and 170 transcripts were assigned to “isoflavonoid biosynthesis”. More importantly, 479 transcripts were annotated as related to “terpenoid backbone biosynthesis”, 173 transcripts were assigned to “sesquiterpenoid and triterpenoid biosynthesis” in A. mongholicus transcriptome.
Identification of transcription factors
Transcription factors (TFs) play a transient spatially regulated role in plant development and response to stress. For TFs prediction, we annotated 2455 TFs belonging to 53 different TF families from the transcriptome dataset of A. mongholicus. Among them, the C3H (219 gens), GRAS (189 gens), AP2-EREBP (178 gens), MYB (177 gens), bHLH (176 gens) families had the large members (Table 4). It was reported that TFs play vital roles in regulating isoflavonoid and terpene biosynthesis in many plant species, such as MYB, bHLH, AP2-EREBP, bZIP families [23-27]. By removing the TFs, which expression levels among three organs extremely lowly (FPKM>1), we identified 72 MYBs, 53 bHLHs, 64 AP2-EREBPs and 11 bZIPs. Among the 72 members in the MYB family, more than half of MYB genes were mainly expressed in roots. For 53 bHLH genes, 18 bHLHs had the highest expression level in roots, 16 bHLHs in stem and 19 bHLHs in leaves. For 64 members of the AP2-EREBP family, almost all AP2-EREBPs have a high expression level in roots. In addition, we observed that 5 of 11 bZIP genes were enriched in roots.
Identification of organ-specific transcripts and differentially expressed genes
A total of 58,896 transcripts were found to be simultaneously expressed in roots, stems and leaves (Additional file 2: Figure S2a). In addition, 6087 transcripts exhibited organ-specific expression, with 2124, 1897, and 2066 transcripts specifically expressed in roots, stems and leaves, respectively.
The differentially expressed genes (DEGs) among different organs (AL vs. AS, AR vs. AL, and AR vs. AS) were investigated, and the results were displayed in Fig. 3a. A comparison between the AR vs. AL revealed that 25,970 were DEGs, with 12,799 genes up-regulated, and 13,171 genes were down-regulated (Table 5). Moreover, roots and leaves had the most specifically expressed differential genes (4765) (Fig. 3a), suggesting a larger biological difference between roots and leaves. In AL vs. AS, 19,923 DEGs were identified, with 12,313 genes were up-regulated, and 7610 genes were down-regulated. A total of 18,951 DEGs were identified in the AR vs. AS comparison. Among them, 12,829 genes were up-regulated, and 6122 genes were down-regulated. In all comparison groups, 6035 genes were differentially expressed, and revealing that these genes play an important role in the metabolism among different organs of A. mongholicus.
KEGG (Fig. 3) and GO (Additional file 2: Figure S2b-d.) enrichment analysis with the DEGs among different organs were performed to analyze the identified transcripts further. In the different comparisons, the widest metabolism class involved “carbohydrate metabolism”, “amino acid metabolism”, “energy metabolism”, “lipid metabolism”, “biosynthesis of other secondary metabolites”.
Identification of DEGs involved inisoflavonoid biosynthesis
Isoflavonoids are identified almost exclusively in leguminous plants. The isoflavonoid biosynthetic in A. mongholicus was characterized by other legumes (Fig. 4) [28-31]. Total of 53 candidate genes with a FPKM value of more than one that encoded nine enzymes involved in isoflavonoid biosynthesis was identified, and an intense focus was set on 44 DEGs in this study discovered (Additional file 3: Table S1).
Isoflavonoids are synthesized from L-phenylalanine through the central phenylpropanoid pathway. The upstream of the isoflavonoid synthesis pathway is the same as a flavonoid. This process of upstream metabolism involved 34 DEGs in A. mongholicus were identified, including 12 AmPALs, two AmC4Hs, three Am4CLs, five AmCHSs, and 12 AmCHIs (Fig. 4). Close to half of the AmPALs in roots and stems were higher than in leaves. AmPAL6 exhibited the highest expression in roots among the three organs. AmPAL10, AmPAL3, AmPAL11 showed higher expression in stems over roots and leaves. The expression level of AmPAL4, AmPAL1 and AmPAL7 in leaves were over the other two organs. AmC4H1 showed the highest expression in stems, and AmC4H2 was expressed predominantly in stems and rarely expressed in leaves. Am4CL1 and Am4CL2 had their highest expression in stems. Am4CL3 had a high level in leaves, which differed from other Am4CLs. The expression level of Am4CL1 in roots was higher than that of Am4CL2 and Am4CL3. AmCHS2 and AmCHS3 were mainly expressed in roots. Notably, the expression level of AmCHS2, AmCHS3 were higher than other CHSs in roots. AmCHS4 and AmCHS5 had their highest expression in stems, followed by leaves and roots. Among the 12 transcripts of AmCHIs, we found that more than half of them showed the highest expression level in roots. AmCHI1, AmCHI12, AmCHI6, AmCHI9 were expressed predominantly in stems. In general, the phenylpropanoid biosynthesis pathway and the flavonoid biosynthesis pathway have a higher relative gene expression level in roots and leaves
Isoflavonoid biosynthesis is a branch of flavonoid pathway. The enzymes involved in the isoflavonoid pathway of A. mongholicus were as follows: one AmIFSs, five AmHIDs, one Am4’-OMTs, three AmIF7GTs. As shown in Fig. 4, all the candidate transcripts were investigated in detail and displayed differential expression levels in roots, stems and leaves of A. mongholicus. It is noteworthy that the vast majority of the candidate transcripts related to isoflavonoid biosynthesis were more highly expressed in roots, over stems, and leaves, which were consistent with the contents of isoflavonoids in A. mongholicus. The cooperative relationship between these candidate transcripts and the accumulation of isoflavonoids implies that these genes have significant functions in isoflavonoids biosynthesis of A. mongholicus.
Analysis of DEGs in pathways related to triterpenoid saponins biosynthesis
A total of 58 candidate genes encoded 16 enzymes related to triterpenoid saponins biosynthesis were found by removing extremely lowly expressed genes with an FPKM value of less than 1. Among them, 44 candidate genes were considered as DEGs (Additional file 4: Table S2) and were the key candidates for discussion in our study.
The MVA pathway is the earliest and more traditional pathway for the synthetic biology of terpenoids. In the MVA pathway, 23 DEGs were identified (Fig. 5). The results showed that half of the candidate transcripts involved in MVA pathway displayed the highest expression levels in the roots of A. mongholicus. Based on our transcriptomic dataset, only one transcript for AmHMGCS, AmMVK, and AmPMK were identified as DEGs involved in the MVA pathway. Moreover, they demonstrated the same regularity among the three organs and were showed the highest expression in stems, followed by roots or leaves. Of the six AmAACTs, AmAACT6 and AmAACT4 displayed higher expression levels in roots than that in the other two organs. However, the expression of AmAACT3 and AmAACT1 in roots was higher than other AmAACTs, especially AmAACT3. These suggest that AmAACT3 also plays an important role in the MVA pathway of A. mongholicus.HMGR is a key enzyme in the MVA pathway, 11 AmHMGRs were identified as DEGs of A. mongholicus. Notably, six AmHMGRs expression levels of roots were higher than that of stems and leaves. Compared with other AmHMGRs, AmHMGR5 was most highly expressed in roots, implied that the transcript played a vital role in the MVA pathway in roots of A. mongholicus. As far as three AmMVDs expression patterns were found in different organs, all of them were the highest in roots, followed by stems and leaves.
In our transcriptomic dataset, a total of 13 DEGs were found to be involved in the MEP pathway of A. mongholicus, including three AmDXSs, two AmDXRs, two AmispDs, one AmispE, one AmgcpE, and four AmispHs. As shown in Fig. 5b, it was evident that almost all DEGs related to the MEP pathway were displayed the highest expression levels in leaves and least in roots. Both MEP and MVA pathways generate IPP and its isomer DMAPP, which are precursors of the production of terpenoids. Triterpenoids and sesquiterpenoids are synthesized by the MVA pathway, whereas monoterpenoids, diterpenoid, and tetraterpenoids are biosynthesized via the MEP pathway. In our study,the MVA pathway is mainly involved in the biosynthesis of triterpenoid saponins in A. mongholicus.
The biosynthesis of triterpenoid saponins from A. mongholicus mainly involves “terpenoid backbone biosynthesis” and “sesquiterpenoid and triterpenoid biosynthesis” pathways. Nine DEGs encoded for three enzymes were identified with regard to “sesquiterpenoid and triterpenoid biosynthesis” (Fig. 5d). AmFDPSs have two candidate transcripts, but only one gene was differentially expressed. AmFDPS showed similar expression among three organs, while the expression level in stems was highest. SQS and SQE are playing a pivotal role in the biosynthesis pathway of the carbon ring skeleton of triterpenoid saponins. Three AmSQEs exhibited higher expression in roots than in stems and leaves. The two AmSQSs, AmSQE1 and AmSQE2, had the highest expression levels in stems, over roots and leaves.
In order to verify the gene expression level, five genes were randomly selected for a real-time qPCR (RT-qPCR) experiment. The results showed that the trend of transcript changes in these five gene transcriptomes was good with the quantitative results of qPCR. This result proves the accuracy of the transcriptome data (Additional file 5: Figure S3).