Sequence analysis of the soybean GmMYB3a gene
The expression patterns of key genes involved in soybean isoflavone biosynthesis are governed by transcription factors (TFs). Using a rigorous evolutionary analysis comparing the MYB transcription factor families between soybean and Arabidopsis, we identified the MYB transcription factor family in soybean (Fig. 2A).
A detailed analysis of the expression profiles revealed that members of this family were abundantly expressed during the later stages of embryonic development, which aligns with the temporal pattern of isoflavone accumulation in soybeans. This concordance provided further evidence supporting an association between the MYB transcription factor family and isoflavone synthesis.
Within this family, we discovered an MYB protein encoded by GLYMA_20G013000, designated GmMYB3a. The open reading frame of GmMYB3a spanned 672 bp. Structural prediction of its amino acid sequence indicated the presence of two SANT domains (also known as MYB domains) located at amino acid positions 12–62 and 65–113, respectively (Fig. 2B). These domains are characteristic of the typical R2R3-MYB structures. Notably, GmMYB3a exhibited high homology with Arabidopsis AtMYB3, AtMYB4, and AtMYB7, suggesting conserved functional roles across species. Furthermore, a comprehensive phylogenetic analysis comparing GmMYB3a with previously identified MYB transcription factors involved in isoflavone biosynthesis in soybean revealed a strong phylogenetic relationship. This high degree of homology indicated a potential functional role for GmMYB3a in isoflavone biosynthesis.
Analysis of the expression pattern of the soybean GmMYB3a gene
We conducted subcellular localization experiments to investigate the functional aspects of the GmMYB3a gene. Leaves transformed with empty GFP vector exhibited intense green fluorescence across the nucleus, cytoplasm, and cell membrane. Conversely, leaves specifically transformed with GmMYB3a displayed prominent green fluorescence, primarily within the nucleus. These findings provided compelling evidence that GmMYB3a is primarily localized in the nucleus (Fig. 3C).
We performed qRT-PCR analysis across various soybean tissues (including the roots, stems, leaves, flowers, immature embryos, cotyledonary nodes, and hypocotyls) to comprehensively assess the expression patterns of GmMYB3a. GmMYB3a was expressed in all seven tissues examined, with a markedly higher expression level observed in immature embryos than in other tissues. Notably, the roots, flowers, and hypocotyls exhibited considerable expression, whereas the lowest expression levels were detected in the stems (Fig. 3D). Importantly, qRT-PCR data aligned with the expression profiles of related genes, thus corroborating the reliability of our observations and suggesting a potential association between GmMYB3a and isoflavone synthesis.
Generation and screening of GmMYB3a transgenic plants
We subjected the pTF101.1-GmMYB3a and RNAi-GmMYB3a constructs to glufosinate-ammonium treatment to validate the successful establishment of overexpressing and silenced plant lines. The leaves of the ‘Williams82’ soybean control changed from green to yellow, whereas the leaves of the GmMYB3a transgenic plants remained largely unchanged, owing to their herbicide resistance (Fig. 4D). Subsequently, we performed PCR analysis on the GmMYB3a T3 generation lines using Bar gene-specific primers. We observed a distinct bar gene band of approximately 552 bp in the overexpressing and silenced lines (Fig. 4A-B). Finally, bar test strip analysis confirmed the positive status of the three overexpressing (OE-3a-1, OE-3a-2, and OE-3a-3) and three silenced lines (RNAi-3a-1, RNAi-3a-2, and RNAi-3a-3), as evidenced by the presence of two distinct bands on the test strip (Fig. 4C).
GmMYB3a leads to increased isoflavone content in transgenic soybean
We conducted a rigorous analysis of the isoflavone content in the OE-GmMYB3a or RNAi-GmMYB3a transgenic soybean strains and compared it with that of a WT soybean strain throughout the T2, T3, and T4 generations. We identified significant differences in isoflavone content between the OE-3a and RNAi-3a lines compared with that of WT. Notably, OE-3a lines (including OE-3a-1, OE-3a-2, and OE-3a-3) exhibited approximately 1.27, 1.31, and 1.33-fold higher isoflavone contents, respectively, than that of the WT in the T2 generation. Conversely, RNAi-3a lines, specifically RNAi-3a-1, RNAi-3a-2, and RNAi-3a-3, displayed decreases of approximately 0.64, 0.69, and 0.69-fold compared with the WT level, respectively (Fig. 5A).
In the T3 generation, OE-3a lines (OE-3a-1, OE-3a-2, and OE-3a-3) exhibited a substantial increase in isoflavone content, reaching 1.76, 1.69, and 1.84-fold higher levels than that of the WT, respectively. In contrast, the RNAi-3a lines (RNAi-3a-1, RNAi-3a-2, and RNAi-3a-3) showed a significant decrease in isoflavone content, dropping to 0.71, 0.62, and 0.76-fold of the WT level, respectively (Fig. 5B).
A detailed breakdown of the isoflavone components revealed that the OE-GmMYB3a lines showed varying degrees of increase in daidzin, glycitin, genistin, and daidzein levels, whereas that of genistein was decreased. Conversely, RNAi-GmMYB3a lines exhibited decreased levels of daidzin, glycitin, genistin, and daidzein Content (Fig. 7A-D) and a corresponding increase in that of genistein Content (Fig. 7E). Based on these comprehensive findings, OE-3a-3 and RNAi-3a-3 cells were selected for further in-depth analysis.
Subsequent examination of the isoflavone content in embryos during various developmental stages revealed significant changes in the isoflavone components of OE-3a-3 and RNAi-3a-3 compared with those of the WT. Overall, the isoflavone content was generally increased with progressing developmental stages, particularly during the transition from R7 to R8, during which the rapid accumulation of each component was observed (Fig. 6). In the T4 generation, OE-3a lines (OE-3a-1, OE-3a-2, and OE-3a-3) displayed a significant increase in isoflavone content, reaching 1.34, 1.4, and 1.33-fold higher levels than that of the WT, respectively. Conversely, the RNAi-3a lines (RNAi-3a-1, RNAi-3a-2, and RNAi-3a-3) exhibited a significant decrease in isoflavone content, dropping to 0.76, 0.73, and 0.72-fold of the WT level, respectively (Fig. 5C).
Collectively, our comprehensive analysis spanning multiple generations provided compelling evidence that OE-3a lines exhibited markedly higher isoflavone content than the WT, whereas RNAi-3a lines displayed considerably lower levels. These findings underscored the pivotal role of GmMYB3a in the positive regulation of soybean isoflavone biosynthesis, thereby modulating the soybean isoflavone profile.
Statistics of differentially expressed genes in embryos of control and GmMYB3a overexpressing plants
Differentially expressed genes in embryos of control and OE-GmMYB3a plantsWe quantified the isoflavone content in soybean plants at the R5, R6, R6−, and R7 embryonic stages and in R8 mature seeds using our prior HPLC assessments. The transgenic and recipient plants exhibited a consistent trend of isoflavone accumulation, with concentrations gradually increasing as development progressed (Fig. 6). Notably, a marked acceleration in accumulation was observed from the R7 embryonic stage onward. We conducted RNA sequencing (RNA-seq) analysis on eight samples encompassing R7 embryos and R8 mature seeds of GmMYB3a-overexpressing soybean plants (OE) and their respective empty vector-transformed controls (CK) to elucidate the regulatory network underlying embryo development mediated by GmMYB3a overexpression in soybeans.
Our comprehensive analysis revealed intricate patterns of gene expression changes, providing insights into the molecular processes governed by GmMYB3a overexpression. We identified differentially expressed genes (DEGs) using stringent criteria, including a two-fold change in gene expression and a corrected p-value < 0.05. Specifically, we identified a total of 23 479 DEGs in OE samples from R7 embryos and R8 mature seeds, comprising 10 777 upregulated and 12 702 downregulated genes, respectively (Fig. 8C). Conversely, we observed 22 385 DEGs in the CK samples, with 10 100 being upregulated, whereas 12 285 were downregulated genes (Fig. 8D). Furthermore, comparative analysis between the OE and CK samples at the R7 embryo stage revealed 2638 DEGs, including 1752 upregulated and 886 downregulated genes (Fig. 8A). In R8 mature seeds, we detected 775 DEGs, with 326 upregulated and 449 downregulated genes (Fig. 8B).
We conducted a Gene Ontology (GO) enrichment analysis of DEGs using a significance threshold of padj < 0.05 to further elucidate the regulatory network mediated by GmMYB3a during soybean embryogenesis. Specifically, we analyzed DEGs from R7 embryos (P3_R7 vs. WT_R7) and R8 mature seeds (P3_R8 vs. WT_R8). In R7 embryos, photosynthesis was a biological process significantly enriched among DEGs. Cellular components were primarily associated with thylakoids and their parts, whereas the most prominent molecular function was related to calcium ion binding(Fig. 9A). Conversely, R8 mature seeds exhibited a distinct pattern with significant enrichment in cofactor metabolic processes. Regarding cellular components, DEGs were primarily found in photosystem I reaction centers, lipid droplets, monolayer-surrounded lipid storage bodies, chloroplasts, and plastids (Fig. 9B). Notably, DEGs were remarkable enriched in nutrient reservoir activity.
We performed Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis to investigate the metabolic processes involving DEGs in R7 embryos (P3_R7 vs. WT_R7) and R8 mature seeds (P3_R8 vs. WT_R8). Mature R7 and R8 seeds displayed significant enrichment in secondary metabolism. Specifically, R7 embryos encompassed three genes involved in phenylalanine metabolism; four genes pertaining to the biosynthesis of phenylalanine, tyrosine, and tryptophan; three genes participating in flavonoid biosynthesis; ten genes involved in phenylpropanoid biosynthesis; and two genes associated with isoflavonoid biosynthesis (Fig. 10A). In contrast, R8 mature seeds contained three genes involved in phenylalanine metabolism; six genes related to phenylalanine, tyrosine, and tryptophan biosynthesis; and three genes involved in flavonoid biosynthesis (Fig. 10B). These findings provided insights into the intricate regulatory mechanisms orchestrated by GmMYB3a during soybean embryogenesis.
Combined analysis of RNA-seq and isoflavone-targeted metabolism
We constructed a comprehensive molecular network diagram to elucidate the role of MYB transcription factors in the regulation of isoflavone biosynthesis by combining RNA-seq with targeted isoflavone metabolomic analyses. Furthermore, we conducted a heatmap analysis to visualize the differential expression patterns of genes associated with the isoflavone synthesis pathway, along with key enzyme genes involved in this process. We found that GmMYB3a may exert its regulatory influence on isoflavone content by modulating the expression of genes, including GmCHS7, GmCHS8, GmCHI1B2, GmCYP98A2, GmCYP93A1, GmIF7GT, GmIF7MaT, GmIFS1, GmIFS2, and GmHIDH (Fig. 11A-B).
GmMYB3a binds to the promoters of GmCHS7 and GmCHS8
Based on the comprehensive results of RNA-seq and targeted metabolomic analyses, we identified GmCHS7, GmCHS8, GmCHI1B2, GmCYP98A2, GmCYP93A1, GmIF7GT, GmIF7MaT, GmIFS1, GmIFS2, and GmHIDH as pivotal candidate genes. We performed comprehensive Y1H and LUC assays to gain a deeper understanding of the regulatory function of GmMYB3a in isoflavone biosynthesis. These assays were conducted by coexpressing GAD-GmMYB3a with fusion constructs encompassing GmCHS7-pro::LacZ, GmCHS8pro::LacZ, GmCHI1B2-pro::LacZ, GmCYP98A2-pro::LacZ, GmCYP93A1-pro::LacZ, GmIF7GT-pro::LacZ, GmIF7MaT-pro::LacZ, GmIFS1-pro::LacZ, GmIFS2-pro::LacZ, and GmHIDH-pro::LacZ in the yeast strain EGY48. Notably, yeast cells harboring GmMYB3a and GmCHS7 or GmCHS8 exhibited a distinct blue coloration upon screening on selective medium (SD/−Trp/−Ura) supplemented with X-gal, indicating a positive interaction. Conversely, the negative control remained unaltered in the selective medium (Fig. 12E). These findings conclusively demonstrated that GmMYB3a selectively interacted with the promoters of GmCHS7 and GmCHS8 (Fig. 12A-D) but did not bind to the promoters of GmCHI1B2, GmCYP98A2, GmCYP93A1, GmIF7GT, GmIF7MaT, GmIFS1, GmIFS2, or GmHIDH (Fig. 12E).
Next, we performed LUC experiments to determine that GmMYB3a activates the 1083bp region of GmCHS7-below (992bp-1984bp) and represses the promoter of GmCHS8 (Fig. 12F).
Statistics of differentially expressed genes in embryos of control and GmMYB3a overexpressing plants
Screening and validating the intercalating proteins of soybean GmMYB3a We validated the specific interactions between GmMYB3a and other TFs using a rigorous Y2H assay. We cloned the ORF sequence of GmMYB3a into the pGBKT7 vector and rigorously assess its self-activation activity. Notably, our findings unequivocally revealed that both constructs displayed significant self-activation capabilities, enabling yeast cells to proliferate on SD/-Trp and SD/-Trp/-Ade media, while also catalyzing X-α-gal degradation (Fig. 13A).
We conducted a hybridization screen using pGBKT7-GmMYB3a in a preconstructed soybean yeast nuclear library and identified 74 potentially interacting proteins associated with robust growth and pronounced blue coloration. After rigorous plasmid extraction, transformation into Escherichia coli, and PCR verification, we selected 19 candidates for subsequent sequencing analyses. Among them, we identified two distinct gene sequences: GmRPS6 (GLYMA_18G151800) and GmMAPK1 (GLYMA_11G150452).
To validate these interactions, we established well-defined experimental groups, including one positive (pGBKT7-53 + pGADT7-T), two negative (pGBKT7-Lam + pGADT7-T and pGBKT7 + the putative interacting protein), and an empty vector (pGBKT7) control, as well as an experimental group (pGBKT7-GmMYB3a + the putative interacting protein). These yeast cells were systematically cultured on SD-Trp/-Leu + X-α-Gal and SD-Trp/-Leu/-His/-Ade + X-α-Gal + AbA media. All control and experimental groups displayed robust growth on SD-Trp/-Leu + X-α-Gal media, with the positive control and experimental groups exhibiting blue colonies. In contrast, empty vector controls failed to proliferate (Fig. 13B). Notably, the positive control and experimental group exhibited robust growth and blue coloration on SD-Trp/-Leu/-His/-Ade + X-α-Gal + AbA media, whereas the second negative control displayed only weak growth (Fig. 13C). These findings confirmed the physical interactions between GmMYB3a and both GmRPS6 and GmMAPK1.
To further strengthen our findings, we employed a LIC assay that unequivocally demonstrated that GmRPS6 and GmMAPK1 physically interacted with GmMYB3a. This compelling evidence suggested that GmRPS6 and GmMAPK1 cooperate with GmMYB3a to regulate isoflavone biosynthesis in soybeans (Fig. 13D-E).