Identification and Classification of PtbHLH Proteins in P. thomsonii genome
A total of 219 putative PlbHLHs were defined from P. thomsonii based on the methods in “Materials and methods”. Sequence analysis indicated that the PlbHLH proteins varied from 91 amino acids to 799 amino acids, with molecular weight ranging from 10356.7 to 89787.63 kDa, the theoretical isoelectric point values ranging of 4.29-9.61. A total of 197 PlbHLH proteins localized in nucleus, while 14 PlbHLH proteins were located in chloroplast, one in endoplasmic reticulum (PlbHLH199), four in mitochondrion, two in plasma membrane (PlbHLH57/169), one in peroxisome (PlbHLH196) (Supplementary Table S2).
Phylogenetic Analysis
To explore the evolutionary relationship and the classification of PlbHLHs, a neighbor-joining phylogenetic tree was constructed with conserved sequences of 219 PlbHLHs and 167 AtbHLHs. Based on this phylogenetic analysis and the previous reported classification of the AtbHLHs, we classified the PlbHLH family with 219 members into 13 subfamilies, designated as I to XIII (Fig. 1). This result corroborated the previously proposed classification of the bHLH family members (Pires and Dolan 2010). As is shown in Fig. 1, group I was the largest subfamily with 36 PlbHLH members, followed by IV with 34 members. In contrast, Group II, the smallest, has only two members (PlbHLH76/93), followed by group VI with four PlbHLHs (PlbHLH22/23/49/50).
Chromosomal Distribution and Synteny Analyses of PlbHLH gene Family
Based on their chromosomal locations, PlbHLH proteins were assigned from PlbHLH1 to PlbHLH219 (Supplementary Table S2). All the PlbHLH genes were distributed on the 11 chromosomes (chr) unevenly. The number of PlbHLH genes on each chr varied from 12 to 44. Chr11 had only 12 PtbHLH genes, both of chr6 and chr9 had 15 PlbHLH members. In contrast, chr1 contain the largest number, 44, of PlbHLH genes followed by 23 on chr4. Most members are closely tandem distributed on chromosomes. In addition, PlbHLH219 was positioned near the telomeric regions on chr11 (Fig. 2). According to a previous study, if two or more genes are present within 200 kb, the elements are considered a tandem repeat event (Holub 2001). It is shown that 35 PlbHLHs were identified to be tandem duplicated genes which were distributed on each chr. The duplicated genes were PlbHLH4/5, 12/13, 18/19, 25/26, 29/30, 33/34/35, 36/37/38, 39/40, 47/48, 54/55, 59/60, 62/63, 67/68, 70/71, 84/85/86, 87/88, 90/91/92, 102/103, 105/106, 114/115/116, 121/122, 125/126, 136/137, 138/139, 144/145, 161/162, 163/164/165/166, 173/174, 175/176, 180/181, 182/183, 197/198, 202/203, 213/214, 216/217 (Fig. 2; Supplementary Table S2). Basic local alignment search tool for proteins (BLASTP) and MCScanX were performed to construct the collinearity of the PlbHLH gene family and identify the possible relationship and potential duplication events between them. Intrachromosomal duplications of the PlbHLH gene family were observed in the P. thomsonii genome (Supplementary Fig. S1). In detail, 30 pairs of PlbHLH genes duplicated tandemly on the 11 chromosomes (Supplementary Table S3). Together, the results suggested that both of tandem and segmental duplications are key factors driving bHLH gene family expansion in P. thomsonii (Guo et al. 2013).
To further illustrate the potential evolutionary patterns of the PlbHLH gene family, a comparative orthologous analysis was performed between P. thomsonii and the other two plant species, A. thaliana and soybean (Glycine max), which belong to the Brassicaceae and leguminosae families, respectively (Fig. 3). The orthologous gene pairs between P. thomsonii and A. thaliana and P. thomsonii and Glycine max were 153 and 584, respectively (Supplementary Tables S4 and S5). These results revealed that the identified orthologous events of PlbHLH-GmbHLH were greatly more than those of PlbHLH-AtbHLH based on the close evolutionary relationship between P. thomsonii and Glycine max. An extensive level of synteny conservation and increased number of orthologous events of PlbHLH-GmbHLH indicated that PlbHLH genes in P. thomsonii shared a similar structure and function with GmbHLH genes in Glycine max.
To further investigate the driving force behind the duplication of PlbHLH gene pairs in P. thomsonii, Ka/Ks (non-synonymous/synonymous substitution ratio) calculation of the duplicated bHLH gene pairs was performed to determine whether a selective pressure acted on the bHLH genes. Interestingly, all the Ka/Ks values of orthologous PlbHLH gene pairs were less than 1, indicating that these genes were subjected to purifying selection with limited functional divergence during evolution after duplication events (Supplementary Tables S4 and S5).
Conserved Motifs, Domain and Gene structure Characterization
To support the phylogenetic analysis, motifs of PlbHLH family members was investigated. The results showed that 20 motifs were found in the PlbHLH gene family (Fig. 4a). In details, motif 1 was presented in all the PlbHLH members. All the PlbHLH members contained motif 2 except PlbHLH192. We found that motif 11 was only presented in PlbHLH51/192/133, and PlbHLH161-166. Motif 2 and Motif 9 were repeated twice in PlbHLH136/137/190/191/215, and PlbHLH19/20/121/122, respectively. Motif 19 was only presented in PlbHLH54/55/156/185, and repeated twice in PlbHLH54/55. The members that phylogenetically clustered in the same subfamily often showed a similar motif pattern. For example, the closely clustered pair PlbHLH24/45/47/48/136/137/1901/191/167 in subfamily I had identical motifs 1-3, 5, 6 and 14. All the 12 members of group IV contained motifs 2, 13 and 15 (Supplementary Table S6). Noticeable, all the PlbHLHs contained the conserved HLH domain except PlbHLH1/2/18/29/35/36/37/39/52/53 (Fig.4b).
Exon-intron structural diversity, an important part in the evolution of gene families, provides additional evidence supporting phylogenetic groupings (Shiu and Bleecker 2003). In order to better understand the structural features of PlbHLHs, the intron structure was detected based on their evolutionary relationships. As shown in Fig. 4c, the number of introns in PlbHLH members varied from 0 (PlbHLH8/9/16/17/46/61/104/113/119/120/122/135/177/205) to 16 (PlbHLH137). Twenty PlbHLH genes contain one intron. PlbHLHs genes within the same subfamily appeared to share similar intron-exon gene structures. A total of 185 PlbHLH genes contain more than two introns. Nevertheless, the number of exons and introns varied greatly among other subgroups (Fig. 4c).
Tissue Expression Pattern of PlbHLH Genes
In order to investigate the possible function of PlbHLH genes, hierarchical cluster analysis was performed to assess the tissue-specific transcript abundance of each PlbHLH gene, represented by FPKM, using the expression patterns of 219 PlbHLH genes corresponding to root, stem and leaf of P. thomsonii (Supplementary Fig. S2). The results indicated that five PlbHLH genes (PlbHLH13/15/78/129/155) showed no expression in the three tissues, in particular, 20 PlbHLHs (PlbHLH3/16/17/18/40/46/48/59/77/86/93/125/131/138/139/162/163/174/177/195) showed high expression level in the three tissues. High expression in root and stem, but low expression in leaf tissue was prevalent among in 13 PlbHLHs (PlbHLH25/31/34/47/53/100/115/121/126/127/165/195/196). Four PlbHLHs (PlbHLH60/63/67/145) showed high expression level in stem (Supplementary Table S7). In order to identify the highest expressed PlbHLH genes which were deemed as the root-specific expressed, the transcriptional abundance of PlbHLH genes in root with more than twice than stem and leaf were selected, consequently, 41 PlbHLH genes were screened (Fig. 5; Supplementary Table S8).
Identification of PlbHLH Genes Related to Pueraria Biosynthesis
In P. thomsonii, pueraria was produced in root, and pueraria content were much higher in the root than in stem and leaf (Chen and Zhang 2001; Han et al. 2015). MeJA is a signaling molecule associated with biosynthesis of flavonoids (Shelton et al. 2012; Zhou and Memelink 2016; Du et al. 2021), for instance, the pueraria yield could be increased to 25 times by different concentration MeJA treatment (Goyal and Ramawat 2008). In view of this, MeJA treatment analysis was performed to narrow down the candidate structural genes. The RNA-seq result of Shang et al. (2022) and results of the qRT-PCR detection indicated PlCHS1 (Pmon008G00376), PlCHR3 (Pmon003G00389), PlIFS1 (Pmon003G02838), PlIFS2 (Pmon003G02830) and PlUGT155 (Pmon006G01850) were up-regulated expressed after treatment with MeJA within 24 h (Fig. 6).
The regulation of isoflavone is at the transcriptional level (Ferreyra and Rius 2012; Li 2014). The positive regulators often exhibit a similar expression pattern to structural genes in biosynthetic pathway (Zhang et al. 2008; Xiang et al. 2019; Zhu et al. 2019). Using qRT-PCR detection, we found that the expression of 19 PlbHLH genes were not influenced by MeJA treatment within 24 h (Supplementary Fig. S3). Compared with untreated root, nine PlbHLH genes were down-regulated in response to MeJA treatment at 12 h and 24 h (Supplementary Fig. S3). In contrast, seven PlbHLH genes (PlbHLH6/56/73/79/94/149/196), were up-regulated in response to MeJA within 24 h of treatment compared with the untreated root (Fig. 7). Importantly, the expression of PlCHS1, PlCHR3, PlIFS1, PlIFS2, and PlUGT155 was positively correlated with the expression of the seven PlbHLH genes. Together, the seven bHLH genes were the candidate genes which acts regulator of puerarin biosynthesis in P. thomsonii.