Phenotype variation in parents and the RIL population
The trait of flowering time of each RIL population individual and two parents were all recorded in two years (Harbin in 2018 and 2019) (Supplementary Fig. 1 and Supplementary Table 1). As shown in Supplementary Fig. 1, a transgressive segregation was observed in this population, which indicates the polygenic control of the soybean flowering time. Correlation analysis was conducted on the two-year data of all traits recorded of the RIL population, the result shows that R1 were significantly correlated in two years. The absolute value of skewness of the mean value of R1 trait in the RIL population across each year was <1, indicating an approximately normal distribution. Therefore, the R1 data of these two years were trustingly used to detect QTLs.
Analysis of sequencing data and construction of a genetic linkage map
Both of the parental cultivar SN4 and ZK168 were resequenced at a higher coverage level of 14,7X and 8.1X individually, to detect SNP markers and call variations between parents. For SN4 and ZK168, a total of 16,182,987,000 and 8,878,524,900 bases were identified, individually. The Q30 ratio of SN4 and ZK168 was 94.66% and 91.35% while the GC content of SN4 and ZK168 was 40.41% and 35.85%, respectively (Supplementary Table 2). Finally, a total of 2942 polymorphic SNP markers were obtained and used in linkage map construction of 20 Chromosomes (Supplementary Figure 2). The genetic distance between the markers was estimated in cM using the QTL IciMapping software (Supplementary Table 3).
QTL mapping of flowering time
We proceeded QTL identification with the two-year phenotypic data of the RIL population. The threshold of the LOD scores for evaluating the statistical significance of QTL effects is 3.43 and 3.56 of two year, respectively. Three QTLs were detected by the ICIM method named as qR1-L, qR1-G and qR1-B2 (Fig.1 and Supplementary Table 4). Within these QTLs, qR1-L was located on chromosome 19, physical length from 45,564,991 to 48,271,467 of the reference genome Wm82.a2.v1. The LOD scores of qR1-L were 6.72 and 4.10 for each year, could explain 12.1–17.36% of the observed PV (phenotype contribution). qR1-G was located on chromosome 18 with physical length from 56,210,047 to 57,296,740. The LOD scores of qR1-G ranged from 2.78 to 3.65, could explain 4.6% –8.52% of the observed PV. qR1-B2 was located on chromosome 14, physical length from 34,291,536 to 39,844,217. The LOD scores of qR1-B2 ranged from 3.57 to 4.39. The LOD of qR1-L and qR1-B2 were higher than the LOD threshold in both two year whereas the LOD of qR1-G was only greater than the LOD threshold in 2019. The LOD value of qR1-G in 2018 was 2.78, not exceeds the threshold, but due to the LOD value was higher than 2.5, we also considered it as a credible QTL. In addition, two major QTLs of R1 trait with the max LOD score of 6.12 and 8.57, respectively, on chromosome 4 were also detected, since these QTLs only could be detected in 2019 but could not detected in 2018, we considered them as unstable QTLs without further analyze the candidate genes of them (Supplementary Table 5).
Candidate Gene Prediction of qR1-G
qR1-G within the physical interval of 56,210,047 - 57,296,740 on chromosome 18. The position interval of the QTL involved 83 genes could be functionally annotated by GO analysis (Supplementary Table 6). Among them, 49, 18 and 79 were functionally annotated to the categories of cellular components, molecular functions and biological processes, individually (Supplementary Figure 3). A single gene Glyma.18G281400 was found to be related with transcription regulation activity. In addition, we found that the interval includes 15 genes with nonsynonymous mutations between parents based on the resequencing data (Supplementary Table 7). All the 15 genes functional annotations were shown on Supplementary Table 8. Especially, Glyma.18G281400 not only has variation between the parents but also the single gene related to transcription regulation activity.
Glyma.18G281400 is an AP2-EREBP (ethylene-responsive element-binding protein) transcription factor. EREBPs with the AP2 domain play a role in the regulation of plant development (Zhao et al., 2006; Wang et al., 2014; Kuluev et al., 2015). Previous research has found that two AP2 domain-encoding genes, SCHLAFMÜTZE and SCHNARCHZAPFEN, repress flowering in Arabidopsis thaliana (Schmid et al., 2003). Therefore, Glyma.18G281400 was considered as a candidate gene for qR1-G and was conducted the correlation analysis of flowering time. At first, we analyze the variation in the Glyma.18G281400 coding sequence using the 1295-accession panel, then defined seven haplotypes with three disparate nonsynonymous variation alleles. However, the variation existence in two parents of Glyma.18G281400 was rare in the natural variations (Fig. 2a and Supplementary Table 7), while, H3 (Haplotype 3) with nine bases inserted after the 665th bp and a SNP (Single nucleotide polymorphism) at the 430th bp of coding region was associated with flowering time (Fig. 2). The 424-accession panel flowering time analysis between H3 and H7 showed no significant difference in Guangzhou 2019, while there were significant differences in the other regions (Harbin 2019, Wuhan 2019, Zhengzhou 2019, Zhengzhou 2018, Hefei 2018) (Fig. 2b-g). This suggests that Glyma.18G281400 may be associated with flowering time in the high latitude and middle latitude rather than the low latitude regions in China.
Candidate Gene Prediction of qR1-B2
qR1-B2 within the physical interval of 34,291,536 - 39,844,217 was located on chromosome 14. We compared coding sequence between the parents in the QTL interval and found 23 genes have differences on the exons (Supplementary Table 9). Among them, 20 genes were annotated genes, and GO analysis was carried out of these genes (supplementary table 10). 5 were functionally annotated to the categories of cellular components, 18 were functionally annotated to molecular functions and 14 were functionally annotated to biological processes (Supplementary Figure 4). Three genes of them were related to transcription regulation activity, namely, Glyma.14G159400, Glyma.14G160600 and Glyma.14G161900. Among these genes, Glyma.14G159400 is a transcription factor encoding a jumonji (jmjC) domain-containing protein. Previous report has elaborated that a jmjc domain-containing protein encoding gene, Se14, plays a key role under long-day suppression of flowering in rice (Takayuki et al., 2014). Therefore, we presumed the Glyma.14G159400 as a candidate gene of qR1-B2.
In order to verify whether the gene Glyma.14G159400 is responsible for qR1-B2, we conducted a correlation analysis between the genotypes and flowering time phenotypes. There are nine haplotypes in Glyma.14G159400 of soybean 1295-accession panel, and we found that the allele of Glyma.14G159400 from SN4 (3206th bp-T) was also consist in generally natural varieties (Fig. 3a). The origins of haplotypes of Glyma.14G159400 were analyzed, the most common haplotype H9 (3206th bp-C type) was possibly originated from H6 (3206th bp-T type), and the proportion of wild was greatly reduced in the transform from H6 to H9. Then the wild type fade away in the process of further differentiation from H9 (Fig. 3b). Additionally, in all the six locations, 3206th bp-T type subgroup showed significantly later flowering time than 3206th bp-C type (Fig. 3c-3h). These indicate that the mutation in codon 3206th bp of Glyma.14G159400 could lead to a variation of flowering time in soybean and implying a key role of Glyma.14G159400 in control of flowering time across diverse genetic backgrounds and environmental conditions.
Candidate Gene Prediction of qR1-L
qR1-L was detected within the physical interval of 45,564,991 - 48,271,467 on chromosome 19. The well-known flowering-related gene E3 was located in this range, but sequence comparison showed that there was no difference in the E3 gene between two parents. Meanwhile, the growth period gene Dt1, which has been reported to regulate flowering time in soybean recently (Yue et al., 2021) was located closely to the interval, but no sequence difference was found between the parents. There may be a new gene regulates the variation of flowering time for qR1-L. Analysis of resequencing data revealed that, there were 131 genes with non-synonymous mutations or frameshift variations between two parental cultivars (Supplementary Table 11). Among them, 87 genes have been annotated by GO (Supplementary Table 12). The result of GO analysis showed that, 24 were functionally annotated to the categories of cellular components, 78 were functionally annotated to molecular functions and 47 were functionally annotated to biological processes (Supplementary Fig. 5). Thereinto, Glyma.19G200800, named GmNF-YA21, was the only one gene related to transcription regulation activity by GO analysis. GmNF-YA21 is a homologous gene of Arabidopsis NF-YA10, previous reports have demonstrated that NF-YA can regulate flowering time in Arabidopsis (Wenkel et al., 2006; Xu et al., 2014; Siriwardana et al., 2016). Consequently, we presumed the GmNF-YA21 as the candidate gene of qR1-L.
Analysis of variation in the GmNF-YA21 coding sequence using the 1295-accession panel defined eight GmNF-YA21 haplotypes, including five distinct nonsynonymous variation alleles (Fig. 4a). In the eight haplotypes, the non-synonymous mutation caused by the G-T mutation at the 202nd base was consisted with the parent variation of the population. Thereafter, we compared the GmNF-YA21 proteins to find it homologous genes in soybean, Arabidopsis thaliana (https://phytozome.jgi.doe.gov/pz/portal.html) as well as several legumes (https://www.legumeinfo.org), and further reproduced the phylogenetic tree (Supplementary Fig. 6). Four GmNF-YA10 genes were clustered into the clade, which was further subdivided into two groups (Supplementary Fig. 6a). The nonsynonymous variation (the 68th Amino acid changes from Ala to Ser) caused by the 202nd bp mutation was compared in the GmNF-YA21 homologous genes, and the result shows that the locus was all S (Ser) in the homologous genes of different legumes, soybean and Arabidopsis thaliana while A (Ala) merely in the GmNF-YA21 (Supplementary Fig. 6b). Combined with the results of protein homologous alignment and haplotypes origin analysis, we found that all the haplotypes originated from the H2 by a SNP and the common H8 was differentiated from H7 (Supplementary Fig. 6b and Fig. 4b). Subsequently, we examined the variations of GmNF-YA21 associated with flowering time in soybean 424-accession panel at six field sites in different latitudes in China (Lu et al., 2020). Significant associations were identified between the 202nd base allele variation in the coding region in flowering time (Fig. 4d-j). The result shows that GmNF-YA21 is related to flowering time and the 202nd-G allele leads to early flowering while the 202nd-T allele contributes to late flowering phenotype. Given that GmNF-YA10 is presumed to be a flowering related gene.
We next examined how the distributions of the major GmNF-YA10 alleles within the subset of Chinese accessions in different geographic latitudes. Within both landraces and improved cultivars, the proportion of the early flowering allele 202nd-G gradually increased from low latitude to high latitude (Fig. 4c). We further conducted analysis for the 202nd-G type and 202nd-T type in wilds, landraces as well as cultivars using the 1295-accession panel data. The result shows that early flowering allele 202nd-G gradually increased from wild to improve cultivars, indicating strongly favored in landraces and subsequently widely utilized in modern breeding (Fig. 4d). Taken together, these results mainly suggested that GmNF-YA10 is definitely a latitudinal adaption gene, and the early flowering allele (202nd-G allele) of GmNF-YA21 is already been used in north region breeding.
Previous study has identified that NF-YA can directly bind the distal CCAAT box in the FT promoter and are positive regulators of flowering in Arabidopsis thaliana (Siriwardana et al., 2016). In order to further verify the gene function of GmNF-YA21 in soybean, we performed transient transfection assays to experimental verify the relationship between GmNF-YA21 and GmFT2a/GmFT5a. The result shows that the LUC activity driven by GmFT2a/GmFT5a promoter was suppressed by both GmNF-YA21-SN4 type and GmNF-YA21-ZK168 type, indicating that GmNF-YA21 possesses the ability to repress GmFT2a/GmFT5a expression (Fig. 5). Even though both SN4 and ZK168 type can inhibit the expression of GmFT2a/GmFT5a, there are significant differences in the inhibition degree of GmFT2a/GmFT5a between the two types (Fig 5b and Fig.5d). The inhibition effect of SN4-type was stronger than that of ZK168-type, which was also consistent with 424 accessions flowering time association analysis (Fig. 5 and Fig. 4e-4j). The result suggested that, although a fine-mapping is still need, GmNF-YA21 is the most likely candidate gene responsible for qR1-L. Furthermore, GmNF-YA21 affects flowering by inhibiting the expressing of two florigen genes, GmFT2a and GmFT5a.