Phenotypic analysis and Segregation analysis
In 2017, the phenotypic data of 1168 individuals in the backcross population were obtained. The ratio of four-seeded pods to total pods ranged from 0 to 40.54%, with an average of 9.13%, and the standard deviation was 5.61%. The average ratio of four-seeded pods to total pods of SN14 (the recurrent parent) was 18.39% (the average value of 10 standard lines), with a standard deviation of 0.07%. Statistical analysis showed that the ratio of four-seeded pods to the number of pods per plant ranged from 35.00%~41.00%; there were 512 individuals whose ratio of four-seeded pods was within 8%~25% and 124 individuals without four-seeded pods (Table 1, Figure. 1A).
There were 113 individuals in the R1 fine mapping population (line 337; line 1160), and the ratio of the number of four-seeded pods to the number of pods per plant ranged from 0.00 to 37.84%, with a standard deviation of 8.38% (Figure. 1B). There were 137 individuals in the R2 population (line 18, line 108, line 554, line 614), and the ratio of the number of four-seeded pods to the number of pods per plant ranged from 0.00 to 43.59%, with a standard deviation of 7.77% (Figure. 1C). There were eighty-four individuals in the R3 population (line 271; line 345; line 712) and the ratio of the number of four-seeded pods to the number of pods per plant ranged from 0.00 to 57.89%, with a standard deviation of 8.34%. These results are consistent with a normal distribution and are suitable for phenotypic and genotype association analyses (Figure. 1D, Table 1).
Table 1
Statistics table of the phenotypes of the preliminary mapping population and secondary mapping population
Population name
|
Population
|
Quantity
|
Average (%)
|
SD
|
Variance
|
Skewness
|
Min (%)
|
Max (%)
|
CSSL860-13/SN14
|
F2
|
1168
|
8.15
|
6
|
36.07
|
1.00
|
0.00
|
40.54
|
R1
|
F3
|
113
|
14.04
|
8.38
|
70.25
|
0.32
|
0.00
|
37.84
|
R2
|
F3
|
137
|
10.91
|
7.77
|
60.43
|
0.85
|
0.00
|
43.59
|
R3
|
F3
|
84
|
14.47
|
8.34
|
69.56
|
1.64
|
0.00
|
57.89
|
a SD: Standard deviation of the phenotypic trait |
b Skewness: Skewness of the phenotypic trait |
c Min (cm): The minimum of the phenotypic data from the population d Max (cm): The maximum of the phenotypic data from the population
|
The results show that the optimal model of the F2 population was 2MG-AD, indicating that there are two major genes in the population. The variance of the main gene is 8.58, and the heritability of the main gene is 0.75; the Smirnov test (nW2) result was 0.00, and the probability of the Kolmogorov test (Dn) was 1. The optimal model of R1 was 2MG-A, indicating that there are two major genes in this population. The variance of the main gene was 57.44, and the heritability of the main gene is 0.81; the Smirnov test (nW2) result was 0.73, and the probability of the Kolmogorov test (Dn) was 1. The optimal model of R2 was 2MG-AD, indicating that there are two major genes in the population. The variance of the main gene is 37.31, and the heritability of the main gene is 0.61; the Smirnov test (nW2) result was 0.73, and the Kolmogorov test (Dn) was 1. The optimal model of R3 was 2MG-EEAD, indicating that there are two major genes in the population. The variance of the main gene is 47.51, and the heritability of the main gene is 0.01; the Smirnov test (nW2) result was 0.71, and the Kolmogorov test (Dn) was 1 (Table 2).
Table 2
Separation and analysis of the four-seeded-pod trait in the F2 population and F3 population
Population name
|
Population
|
Model
|
AIC
|
Major
Gene
Var
|
Heritability
(Major Gene)
|
P(nW2)
|
P(Dn)
|
CSSL860-13/SN14
|
F2
|
2MG-AD
|
5968.00
|
8.58
|
0.75
|
0.00
|
1
|
R1
|
F3
|
2MG-A
|
800.87
|
57.44
|
0.81
|
0.96
|
1
|
R2
|
F3
|
2MG-AD
|
954.53
|
37.31
|
0.61
|
0.73
|
1
|
R3
|
F3
|
2MG-EEAD
|
36.59
|
47.51
|
0.01
|
0.71
|
1
|
QTL analysis of the F2 population
The resequencing results of the population parents (CSSL860-13 and SN14) showed that there were heterozygous segments on Gm03, Gm06, Gm08 and Gm14. The heterozygous fragments are 0.26 Mb, 0.21 Mb, 4.28 Mb, 5.86 Mb, and 4.50 Mb, respectively. ZYD00006 homozygous fragments were present in Gm07 and Gm13, and the homozygous fragments were 1.28 Mb and 1.95 Mb, respectively (Supplementary Table A4, Figure. 2).
Sixty individuals (F2) were screened for genotypic analysis; among them, 30 individuals had a high ratio of four-seeded pods to the number of pods per plant (ranging from 20.00–40.54%), and another 30 individuals did not have four-seeded pods. Sixty-five pairs of SSR markers were screened from within the ZYD00006 homozygous fragment and the heterozygous fragment, and the intervals were approximately 0.13–0.41 Mb. SN14 and ZYD00006 were used for identifying polymorphic SSR markers, and 29 pairs of SSR markers with polymorphisms between the parents were identified. Sixteen pairs of SSR markers with polymorphisms in five heterozygous fragments, of Gm03 (3957733–4224480 bp), Gm06 (48413650–48625035 bp), Gm08 (3398336–7680873 bp), Gm08 (11853745–17718597 bp) and Gm14 (8151693–12658057 bp) were used to determine the genotype of 60 individuals (F2) (Figure. 3). The results showed that there was no significant correlation between the genotype of individuals with a high ratio and the genotype of the individuals with a low ratio. Single-marker analysis showed that the P values of all the 16 SSR markers were greater than 5%, which were not significant, and the possibility of the five heterozygous segments associated with the NFSP could be excluded (Table 3).
Table 3
Single-marker analysis of the F2 population
Linkage group
|
Marker
|
P value
|
Significance
|
Linkage group
|
Marker
|
P value
|
Significancea
|
Gm03
|
SSR_03_0222
|
0.504
|
——
|
Gm13
|
SSR_13_0853
|
1.000
|
——
|
Gm03
|
SSR_03_0235
|
0.778
|
——
|
Gm13
|
SSR_13_0854
|
0.318
|
——
|
Gm03
|
SSR_03_0239
|
1.000
|
——
|
Gm13
|
SSR_13_0864
|
1.000
|
——
|
Gm06
|
SSR_06_1716
|
0.369
|
——
|
Gm13
|
SSR_13_0867
|
1.000
|
——
|
Gm06
|
SSR_06_1718
|
0.320
|
——
|
Gm13
|
SSR_13_0868
|
0.307
|
——
|
Gm08
|
SSR_08_0363
|
0.816
|
——
|
Gm13
|
SSR_13_0877
|
1.000
|
——
|
Gm08
|
SSR_08_0398
|
0.491
|
——
|
Gm07
|
SSR_07_1183
|
0.788
|
——
|
Gm08
|
SSR_08_0617
|
0.215
|
——
|
Gm07
|
SSR_07_1212
|
1.000
|
——
|
Gm08
|
SSR_08_0652
|
0.413
|
——
|
Gm07
|
SSR_07_1224
|
0.000
|
***
|
Gm08
|
SSR_08_0758
|
1.000
|
——
|
Gm07
|
SSR_07_1248
|
0.000
|
***
|
Gm08
|
SSR_08_0881
|
1.000
|
——
|
Gm07
|
SSR_07_1254
|
0.000
|
***
|
Gm08
|
SSR_08_0895
|
0.337
|
——
|
Gm07
|
SSR_07_1257
|
0.000
|
***
|
Gm08
|
SSR_08_0923
|
0.567
|
——
|
Gm07
|
SSR_07_1260
|
0.135
|
——
|
Gm08
|
SSR_08_0941
|
1.000
|
——
|
Gm07
|
SSR_07_1262
|
1.000
|
——
|
Gm14
|
SSR_14_0603
|
0.803
|
——
|
Gm07
|
SSR_07_1358
|
1.000
|
——
|
Gm14
|
SSR_14_0616
|
0.465
|
——
|
|
|
|
|
a Significance is indicated by *, **, ***, and **** at the levels of 5%, 1%, 0.1%, and 0.01%, respectively. |
Six SSR markers with polymorphisms in the homozygous fragments of Gm13 (23349075-24544199 bp) were used to determine the genotype of 60 individuals (F2), and a genotypic map was created (Figure. 4). The results showed that there was no significant correlation between the genotype of the individuals with a high ratio and the genotype of the individuals with a low ratio. Single-marker analysis showed that the P values of all six SSR markers were greater than 5%, which was not significant, and the possibility that the Gm13 homozygous fragment (23349075-24544199 bp) was associated with the NFSP was excluded (Table 3).
Nine SSR markers with polymorphisms in the homozygous fragment of Gm07 (36116118-37399738 bp) were used to determine the genotype of 60 individuals (F2), and a genotypic map was created (Figure. 4). The results showed that there was a significant correlation between the genotype of the individuals with a high ratio and the genotype of the individuals with a low ratio. Single-marker analysis showed that four SSR markers, SSR-07-1224, SSR-07-1248, SSR-07-1254 and SSR-07-1257, have high significance. The interval coincides with Gm07 (36731771-37409044 bp), and the distance is 0.67 Mb. (Table 3).
QTL analysis of the F3 population
Seven SSR markers with polymorphisms, SSR-07-1224, SSR-07-1235, SSR-07-1248, SSR-07-1253, SSR-07-1254, SSR-07-1256 and SSR-07-1257, were screened for genotypic analysis of 13 subpopulations (F3). Single-marker analysis showed that SSR-07-1224, SSR-07-1235, SSR-07-1248, SSR-07-1253, and SSR-07-1254 are highly significant in the R1 (line 337, line 1160) subpopulation (Table 4), and the QTL interval was thus shortened to 656 kb.
Table 4
Single-marker analysis of the R1 population
Linkage group
|
Marker
|
P value
|
Significance
|
Gm07
|
SSR_07_1224
|
0.000
|
***
|
Gm07
|
SSR_07_1235
|
0.001
|
***
|
Gm07
|
SSR_07_1248
|
0.001
|
***
|
Gm07
|
SSR_07_1253
|
0.000
|
****
|
Gm07
|
SSR_07_1254
|
0.000
|
***
|
Gm07
|
SSR_07_1256
|
0.723
|
——
|
Gm07
|
SSR_07_1257
|
0.493
|
——
|
Based on the above mapping results, nine SSR markers with polymorphisms, SSR-07-1225, SSR-07-1226, SSR-07-1228, SSR-07-1233, SSR-07-1241, SSR-07-1242, SSR -07-1244, SSR-07-1246 and SSR-07-1247, were added, and a total of sixteen SSR markers were analyzed for the remaining 11 subpopulations. Among them, R2 (line 18, line 108, line 554, line 614) has significance in the interval from SSR-07-1224 to SSR-07-1248, and single-marker analysis showed that SSR-07-1224, SSR-07-1225, SSR-07-1226, SSR-07-1228, SSR-07-1233, SSR-07-1235, SSR-07-1248 were highly significantly (Table 5). The QTL interval was thus shortened to 546 kb.
Table 5
Single-marker analysis of the R2 population
Linkage group
|
Marker
|
P value
|
Significance
|
Gm07
|
SSR_07_1224
|
0.000
|
***
|
Gm07
|
SSR_07_1225
|
0.000
|
****
|
Gm07
|
SSR_07_1226
|
0.000
|
****
|
Gm07
|
SSR_07_1228
|
0.000
|
****
|
Gm07
|
SSR_07_1233
|
0.000
|
****
|
Gm07
|
SSR_07_1235
|
0.000
|
****
|
Gm07
|
SSR_07_1241
|
0.000
|
****
|
Gm07
|
SSR_07_1242
|
0.000
|
****
|
Gm07
|
SSR_07_1244
|
0.000
|
***
|
Gm07
|
SSR_07_1246
|
0.000
|
***
|
Gm07
|
SSR_07_1247
|
0.000
|
***
|
Gm07
|
SSR_07_1248
|
0.000
|
****
|
Gm07
|
SSR_07_1253
|
0.085
|
——
|
Gm07
|
SSR_07_1254
|
0.409
|
——
|
Gm07
|
SSR_07_1256
|
0.061
|
——
|
Gm07
|
SSR_07_1257
|
0.355
|
——
|
To shorten the QTL interval further, the remaining population was analyzed. The results showed that R3 (line 271, line 345, line 712) has a significant mapping interval (SSR-07-1224~SSR-07-1235). Single-marker analysis showed that SSR-07-1224, SSR-07-1225, SSR-07-1226, SSR-07-1228, SSR-07-1233, and SSR-07-1235 were significant (Table 6), and the QTL interval was therefore shortened to 321 kb. Based on the above results, as shown in Figure. 5, the QTL associated with the number of four-seeded pods located on Gm07 ranged from 36731771 bp to 37052836 bp, and the QTL (QNFSP07-1) interval was delimited to 321 kb.
Table 6
Single marker analysis of the R3 population
Linkage group
|
Marker
|
P value
|
Significance
|
Gm07
|
SSR_07_1224
|
0.000
|
****
|
Gm07
|
SSR_07_1225
|
0.034
|
*
|
Gm07
|
SSR_07_1226
|
0.000
|
****
|
Gm07
|
SSR_07_1228
|
0.000
|
****
|
Gm07
|
SSR_07_1233
|
0.000
|
***
|
Gm07
|
SSR_07_1235
|
0.025
|
*
|
Gm07
|
SSR_07_1241
|
1.000
|
——
|
Gm07
|
SSR_07_1242
|
1.000
|
——
|
Gm07
|
SSR_07_1244
|
1.000
|
——
|
Gm07
|
SSR_07_1246
|
1.000
|
——
|
Gm07
|
SSR_07_1247
|
1.000
|
——
|
Gm07
|
SSR_07_1248
|
1.000
|
——
|
Gm07
|
SSR_07_1253
|
1.000
|
——
|
Gm07
|
SSR_07_1254
|
0.150
|
——
|
Gm07
|
SSR_07_1256
|
1.000
|
——
|
Gm07
|
SSR_07_1257
|
1.000
|
——
|
Candidate gene mining in QNFSP07-1 for soybean NFSP
The marker sequences of QNFSP07-1 were obtained using SoyBase (http://www.soybase.org/), and candidate genes were detected and annotated based on a BLAST search of publicly available resources, and 27 genes were identified (Supplementary Table A5). Fourteen genes had amino acid mutations according to the SNP sequences of the parental SN14 and ZYD00006 sequences via resequencing. Glyma.07G200400 and Glyma.07G201000 were not expressed. Glyma.07G200200 encodes an HSP20-like protein of the chaperone superfamily, whose members are involved in the drought stress response (Gugger et al., 2017). Glyma.07G201500 contains the DUF679 protein domain and decreases downstream gene expression during flower senescence (Gao et al., 2018). Glyma.07G199700 contains the ERF protein 9 domain, and the amino acids valine and glutamate, which are retained in the ERF protein domain, play an important role in drought and low temperature stress responses (Sakuma et al., 2002). Glyma.07G199800 contains a MAC/perforin domain, which directly interacts with the cell membrane of migrating neurons (Gorbushin et al., 2016). Glyma.07G200800 contains a zf-Di19 superfamily domain, which is related to the plant drought stress response (Lu et al., 2007). Glyma.07G201300 was annotated as being associated with the glucose-6-phosphate/phosphoric acid transporter; the homologous gene in Arabidopsis is AT5G46110, which is downregulated gene in response to VVZFPL overexpression, and overexpression of VVZFPL affects plant photosynthesis and slows plant growth (Kobayashi et al., 2012). Glyma.07G200900 was annotated as an OB-folding protein that according to KEGG pathway analysis (pathway_id k07466); this protein is related to litchi flowering defects (Lu et al., 2014). Glyma.07G201200 was annotated as homeobox 1, and the homologous gene in Arabidopsis is AT1G28420, which is a membrane-tethered transcription factor (MTTF) belonging to the bZIP family (Chen et al., 2008). Previous studies have shown that members of the bZIP gene family are TFs (transcription factors); bZIPs are ubiquitous within the genome of green plants, and they affect organ and tissue differentiation and regulate cell elongation (Smitha et al., 2017; Corrêa et al., 2008). Glyma.07G201400 was annotated as a cysteine-tRNA synthase, and the homologous gene in Arabidopsis is AT2G31170, which is a regulatory gene encoding aminoacyl-tRNA synthase. Studies have shown that interruption of chloroplast translation affects gametophytes in the seed during embryogenesis, and interruption of mitochondrial translation leads to ovule abortion and cessation of male-female gametogenesis (Berg et al., 2005). Glyma.07G201600 was annotated as encoding the HAESA-like protein 1, and the homologous gene in Arabidopsis is AT1G28440. The HAESA coreceptor SERK1 is a positive regulator of the flower abscission pathway, which allows high-affinity sensing of peptide hormones by binding to the Arg-His-Asn motif in IDA. This sequence pattern is conserved among various plant peptides, indicating that plant peptide hormone receptors share common ligand binding patterns and activation mechanisms. Leucine-rich repeat receptor kinases (LRR-RKs), HAESAs and the peptide hormone IDA control flower abscission (Santiago et al., 2016). Combinations of the results of previous studies have shown that the development of ovules, embryos, pollen, stamens and flowers has a great influence on the number of pods and seeds (Kurdyukov et al., 2014; Jiang et al., 2015). Glyma.07G200900 and Glyma.07G201200 have higher expression in pod (Figure. 6A), and Glyma.07G200900 and Glyma.07G201200 have a positive regulatory effect in the Hr stage (Figure. 6B). Glyma.07G200900, Glyma.07G201200 were selected as candidate genes in this study, and additional studies were carried out.
Candidate gene expression analysis
The expression of two candidate genes, Glyma.07G200900, Glyma.07G201200 at different growth and development stages of soybean (florescence, axillary bud, COT, EM, MM, and LM stages) was measured via qRT-PCR. The selected materials included line 953 (high ratio of four-seeded pods to the number of pods per plant), line 32 (low ratio of four-seeded pods to the number of pods per plant) and SN14 (control). The results showed that the expression of the two candidate genes in the cell division stage (axillary bud, COT, and EM stages) was higher than that in the cell expansion stage (MM and LM stages). The highest expression occurred in the EM stage, followed by the COT stage, axillary bud stage, florescence stage and MM stage. The lowest expression occurred in the LM stage, indicating that the expression decreased with the development of organs (Figure. 7). The candidate genes may affect cell division rather than cell expansion and affect ovule number by regulating cell division, which is related to the formation of four-seeded pods in soybean. The expression of the two candidate genes in the three experimental materials was the highest in the EM stage and lowest in the LM stage. The expression was associated with a high ratio of the number four-seeded pods to the number of pods per plant (higher than that of SN14) and a low ratio of the number four-seeded pods to the number of pods per plant, indicating that these two candidate genes have a positive regulatory effect on the NFSP of soybean. Therefore, Glyma.07G200900, Glyma.07G201200 can be used as candidate genes for the number of four-seeded pods of soybean, laying a foundation for future gene cloning and functional verification.
Haplotype analysis of candidate genes
253 germplasm resources (Supplementary Table A6) were used for haplotype analysis, which differed significantly in their NFSP and met the requirements for haplotype analysis of the candidate genes. Whole-genome resequencing of the ninety-two germplasm resources was carried out in 2019, and the phenotype statistics are shown in Supplementary Table A7.
We selected the full-length sequences of four candidate genes and their upstream 3000 bp promoter region sequence and 2000 bp downstream in the Phytozome database as analysis objects, and DNAsp 5.0 was used to detect haplotype polymorphisms. The results showed that Glyma.07G201200 has 2 excellent haplotypes, while Glyma.07G200900 has only one excellent haplotype in the germplasm resource. ANOVA was used to analyze the differences between the phenotypes of the excellent haplotypes, significant differences in NFSP was showed only one excellent haplotype of Glyma.07G200900 (Supplementary Table A8).
Glyma.07G201200 has two excellent haplotypes: HA-1and HA-2. There are nine SNP differences between HA-2, the promoter region sequence (V186, V581, V2356, V2625, V2656, V2713, V2720, V2787, V2940), four SNP differences in the CDS region (V8713, V10660, V12131, V12448), and one difference in the genomic intron region (V13150). The change in SNP V8713 resulted in an amino acid change (D>E) and the change in SNP V12131 resulted in an amino acid change (K>Q), potentially leading to changes in the structure and function of the gene (Figure. 8A). There are multiple SNP differences between HA-1 and HA-2, of which SNPV2713 lead to EECCRCAH1 element according to PLACE. With respect to NFSP based on phenotypic values and haplotypes, ANOVA showed that there were significant differences between HA-1 and HA-2, and HA-2 was the haplotype with a high number of four-seeded pods. HA-1 was the haplotype with a low number of four-seeded pods, with P values of 0.046 (Figure. 8B.)