Natural variation in KRN within the association panel
KRN was measured within our association panel, which included 639 maize inbred lines, in XX, BJ and GZL in 2011. The results showed that KRN was normally distributed under each environment, and the KRNs among environments were highly positively correlated, ranging from 0.73 between XX and BJ to 0.79 between XX and GZL (Figure 1a). The KRN exhibited high broad-sense heritability (H2 = 0.90, Table 1), which was similar to the results of previous studies [14, 16]. Comparing the KRN among the different environments, we found that the KRN showed the smallest average (13.69), minimum (8.60) and maximum (20.60) values in XX, where all accessions were summer-planted in June. With increasing latitude, where the accessions were spring-planted in May, the average KRN increased (14.65 in BJ and 14.59 in GZL). The largest range (max - min) in KRN appeared in GZL (12.60), which had the most hours of daylight (Table 1). Based on the previous results, our association panel could be divided into five subgroups: Reid, tangsipingtou (TSPT), lvdahonggu (LRC), Lancaster and P. The KRN statistical analysis results of various subgroups are shown in Table S1. There was no significant difference in KRN among the five subgroups (Figure 1b). We found that the Reid subgroup had the highest value (14.65) for KRN, and the Lancaster subgroup had the lowest value (14.26) for KRN. The LRC subgroup showed the largest variation, ranging from 8.8 to 19.51 (Table S1). These results indicated that KRN was a quantitative trait and that the phenotypic variation among the tested inbred lines in the association panel was beneficial for dissecting the genetic architecture of KRN.
Table 1. Phenotypic variance in KRN for 639 maize inbred lines in three environments.
Env
|
Mean
|
Min
|
Max
|
SD
|
CV (%)
|
H2
|
|
|
XX
|
13.69
|
8.60
|
20.60
|
2.02
|
14.76
|
0.90
|
|
BJ
|
14.65
|
9.20
|
21.00
|
1.69
|
11.56
|
|
GZL
|
14.59
|
8.60
|
21.20
|
2.00
|
13.69
|
|
BLUP
|
14.31
|
9.17
|
20.01
|
1.61
|
11.27
|
|
|
Env, Environment; XX, Xinxiang; BJ, Beijing; GZL, Gongzhuling; Max, maximum; Min, minimum; SD, standard deviation; CV, coefficient of variation; H2, broad-sense heritability.
QTNs for KRN identified by different methods
Single-locus analysis for KRN (MLM)
Based on the MaizeSNP50 BeadChip, we obtained 42,667 high-quality SNPs distributed on 10 maize chromosomes. Under the P < 0.0001 and P < 0.001 thresholds, 3/55, 3/45, 1/24, and 3/51 KRN-associated QTNs were found in XX (Figure 2a), BJ (Figure 2b), GZL (Figure 2c) and BLUP (Figure 2d), respectively. To account for overcorrection in this model, the P < 0.001 thresholds were selected to identify KRN-associated QTNs. Finally, 177 QTNs were found to be associated with KRN, and the proportion of phenotypic variance explained (PVE) by these individual QTNs ranged from 1.84 to 4.01% (Table S2).
Multiple-locus analysis for KRN
Using different multiple-locus models, we identified different numbers of significant QTNs for KRN in XX, BJ, GZL and together with BLUP across all locations. These QTNs were unevenly distributed on 10 chromosomes, with the most QTNs on Chr. 1 and the fewest on Chr. 8. Specifically, 15 (FASTmrEMMA)-177 (mrMLM) QTNs for XX, 11 (FASTmrEMMA)-30 (ISIS EM-BLASSO) QTNs for BJ, 12 (FASTmrEMMA)-55 (mrMLM) QTNs for GZL and 11 (FASTmrEMMA)-106 (mrMLM) QTNs for BLUP were identified by the six different methods (Table S3). Comparative analysis of the GWAS results among different statistical approaches showed that FASTmrEMMA detected the fewest QTNs in all the environments, while mrMLM detected the most QTNs in all the environments, except for BJ (Table S3). The QTN overlap analysis among the seven methods indicated that the majority of the association QTNs were specifically identified by only one method, followed by the QTNs codetected by two and three methods (Figure S1a and Table S4). Overall, ISIS EM-BLASSO, which detected the third largest number of QTNs, identified the most codetected QTNs, followed by FASTmrMLM (Table S4). The comparative analysis of the GWAS results among the different environments showed that the majority of the significant QTNs were specifically identified in only one environment (Figure S1b). The low repeatability of associated QTNs among various environments suggested that the genetic loci controlling KRN were environmentally sensitive, although the KRN was similar and consistently showed high heritability across all locations.
Annotation and expression of candidate genes for KRN
To obtain reliable significant QTNs and predict the candidate genes for KRN, only the QTNs simultaneously identified by at least three methods (either single-locus or multi-locus) and in at least two environments were used for the next analysis. Finally, seven QTNs controlling KRN were obtained (Table 2). The seven QTNs were located on chromosomes 1, 2, 3, 5, 9, and 10, and the PVE by these QTNs ranged from 1.06% to 5.21%. Based on the LD of the association panel (Figure S2), 49 genes around the QTNs (200 kb upstream and downstream) were obtained, and their expression varied widely in different maize tissues (Figure 3a and Table S5). For example, Zm00001d016760, which encodes the abscisic acid stress ripening6 protein, is highly expressed in the roots, Zm00001d031426, which encodes serine/threonine-protein kinase, and Zm00001d043298, which encodes a P-loop containing nucleoside triphosphate hydrolase superfamily protein, are highly expressed in tassels and anthers. Among the 49 genes, 22 genes were differentially expressed in different spike development mutants (Table S6), which suggested that these 22 genes might be involved in ear development in maize.
Table 2. Significant KRN-associated QTNs codetected in at least two environments and by at least three models.
SNP
|
Chr
|
Pos
|
Single-locus GWAS (MLM)
|
|
Multi-locus GWAS
|
LOD
|
PVE (%)
|
|
LOD
|
PVE (%)
|
Methods1
|
PZE-101124566
|
1
|
156580056
|
3.44
|
3.00
|
|
4.60-11.63
|
1.91-3.02
|
1, 2, 3, 4, 5, 6, 7
|
PZE-101144585
|
1
|
187526525
|
3.13
|
2.00
|
|
4.39-5.95
|
1.84-3.51
|
1, 3, 4, 5, 7
|
PZE-102176259
|
2
|
219023013
|
3.32
|
3.00
|
|
3.41-4.17
|
1.06-2.04
|
1, 2, 3, 4, 7
|
PUT-163a-110967306-138
|
3
|
191981941
|
3.28
|
2.56
|
|
8.17-11.77
|
1.62-3.37
|
1, 2, 5, 6
|
PZE-105114980
|
5
|
171187130
|
/
|
/
|
|
4.35-8.20
|
1.15-2.29
|
2, 3, 5,6,7
|
PZE-109047930
|
9
|
79941271
|
4.61
|
4.00
|
|
5.73-10.40
|
2.43-5.21
|
1, 2, 3, 5, 6, 7
|
PZE-110106563
|
10
|
146944098
|
3.61
|
3.00
|
|
3.65-5.25
|
1.18-2.38
|
1, 2, 3, 4, 5, 6, 7
|
Methods1: Numbers 1 to 7 represent different GWAS methods: 1: MLM; 2: mrMLM; 3:
FASTmrMLM; 4: FASTmrEMMA; 5: pLARmEB; 6: pKWmEB; 7: ISIS EM-BLASSO.
Interestingly, we found that Zm00001d026540, which encodes auxin response factor29 (ARF29), had a higher expression in SAM and ears than in other tissues. Candidate gene association mapping showed that five SNPs (two SNPs in the gene and three SNPs in the gene upstream) around ARF29 were significantly related to KRN (Figure 3b and Table 3). ARF29 can bind the Bif1 (which is related to SAM development and final KRN) promoter by recognizing the TTTCGG motif [40, 41]. The S10_147122969 SNP, located within the gene body, was significantly associated with KRN. Two alleles for this SNP (A/T) were present in this panel, with the A allele having a higher KRN. Cytokinins also play an important role in the development of immature spikes and the formation of final KRN [42]. For example, UB3 regulates KRN by the cytokinin pathway and CLAVATA-WUSCHEL pathway [42]. In this study, CKO4 (Zm00001d043293, encodes cytokinin oxidase protein) was detected, and candidate gene association mapping for CKO4 was also conducted. The results showed that two SNPs located upstream of CKO4 were significantly associated with KRN (Figure 3c and Table 3). The S3_191837578 SNP had two alleles (T/G), and the T allele was associated with a higher KRN but had a lower allele frequency. This suggested that this allele may not be widely used in maize breeding.
Table 3. Candidate gene association analysis.
Gene ID
|
SNP
|
Chr
|
Pos
|
LOD
|
PVE
|
Allele
|
Frequency
|
ARF29
|
S10_147122969
|
10
|
147122969
|
4.57
|
8.97%
|
A/T
|
127/99
|
S10_147121954
|
10
|
147121954
|
4.44
|
8.98%
|
G/A
|
94/90
|
S10_147126021
|
10
|
147126021
|
3.88
|
7.58%
|
T/A
|
161/27
|
S10_147123193
|
10
|
147123193
|
3.33
|
5.30%
|
A/C
|
119/110
|
S10_147141311
|
10
|
147141311
|
3.17
|
4.92%
|
C/G
|
211/21
|
CKO4
|
S3_191837578
|
3
|
191837578
|
4.64
|
7.85%
|
G/T
|
177/45
|
S3_191841761
|
3
|
191841761
|
4.67
|
6.99%
|
T/G
|
236/16
|
Whole-genomic prediction of KRN
We first analyzed the LD blocks of all markers using the threshold value r2 > 0.2 and obtained 27,688 tagSNPs in our association panel. Then, we randomly selected different numbers of tagSNPs, from 5 to 27,000, in the whole genome to calculate the prediction accuracies for the KRN of the inbred lines. The results showed that the prediction accuracies increased as the number of tagSNPs increased (Figure 4a and Table S7). More specifically, the prediction accuracies sharply increased when the tagSNPs increased from 5 to 500 and then slowly grew when the tagSNPs increased from 400 to 2000. After 2000, the prediction accuracies maintained a consistently high level. Although a large number of tagSNPs was used to predict the KRN, the prediction accuracies were still less than 0.5. The effects of training population size on the prediction accuracy were also conducted based on the marker number 14000 (approximately 50% of the total tagSNPs). In the association panel, the prediction accuracies improved with increasing training population size. When the training population size increased from 50% to 90%, a slight increase was observed in the prediction accuracy (Figure 4b and Table S7).
To better understand the genetic architecture of KRN and improve its predictive ability, we ranked the 27,688 tagSNPs according to their significance to KRN obtained by MLM to obtain the top tagSNPs. We found that these top tagSNPs had a higher prediction accuracy (ranging from 0.60 for the top 100-tagSNPs to 0.74 for the top 700-tagSNPs) than these randomly selected tagSNPs (ranging from 0.22 for 100 random tagSNPs to 0.33 for 700 random tagSNPs) (Figure 4c and Table S7).
The tagSNPs representing the significant QTNs detected by different models based on the BLUP were collected and used to calculate the prediction accuracies for KRN in our association panel. The results showed that these tagSNPs identified by different methods had different prediction accuracies ranging from 0.43 (FASTmrEMMA) to 0.60 (ISIS EM-BLASSO) (Figure 4d and Table S8). We also found that the tagSNPs associated with KRN identified by the same method showed different prediction accuracies in diverse environments (Table S8). To explore whether using the codetected QTNs in different GWAS methods could increase the prediction accuracies for KRN, we selected the common QTNs identified by at least two, three, four, five and six methods to conduct the predictions. The results showed that only the common QTNs identified by at least two methods (common ≥ 2) could maintain predictability at a high level; other common QTNs had no advantage in predicting KRN, which may be due to the fewer QTN numbers (Table S8).
Additionally, to improve the prediction ability, we put the KRN-related tagSNPs detected by seven methods together in a single environment (204 in XX, 87 in BJ, 118 in GZL and 167 in BLUP), namely, M-total tagSNPs, to conduct the KRN prediction. As a result, we found that the prediction accuracies were improved sharply and reached 0.74 in XX, 0.66 in BJ, 0.75 in GZL and 0.75 in BLUP (Figure 4d and Table S8). These predictabilities were much higher than those of the single method in each environment (Table S8). Then, we collected the tagSNPs associated with KRN from all methods and all environments, namely, E-M-total tagSNPs, and obtained 439 tagSNPs in total. However, there was only a slight increase in prediction accuracies (ranging from 0.68 in BJ to 0.79 in BLUP for the 439 tagSNPs) when we used the much higher number of E-M-total tagSNPs compared to the fewer M-total tagSNPs (Figure 4d and Table S8).