For the field trials, sugarcane aphid population number, damage, plant height, and flowering time were recorded for two years in 2019 and 2020. In 2019 and 2020, the first alate aphids (winged aphids) were observed on sorghum on July 11 and July14, respectively. Aphid population increased gradually in both years before the population collapsed in the second half of August each year. Aphid counts and damage ratings were started at peak of the aphid population growth, recorded as first ratings on August 14, 2019, and August 11, 2020, while the second ratings were taken on August 28, 2019 and August 18, 2020, respectively. The sugarcane aphid damage on the SAP panel was rated twice in each year and the ratings were averaged across both years (“D1” and “D2” for the first and second rating dates, respectively).
General statistical summary
Aphid damage on plants was rated using a 1-9 scale, with 1 describing plants that showed no aphid-induced plant damage and 9 describing plants that had heavy aphid infestation up the flag leaf and 80% of the leaves displayed aphid damage40. There were 5 resistant lines that scored between 1.0-3.0 and 12 susceptible lines that scored 8.0-9.0 scale (Supplementary Table S1-a). There were two susceptible checks (N98 & N109B) and two resistant checks (No. 5-Gambela and SC110). The average aphid damage score over two seasons for the susceptible checks ranged from 2.3 to 6.3, while for the resistant checks aphid damage rating ranged from 1.0-3.2 (Supplementary Table S1-b).
Most SAP accessions were susceptible to aphid damage, with a mean damage score of 3.8 for D1 and 5.9 for D2 (Supplementary Table S1-c). Aphid induced plant damage for D1 and D2 ranged from 1.0-7.0 and 2.5-9.0, respectively. The average aphid count over the two seasons for the first-rating dates (AC1) ranged from 48 to 1500 aphids leaf-1, while the average of the second-rating dates aphid count (AC2) ranged from 0 to 1166 aphids leaf-1 (see Methods and abbreviations for details). For the SAP lines grown in the greenhouse assessed for aphid-induced plant damage (GHD) at the seedling stage, the average damage rating was 8.5 and the range varied from 5.0-9.0.
Remotely sensed imagery from a UAS-borne sensor was used to calculate three vegetation reflectance indices, namely, Normalized Difference Red-Edge (NDRE), Normalized Difference Vegetation Index (NDVI), and Soil Adjusted Vegetation Index (SAVI). Wall-to-wall coverage of the index values was used to score aphid damage more accurately across the entire field. Only the UAS-based data collected on August 25, 2020, was used for analysis as it had a high correlation to manual phenotyping and high heritability values. The NDRE value ranged from 0.12 (high aphid damage) to 0.51 (low or no aphid damage) with a mean of 0.28. The NDVI scores ranged from 0.16 to 0.78 with a mean of 0.46, while the SAVI score ranged from 0.07 to 0.41 with a mean of 0.16 (Supplementary Table S1-c).
The average plant height for the two growing seasons was 106 cm, ranging from 39 to 285 cm (Supplementary Table S1-c). Flowering time ranged from 41 to 97 days after planting, with an average of 59 days after planting.
Correlation analysis
The correlation between reflectance indices from 2020 and average aphid damage ratings from 2019 and 2020 were calculated. A significant and negative linear relationship was observed between aphid damage for the first-rating date in 2020 and UAS based reflectance indices with NDRE (r = -0.41), NDVI (r = -0.38), and SAVI (r = -0.35) (Supplementary Table S1-d). Similarly, the second-rating date for visual aphid damage in 2020 also showed a relatively strong negative correlation with NDVI (r = -0.48), SAVI (r = -0.50), and NDRE (r = -0.52). The average first-rating dates (D1) for aphid damage for 2019 and 2020 were also negatively correlated with NDVI (r = -0.43), SAVI (r = -0.41), and NDRE (r = -0.44). A relatively strong negative linear relationship was observed between the second-rating dates (D2) for aphid damage for 2019 and 2020 with NDVI (r = -0.55), SAVI (r = -0.56), and NDRE (r = -0.58). The correlation between reflectance indices and the first average aphids count (AC1) ratings showed a negative linear relationship, whereas the second aphid count ratings (AC2) showed a positive relationship. A relatively strong positive linear relationship was observed between AC1 and D1 (r = 0.62), as well as AC1 and D2 (r = 0.51) (Supplementary Table S1-d), suggesting that the higher the aphid population, the higher the aphid damage.
Heritability
High broad-sense heritability values were observed for plant height (H2 = 0.74) and days to anthesis (H2 = 0.69) (Supplementary Table S1-e). Aphid damage ratings for D1 and D2 had moderate heritability values (H2 = 0.31 and 0.34, respectively), whereas aphid damage ratings from the greenhouse experiment had lower heritability (H2 = 0.22). Of the aphid traits analyzed, the aphid count data had the lowest heritability at H2 = 0.16 for AC1 and 0.04 for AC2. Moderate to high heritability was observed for the drone data recorded on August 25, 2020. Heritability values for NDVI, NDRE, and SAVI for August 25, 2020, were 0.68, 0.56, and 0.36, respectively.
GWAS Marker-Trait Associations (MTA)
Genome-wide association was performed using two complementary methods. First, we performed single-marker regression with a generalized linear model (GLM), using genetic principal coordinates to correct for population structure in TASSEL41. Significant hits were identified by using 1000 permutations to establish empirical p-values, with empirical p≤0.01 considered as significant. In a complementary analysis, we used Fixed and random model Circulating Probability Unification (FarmCPU), a model-selection algorithm that includes both fixed and random effects components42. Preliminary analyses found individual runs of FarmCPU to be unstable, so the algorithm was rerun100 times with 90% subsampling to identify the most robust hits, scored by their “resample model inclusion probability” (RMIP)43 ( see Methods for details). The RMIP values of 0.05 or greater were considered significant44. Genomic regions identified by both methods—meaning low empirical p-values and high RMIPs—are high-confidence hits; those identified by only one are still significant but of lower confidence. For those traits that did not pass through significant thresholds and also showed weak signals in the above two methods, GWAS was also performed with FarmCPU in GAPIT package from R software without any iterations. GWAS analysis identified a total of 200 MTAs for different aphid resistance-related traits, flowering time, and plant height which are discussed below (Supplementary Table S2-a).
AC1-Aphid Count first rating dates and AC2-Aphid Count second rating dates
FarmCPU-RMIP model selection analysis identified 11 loci for AC1(Fig. 1, Supplementary Table S2-a). Three loci each were located on SBI-07 and SBI-09, while the remaining loci were distributed with a single SNP on SBI-01, SBI-02, SBI-06, SBI-08, and SBI-10. Among these markers, S8_11781182 showed a strong association (RMIP=0.41) with AC1. All SNPs that were significantly associated with AC1 explained 55% of total phenotypic variation (Supplementary Table S3). Two markers, S7_58778565 and S7_58839694 significantly associated with AC1, are located close (~479 kb) to the gene Sobic.007G151100 on SBI-07, which encodes a protein that is similar to 12-oxo-phytodienoic acid reductase, a gene expressed in response to sugarcane aphid attacks45. There were no significant associations identified for AC2 in either GLM or RMIP methods.
D1-First aphid damage ratings
FarmCPU-RMIP identified nine total SNPs with two SNPs each on SBI-07 and SBI-10 and a single SNP on SBI-02, SBI-04, SBI-05, SBI-06, and SBI-08 (Fig. 1, Supplementary Table S2-a). GLM analysis did not identify any significant MTAs for aphid damage D1 at permutation p-value ≤ 0.01. Among nine loci detected here using RMIP, the MTA on SBI-06 (S6_334458) showed the highest RMIP (13%) value (Supplementary Table S2-a). The total phenotypic variance explained by all SNPs associated with the first aphid damage rating dates-D1 was 32% (Supplementary Table S3).
D2-Second aphid damage ratings
Use of the RMIP method detected 14 SNPs and GLM method detected 6 SNPs associated with aphid damage D2 (Fig. 1; Supplementary Table S2-a). These 20 MTAs were detected on all chromosomes except SBI-02 and SBI-04, with the strongest associations on SBI-01, SBI-06, SBI-07, SBI-08, and a cluster of SNPs on the tail end of SBI-09 with higher RMIP and p values. The portion of phenotypic variance explained by significantly associated SNPs was 63% (Supplementary Table S3). GLM analysis detected one MTA on SBI-06 (S6_3050581) approximately 298 kb away from the RMES1 locus. The S6_53836196 marker is located ~99 kb away from Sobic.006G184300 gene on SBI-06 that encodes a phloem protein 2 gene known to be induced due to sugarcane aphid infestation46. RMIP identified two SNPs, S9_56657642, and S9_56678671, which were < 1Mb away from WRKY transcription factor SbWRKY86, Sobic.009G238200 (S9_57628850 to 57630763), which was differentially expressed in a gene expression study45. The SNP S8_11781182 showed a strong association with damage D2 in both GLM and RMIP analyses.
Greenhouse evaluation of aphid damage-(GHD)
There were no significant associations found for aphid damage in the greenhouse, except on SBI-07 using the FarmCPU RMIP method (Fig. 1). The marker on SBI-07, S7_2494675 was identified with RMIP value= 0.09 (Supplementary Table 2-a).
GWAS analysis for reflectance indices to assess SCA damage (with and without flowering time as a covariate)
Analysis of UAS reflectance data was performed both with and without flowering time as a covariate in case plant maturity had a significant impact on the observed spectra.
A total of 6 SNPs were associated with NDRE when flowering time was used as a covariate in the model selection analysis. Three of these SNPs were located on SBI-06, while a single SNP was found on SBI-01, SBI-02, and SBI-05. No significant SNP associated with NDVI was detected when the flowering time was used as a covariate. A single SNP marker (S9_3054933) was significantly associated with SAVI when flowering time was used as a covariate (Fig. 2, Supplementary Table S2-a).
We also did analysis without using flowering time as a covariate, and the RMIP analysis identified 20 SNPs significantly associated with NDRE. Of these, four SNPs were found on SBI-05, three on SBI-03, SBI-06, and SBI-08, two on SBI-01, SBI-02, and SBI-10, and a single SNP on SBI-07 (Supplementary Fig. S1 and Supplementary Table S2-a). There were four markers, namely, S1_5101469, S2_61431704, S6_58686833, and S6_60922873, that were significantly associated with NDRE and found in either case with or without using flowering time as a covariate and the rest 16 SNPs were new loci when flowering time was not used as covariate. Some of the important notable ones were S6_32034639 marker was found ~366 kb of the sugarcane aphid resistance QTL, qtlMs-6.2, located between 31,020,371 to 32,400,770 bp27. Among several loci identified, SNP S8_59192389 associated with NDRE is approximately10 kb away from the calmodulin, (CAM) gene, Sobic.008G159100 (SBI-08: 59203243 to 59204321), a differentially expressed gene family observed by Kiani & Szczepaniec45.
A total of 8 SNPs were associated with NDVI without using flowering time as a covariate, with two SNPs on SBI-01, SBI-02, and SBI-05, while a single SNP each on SBI-03 and SBI-06 (Supplementary Fig. S1; Supplementary Table S2-a). The S6_3383155 showed a strong association (RMIP=0.23) with NDVI and was located 631 kb away from RMES1 locus.
A single SNP marker (S9_3054933) associated with SAVI on SBI-09 was also found in either case with or without using flowering time as a covariate. NDRE associated markers explained 21-54% of the phenotypic variance, NDVI associated markers accounted for 41% of phenotypic variance and the single marker associated with SAVI accounted for 6% of the phenotypic variance (Supplementary Table S3). SNPs on SBI-05 at 3,230,535 bp and 3,282,948 bp were remarkably closer to each other (52.4 kb) and associated with NDRE and NDVI respectively. A common SNP on SBI-05 at 63,115,845 bp was associated with NDRE and NDVI.
Plant Height
FarmCPU-RMIP identified 12 SNPs with a RMIP ≥ 0.05, and these were located on all chromosomes except SBI-04, SBI-05 and SBI-10 (Fig. 3, Supplementary Table S2-a). The most SNPs were found on chromosome SBI-09 with 4 SNPs that colocalized to the Dw1 locus (Sobic.009G229800) (19 kb to 131 kb away). There were two SNPs identified on SBI-06 that were close to the Dw2 locus, about 863 kb to 1.58 Mb away. There was one SNP, S7_59614079 that was located within the Dw3 locus and was 3 kb away from the start position for this gene. There were two SNPs on chromosome SBI-01 and single SNPs on chromosomes SBI-02, SBI-03 and SBI-08. The strongest association of SNP with plant height was identified on SBI-09 (S9_57172609; p = 1.41 x 10-15; RMIP=0.48). The GLM results identified 78 markers on chromosomes SBI-06 and SBI-09 that colocalized to the Dw1 and Dw2 loci with markers as close as 9kb (S9_57051085) to 1.5 Mb from these loci. Markers on other chromosomes were not detected. These SNPs explained 74 % of phenotypic variance (Supplementary Table S3). All these loci were detected in previous studies compiled in Sorghum QTL Atlas by Mace et al.47 except two locations. The two SNPs, S2_6406243 on SBI-02 and S8_45638928 on SBI-08 were found to be new loci associated with plant height in sorghum from this study.
Maturity/Flowering time
GLM analysis identified two SNPs that are located on SBI-03 in close proximity (S3_4423241, and S3_4423302) with high confidence (post-permutation p-value ≤ 0.01) (Fig. 3, Supplementary Table S2-a). FarmCPU model selection identified 17 SNPs across all chromosomes except SBI-05 and SBI-10, including a highly associated SNP at the end of SBI-03, a cluster of SNPs on SBI-06, and a strongly associated SNP at the end of SBI-09. These SNPs explained 63 % of the total phenotypic variance for maturity (Supplementary Table S3). One MTA on SBI-01 (S1_7584419) was ~1.5 Mb away from the known maturity gene Ma5 (PHYC)48, 49, while S6_40987299 was 0.7 Mb away from major maturity locus Ma1 (SbPRR37)50. The other loci identified in this study are also found in the vicinity of QTLs reported in Sorghum QTL Atlas and from other studies47, 51, with the ones on SBI-03 having the most significant associations. Some of the loci were not detected in previous studies compiled in the QTL atlas of Mace et al.47 and are reported here as novel loci identified in this study as they were very far from the known maturity loci (Supplementary Table S2-a). There were 6 new loci and these were at least 10 Mb away from known maturity loci or at least 100-200 kb away from the known existing QTLs. There were two loci on SBI-01 (S1_30327933 and S1_30462377 that were 30 Mb away from Ma3), two loci on SBI-06 (S6_50581550 and S6_50582469, 10 Mb away from Ma1), and one single locus on SBI-02 (S2_2488178, 65 Mb away from Ma2) and on SBI-07 (S7_2559101).
FarmCPU analysis without iteration (single-run)
For traits that did not yield any MTAs by GLM or RMIP analysis, we noticed that running FarmCPU with default parameters (a single run with no resampling) still generated a few hits near known candidate genes. These hits were apparently not robust enough to be picked up in the resampling analysis, and we include them here for the sake of completion and under the assumption that they probably indicate real signal that simply fell below our statistical threshold (Fig. S2 & Supplementary Table S2-a).
AC2 Aphid count for second-rating dates using FarmCPU single-run
Single-run FarmCPU analysis identified a total of 7 SNPs significantly associated with the second rating dates of aphid count (AC2); three SNPs closely (<100 kb apart) located on SBI-04, one SNP on SBI-05 and three on SBI-06 were associated with AC2. Three SNPs, namely, S6_3381010, S6_3381051, and S6_3381094, were remarkably close to each other and (Fig. S2 & Supplementary Table S2-a) were approximately 629 kb away from the Sb06g001650 gene encoding for the RMES1 locus.
Greenhouse evaluation of aphid damage (GHD) rating using FarmCPU single-run
A total of 13 SNPs with a significant association to GHD ratings were identified using FarmCPU single-run analysis. Three SNPs were detected on SBI-01, two each from SBI-02 SBI-03, and SBI-09, and single SNP each detected on SBI-04, SBI-05, SBI-06, and SBI-07 (Fig. S2 & Supplementary Table S2-a). The total phenotypic variance explained by all SNPs of FarmCPU single-run for GHD rating was 30% (Supplementary Table S3). The RMIP and GLM analysis detected none or low phenotypic variation as only one significant SNP associated with GHD rating was identified (Supplementary Table S3). The S7_2494675 marker, significantly associated with GHD, was consistently found across RMIP and FarmCPU single-run data analyses. FarmCPU single-run analysis also detected a marker (S4_53265404 on SBI-04) associated with greenhouse aphid damage at 23 kb away from Sobic.004G180500 gene that encodes for 12-oxo-phytodienoic acid.
Genomic regions/clusters commonly identified for aphid-related traits
Using FarmCPU and GLM analyses, overlapping genetic regions (within 500 kb) were consistently identified for aphid population number, aphid damage, and reflectance indices. These clusters found across different traits were sorted based on their chromosomal position (Supplementary Table S2-b). We found a few regions on SBI-02, SBI-03, SBI-05, SBI-08 and SBI-10 that were consistently found across different traits. The SNPs, S8_11781182 on SBI- 08, and S10_ 2507813 on SBI-10 were consistently identified and associated with AC1 and plant damage D2 respectively (Supplementary Table S2-a & b). It is also noteworthy to observe that SNP S8_11408106 associated with aphid damage D1 was 373 kb away from this common genomic region, S8_11781182 on SBI-08. Additionally, three common SNPs, S2_61431704, S3_19558428, and S5_63115845 on SBI-02, SBI-03 and SBI-05 respectively were associated with the reflectance indices, NDRE and NDVI consistently. Another SNP associated with NDRE, S5_63115742 was remarkably close within 103 base pairs of consistently identified region S5_63115845 for NDRE and NDVI. This study identified two markers (S6_3050581 and S6_3383155) on SBI-06 that were associated with D2 and AC2 found in proximity (298-629 kb away) of RMES1 locus (Sb06g001650). The markers associated with D1 (S6_179752) and AC1 (S6_334458) were 154 kb apart from each other and 2.3 Mb away from RMES1 locus (Supplementary Table S2-b). The greenhouse GHD (S2_551409) and NDVI (S2_887719) associated SNPs on SBI-02 were closely located within 336 kb distance. The SNP marker associated with NDRE (S3_19929613) was within 371 kb from a marker, S3_19558428, consistently identified on SBI-03. There were three traits, GHD (S7_2494675), flowering (S7_2559101) and AC1(S7_2888228) that were within 64-393 kb which may show that flowering time and aphid resistance genes were found in a cluster on SBI-07. Aphid damage, D1(S8_11408106) was remarkably close (373 kb) to the common genomic region S8_11781182 that associated with damage D2 and AC1. The D1 SNP S8_11408106 is located 124 kb away from a gene, Sobic.008G075700, encoding LRR. SNPs associated with flowering time (S8_39703004) and NDRE (S8_39703090) were within 86 base pairs in another location. SNPs associated with plant height (S9_56530072) and D2 (S9_56705459) were within 174 kb on SBI-09. Plant height (S9_57804067) and flowering loci (S9_57914426) were identified within 109 kb on SBI-09.
Identification of differentially expressed known SCA genes within 200 kb of MTAs
There are two studies that have done in-depth analysis of differential gene expression in sorghum upon sugarcane aphid infestation45, 46. Our gene analysis was done exclusively to identify the candidate genes around MTAs identified in this study using genes expressed in these previously conducted SCA differential gene expression studies45, 46. The genes from these two studies were searched if they were found within 200 kb distance upstream and downstream of the markers identified in our study. The genes that were detected two times due to their association with closely linked markers were removed and only unique genes were retained.
Of the 9,291 unique genes identified by Kiani and Szczepaniec,45 that were differentially expressed in resistant and susceptible lines after exposure to the sugarcane aphid, 561 (6%) unique genes were detected within 200 kb distance of the 74 markers associated with sugarcane aphid-related traits (Supplementary Table S4). Of these, eight genes encoding leucine-rich repeats (LRR) were found on SBI-05, SBI-06, and SBI-07. The SNP marker, S6_334458, associated with aphid damage (D1), is 5 kb away from Sobic.006G001900 encoding a LRR gene that was upregulated during aphid infestation in the previous study45. Four genes near NDRE associated SNPs on SBI-03, SBI-06 and SBI-08 were similar to lipoxygenase (LOXs), and two genes on SBI-05 and SBI-07 encoded flavonol biosynthesis genes (Supplementary Table S4). The NDVI marker, S1-7442679, was found within ~99 kb of the Sobic.001G0955 gene that encodes a WRKY1 transcription factor.
Additionally, we pooled gene datasets obtained from Tetreault et al.46 that were detected within a 200 kb distance of MTAs. We analyzed genes that were induced in both resistant and susceptible plants upon sugarcane aphid infestation using 16 gene-expression modules in this study. Out of 15074 genes evaluated, 911 (6%) unique genes were within 200 kb region of sugarcane aphid associated SNPs (Supplementary Table S5). The SNP markers associated with aphid damage D2, S6_3050581 is 147 kb away from Sobic.006G018900 encoding protein similar to avr9/Cf-9 rapidly elicited response protein. S2_61431704 associated with NDRE is 339 base pairs away from Sobic.002G222800 encoding protein similar to Putative Avr9 elicitor response protein. The analysis identified twenty-five LRR, four LOXs, four CAM dependent protein kinases, two WRKY transcription factors, and one flavonol 3-O-glucosyltransferase.
In total, twenty-nine LRR, four CAM proteins, four lipoxygenase (LOXs), two WRKY, and two flavanol biosynthesis genes were identified from the above two studies (all supplementary files of Kiani and Szczepaniec,45 and 16 selected modules of Tetreault et al.46 (Supplementary Table S4 and S5). When all the modules from the Tetreault et al.46 study were used for candidate gene analysis, one 12-oxophytodienoic acid reductase gene (Sobic.004G180500) was found very close within 23 kb from S4_53265404. The Sobic.007G151100 encoding 12-oxophytodienoic acid reductase was also found within 483 kb to a marker, S7_58778565, associated with aphid count (AC1). We also found three LRR genes on SBI-05 and SBI-07, and one WRKY1 gene on SBI-01, which were commonly found in the studies by Kiani and Szczepaniec,45 and Tetreault et al.46.
We also analyzed the number of genes and gene frequency on individual sorghum chromosomes found near our SNPs associated with different aphid traits, by using these two datasets45, 46. The highest gene frequency distribution was observed on SBI-01 followed by SBI-06 (Supplementary Fig. S3-a & c). The majority of these genes were captured by markers associated with NDRE and D2 aphid traits (Supplementary Fig. S3-b & d).
The markers in the current study that showed more than 10 genes were considered as genic-rich regions as they may reveal a cluster of genes around them. Our result indicated that a total of 59 markers were associated with more than 10 genes found in the above two gene expression data45, 46. These clusters ranged from 11-32 candidate genes for each of these markers within 200 kb that might have promising genic-rich regions/clusters related to sugarcane aphid resistance (Supplementary Table S6).
Candidate gene analysis near consistently identified markers
The candidate gene analysis was focused near common and consistent markers identified between different traits on SBI-02, SBI-03, SBI-05, SBI-08 and SBI-10 using previously known SCA genes45, 46(Supplementary Table S4 and S5). A total of 25, 3, 1, 1, and 32 genes were found near these consistently identified markers, S2_61431704, S3_19558428, S5_63115742, S8_11781182 and S10_2507813, respectively. Some of the noted ones were Sobic.005G158000; oxidative stress response genes-similar to Cytochrome P450 71E1 found on SBI-05 and Sobic.008G077600- similar to Diacylglycerol kinase 1, found on SBI-08 as they were narrowed down to single genes from the previous gene expression studies around these markers. S2_61431704 marker was close to Avr9 elicitor response protein found in both studies. Among 32 genes found near S10_2507813, several LRR repeats were detected and one of them, Sobic.010G030000 was within 77 kb.