Temporal phenomic predictions from unoccupied aerial systems can outperform genomic predictions

doi:10.21203/rs.3.rs-954708/v1

Download PDF

Research Article

Temporal phenomic predictions from unoccupied aerial systems can outperform genomic predictions

https://doi.org/10.21203/rs.3.rs-954708/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

A major challenge of genetic improvement and selection is to accurately predict individuals with the highest fitness in a population without direct measurement. Over the last decade genomic predictions (GP) based on genome-wide markers have become reliable and routine. Now phenotyping technologies, including unoccupied aerial systems (UAS also known as drones), can characterize individuals with a data depth comparable to genomics when used throughout growth. This study, for the first time, demonstrated that the prediction power of temporal UAS phenomic data can achieve or exceed that of genomic data. UAS data containing red-green-blue (RGB) bands over fifteen growth time points and multispectral (RGB, red-edge and near infrared) bands over twelve time points were compared across 280 unique maize hybrids. Through cross validation of untested genotypes in tested environments (CV2), temporal phenomic prediction (TPP) outperformed GP (0.80 vs 0.71); TPP and GP performed similarly in three other cross validation scenarios. Genome wide association mapping using area under temporal curves of vegetation indices (VIs) revealed 24.5 percent of a total of 241 discovered loci (59 loci) had associations with multiple VIs, explaining up to 51 percent of grain yield variation, less than GP and TPP predicted. This suggests TPP, like GP, integrates small effect loci well improving plant fitness predictions. More importantly, temporal phenomic prediction appeared to work successfully on unrelated individuals unlike genomic prediction.

Plant Molecular Biology and Genetics

Agronomy

Agricultural Engineering

high throughput phenotyping

phenomic prediction

genomic prediction

plant breeding

To improve genetic gain, plant breeders must phenotype more plants repeatedly during growth allowing higher selection intensity, accuracy, and increased statistical power (1–3). High quality and quantity phenomic data is essential to develop widely applicable prediction models (e.g., phenomic predictions) to predict yield across growing environments and conditions in the near future (4). To date, few phenomic data sets, approaches and applications have been reported, especially those applied in a breeding context.

Organismal fitness, such as terminal grain yield in crops, is a cumulative response of genetics (G), the environment (E), management (M) and integrated GxExM interactions temporally throughout growth. To predict cumulative fitness of an individual organism without direct measurement of that individual’s fitness, proxies such as genetic markers are used, to link measurements of relatives and predict fitness with breeding values. Traditional best linear unbiased prediction (BLUP) derived breeding values (5) were modified by (6) where genotypic marker data of parental inbreds was combined with the yield data of the related single cross hybrids to predict yield performance of the single cross hybrids, known as genomic BLUP (GBLUP). However, prediction accuracies dropped dramatically when yield of unknown (previously untested) parental lines derived hybrids was predicted (7, 8). Various genomic based statistical models have been developed after the traditional GBLUP approach with advent of genomic technology (9–11). These methods have been applied extensively as genome wide marker facilitated selection also known as genomic selection in plants (12). Predicting the performance of previously untested genotypes in both tested and untested environments remains the central problem in plant breeding selections, and new approaches to addressing this challenge are needed. Genomic selection to estimate genotype fitness, as measured by terminal grain yield, relies on manually collected phenotype data which is resource intensive to collect. Phenotypic characteristics of cumulative complex traits are often not accurately predicted in GS because of (i) the different interplays of genes on phenotype throughout different growth stages, (ii) different effect sizes of the same genetic markers on phenotype of complex traits at different growth stages and (iii) different sources of phenotypic variation of the complex traits at different growth stages (13–21). Tools that can inexpensively evaluate individuals throughout growth, as they interact with their environment, would therefore be a valuable addition to predicting an organism’s fitness. Unoccupied aerial systems (UAS) are now able to provide these insights, frequently evaluating individuals temporally throughout growth. However, to date, fitness predictions from UAS alone have not been compared to the standard method of genomic prediction.

To evaluate fitness prediction of UAS based phenomics tools, the breeding value of each hybrid must be produced, these can be estimated from temporal VIs and structural measurements (canopy height) collected temporally throughout growth. Correlations between temporal VIs with yield and flowering times, as well as machine learning models can investigate predictive abilities for fitness traits (yield and flowering times). Phenomic predictions made from temporal vegetation indices and canopy height can be compared with traditional genomic predictions. Ultimately, major causal loci underlying phenomic predictions success for complex traits can be useful to understand underlying biology of organismal fitness over growth. Here we report phenomic data-driven selection for complex traits in maize breeding. We conducted UAS surveys with multispectral and RGB sensors to collect image-based temporal predictors throughout maize growth stages. We compared phenomic based prediction accuracy to that of genomic prediction, explored temporal shifts in image-based phenotypic variation explained by genome wide markers, and conducted association mapping utilizing temporal image-based phenotypes to identify biologically important loci.

Using the Genome to Fields initiative’s 2017 germplasm, 280 unique maize hybrids were grown under optimal management (OM) and 230 were grown under stressed management (SM, no irrigation, low fertilizer) near College Station, Texas. Two replications were used in a randomized complete block design with each hybrid grown as two consecutive row plots.

UAS surveys and image processing

A Phantom 3 Professional rotary-wing UAS, equipped with a 12-megapixel red-green-blue (RGB) DJI FC300X camera, flown 25 meters above the ground (TPP_RGB). Additionally, a Tuffwing UAS equipped with a MicaSense RedEdge-MX multispectral camera was flown 120 meters above the ground (TPP_Multi). Images were collected with 80% forward and side overlap in both surveys. Raw images were processed in Agisoft Metaphase Professional software (https://www.agisoft.com/) to generate the 3D point clouds and orthomosaics (SI appendix, Table S1) (22).

Phenomic data extraction pipeline

Environmental Systems Research Institute, Inc. (ESRI) shape file were constructed using R/UAStools::plotshpcreate function (23) and applied to each survey’s respective orthomosaic (.tif files) and 3D point clouds (.las or .laz files) to extract plot level image based phenotypes. Vegetation indices (SI appendix, Table S2) for each flight date were extracted using the FIELDImageR package (24) for each UAS survey (SI appendix, SI Materials and Methods). Plot based 99th percentile temporal plant heights (canopy height measurement; CHM) were extracted from 3D point clouds following the methods of (16) (SI appendix, SI Materials and Methods).

Experimental design and nested model for phenomic data

To analyze the temporal VIs and CHM, a custom nested design was applied to raw data of each VI and CHM belonging to each row plot in OM and SM, where experimental design and maize hybrids were treated as nested within drone times (SI appendix, SI Materials and Methods). Hybrids nested within pedigree results were used to predict GY, DTA, DTS, and PHT within and between the trials.

Machine learning based phenomic prediction models

Manually collect phenotypes (GY, DTA, DTS and PHT) were predicted using linear, elastic net, ridge, lasso, and random forest regressions using the TPP_RGB and TPP_Multi image-based phenotypes. Prediction models were trained using a random sampling of 70% of the common maize hybrids (tested genotypes) across the two management environments. The remaining 30% were used as the validation dataset (untested genotypes). Models were trained using OM trial (tested environment) while the SM trial served as the untested environment. Four cross validation schemes (CVs) were conducted as follows: (i) tested genotypes in tested environment (CV1), (ii) untested genotypes in tested environment (CV2), (iii) tested genotypes in untested environment (CV3), and (iv) untested genotypes in untested environment (CV4) (25). Phenomic prediction models and prediction steps are available in the SI appendix, SI Materials and Methods.

Association mapping for phenomic data

The image-based vegetation indices and Weibull_CHM were converted to cumulative area under curve (AUC) values and used as trait data in a genome wide association study (GWAS) (SI appendix, SI Materials and Methods). Association mapping was conducted using 158 maize hybrids and 101,100 genotyping by sequencing (GBS) SNP markers, implementing three multiple loci test methods; (i) fixed and random model circulating probability unification (FarmCPU) (26), (ii) multiple loci mixed model (MLMM) (27), and (iii) bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) (28) (SI appendix, SI Materials and Methods). Linkage disequilibrium (LD) estimates were used to identify candidate genes within LD blocks (\({R}^{2}\) ≥ 0.8) of colocalized SNPs (SI appendix, Fig. S1).

Genomic prediction for phenomic data

Genome-wide prediction was applied to 540 image-based phenotypes (35 VIs and CHM belonging to 16 flight times) of the 158 maize hybrids in TPP_RGB using 153,252 SNPs, temporal genomic prediction model was explained in SI appendix, SI Materials and Methods.

Phenomic prediction versus genomic prediction

GBS marker data (GP) and two sets of phenomic data (TPP_RGB and TPP_Multi) were used to conduct genomic prediction and phenomic prediction for maize grain yield (GY). A total of 118 G2F maize hybrids were used to compare the predictive ability between the genomic and phenomic data sets. Four cross validation schemes were applied as explained in “Machine learning based phenomic prediction models” section. Additional details regarding phenomic prediction versus genomic prediction are available within the SI appendix, SI Materials and Methods.

Variance decomposition and repeatability estimates demonstrate UAS sensor-based phenotypes were genetically stable

Variance component decomposition of the 83 sensor-based VIs (35 RGB and 54 multispectral) demonstrated UAS sensor-based data was statistically repeatable and biologically meaningful with a genetic basis. The rotary-wing equipped with an RGB (3 band, 12 MP) sensor flown at 25 m resulted in ~1 cm pix⁻¹ image resolution and had higher repeatability than the Tuffwing platform equipped with a multispectral (5 band, 3.8 MP) sensor flown at 120 m (~8 cm pix⁻¹). The main source of phenotypic variation for both platforms was explained by the temporal flight component (\({\beta }_{i}\) component in \(Eq. 1\)) of the nested design (31-96%) showing a temporal plasticity of maize spectral reflectance signatures throughout the plants growth cycle (SI appendix, Figs. S2 and S3). Genetic variance (\({\varOmega }_{i\left(j\right)}\) component in \(Eq. 1\)) was slightly greater for the higher resolution-low altitude RGB (1.5 - 5.2%; TR: 0.46 - 0.77) phenotypes compared to the lower resolution-high altitude RGB (1.1 - 4.5%; TR: 0.26 - 0.66) and lower resolution-high altitude multispectral (0.5 - 3.4%; TR: 0.28 - 0.62) phenotypes (SI appendix, Figs. S2 and S3). The repeatability estimates over the 35 RBG phenotypes were highly correlated (r=0.71) between the two sensor systems, although repeatability was improved by 0.08 on average, when implementing the higher resolution-low altitude RGB platform. Noticeable improvements in repeatability estimates (>0.1) were achieved for 13 RGB VIs and 6 VIs repeatability were reduced (<0.06) when implementing the higher resolution-low altitude RGB platform (SI appendix, Figs. S2 and S3). Overall, significant genetic variation was attributed to all VIs on both platforms, useful in predictive modeling of important agronomic traits such as grain yield, flowering times, and plant height (SI appendix, Fig. S4).

Temporal correlation

Temporal correlation between UAS survey dates of the VIs derived from the higher-resolution-low altitude RGB demonstrated that 14 of 35 RGB-derived VIs achieved a correlation above 0.50 (up to 0.61) to GY (SI appendix, Fig. S5). However, the 14 RGB- and 40 multispectral-derived VIs from the lower resolution-high altitude multispectral achieved correlations above 0.50 (up to 0.70) to GY (SI appendix, Fig. S6). Sensor-based VIs correlations with GY varied depending on the flight times. High correlations were found for VIs belonging to certain time points in both TP_RGB and TPP_Multi demonstrating that temporal VIs tend to synchronize with GY in maize hybrids indicating potential source for predicting yield.

Phenomic prediction using high dimensional UAS data

Temporal breeding values of each pedigree at each timepoint in TPP_RGB and TPP_Multi followed unique trajectories (SI appendix, Figs. S7 and S8) visually discriminating low, mid, and high yielding maize hybrids. Phenotype data of VIs at different time points had different discriminative ability for yield. This led us to test the predictive ability of two phenomic data derived from different sensors and resolutions utilizing the different prediction models.

The three machine learning models improved prediction accuracy (>90%) for all four agronomic traits (GY, DTA, DTS, and PHT) compared to the linear model when temporal phenotypes in TPP_RGB and TPP_Multi phenomic data were used as predictors (Fig. 1). The linear models had the highest prediction errors (RMSE and MAE) and lowest R² (SI appendix, Fig. S9). Ridge regression was the highest performing model for predicting GY regardless of the phenomic data sets; resulting in the best prediction performances for untested genotypes in tested environment (CV2), tested genotypes in untested environment (CV3), and untested genotypes in untested environment (CV4) (Fig. 1). Ridge regression best predicted GY for CV2 using the low-resolution multispectral sensor (TPP_Multi), while ridge regression also best predicted GY for tested and untested genotypes in untested environment cross validations (CV3 and CV4) using the high resolution RGB sensor (TPP_RGB; Fig. 1). Furthermore, ridge regression achieved the greatest prediction accuracy for the flowering times and plant heights utilizing the high resolution RGB UAS (Fig. 1). Prediction accuracies were higher in the most challenging CVs (CV3 and CV4) when TPP_RGB was used to predict GY, DTA, DTS, and PHT. These results demonstrate that the reduction in resolution, increased spectral bands, and increased sensor cost of incorporating the multispectral bands did not significantly improve model performance.

Variable importance scores of the machine learning models

To understand potential biological causes behind the most accurate predictions, variable importance scores were derived from the prediction models to identify critical predictor/time point combinations for TPP_RGB and TPP_Multi phenomic data sets (SI appendix, Figs. S10 and S11). Different contributions of VIs and Weibull_CHM at multiple time points were important among both phenomic datasets in the prediction of GY, DTA, DTS and PHT (SI appendix, Figs. S10 and S11). For instance, the TPP_RGB red chromatic coordinate index (RCC) and TPP_Multi modified nonlinear index values (MNLI) belonging to various time points, either before or after flowering times, for all predicted variables were identified by all machine learning models consistently and are therefore critical VI/timepoints combinations for all predicted variables (SI appendix, Figs. S10 and S11). This demonstrates the ability of machine learning models to identify important image-based phenotypes for future UAS surveying efforts and provides foundational insight towards understanding the biological importance of images-based phenotypes within a plant’s growth cycle.

Genome wide association mapping results

To gain further insight into biological significance of successful predictions, GWAS peaks were identified using area under curve values (SI appendix, Fig. S12) of each high resolution VI and Weibull_CHM in the TPP_RGB phenomic data set (SI appendix, Fig. S13). A total of 241 GWAS peaks were identified across the 36-temporal image-based phenotypes in TPP_RGB. Five genomic regions had significant loci for VIs and candidate genes of relevant interest (SI appendix, SI Results). Two genomic regions were identified as hotspots (fourth bin in chr2 and eighth bin in chr4) having GWAS peaks belonging to 24 VIs discovered across the three GWAS models (SI appendix, Fig. S14 and Dataset S1). A 15 kb genomic distance around the GWAS peaks was scanned to determine candidate genes based on the calculated LD decay (SI appendix, Fig. S1). LD patterns of both hotspots were visualized along with six candidate genes with functions described in SI appendix, Fig. S14.

A hotspot was identified at 36,828,844 bp on chromosome 2 (chr2_1), identified by the excessive red, modified green red, normalized difference, Normalized green red difference, and visible atmospherically resistant indices by the three GWAS models consistently explaining 8-13% phenotypic variation (Dataset S1). The chr2_1 peak is inside GRMZM2G023204 (chr2:36827859..36,829,876; B73 RefGen_v4), a putative protein kinase domain that catalyzes the function of protein kinases. Another candidate gene (~4kb away from chr2_1) is GRMZM2G021560 (pebp25; chr2:36,779,809..36,782,444; B73 RefGen_v4) a member of phosphatidylethanolamine-binding proteins (PEBPs) that regulate floral transitions (29) as well as that GRMZM2G021560 found to be expressed at the early vegetative stage (eg. third leaf stage) (30). Integrating GWAS with temporal phenotypes (TPP RGB), loci controlling the temporal VIs explained the phenotypic variations of multiple VIs revealing the pleiotropic effects of the loci. Additional candidate genes for other hotspots are discussed in SI appendix.

Genomic prediction results of temporal phenomic data

Genomic prediction results of temporal VI’s identified specific time points for each of the high-resolution VIs in TPP_RGB had varying ability to be predicted in cross validation (Fig. 2). Prediction accuracy showed flowering was the most (and in a few cases least) predictable by genomic markers for many VI’s likely because of differential emergence of tassels (Fig. 2). It was surprising that time points prior to flowering in some cases had relatively similar or higher prediction accuracy than those at flowering time (Fig. 2). Overall, sensor-based VIs were predictable at different time points using whole genome markers but estimated different phenotypic effect sizes (Fig. 2). This demonstrates that genetic makers estimated changing effects sizes revealing the plasticity of temporal VIs that are more explanatory to monitor the interactions between genetic background of plants and their growing environments across plant growth.

Genomic prediction vs phenomic prediction

Grain yield (GY) prediction ability of phenomic and genomic approaches were compared between both phenomic data sets (TPP_RGB and TPP_Multi) and genomic data (genomic prediction, GP). Comparing model prediction accuracies for untested genotypes in tested environment (CV2), low resolution multispectral (TPP_Multi) outperformed (\(\stackrel{-}{r}\)= 0.80) both genomic prediction (\(\stackrel{-}{r}\) = 0.71) and high resolution RGB (TPP_RGB; \(\stackrel{-}{r}\) = 0.72) (Fig. 3). Comparing model prediction accuracies for untested genotypes in untested environment (CV4), genomic prediction and RGB high resolution phenomic selection supplied similar prediction accuracies (\(\stackrel{-}{r}\): 0.53-0.55), while low resolution with multispectral sensor based HTP supplied a lower prediction accuracy (Fig. 3). Overall, the phenomic prediction platforms used in this study were largely able to predict better (CV2), or equivalent to, genomic prediction (CV1 and CV3) depending on which of the four cross validation schemes is evaluated (Fig. 3). However, genomic prediction outperformed phenomic prediction when predicting known genotypes in unknown environments (CV3). Combining both UAS measures (TPP_RGB and TPP_Multi) using ridge regression did not further improve prediction accuracies (data not shown).

Field-based high-throughput phenotyping technologies, such as drones, are able to provide phenome-wide measurements of plants in much the same way that high-throughput sequencers have provided genome-wide data. Uniquely, phenotyping technologies can screen high numbers of plots repeatedly through the growing period resulting in not only high spatial resolution but also high temporal resolution, helping dissect how different genotypes respond to their environments to maximize fitness in near real-time (SI appendix, Figs. S7 and S8).

As new temporal phenomic markers are difficult to independently measure and validate, one of the first approaches to evaluate phenomic marker utility is to look at heritability/repeatability values over different replicates and environments. This approach is not needed for genomic markers which do not vary over replicates and environments and theoretically have a repeatability near 1, but are also unable to capture environmental interaction in real time. Temporal repeatability (Eq. 3) of VIs were moderate, above ~0.5 for TPP_RGB (SI appendix, Fig. S2) and between 0.26 and 0.66 for TPP_Multi (SI appendix, Fig. S3). Temporal repeatability relied on variation across plant development, biologically more meaningful than using genotypic variation which is static at every time point. Temporal variation captured by drones assesses temporal genotypic variation jointly over time via nested design (Eq. 2). Previously, repeatability has only been calculated between different vegetation indices/CHM and yield at a single time point (16, 31–37); disregarding the temporal genotypic variation occurring across plant growth. Furthermore, previous studies used either one or a limited number time points and analyzed each time point separately.

High dimensional and temporal resolution phenomic data used in predictive plant breeding integrated with high throughput genotyping data discovered underlying genetic causes for many important temporal VI features. For instance, pleiotropy discovered via GWAS identified specific loci controlling more than one VIs (SI appendix, Figs. S13 and S14, Dataset S1). In addition, genomic prediction of temporal VI phenotypes proved that estimated effects of each marker varied through time, causing different prediction accuracy results for temporal phenotypes of the same VIs (Fig. 2). Therefore, instead of depending on discrete genome wide markers as predictors for yield, temporal phenotype data formed by estimated temporal marker effects could better predict certain scenarios (e.g., untested genotypes in tested environment). Predicting grain yield of untested genotypes in a tested environment is an important scenario for public breeding programs because lines developed in public breeding programs are mostly targeted for specific environments. So that figure 3 proved that TPP predicted the grain yield better than GP in CV2 indicating that TPP can be better solution for the public breeding programs for genetic gain. In addition, the predictive ability of TPP in untested genotype untested environments (CV4) was in the same range as that of GS (Fig. 3). This is also an important proof of concept that TPP can be used as widely as GP. Genomic prediction methods have been developed over more than a decade and phenomic prediction methods can likewise be improved. Further optimization and improvement of this approach will likely benefit from the integration of novel crop growth models as genomic prediction has (38).

Phenomic data can predict yield and flowering times via machine learning regressions

Shrinkage factors previously shown as the best performing prediction models when using different hyper parameters have been adapted for predicting both yield (15, 31, 39) and flowering times (15) when different reflection bands were used as predictors. Machine learning models with different regularization parameter settings to predict yield and flowering times (Fig. 1) were more accurate than linear-based prediction models (15, 34). This suggests that temporal variation in VIs do not have a linear relationship to predicted variables. This is because linear models tend to overfit when there are increasing numbers of predictors and with fluctuating collinearity between predictors, such as in phenomic data. Linear models are not capable to explain non-linear relationships between predictors and predicted variables.

Tuning regularization parameters of the ridge, lasso and elastic net-based prediction models is a good approach to deal with model overfitting when high dimensional phenomics data are used in prediction. Tuned regularization parameters in ridge, lasso and elastic net models can lessen coefficients, and predict test data more reliably than linear models. For example, pedigree within flight combination (\({\varOmega }_{i\left(j\right)}\) component in Eq. 2) were found to be statistically significant for all VI and CHM (SI appendix, Fig. S2) indicating a temporal interaction among the pedigree across flight times because of fluctuating temporal phenotype values of VIs (SI appendix, Figs. S7 and S8). Nevertheless, a general trend demonstrated that high- and low- yielding pedigrees segregate according to temporal phenotypes of VIs. This reverse correlation of temporal breeding values of the pedigree through time supports the existence of nonlinear relationships, problematic for a linear model to capture. Because of multiple decision tree learning, the random forest model accounts for non-linearity, limiting overfitting.

Phenomic prediction reached up to ~0.80 for grain yield and flowering time prediction (Fig. 1) higher than previously reported prediction accuracies (31–33, 35–37). (31) showed use of raw reflected bands instead of ratios (e.g. vegetation indices) performed better in prediction models. (34) further reported using all bands simultaneously increased prediction accuracy instead of VIs alone. However, reflected bands used in past studies derived from five to nine time points, lower time dimension data than what we generated in our study. This suggests that predictors derived from additional time points could play an important role on increasing the prediction ability of the models; more so than using the predictors as either raw reflectance bands or vegetation indices.

Genomic prediction for temporal traits can vary depending on the time points of growth

TPP_RGB phenomic data tested using genomic prediction to identify temporal marker effects and their prediction accuracies for each VI and Weibull_CHM throughout time (Fig. 2) demonstrated that genomic markers could predict an individual’s VI or Weibull_CHM value through cross validation using other individuals at the same stage. This demonstrated that certain stages and VIs have more genetic determination and are more heritable.

Temporally varying marker effects on the phenotype of VIs resulted in phenotypes at different timepoints of VIs and Weibull_CHM having different correlations with yield (SI appendix, Figs. S5 and S6) as well as different prediction abilities for dependent variables (Fig. 2). A dynamic pattern of marker effects as shown here has so far been overlooked in genomic prediction/selection of yield. (4) underlined that predicting the candidate genotype using the phenotype information collected from across multiple environments may be more accurate than using the genetic markers in the prediction model. Similarly, instead of predicting grain yield fitness by whole genome marker effect approaches such as RR-BLUP and GBLUP, including the temporal phenotypic variation occurring across growth into prediction models can result in more accurate fitness prediction as phenomic data already contain temporal marker effects. This study also showed that specific loci can explain different phenotypic variance across more than one derived VI (SI appendix, Figs. S13 and S14, Dataset S1) signifying pleiotropic effects of certain markers for the VIs. These pleiotropic effects have various associations with developing young tissues, inflorescence, and yield.

Phenomic prediction can perform similarly to or outperform genomic prediction

Phenomic data (TPP_Multi and TPP_RGB) predicted grain yield as well as genomic data using ridge regression (Fig. 3) but different results were observed depending on the cross validation scheme. TPP_RGB contained 35 VIs derived from only RGB bands and Weibull_ CHM belonging to fifteen time points (525 phenomic features) resulting in an accuracy of 0.71; this accuracy was same as the accuracy of 0.71 belonging to GP containing the 153,252 segregating whole genome markers. However, when TPP_Multi, which contains the 89 VIs derived from the multispectral bands and Weibull_CHM belonging to twelve time points (1068 phenomic features), were used in the prediction the yield, prediction accuracy reached up to 0.80; substantially higher than both GP and TPP_RGB supplied for the untested genotype in tested environments schemes (CV2) (Fig. 3). Moreover, in the most challenging cross-validation scheme, untested genotypes in untested environment (CV4), GP, TPP_RGB and TPP_Multi performed approximately equally as their prediction accuracies were around 0.50 ± 0.05 (Fig. 3). These empirical findings suggest, for the first time, that increasing temporal as well as spectral information can be used to predict fitness substantially better than genomic prediction. This also suggests that temporal and continuous phenomic data can be better predictors than discrete genomic data in prediction and selection of high yielding genotypes. In the only two previous phenomic prediction studies reported to date, (2) used 3,076 NIRS bands at a single timepoint, while (40) used 1,050 NIRS bands on grain samples. (40) then showed these NIRS bands outperformed genomic selection which used 84,259 SNP markers in wheat. Overall, phenomic selection is an emerging approach that may remove the cost of genotyping each year that is required by genomic prediction/selection. Adding a temporal component into phenomic prediction has innumerable known and yet to be discovered advantages.

In summary, this study demonstrated the predictive capability of phenomic data for complex traits in maize, yielding as much as genomic markers frequently applied in plant selection over the past 20 years. UAS surveys over the experimental field plots supplied temporal traits as predictors to facilitate the selection of untested genotypes in untested environments. Growing more plants and measuring them accurately are critical steps to drive effectiveness of selection intensity and accuracy resulting in higher genetic gain over time. This study exemplified that screening more plants and measuring them thanks to repetitive UAV flights across plant growth may results in greater genetic gain than genomic selection when phenomic prediction/selection is applied routinely.

Acknowledgements

The authors acknowledge Dr. Sorin Popescu and Dr. Lonesome Malambo for the implementing RGB flights, and Dr. Dale Cope for implementing multispectral flights over the maize breeding nurseries multiple times and across two years. The authors thank the Genome to Field project, which collaborated with many researchers at many research institutions and made this research possible. The authors also would like to acknowledge David Rooney, Jacob Pekar, Stephan Labar for their technical field support, and graduate and undergraduate/high school students for their dedicated work in the field. A.A was supported by a fellowship from Republic of Turkey, Ministry of National Education and Ministry of Agriculture and Forestry.

Author contributions

A.A.: conceptualization, data curation, formal analysis, investigation, methodology, original draft, review and editing (lead), supervision, validation, visualization; S.C.M.: conceptualization, funding acquisition, methodology, project administration, supervision, resources, review and editing (supporting); S.L.A.: conceptualization, data curation, formal analysis, investigation, methodology, review and editing (supporting)

Competing Interest

Authors declare that no competing interest exist.

J. L. Araus, S. C. Kefauver, M. Zaman-Allah, M. S. Olsen, J. E. Cairns, Translating high-throughput phenotyping into genetic gain. Trends in plant science 23, 451–466 (2018).
H. M. Lane, S. C. Murray, High Throughput can produce better decisions than high accuracy when phenotyping plant populations. Crop Science.
Y. Shi et al., Unmanned aerial vehicles for high-throughput phenotyping and agronomic research. PloS one 11, e0159781 (2016).
R. Bernardo, Predictive breeding in maize during the last 90 years. Crop Science (2021).
C. R. Henderson, Best linear unbiased estimation and prediction under a selection model. Biometrics, 423–447 (1975).
R. Bernardo, Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Science 34, 20–25 (1994).
R. Bernardo, Best linear unbiased prediction of maize single-cross performance. Crop Science 36, 50–56 (1996).
R. Bernardo, Best linear unbiased prediction of the performance of crosses between untested maize inbreds. Crop Science 36, 872–876 (1996).
J. B. Endelman, Ridge regression and other kernels for genomic selection with R package rrBLUP. The plant genome 4 (2011).
T. H. Meuwissen, B. J. Hayes, M. E. Goddard, Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
J. C. Whittaker, R. Thompson, M. C. Denham, Marker-assisted selection using ridge regression. Genetics Research 75, 249–252 (2000).
R. Bernardo, J. Yu, Prospects for genomewide selection for quantitative traits in maize. Crop Science 47, 1082–1090 (2007).
A. Adak et al., Validation of Functional Polymorphisms Affecting Maize Plant Height by Unoccupied Aerial Systems (UAS) Discovers Novel Temporal Phenotypes. G3 Genes| Genomes| Genetics (2021).
A. Adak et al., Unoccupied aerial systems discovered overlooked loci capturing the variation of entire growing period in maize. The Plant Genome, e20102 (2021).
A. Adak et al., Temporal Vegetation Indices and Plant Height from Remotely Sensed Imagery Can Predict Grain Yield and Flowering Time Breeding Value in Maize via Machine Learning Regression. Remote Sensing 13, 2141 (2021).
S. L. Anderson et al., Prediction of maize grain yield before maturity using improved temporal height estimates of unmanned aerial systems. The Plant Phenome Journal 2, 1–15 (2019).
J. A. Bac-Molenaar, D. Vreugdenhil, C. Granier, J. J. Keurentjes, Genome-wide association mapping of growth dynamics detects time-specific and general quantitative trait loci. Journal of experimental botany 66, 5567–5580 (2015).
M. T. Campbell et al., A comprehensive image-based phenomic analysis reveals the complex genetic architecture of shoot growth dynamics in rice (Oryza sativa). (2017).
M. J. Feldman et al., Time dependent genetic analysis links field and controlled environment phenotypes in the model C4 grass Setaria. PLoS genetics 13, e1006841 (2017).
B. Ward et al., High-throughput 3D modelling to dissect the genetic control of leaf elongation in barley (Hordeum vulgare). The Plant Journal 98, 555–570 (2019).
R. Wu, Z. Wang, W. Zhao, J. M. Cheverud, A mechanistic model for genetic machinery of ontogenetic growth. Genetics 168, 2383–2394 (2004).
S. Murray et al., G2F Maize UAV Data, College Station, Texas 2017. CyVerse Data Commons. doi 10 (2019).
S. L. Anderson, S. C. M. II, R/UAStools:: plotshpcreate: Create multi-polygon shapefiles for extraction of research plot scale agriculture remote sensing data. Frontiers in plant science 11 (2020).
F. I. Matias, M. V. Caraza-Harter, J. B. Endelman, FIELDimageR: an R package to analyze orthomosaic images from agricultural field trials. The Plant Phenome Journal 3, e20005 (2020).
X. Li, T. Guo, Q. Mu, X. Li, J. Yu, Genomic and environmental determinants and their interplay underlying phenotypic plasticity. Proceedings of the National Academy of Sciences 115, 6679-6684 (2018).
X. Liu, M. Huang, B. Fan, E. S. Buckler, Z. Zhang, Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS genetics 12, e1005767 (2016).
V. Segura et al., An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature genetics 44, 825–830 (2012).
M. Huang, X. Liu, Y. Zhou, R. M. Summers, Z. Zhang, BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. GigaScience 8, giy154 (2019).
O. N. Danilevskaya, X. Meng, Z. Hou, E. V. Ananiev, C. R. Simmons, A genomic and expression compendium of the expanded PEBP gene family from maize. Plant physiology 146, 250–264 (2008).
X.-h. SONG et al., Integrating transcriptomic and proteomic analyses of photoperiod-sensitive in near isogenic maize line under long-day conditions. Journal of Integrative Agriculture 18, 1211–1221 (2019).
F. M. Aguate et al., Use of hyperspectral image data outperforms vegetation indices in prediction of maize yield. (2017).
R. J. Galán et al., Early prediction of biomass in hybrid rye based on hyperspectral data surpasses genomic predictability in less-related breeding material. Theoretical and Applied Genetics 134, 1409–1422 (2021).
M. R. Krause et al., Aerial high-throughput phenotyping enabling indirect selection for grain yield at the early-generation seed-limited stages in breeding programs. Crop Science, 3096–3114 (2020).
O. A. Montesinos-López et al., Predicting grain yield using canopy hyperspectral reflectance in wheat breeding data. Plant methods 13, 1–23 (2017).
J. Rutkoski et al., Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3: Genes, Genomes, Genetics 6, 2799-2808 (2016).
J. Sun et al., High-throughput phenotyping platforms enhance genomic selection for wheat grain yield across populations and cycles in early stage. Theoretical and Applied Genetics 132, 1705–1720 (2019).
G. Wu, N. D. Miller, N. De Leon, S. M. Kaeppler, E. P. Spalding, Predicting Zea mays flowering time, yield, and kernel dimensions by analyzing aerial images. Frontiers in plant science 10, 1251 (2019).
C. D. Messina et al., Leveraging biological insight and environmental variation to improve phenotypic prediction: Integrating crop growth models (CGM) with whole genome prediction (WGP). European Journal of Agronomy 100, 151–162 (2018).
K. Kismiantini, O. A. Montesinos-López, J. Crossa, E. P. Setiawan, D. U. Wutsqa, Prediction of count phenotypes using high-resolution images and genomic data. G3 Genes| Genomes| Genetics (2021).
R. Rincent et al., Phenomic selection is a low-cost and high-throughput method based on indirect predictions: proof of concept on wheat and poplar. G3: Genes, Genomes, Genetics 8, 3961-3972 (2018).

SupportingInformation.docx
SI appendix
DatasetS1.xlsx

Download PDF

Version 1

posted

You are reading this latest preprint version

Temporal phenomic predictions from unoccupied aerial systems can outperform genomic predictions

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

UAS surveys and image processing

Phenomic data extraction pipeline

Experimental design and nested model for phenomic data

Machine learning based phenomic prediction models

Association mapping for phenomic data

Genomic prediction for phenomic data

Phenomic prediction versus genomic prediction

Results

Variance decomposition and repeatability estimates demonstrate UAS sensor-based phenotypes were genetically stable

Temporal correlation

Phenomic prediction using high dimensional UAS data

Variable importance scores of the machine learning models

Genome wide association mapping results

Genomic prediction results of temporal phenomic data

Genomic prediction vs phenomic prediction

Discussion

Phenomic data can predict yield and flowering times via machine learning regressions

Genomic prediction for temporal traits can vary depending on the time points of growth

Phenomic prediction can perform similarly to or outperform genomic prediction

Declarations

References

Supplementary Files

Status:

Version 1