Figure 2 depicts a graphical summary of the methods employed in this article. Descriptions of methods used are listed in the order in which they appear.
Cotton germplasm and experimental design
A field experiment was conducted in College Station, TX, between April and September 2023 to evaluate upland cotton (G. hirsutum) BC5 backcross-inbred lines containing small proportions of the genome of the wild Hawaiian cotton G. tomentosum (Nutt. ex Seem.), including 12 chromosome substitution lines, 35 chromosome segment substitution lines, two pure upland lines, a check, and a "filler". Greenhouse-grown three-week old seedlings of 49 unique genotypes were mechanically space-transplanted into a randomized complete block design with 10 rows (ca. 200-cm spacing) of 10 plants (ca. 180-cm spacing), with outer rows and end hills serving as non-experimental "border", thus yielding 48 spaced transplants in each of the five blocks. Each genotype had approximately 5 replications with a total of 240 individual plants. Due to poor germination or other environmental causes, 5 plants perished early in the growing season, leaving 235 individual plants.
G. tomentosum is a wild allotetraploid species native to the Hawaiian Islands, where it is referred to as Ma’o (43). Several G. tomentosum traits are desirable for introgression into cultivated cotton species, including its characteristic heat tolerance, resistance to pests and diseases such as fleahoppers, tarnished plant bug, bollworm, and boll rot (44), thrips and jassids (45), and for desirable agronomic traits including fiber quality, length, and fineness (46, 47). Whole-genome sequencing and comparison revealed higher genetic similarity between G. hirsutum and G. tomentosum versus other wild allotetraploid relatives, such as G. mustelinum and G. darwinii (48).
High-throughput phenotyping of cotton fiber quality trial
Figure 2A - After transplanting the individual plants to the field, UAS flights were conducted two or three times each week totaling to 46 flights across the growing season, of which 14 late-season flights are used to measure senescence. RGB images were captured with a DJI Phantom 4 Pro V2.0 with a 1-inch 20MP CMOS sensor with a mechanical shutter (SZ DJI Technology Co., Ltd., Shenzhen, China). The DJI GS Pro application was used for mission planning, with all missions conducted at 20 m (above ground), 90% forward and 80% side overlap, flight speed of 1.0 m/s, and a 2.0 s shutter interval, producing a ground sampling distance of 5 mm/pix. Geotagged images were orthorectified and stitched with Agisoft Metashape Version 2.0.2 (Agisoft LLC, St. Petersburg, Russia) and orthomosaics (Fig. 2A) were generated via the program’s structure from motion with multi-view stereo (SfM-MVS) workflow. Ground control points (GCPs) were recorded using an Emlid Reach M2 UAV RTK Kit (Emlid Tech Kft., Budapest, Hungary).
The procedures used to generate orthomosaics were as follows: 1) RGB images were imported into a Metashape project; 2) photos were aligned using referenced preselection, the key point limit was set to 40,000, the tie point limit was set to 4,000; 3) initial bundle adjustment was performed with the f, cx/cy, k1, k2, k3, p1, and p2 distortion parameters selected; 4) GCPs were imported as a .csv file and were manually tagged in six raw images per GCP; 5) all images were unchecked in the reference pane, the GCPs were integrated into the point cloud with the “update” button, and camera alignment was optimized using all available distortion parameters; 6) the dense point cloud and digital elevation map (DEM) were built using default settings; 7) the orthomosaic was constructed using the DEM as the surface.
Single-plant image extraction
Figure 2B – Using the UAStools R package (49), a shapefile was generated in which a bounding box was placed over each plant. Boxes were named according to the convention “Genotype_ID”, where the genotype was a four-digit code corresponding to one of the 49 unique genotypes and ID was a three-digit code that delineated individual plants within the same genotype. Of the 46 flights conducted across the growing season, 14 were deemed as capturing the senescence window, occurring on: 24, 28, and 31 July; 4, 8, 11, 14, 16, 18, 21, 24, and 28 August; 1 and 5 September 2023. These orthomosaics were loaded into QGIS (50) along with the shapefile and manual corrections to bounding box dimensions and locations were made as necessary.
Single-plant temporal image extraction
Figure 2C – The st_read function within the sf R package (51) was used to import the shapefile into the R environment. The fieldCrop_grid function from FIELDimageR.Extra (52) iterated through each bounding box within the shapefile and extracted one image per bounding box in TIF format (it is also possible to extract JPEGs), producing 240 images per flight. As five plants did not survive after being transplanted, effectively 235 images were obtained per flight, leading to a total data set size of 3,290 images (235 images per flight × 14 flights). Using a loop in R, images were renamed to the format “YYYYMMDD-Genotype_ID.tif” such that each image in the data set had a unique file name, where YYYYMMDD is the ISO 8601 date format of the drone flight. Images were separated into folders named according to the genotype and ID such that senescence could be scored rapidly in order of flight date for individual plants within each folder.
Senescence scoring
Figure 2D – Within each folder (i.e., for each plant), 14 senescence scores were assigned in succession according to flight date and were recorded in tabular format (3,290 total senescence scores). A scoring system with six categories was implemented, where 0 = 0% senescence (completely green), 1 = 20% senescence, 2 = 40% senescence, 3 = 60% senescence, 4 = 80% senescence, and 5 = 100% senescence (completely brown). Examples of plants representing each score are shown in Fig. 3. Three distinct temporal phenotypes were observed: senescence progressed toward plant death (Fig. 3A), stay-green occurred and plants maintained vigor until the end of the season (Fig. 3B), or plants presented an initial drop in vigor but displayed resiliency and resurgence of vigor (Fig. 3C). The transient display of intermediate senescence stages led to an imbalanced data set, which was dominated primarily by scores of 0, 1, and 5, with notably less examples seen for 2, 3, and 4, respectively (Table 1).
Table 1
Distribution of senescence scores belonging to each category. Since cotton is a perennial, many plants displayed either stay-green or a resurgence in vigor after an initial period of senescence, leading to a low number of scores of 3 (60% senescence) and 4 (80% senescence).
Score | 0 | 1 | 2 | 3 | 4 | 5 |
Count | 1,129 | 939 | 448 | 196 | 109 | 469 |
Data preparation and partitioning
Figure 2E – All TIFs were imported in R via the readImage function within the EBImage package (53) and were resized such that all images had uniform dimensions of 163×163 pixels corresponding to the approximate average dimensions of each image. The array_reshape function from the reticulate R package was used to transform each image into a 163×163×3 array, thereby splitting the red, green, and blue color channels. A loop was constructed to successively run 200 iterations of training and evaluating the CNN. With each iteration, the data set was randomly split into 80% and 20% partitions based on genotype, with 188 genotypes (2,632 images) belonging to the training data set and 47 belonging to the testing data set (658 images). Training and testing image sets were each converted to 4D arrays, where the first dimension corresponded to the number of images (either 2,632 or 658), and the remaining dimensions corresponded to 163×163×3. Pixel values within each 4D array were normalized to a range of [0, 1] from [0, 255] by dividing by 255. Vectors of ground truth senescence scores for training and testing images were one-hot encoded using the to_categorical function in the keras R package (54).
CNN parameters and model training
Figure 2F – A CNN was parameterized using the keras_model_sequential function with the following settings:
-
The first 2D convolutional layer had 32 filters of dimensions 3×3 and used the ReLU activation function. A max-pooling layer follows with a 2×2 pool size.
-
The second and third 2D convolutional layers had 64 and 128 filters, respectively, both with 3×3 kernels that use the ReLU activation function. Each were followed by a max-pooling layer with a 2×2 pool size.
-
After the three convolutional layers, the output of the final max-pooling layer was flattened to transform the 2D feature maps into a 1D vector.
-
A dense (fully connected) layer with 128 units followed the flattening layer and used ReLU activation.
-
A dropout layer with a rate of 0.5 followed the first dense layer to prevent overfitting.
-
Another dense layer with 6 units served as the output layer, where each unit corresponded to one of the six senescence categories. A softmax activation function was used in this layer to obtain probability values that sum to 1 for each prediction.
-
A categorical cross entropy loss function was used to compile the model. Root-mean-square propagation was used as the optimizer with a learning rate of 0.0001 and accuracy as the performance metric.
-
Using the fit function from keras, the model was trained for 50 epochs with a batch size of 32. The validation_split parameter was set to 0.2 to prevent overfitting. This resulted in the model being trained on 80% of the initial 80% training partition, or 64% of the original data set, resulting in approximately 2,106 training images.
Model evaluation
Figure 2G – With each iteration of the CNN, the model was evaluated with the unseen set of images via the evaluate function in keras. The overall accuracy was calculated by the sum of the diagonal of the confusion matrix divided by the total number of testing images, given by \(\frac{True positives}{658}\), with 658 being the number of validation images. For each iteration, the following metrics were saved: 1) a matrix of the accuracy, validation accuracy, loss, and validation loss for each epoch after subjecting the 20% validation data set to the model; 2) the average validation accuracy and loss across all epochs; 3) the confusion matrix. Both the confusion matrices and the averages of validation accuracy and loss were subsequently averaged across all 200 iterations. The average confusion matrix was used for calculations of precision, recall, and F1 scores. Where TP and FP denote true and false positives, respectively, precision is given by \(\frac{{TP}_{Class X}}{{TP}_{Class X} + {FP}_{Class X}}\) (correct class X predictions divided by all class X predictions), recall is given by \(\frac{{TP}_{Class X}}{{TP}_{Class X} + {FN}_{Class X}}\) (correct class X predictions divided by all class X instances in the data set), and F1 score is the harmonic mean of precision and recall, given by \(\frac{2 \times Precision \times Recall}{Precision+Recall}\). Precision, recall, and F1 scores were calculated for each senescence category.