Rice genotypes
One hundred rice genotypes were subjected to high-throughput phenomic analysis. Details about the 100 rice genotypes, including names, variety type, and the year they received the Variety Certificate in China, are listed in Table S1. These 100 rice genotypes were obtained by purchasing from domestic seed companies. The 100 rice genotypes were grown using the following method: On May 22, 2023, the rice genotypes were sown at the experimental farm of the China Rice Research Institute in Fuyang, Hangzhou, China. Ten days after sowing, the rice materials were transplanted into plastic buckets (18 cm high, 20 cm in diameter). Each rice genotype was planted in 3 plastic buckets (i.e., 3 replicates per genotype), with 2 seedlings planted in each bucket, totaling 6 individual plants per genotype. The potting soil used was paddy field clay loam, with about 4 kg of dry soil per bucket. These genotypes were studied using RGB imaging, visible and near-infrared (VNIR) hyperspectral imaging, short-wave infrared (SWIR) imaging, and fluorescence imaging at four rice growth stages: tillering, jointing, grain filling, and 20 days after grain filling.
Platform for high-throughput phenomic analysis
The high-throughput phenomic analysis was conducted using a PlantScreen™ System platform designed and constructed by Photon Systems Instruments (PSI, Czech R.) (https://psi.cz/). This PlantScreen™ System platform is located at the China National Rice Research Institute in Fuyang, Hangzhou, China (Fig. S1A). The PlantScreen™ System platform consisted of an RGB imaging station, a chlorophyll fluorescence imaging station, and VNIR and SWIR imaging stations (Fig. S1B, S1C). All the high-throughput rice images and data were obtained using this PlantScreen™ System and its attached computer software, including Plantscreen Data Analyzer (Version 3.3.16.3), Hyperspectral Analyzer (Version 1.0.0.14), Morpho Analysis (Version 1.0.12.0), and FluorCam10 (Version 1.0.0.1806).
RGB imaging
For 100 rice varieties, top-view and side-view RGB imaging were performed at 4 different growth stages. The RGB side view and top view cameras are mounted on robotic arms, which can take 12 megapixel resolution photos. Each camera is supplemented with a calibrated LED-based lighting source to ensure homogeneous illumination. Fish eye correction, white balance calibration, and color correction are applied to the original photos taken using the RGB cameras to reduce errors. The corrected images are then processed for masking and background subtraction to remove the background and keep only the rice plants. Color segmentation is then applied to the processed rice images. The RGB values of each pixel corresponding to the rice's surface area are extracted to serve as the dataset for the color segmentation. A k-means clustering algorithm is used to partition the pixels into 9 clusters based on their Euclidean distance in the RGB color space. These cluster centroids represent the base hues used for the color segmentation. The original pixel colors are approximated to the nearest cluster centroid, effectively color-segmenting the images. The image is segmented into 9 hues, which include RGB(34,38,22), RGB(45,54,13), RGB(45,55,36), RGB(57,71,46), RGB(59,71,20), RGB(72,84,58), RGB(73,86,36), RGB(90,98,58), and RGB(110,111,90) (Fig. 1). The detailed method for color segmentation in the PlantScreen™ System can be found in Awlia et al. (2016). After this step, a total of 18 color parameters are attained, including 9 colors for side view and top view images, respectively. The "side" and "top" suffixes are used to distinguish side view and top view, respectively, in this study. The RGB parameters attained at four rice growing stages for the 100 rice varieties were listed in Table S2 to Table S5, respectively.
Meanwhile, binary images are attained from the RGB rice images, and the binary images are subjected to morphological parameter calculation. Five parameters are attained for side view images, including AREA, PERIMETER, COMPACTNESS, WIDTH, and HEIGHT. For top view images, nine parameters are attained, including AREA, PERIMETER, ROUNDNESS, ROUNDNESS2, ISOTROPY, COMPACTNESS, ECCENTRICITY, RMS (Rotational Mass Symmetry), and SOL (Slenderness of Leaves) (Fig. 1). Area is calculated by counting pixels (PX) of the binary image. Perimeter is determined by calculating edge pixels of the rice binary image and transformed to millimeters (MM). Both pixels or millimeter can be used to indicate perimeter or area, which we use perimeter-PX or perimeter-MM to refer to pixels and millimeter, respectively. Because the perimeter-PX and perimeter-MM are the same in coefficient analysis or regression analysis, we neglect all the MM results and keep only the PX data in the present study. After obtaining the parameters of area and perimeter from the binary image, other morphometric rosette parameters can be determined, including: ROUNDNESS1, ROUNDNESS2, ISOTROPY, ECCENTRICITY, COMPACTNESS, Rotational Mass Symmetry (RMS), and Slenderness of Leaves (SOL) (PlantScreen™ analyzer, PSI) (Pavicic et al. 2017). These parameters were originally designed for studying Arabidopsis, and may not be suitable for studying rice, but since they are automatically obtained from the PlantScreen™ System, we also analyze these parameters (phenotypic traits) to see whether these rosette parameters influence rice or not. Fig.S2 is an example demonstrating the calculation of perimeter and eccentricity in rice. The introduction about these parameters can be found in Pavicic et al. (2017), and we introduce them briefly here:
$$\:\text{R}\text{O}\text{U}\text{N}\text{D}\text{N}\text{E}\text{S}\text{S}=\frac{4\ast\:\text{P}\text{I}\ast\:\text{A}\text{R}\text{E}\text{A}}{{\text{P}\text{E}\text{R}\text{I}\text{M}\text{E}\text{T}\text{E}\text{R}}^{2}}$$
$$\:\text{R}\text{O}\text{U}\text{N}\text{D}\text{N}\text{E}\text{S}\text{S}2=\frac{4\ast\:\text{P}\text{I}\ast\:\text{C}\text{O}\text{N}\text{V}\text{E}\text{X}\_\text{H}\text{U}\text{L}\text{L}\_\text{A}\text{R}\text{E}\text{A}}{{\text{C}\text{O}\text{N}\text{V}\text{E}\text{X}\_\text{H}\text{U}\text{L}\text{L}\_\text{P}\text{E}\text{R}\text{I}\text{M}\text{E}\text{T}\text{E}\text{R}}^{2}}$$
$$\:\text{I}\text{S}\text{O}\text{T}\text{R}\text{O}\text{P}\text{Y}=\frac{4\ast\:\text{P}\text{I}\ast\:\text{P}\text{O}\text{L}\text{Y}\text{G}\text{O}\text{N}\_\text{A}\text{R}\text{E}\text{A}}{{\text{P}\text{O}\text{L}\text{Y}\text{G}\text{O}\text{N}\_\text{P}\text{E}\text{R}\text{I}\text{M}\text{E}\text{T}\text{E}\text{R}}^{2}}$$
$$\:\text{R}\text{M}\text{S}=\frac{2\sqrt{{(0.5\ast\:\text{M}\text{A}\text{J}\text{O}\text{R}\_\text{A}\text{X}\text{I}\text{S}\_\text{L}\text{E}\text{N}\text{G}\text{T}\text{H})}^{2}-{(0.5\ast\:\text{M}\text{I}\text{N}\text{O}\text{R}\_\text{A}\text{X}\text{I}\text{S}\_\text{L}\text{E}\text{N}\text{G}\text{T}\text{H})}^{2}}}{\text{M}\text{A}\text{J}\text{O}\text{R}\_\text{A}\text{X}\text{I}\text{S}\_\text{L}\text{E}\text{N}\text{G}\text{T}\text{H}}$$
$$\:\text{C}\text{O}\text{M}\text{P}\text{A}\text{C}\text{T}\text{N}\text{E}\text{S}\text{S}=\frac{\text{A}\text{R}\text{E}\text{A}}{\text{C}\text{O}\text{N}\text{V}\text{E}\text{X}\_\text{H}\text{U}\text{L}\text{L}\_\text{A}\text{R}\text{E}\text{A}}$$
$$\:\text{S}\text{O}\text{L}=\frac{{\text{S}\text{K}\text{E}\text{L}\text{E}\text{T}\text{O}\text{N}\_\text{P}\text{E}\text{R}\text{I}\text{M}\text{E}\text{T}\text{E}\text{R}}^{2}}{\text{A}\text{R}\text{E}\text{A}}$$
$$\:\text{E}\text{C}\text{C}\text{E}\text{N}\text{T}\text{R}\text{I}\text{C}\text{I}\text{T}\text{Y}=\frac{\text{A}\text{R}\text{E}\text{A}\left(\text{C}\text{I}\text{R}\text{C}\text{L}\text{E}\:\text{O}\text{N}\text{L}\text{Y}\right)+\text{A}\text{R}\text{E}\text{A}\left(\text{C}\text{O}\text{N}\text{V}\text{E}\text{X}\:\text{H}\text{U}\text{L}\text{L}\:\text{O}\text{N}\text{L}\text{Y}\right)}{\text{A}\text{R}\text{E}\text{A}\left(\text{I}\text{N}\text{T}\text{E}\text{R}\text{S}\text{E}\text{C}\text{T}\text{I}\text{O}\text{N}\right)}$$
Fluorescence Imaging
The chlorophyll fluorescence imaging analysis can quantify the efficiency of plants in using light energy for photosynthesis and heat dissipation in rice. The fluorescence imaging station in the PlantScreen™ System uses a high-sensitivity CCD camera with a multi-color LED light panel. It utilizes pulse-modulated short-duration flashes for accurate measurement of minimal fluorescence, and it uses saturating light pulses for measurement of maximal fluorescence. The system has two types of actinic lights for light-adapted and quenching analyses.Thirteen parameters were measured and obtained for 100 rice varieties at four rice growing stages:
F0: Minimum fluorescence in the dark-adapted state.
Fm: Maximum fluorescence in the dark-adapted state.
Fv: Variable fluorescence in the dark-adapted state, calculated as Fm - F0.
QY_max: Maximum PSII quantum yield, calculated as Fv/Fm.
Fm_Lss: Steady-state maximum fluorescence in the light.
Ft_Lss: Steady-state fluorescence in the light.
F0_Lss: Steady-state minimum fluorescence in the light, calculated as F0 / ((Fv/Fm) + (F0/Fm_Lss)).
Fv_Lss: Variable fluorescence in the light-adapted state, calculated as Fm_Lss - F0_Lss.
Fq_Lss: Difference in fluorescence between Fm_Lss and Ft_Lss in the light-adapted state, calculated as Fm_Lss - Ft_Lss.
Fv/Fm_Lss: PSII quantum yield of the light-adapted sample in the steady-state, calculated as Fv_Lss/Fm_Lss.
QY_Lss: Steady-state PSII quantum yield, calculated as Fq_Lss/Fm_Lss.
qP_Lss: Coefficient of photochemical quenching in the steady-state, calculated as Fq_Lss/Fv_Lss.
Size_FC: The top view rice area under the fluorescence imaging system.
The "Lss" indicates that the parameters are measured in the light steady-state (Abdelhakim et al. 2024).
VNIR and SWIR hyperspectral imaging
The PlantScreen™ system uses two types of hyperspectral cameras: (1) Visible-Near-Infrared (VNIR) camera: Measures light in the 380–900 nm wavelength range at high spectral resolution. (2) Short-Wave Infrared (SWIR) camera: Measures light in the 900–1700 nm wavelength range at high spectral resolution. These cameras can capture hundreds of narrow spectral bands for each image pixel, providing detailed information about the optical properties of the samples being imaged. In this study, rice plants were imaged from the top-down at four different growth stages. To isolate the rice plants from the background, the VNIR images were processed using the formula: 1.2 * (2.5 * (R740 - R672) − 1.3 * (R740 - R556)), where R740, R672, and R556 represent the reflectance at 740 nm, 672 nm, and 556 nm, respectively. For the SWIR images, the background was removed using the formula (R960 - R1450) - (R960 - R1200). Pixels with values below a certain threshold were considered background, while those above the threshold were used to construct the rice plant imaging area. Several spectral vegetation indices were then calculated from the hyperspectral data using the PlantScreen™ Data Analyzer software, including PSRI (Plant Senescence Reflectance Index), NDVI (Normalized Difference Vegetation Index), PRI (Photochemical Reflectance Index), SIPI (Structure Insensitive Pigment Index), OSAVI (Optimized Soil-Adjusted Vegetation Index), and MCARI1 (Modified Chlorophyll Absorption in Reflectance Index) :
$$\:\text{P}\text{S}\text{R}\text{I}=\frac{\text{R}680-\:\text{R}500}{\text{R}750}$$
$$\:\text{N}\text{D}\text{V}\text{I}=\frac{\text{R}800-\:\text{R}670}{\text{R}800+\:\text{R}670}$$
$$\:\text{N}\text{D}\text{V}\text{I}2=\frac{\text{R}400-\:\text{R}670}{\text{R}400+\:\text{R}670}$$
$$\:\text{P}\text{R}\text{I}=\frac{\text{R}531-\:\text{R}570}{\text{R}531+\:\text{R}570}$$
$$\:\text{S}\text{I}\text{P}\text{I}=\frac{\text{R}790-\:\text{R}450}{\text{R}790+\:\text{R}650}$$
$$\:\text{O}\text{S}\text{A}\text{V}\text{I}=\frac{(1\:+\:0.16)\:\ast\:\:(\text{R}800-\:\text{R}670)}{\text{R}800-\:\text{R}670\:+\:0.16}$$
$$\:\text{M}\text{C}\text{A}\text{R}\text{I}1=1.2\:\ast\:\:[2.5\:\ast\:\:(\text{R}800-\:\text{R}670)\:-\:1.3\:\ast\:\:(\text{R}800-\:\text{R}550\left)\right]$$
For each of these indices, summary statistics were computed, including the average, standard deviation, median, minimum, and maximum values across the rice plant imaging area. The suffixes "-avg", "-stddev", "-median", "-min", and "-max" that follow the hyperspectral parameters indicate the average, standard deviation, median, minimum, and maximum values, respectively, based on a pixel-by-pixel inspection for each area defined by the rice mask. Additionally, a water index called WATER1 was calculated from the SWIR data as R1440 / R960.
Traits related to rice yield and grain quality measured at mature stage
At maturity, the aboveground portion of the rice plants (grown in plastic buckets) was harvested and separated into leaves, stem sheaths, and panicles. The plant materials were then dried at 105°C for 120 minutes and further dried at 80°C until they reached constant weight. The dry weights of the leaves, stem sheaths (shoot weight), and panicles were measured. The total dry matter weight was calculated as the sum of the leaf, stem sheath, and panicle weights. The number of effective panicles was also recorded. These five yield-related traits were measured for 100 rice varieties (Table S6), with three replicates (i.e., three plastic buckets) per variety.
After harvesting, the rice grains were dehulled using a rice huller (Otake-FC2R, Japan) to obtain the brown rice percentage, which was averaged across the three replicates. The brown rice was then milled using a precision rice mill (Puyun 2299, China) to determine the milled rice percentage. Approximately 30 grams of milled rice from each variety were analyzed using a rice grain appearance and quality tester (China Wanshen Testing Technology Co., Ltd) to measure the head rice percentage, chalky rice percentage, and degree of chalkiness. Each variety was tested in three replicates, with 30 milled rice grains per replicate. Head rice is defined as grains with a length of at least 3/4 the length of an intact milled rice grain. Chalkiness degree (%) refers to the percentage of the head rice area that is chalky. Chalky rice percentage (%) is the percentage of grains with chalkiness out of the total head rice. These five grain quality-related traits were measured for 91 rice varieties (no data available for 9 varieties, Table S7).
Regression Analysis
To identify the best subset of predictors from the 88 available parameters/traits related to RGB imaging, fluorescence imaging, VNIR and SWIR imaging, we performed subset selection regression using the "regsubsets" function in R. The goal was to use as few traits as possible to accurately predict rice yield and quality. We used the "regsubsets" function with the 'nbest = 8' argument, which returned the 8 "best" subsets of predictors. The number of predictors to include can be decided based on practical needs, ranging from 1 to 88.
Deep Learning Neural Network Model
We built a neural network model using the Keras library in R to predict rice yield and quality. First, we split the phenotypic data into 80% training and 20% testing sets using the "sample" function in R. The neural network model had two dense layers: the first had 10 units and used the ReLU activation function, and the second was the single-unit output layer. This was a simple feed-forward neural network. We trained the model for 500–2500 epochs, with a batch size of 32 and a validation split of 0.2 (see Data. S1 for details). We evaluated the model's performance on the test data using the "predict()" function and calculated the mean absolute error (MAE). Finally, we compared the neural network model's performance to a linear regression model. The data can be split into training and testing sets in different proportions depending on your needs, but here we used an 80/20 split.