Phenotypic variability
Genetic divergence based on agro-morphological variability in the RIL lines
The progeny population of two wide crosses BWF (Badshabhog x O. rufipogon) and CWF (Chenga x O. rufipogon) comprising 100 RIL lines in each cross was evaluated in the present study using DUS guideline (PPV&FR Act 2001, Govt. of India). All the RIL lines of BWF and CWF showed higher significant differences at p<0.001 based on the 15 quantitative traits, as revealed by ANOVA (Tables 2-3). An unexpected range of phenotypic variation was recorded among the RIL lines of BWF (Table 4, Figures 3-7). Shortest plant height of only 60 cm was recorded in BW98 line with small flag leaf length of 15.17 cm and width of only 5.07 mm with small shattered seeds (Table 4, Figures 3-7; Supplementary Table S1). In contrast to that tallest plant height was observed in line BW97 with shattered seeds. Grain pericarp colour in both the RIL lines (BWF, CWF), varied from white, brown, red, greenish to black with distinctive grain quality parameters viz., ASV, GT, GC, aroma (Figure 8; Supplementary Tables S1-S2). Considering the plant height (121.07 cm to 142.14 cm), maturity time (125 to 140 days), grain per panicle (190.40 to 387.14), 1000 grain weight (21.34 g to 27.20 g), and single plant yield (32.48 g to 61.89 g), the following BWF RIL lines BW6, BW17, BW18, BW23, BW24, BW25, BW26, BW77, BW84, BW85, BW88, BW90, BW91, BW94, and BW95 can be considered as promising aromatic black rice lines, including one white grained line BW99 (Table 4). In case of CWF, the following promising potential RIL lines CW1, CW11, CW16, CW20, CW23, CW79, CW78, CW79, CW90, CW94, CW95, CW96, CW97, CW98, and CW99 can be considered for further evaluation before release to the farmers’ field (Table 5, Figures 3-7). The presence of variability is a prerequisite for any breeding program and germplasm characterization. In this context, a previous study reported that grain weight is one of the most important trait for determining phenotypic variability in the rice landraces of the Majuli Islands, Assam, India (Mudhale et al. 2024).
In our breeding lines (BWF, CWF), wide range of phenotypic diversity was observed in regard to 15 quantitative agromorphological traits and grain quality parameters. Some of the RIL lines (BWF, CWF) showed transgressive segregation with respect to 1000 GrWt, grain length, panicle length, panicle weight, and others (Tables 4-5, Figures 3-7; Supplementary Tables S1-S2). Therefore, our present results are consistent with the previous studies and emphasizes that prior knowledge about the agro-morphological characterization of rice germplasm is fundamental to the plant breeders (Mudhale et al. 2024; Bordoloi et al. 2024).
Genetic variability parameters (Heritability, Genetic advance, GAM)
To investigate the genetic diversity of any genetic resources, several indicators like genotypic variance (GV), phenotypic variance (PV), and genotypic coefficient of variation (GCV), phenotypic coefficient of variation (PCV), as well as its broad sense heritability (H%), genetic advance (GA), and GAM (genetic advance as percentage of mean), were frequently exploited and accordingly conducted in our breeding lines (BWF, CWF) (Tables 6-7). The 15 quantitative traits studied indicated the presence of gigantic variability for the yield and its allied traits, which provides more opportunity to utilize these traits for the further rice improvement programs. The magnitude of PCV was higher than GCV for all the traits studied, indicating that the environment may have an impact on their phenotypic expression. Differences between GCV and PCV were less (BWF, CWF), indicating a higher correlation between phenotype and genotype, less environmental effect, and a larger role of genetic factors in these traits expression (Tables 6-7). High PCV and GCV values suggest that there is a great amount of genetic diversity existed in RIL lines (BWF, CWF) and are consistent with the previous results (Pathak et al. 2019; Gogoi et al. 2024). Heritability in the broad sense was classified as high (>60%), moderate (40–60%), and low (40%), whereas the levels of GA were ranked as low, moderate, and high, with corresponding ranges of 10%, 10–20%, and >20%, respectively. Some of the traits were recorded for high heritability in concurrence with high genetic advance, indicating additive gene effects in our present study (Tables 6-7). The broad sense heritability for agro-morphological traits ranged from 73.20% (PnL) to 99.87% (HD) in BWF lines and 76.58% (GB) to 99.72% (KL) in CWF (Tables 6-7). Broad sense heritability was found to be high for heading date (99.87%), maturity time (98.81%), 1000 grain weight (98.41%), panicle weight (91.98), plant height (91.55), yield per plant (90.01%), panicle length (73.20), and grain per panicle (80.85%) in BWF lines (Table 6). Similar results were also demonstrated by previous researchers (Pathak et al. 2019; Gogoi et al. 2024). Broad sense heritability was high (>60%) in all the characters studied (Tables 6-7), which indicates little environmental influence in our breeding lines and is consistent with an earlier study of Mondal et al. (2024). High heritability coupled with high GA (> 20%) of a trait is crucial for the selection of a trait in any breeding program. In our present study, high heritability coupled with high GA was reported for plant height; panicle length; grain per panicle; thousand grain weight; tillering; and single plant yield in BWF RIL lines, suggesting the additive gene action for the traits, so selection based on these traits could contribute largely to the improvement of rice (Table 6). High heritability (>90%) in combination with high score of genetic advance as percent of mean (GAM) was observed for the traits FLW, panicle weight, grain length, and kernel length in the CWF RIL lines, respectively (Table 7). This signifies that the traits are governed by additive gene action in nature and can be introgressed into the cultivars for germplasm enhancement through selection. Present results were consistent with the earlier studies in respect to genetic parameters studied (Ahmad et al. 2015; Roy and Shil 2020B; Faysal et al. 2022; Gogoi et al. 2024; Bordoloi et al. 2024). Our results ratify the proposition that significant genetic gain can be achieved in improving varieties by utilizing novel genes of neglected wild rice to restore genetic diversity and allelic variation lost during domestication (Siddiq and Vemireddy 2021; Eizenga et al. 2024).
Correlation among the agro-morphological traits data
Based on correlation analysis, it was observed that yield per plant (PY) had positive correlation with the traits such as grain per panicle (0.78), panicle weight (0.66), 1000 grain weight (0.57), and grain length (0.57) in BWF (Figure 9). The highest value of positive correlation was observed (0.88) between the traits kernel length (KL) and grain length (GL), followed by grain length and 1000 grain weight (0.79), grain length, and panicle weight (0.61). Yield per plant was negatively correlated with maturity time in days (MT) (-0.26), followed by heading date (-0.23). Heading date (HD) was negatively correlated with three traits: 1000 grain weight (-0.22), grain breadth (-.0.23), and kernel breadth (-0.03). In the CWF RIL lines, single plant yield was positively correlated with the traits panicle length (0.26), panicle weight (0.32), grain per panicle (0.63), and grain bredth (0.23) (Figure 9). The highest positive correlation (0.76) was found between heading date (HD) and maturity time in days (MT) and (0.61) between grain length and kernel length. The correlation analysis therefore indicate that grain per panicle, panicle length, panicle weight, 1000 grain weight, and grain length are the most important traits that need to be considered in the production of high-yield breeding lines.
Character Association and Path Coefficient Analysis
The correlation analysis illustrated only the relationship between two traits while path coefficient analysis permits separation of the direct and indirect effects via other attributes by partitioning the associations. Additionaliy, by separating the genotypic (G) and phenotypic (P) correlation coefficients into direct and indirect effects, the path analysis module gives insight into the true impact of independent factors on yield (dependent factor) (Saleh et al. 2020). In our current investigation, single plant yield (PY) was measured as a resultant (dependent) variable and PH, FFL, FLW, PnL, PnWt, GrPn, GL, GB, KL, KB, GrWt, Till, HD and MT were causal (independent) variables, which is illustrated through a path coefficient diagram (Figure 10; Tables 8-9). Partitioning of genotypic correlation coefficient with PY into direct (bold) and indirect path coefficient (PC) in 100 CWF and 100 BWF RIL lines are also summarized (Supplementary Tables S3-S4). In BWF lines, residual effect was 0.1743, indicating that 82.57% of the variability was explained by the 14 characters studied (Table 8). The positive direct impact on plant yield was majorly influenced by GL (0.5977**), GB (0.4916**) GrWt (0.5787**), PnWt (0.656**), PnL (0.4275**), and GrPn (0.797**) in the present study in case of BWF (Table 8; Figure 10). Thus, selection directly based on these characters would be appropriate for increasing plant yield (PY) in BWF. On the other hand, in case of CWF lines, plant height (PH) had a significant positive correlation (SPC) with FLL (r = 0.6869**, 0.5932**), FLW (r= 0.4341**, 0.3983**), PnL (r=0.5034**, 0.3677**), PnWt (r=0.2258*, 0.1543**), respectively at both genotypic and phenotypic (G and P) level (Table 9; Supplementary Table S4), suggesting that yield might be improved by regulating plant height. The SPC was found between panicle length (PnL) and panicle weight (r=0.5073**, 0.4600 **), grain per panicle (0.3652**, 0.3348 **), grain weight (0.3042**, 0.2378 **), GL (0.2207**, 0.1754 **), GB (0.3658**, 0.2967 **) KL (0.3301**, 0.2441 **) at the G and P levels, respectively, indicating that PnL and plant yield could be increased by improving these traits. However, the negative correlation (NC) between grain per panicle and HD (r = -0.4501 **, -0.4171 **), MT (r = -0.3705 **, -0.3582 **) was observed in our study at both genotypic and phenotypic level. The results indicate the association among the traits, that number of grain per panicle may increased the HD and MT (Table 9). Our results were in agreement with that of Nguyen et al. (2023), which revealed that MT had a negative indirect impact on grain yield per plant via HD. Positive direct impact was detected on plant yield through KL (0.5259**), GrWt (0.5113**), PnWt (0.6469**), PnL (0.4043**), and GrPn (0.7323) in the present study in CWF (Figure 10). Thus, selection directly based on these characters would be achievable for increasing plant yield (PY). The residual effect was 0.1364 indicating that 86.36% of the variability was explained by the 14 characters studied. However, there were other contributors (13.64%) which were responsible for yield but were not taken into consideration in the present investigation (Table 9). Our findings corroborate the earlier results that these traits had strong positive direct effects on yield (Surekha et al. 2016; Jeke et al. 2021; Nguyen et al. 2023). Positive direct effects of various traits on grain yield reported in the present research are in agreement with the findings of Faysal et al (2022). Therefore, the present study suggested that GrPn, PnWt, GrWt, and PnL which are the main components of the yield of these genotypes should be given high priority in selection for future breeding programs.
Dendrogram construction from agro-morphological traits data in BWF and CWF
The genetic relationship among the 100 RIL lines (BWF, CWF) was assessed by constructing a dendrogram to visualize the closeness. The dendrogram for agro-morphological traits was constructed on the basis of a matrix of average taxonomic distance based on euclidean estimates using the UPGMA method for 15 quantitative agro-morphological traits in both RIL populations (BWF and CWF). All the 100 BWF RIL lines, including three parental lines (parentals Badshabhog, wild rice Oryza rufipogon, and one control local black, Chakhao), were broadly grouped into three clusters. Each cluster is closely associated with the parental and control black lines. The cluster I comprises 40 RIL lines and is closely associated with the parental line Badshabhog; cluster II is related to control black rice and consists of 53 lines, whereas cluster III is associated with wild rice parental line O. rufipogon, consisting of 10 RIL lines (Figure 11A). On the other hand, dendrogram showed three clusters based on 15 agro-morphological data of 100 CWF RIL lines (Figure 11B). Cluster I, consisting of twenty-six RIL lines along with the parental line Chenga; cluster II, comprising of 59 RIL lines along with the local control black; whereas cluster I incorporates sixteen RIL lines close to wild rice, although O. rufipogon is placed in a separate clade outside of the cluster III, indicating high genetic dissimilarity with the RIL lines. Our present findings are consistent with the earlier studies of Ahmad et al. (2015) and Bordoloi et al. (2024).
Mahalanobis D2 test for genetic diversity analysis in BWF and CWF
Mahalanobis distances (D2) distinguished the BWF RIL lines (100 lines) into seven Tocher’s clusters (Supplementary Table S5). Each of the fifteen traits has contributed to the overall genetic divergence in the 100 BWF RIL lines (Table 10). The contribution towards the total variation was the maximum for GrPn (19.00), followed by 1000 GrWt (13.90), KB (13.60), GB (10.50), GL (8.80), PH (8.50), PnWt (5.30), KL (5.00), and PY (4.30). The average intra- and inter-cluster distances within the seven clusters indicate the degree of divergence within and between the groups (Table 11). The largest inter-cluster distances were found between clusters IV and VII (25865.50) in BWF lines. On the other hand, based on the Mahalanobis distance (D2) matrix, 100 CWF RIL lines are grouped into elevan Tocher’s clusters, seven multi-genotypic and four mono-genotypic (Supplementary Table S6). Each of the fifteen traits that has contributed to the overall genetic divergence in the CWF was categorized and displayed in Table 12. The contribution towards the total variation was the maximum for 1000 GrWt (24.32), followed by the other traits (Table 12). Cluster XI had the highest PY (34.95 g) with maximum contributions from GrWt, GL, KB, PH, GB, PnL, GrPn, and PnWt. Moreover, PY benefited most from Clusters VIII (32.01 g) and X (33.65 g) (Table 12). The average inter-cluster distances were observed to be greater than the average intra-cluster distances, suggesting that the CWF lines possess a greater degree of genetic diversity (Table 13). The average intra- and inter-cluster distances within the elevan clusters indicate the degree of divergence within and between the groups (Table 13). The largest inter-cluster distances (25817.49) were found between clusters II (CW27, CW71, CW95, CW23, CW16, CW84, CW85, and CW81) and cluster IV (CW44, CW48, CW26, CW31), containing genotypes that were found most divergent with maximum inter-cluster distance. According to the D2 cluster matrix, cluster VII had the largest intra-cluster distance (2942.63) with RIL genotypes CW20 and CW34. Maximum heterosis would result from a cross between genotypes from Clusters II and IV that had the greatest genetic distance (25817.49). Genotypes with the largest genetic distance in yield-attributing parameters would result in the complementation of gene effects in the hybridization program. Our present studies are substantiating the previous explanation that natural wild rice accessions are regarded as poor agronomic performers; despite this, the recovery of widely adaptable cultivars to diverse challenging environments would be higher when wild relatives of rice are used in the crossing program (Sanchez et al. 2013). The present reports are consistent with the earlier analyses that higher inter-cluster distances existed in the breeding lines and cultivars (Faysal et al. 2022; Mondal et al. 2024).
Principal component analysis (PCA)
Principal component analysis (PCA) is a multivariate powerful statistical tool to identify the minimum number of components to explain the maximum variability out of the total variability and rank genotypes based on PC scores. Plant breeders often measure many variables, some of which may not be of sufficient discriminatory power for germplasm evaluation, characterization, and management. In such cases, principal component analysis (PCA) may reveal the patterns and eliminate redundancy in data sets. Principal component analysis (PCA) was performed using phenotypic diversity based on 15 morphological traits from the 100 BWF RIL and 100 CWF RIL lines (Tables 14-15). According to the scree plot (Figure 12), principal components (PCs) assume importance when the eigen value is greater than one and the PC explains at least 3-5% of the variation in the data. Significant variables were indicated by high positive loading values (Tables 14-15, Figure 12). Out of fifteen, only four principal components (PCs) exhibited eigen value greater than one (Eigen value > 1) and explained 73.74% cumulative variability among the traits studied, indicating significant variability in BWF RIL lines (Table 14). The first four PCs explained 35.52, 17.43, 13.09 and 7.68% of the variability among the BWF RIL. The traits panicle weight (0.350), 1000 grain weight (0.346), grain length (0.335), grain breadth (0.324), kernel length (0.332) and single plant yield (0.356) positively contributed to the first PC1. In contrast, heading date (-0.1384) and maturity time in days (0.1242) contributed negatively to PC1. PC2, PC3 and PC4 which accounted for 17.43, 13.09 and 7.68% of the total variability respectively (Table 14; Supplementary Tables S7 & S9). PC1 and PC2 in the biplot diagram explained the dispersion and nature of diversity for both variables and breeding lines (BWF and CWF) in our present study (Figure 9). The vectors in the first quadrant, viz., plant height and flag leaf length, strongly correlated among themselves and loaded on the PC2 (Table 14). The vectors in the second quadrant- productive tillers, grain per panicle, and single plant yield, were highly correlated variables loaded on PC4. Similarly, the vectors in the fourth quadrant, heading date (HD) and maturity time in days (MT) were highly correlated variables and loaded on PC2. The traits heading date and maturity time were negatively correlated with single plant yield (PY). The RIL lines (BW9, BW19, BW81, BW44, BW86, BW88) projected on the vectors of grain per panicle, grain weight, GL, KL and single plant yield (PY) were close to them, demonstrating a positive interaction (Figure 9). Comparing the 100 RIL lines of BWF based on PCA biplot analysis, the RIL lines BW18, BW23, BW24, BW25, BW44, BW52, BW77, BW83, BW88, BW90, and BW99 were superior for panicle weight, 1000 grain weight, single plant yield, grain length, grain breadth, and kernel length. Hence, these results of PCA will be of greater benefit to the breeder to identify parents and the selection of characters for future hybridization programs for varietal improvement.
The PC analysis of yield and yield-contributing traits of 100 CWF RIL lines generated six PCs, and the first six components together explained more than 71.90 % of the total variation in CWF RIL lines (Table 15; Figure 9). PC1, PC2, PC3, PC4, PC5 and PC6 accounted for 20.026, 14.267, 11.569, 9.680, 8.394, and 7.961%, respectively, of the total variability in CWF RIL lines. In PC1, the traits GrPn (0.384) and PH (0.390) contributed positively and accounted for 20.026 % of the variation as a whole (Figure 9). The vectors in the first quadrant, viz., FLL, GrWt, GL, GB, KB, strongly correlated among themselves and loaded on the PC2 (Table 15). The vectors in the second quadrant, PY, GrPn, PnWt, and PnL were highly correlated variables loaded on PC5. Similarly, the vectors in the fourth quadrant, PH, Till, and KL, were highly correlated variables and loaded on PC5. Comparing the 100 CWF RIL lines based on PCA biplot analysis, the RIL lines CW1, CW11, CW16, CW26, CW29, CW32, CW36, CW40, CW41, CW44, CW57, CW79, CW81, CW84, CW85, CW93, CW96, and CW98 were superior for PnWt, PnL, GrWt, PY, GL, GB, and KL (Table 15; Figure 9; Supplementary Table S8 & S10). The genetic diversity of breeding lines (BWF and CWF) was clarified, and component traits contributing to variability were broken down through the combination of principal component analysis; this could provide the framework for a well-run hybridization program (Supplementary Tables S7-S10). Previous studies also found similar results that the first five principal components (PCs) exhibited eigen values of more than 1.00 and explained 85.87% variability (Ahmad et al. 2015; Bordoloi et al. 2024).
Selection of newly developed innovative aromatic black rice lines with purple leaf
The most innovative and novel genetic change that we have observed in the progeny populations (BWF and CWF) was the appearance of black pericarp containing rice lines. Many phenotypic variations were detected in the grain colour which ranged from white, light brown, redish brown, brown, deep brown, redish, red, blackish red, greenish, blackish brown, black, to deep black (Figures 3-7; Supplementary Tables S1-S2), broadly showing 9:6:1 and 9:3:4 ratios in some of the generations. This supports the view that grain colour is a polygenic inheritance in nature and controlled by many genes, or quantitative trait loci (QTL) or/ involving as yet unidentified genes (Oikawa et al. 2015; Ham et al. 2015; Devi et al. 2020; Pham et al. 2024). In the present study, we have observed many breeding lines with purple leaf colouration in the CWF cross with black pericarp and black husk colour (Figure 2). Purple leaf trait is inherited from F3 generation, suggesting that the trait has been newly acquired by the breeding lines although parental lines were devoid of such trait. The evolutionary history of anthocyanin biosynthesis genes reveals that the purple-leaf trait was negatively selected during the domestication of rice. The reason for this negative selection (mutant allele of OsC1 and Rb gene and normal Rd) may be that anthocyanin in rice leaves reduces the efficiency of photosynthesis, in turn reducing yield (Xia et al. 2021). However, anthocyanins in various plant tissues is crucial for diverse biological functions, including UV damage protection, defense responses to biotic and abiotic stresses, hormonal regulation and defense against pathogens, insects, herbivores and environmental stresses (Chalker Scott, 1999; Steyn et al. 2002; Ithal et al. 2004; Landi et al. 2015; Zaidi et al. 2019).
Physicochemical properties, Biochemical tests and Nutritional Facts of Breeding Lines
All the physicochemical properties and sensory-based aroma test results of the black rice breeding lines were summarized in Figure 8, supplementary Tables S1-S2. The amount of amylose content ranged from 11.31% in CW97 (black non-scented) to 19.13% in BW57 (red non-scented), 14.46% in BW23 (black scented) and 13.72 % in CW16 (black scented) (Figure 8; Table 16). Higher amylose content generally results in a slower digestion rate, leading to a lower glycemic index compared to rice varieties with lower amylose content. The amount of TPC ranged from 154.859 mg GAE/100g dry weight basis in BW99 (white, non-scented) to 520.016 mg GAE/100g in CW40 (red, scented), which was quite high compared to some of the parental lines. The present results are consistent with the previous report of Bhuvaneswari et al. (2020) and Idrishi et al. (2024). The promising black rice breeding lines also contained a comparatively high amount of anthocyanin pigment, 261 mg/100 g in BW23 (black-scented) in comparison to control black (259 mg/100g) and other parental lines. Our present investigation is consistent with the findings of other researchers (Gogoi et al. 2024; Bhubaneswari et al. 2024; Lap et al. 2024). The DPPH (2,2-diphenyl-1-picrylhydrazyl) free radical scavenging activity in breeding lines ranged from 70.88 to 78.43% in pigmented rice and 11.89% to 19.06% in non-pigmented rice line genotypes (Table 16). Our findings in respect to DPPH antioxidant activity support the earlier views that black rice breeding lines (BWF, CWF) have higher levels of total phenols, flavonoids, and anthocyanins than white rice, indicating greater antioxidant activity with high grain quality (Table 16; Figure 8) (Roy and Reddy 2017; Bhuvaneswari et al. 2020; Zhu et al. 2024; Gogoi et al. 2024). Nutritional facts of black rice breeding lines, including parental lines, were also summarized in Table 17, indicating that our breeding lines were nutritionally enriched, comprising of high amount of quality protein, fiber content, various minerals, magnesium, manganese, phosphorus, calcium, sodium, zinc, iron, PUFAs (polyunsaturated fatty acids), and MUFAs (monounsaturated fatty acids) (Table 17). Both PUFAs and MUFAs play crucial roles in human health and have been associated with a decreased risk of cardiovascular diseases (CVDs), and our present result was consistent with the earlier report (Bordoloi et al. 2024).
Moreover, the HR-LCMS-QTOF analysis revealed the detection of anthocyanin compounds in our black rice breeding lines (BW23 and CW16), mainly petunidin 3-O-glucoside, including 41 metabolites (Table 18; Figure 13C-D). The most common metabolites identified were as follows: catechin, oryzanol, gallic acid, caffeic acid, quinic acid, quercetin, 3,5-dihydroxybenzoic acid, rutin, luteolin 4'-O-glucoside, heptadecatrienoic acid, PAB/4-Aminobenzoic acid, kaempferol, 7-O-glucoside, peganine, maritimetin, mitoxantrone, methyl 2-(10-heptadecenyl)-6-hydroxybenzoate, zinnimidine, azafrin, tubulosine, and other metabolite compounds having medicinal properties (Table 18). Similar patterns of metabolites were reported by previous studies in pigmented rice varieties (Bhuvaneswari et al. 2020; Zhu et al. 2024). Total amino acid content was quantitatively estimated in the rice grain of our breeding lines through the HR-LCMS-QTOF method and ranged from 8.76 mg/100 g (BW23) to 8.81 mg/100 g (CW16) on a dry weight basis with the following amino acid compositions: aspartic acid, alanine, arginine, cysteine, glutamate, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline (hydroxyproline), serine, threonine, tyrosine, glutamine, and valine (Table 18 and Figure 13A-B). In the present study, glutamic acid was found to be the highest amount (1650 mg/100 g) detected in BW23, and methionine was the lowest amount (70 mg/100 g) detected in CW16. The amino acid profile of rice grains showed that it is high in glutamic and aspartic acids, while methionine is the limiting amino acid, similar to other analyses (Carcea 2021). Protein content was 8.76 g/100 g in BW23 and 8.81 g/100 g in CW16, which is quite high in respect to the parental lines (Table 17). Comparatively higher amounts of total protein were observed in aromatic rice compared to non-aromatic rice (Liyanaarachchi et al., 2021; Tyagi et al., 2022). Proteins containing the following amino acids, such as lysine, leucine, isoleucine, and threonine, are considered high-quality proteins (Huang et al., 2019; Liyanaarachchi et al., 2021; Tyagi et al., 2022; Jayaprakash et al., 2022). Our results showed that both the black rice breeding lines (BW23 and CW16) are nutritionally enriched due to the presence of high-quality proteins in the endosperm containing the high quality amino acids (Table 19) and other important neutraceuticals, i.e., oryzanol, anthocyanin, catechin, iron, and zinc (Tables 17-19) (Ahmad et al. 2015; Huang et al. 2019; Liyanaarachchi et al. 2021; Tyagi et al. 2022; Jayaprakash et al. 2022) (Table 19). Therefore, our breeding lines can be considered as Super Food or Panacea. Glutelin protein is considered a high-quality protein due to the lysine, leucine, isoleucine, and threonine richness present in PB-II (protein body) located in the rice endosperm and visualized in caryopsis ultrastructural anatomy using SEM (Figure 14). Grain protein prolamin is present in the PB-I and located in the endoplasmic reticulum (RE) (Jayaprakash et al. 2022).
Genetic characterization of Kala4 gene through PCR amplification in BWF and CWF
The Kala4 gene (bHLH TF) is mostly responsible for black pericarp development in cultivated rice by rearranging its promoter region through a LINE1 insertional mutation, including 11.02 kb of genome segment insertion within the LINE1 transposon (Figure 15). The 11.02 kb genomic segment insertion at the Kala4 promoter has been considered as a key regulatory genetic rearrangement completely responsible for black pericarp development in the Asian cultivated rice (acquired neo-functionalization trait) (Oikawa et al. 2015; Kim et al. 2021). The report explained that neo-functionalization of Kala4 allele through LINE1 insertional mutation, had happened in the genetic background of tropical japonica first and then spread to indica and other subspecies of rice (Oikawa et al. 2015). The Kala4 gene is approximately 25.6 kb in size and composed of 8 exons and 7 introns. In the present study, PCR-amplified products were detected in all the parental and breeding lines when LINE1 insertion-specific primer set 1 was used in the reaction mixture (Table 1), which signified that LINE1 had been inserted into the intron 2 position of the Kala4 gene otherwise, a PCR product was not possible with this primer set 1 (Figure 15A). Genomic segment of 11.02 kb insertion was checked by the PCR product in presence of primer set 2, which was formed as a junction1 of either type I or type II insertional pattern (Figure 15B). PCR amplified product was observed in all the six lines in presence of primer set 3 confirming that the intron 2 construct of Kala4 gene is inserted in the correct position in BWF and CWF lines (Figure 15C). In our present investigation, we judged the availability of Kala4 ORF in the breeding lines based on the information of chromosomal location (Os4g0557500), with the help of primer set 4 (Table 1). PCR product was detected about 700 bp long in all the breeding lines along with other coamplification bands about 200-300 bp, indicating that Kala4 ORF of chromosomal location at position Os4g0557500 is in a functional structural position (Figure 15D). PCR profiling based on fragrant gene BAD2 (Table 1) also indicated that our black rice breeding lines are scented, BWF (Figure 15E) is heterozygous in nature and CWF (Figure 15E) is double recessive homozygous in nature and scented. It can be summarized from the PCR results that primers used (Table 1) for the detection of LINE1 transposon and an 11.02 kb genomic segment insertion near the Kala4 promoter has been indorsed, supporting the view (partially) of earlier study regarding Kala4 gene construct rearrangement in black rice varieties (Oikawa et al. 2015) except the black conversion process from tropical japonica to black indica. Present study therefore, provides a direct evidence of the earlier proposed concept of Kala4 genetic rearrangement of black rice origin, which was totally based on genome analysis and not from any breeding experiment (Oikawa et al. 2015). Our report explained that acquired neo-functionalization of the Kala4 allele (gain-of-function mutation) had occurred in the black rice RIL lines (BWF, CWF). If no such insertional mutation happened (LINE1 and 11.02 Kb insertion), then it would not have been possible to develop black pericarp in our breeding lines (BWF and CWF). The LINE1 rearrangement consequently induce the ectopic expression of genes involved in anthocyanin production, which in turn gives rise to black rice.
The subspecies indica/aus is originated from a subgroup of O. rufipogon type OrI and japonica from a subgroup type OrIII (Huang et al. 2012B; Civan and Brown 2018; Wang et al. 2018; Zhang et al. 2021). Reproductive barriers between japonica and indica rice emphasizes that they have resulted from independent domestication process or a single domestication with multiple origins (Choi et al. 2017). This lead to a pertinent question that how black rice of the indica type be originated from black tropical japonica (whose progenitor is type OrIII)? The subspecies japonica originates from distinct populations of wild rice (OrIII) and indica originates from other subgroup OrI. At this point of dispute, we are not supporting the earlier view that neo-functionalization of Kala4 allele through LINE1 insertional mutation, had happened in the genetic background of tropical japonica first and then spread to indica and other subspecies of rice (Oikawa et al. 2015). Therefore, black indica subspecies must be originated from the wild rice OrI type population. Based on our classical breeding and genetic evidence, we support the view of multigeographical independent origin of cultivated rice, that means indica and japonica subspecies originated from distinct genetic resources of O. rufipogon (OrI and OrIII). In this argument, we have put up a novel theory that black rice, primarily of the indica subspecies, originated independently on the Indian subcontinent during the domestication process from the wild rice of India through natural outcrossing, gene flow, and artificial selection. This is a new innovative information and unravels the potential of breeding knowledge that can contribute to overcoming the many unsolved problems about the evolutionary origin of cultivated rice, the history of rice domestication (single or multiple independent origins), and more specifically, the origin of black pericarp pigmentation in the cultivated rice. Also the high yield potential with good nutritional quality of our RIL lines will help to boost up the food and nutritional security of the world by 2050.