Population genetic structure and nucleotide diversity
The PCA, admixture, and neighbor-joining were performed to visualize a comprehensive investigation into the genetic separation of four commercial Asian rice subspecies. More than 37% of the genetic variation was explained by the top three PCs (Fig. 1). PC1 separates the Indica/Aus and two japonica varietal groups. The Indica and Aus clustered in two separate clusters at PC2. Finally, the PC3 classified the Temperatejaponica and Tropical japonica subpopulations into two separate groups. Similar results were achieved by admixture analysis (Fig. 2). The admixture analysis at K=2 and K=3 showed both Temperatejaponica and Tropical japonica in one cluster; and also, at K=2, it showed Indica and Aus together in one group each of which was grouped into a cluster by admixture at K=4. Similar to PC3, the admixture at K=4 separated Temperatejaponica and Tropical japonica subpopulations. Same interpretation concluded from the neighbor-joining analysis (Fig. 3).
Study of linkage disequilibrium patterns (mean of r2) in populations can be informative for having an overview of population demography background, due to the effective role of linkage disequilibrium patterns in demographic forces and evolutionary patterns [29]. Our linkage disequilibrium results illustrated a rapid drop at approximately 5 Kb in all groups (Fig. 4). The means of r2 at 300 Kb for Indica, Aus, Temperatejaponica, and Tropical japonica groups were 0.04, 0.05, 0.10 and 0.07, respectively.
In this study, values (in 10Kb windows) were calculated to evaluate genomic diversities in studied subpopulations (Fig. 5). The values were ranged from 5.35E-7 to 0.0015 across all subpopulations. The mean of values demonstrated higher nucleotide diversities in Aus (0.16E-3) and Indica (0.17E-3) in comparison with Temperate japonica (7.97E-5) and Tropical japonica (1.09E-4) groups.
Selective signals and gene ontology
In this study, based on windows located in the top 1% of Z (Fst) values with Tajima’s D values < 0, six scenarios were considered to detect genomic regions underlying selection pressures including: Tropical japonica vs. Temperate japonica, Temperate japonica vs. Indica, Tropical japonica vs. Indica, Indica vs. Aus, Tropical japonica vs. Aus and Aus vs. Temperate japonica.
Tropical japonica vs. Temperate japonica
A total number of 66 windows were identified in the top 1% of Z (Fst) values (Fig. 6) with Tajima’s D values < 0, in the Tropical japonica vs. Temperate japonica scenario (Supplementary data 2). These windows contain 145 genes related to 40 biological processes (Supplementary data 2 and Supplementary information, Table S1); however, these biological processes were not significantly enriched. Additionally, we detected three important candidate genes including: Os05g0405000, Os05g0402700 and Os03g0318500 related to different economical traits, such as temperature response [30], response to stresses [31], starch metabolism, structure [32], root growth, and development. The Glucose-6-phosphate1-dehydrogenase gene is the ortholog of Os03g0318500; the important role of this gene was shown in response to stresses, such as metal toxicity [33], osmotic stress [34] and pathogenesis [35]. The ortholog of Os05g0405000 is PPDK1, which in rice endosperm can play a key role in starch metabolism and structure by regulating starch synthesis-related enzymes expression [32]. The temperature tolerance has been one of the important traits in rice breeding strategies [36, 37]. The Os05g0405000 gene is also related to temperature response. A gene expression study revealed that high temperature can decrease the PPDK enzyme activity [30]. The ortholog of Os05g0402700 gene is Fructose-bisphosphate aldolase 1; Konishi et al (2005) demonstrated the pivotal role of Fructose-bisphosphate aldolase enzyme activity in root growth of rice [38].
Temperate japonica vs. Indica
In Temperate japonica vs. Indica scenario, a total number of 104 windows including 274 genes related to 87 biological processes were screened in the top 1% of Z(Fst) values (Fig. 6) with Tajima’s D values < 0 (Supplementary data 3 and Supplementary information, Table S2). The detected biological processes were not significant similar to Tropical japonica vs. Temperate japonica scenario. Additionally, there were not overlapped genomic regions underlying selection pressures among Tropical japonica vs. Temperate japonica and Temperate japonica vs. Indica scenarios (Fig. 7). In this scenario, we screened some candidate genes such as, Os01g0894000 (Calcineurin-related phosphoesterase-like), and Os10g0430200 (CAD1) related to drought tolerance [39], and lignification development [40] in rice, respectively. A former study revealed that lignification development can be associated with intensification of drought tolerance in rice [41]. The drought tolerance, as a complex trait, plays a key role in saving water and sustainable agriculture development, which have been the objectives of Indica breeding strategies [42], as they are broadly cultivated in tropical regions with warm climate [43].
Tropical japonica vs. Indica
A genome scan for signatures of selection between Tropical japonica and Indica subpopulations showed a total of 87 windows including 213 genes related to 64 biological processes in the top 1% of Z(Fst) values (Fig. 6) with a negative Tajima’s D value (Supplementary data 4 and Supplementary information, Table S3). This scenario had 61 and one overlapped windows with Temperate japonica vs. Indica and Tropical japonica vs. Temperate japonica scenarios, respectively (Fig. 7). The Os08g0559400 (OsCyp3), Os02g0462900, Os02g0332200 (T-complex protein 1 subunit delta), Os02g0782500 (18.6 kDa class III heat shock protein), and Os05g0533900 genes involve in protein folding (GO:0006457) as a significant biological process (q-value = 0.027) in Tropical japonica vs. Indica scenario. This biological process is related to heat shock proteins [44]. The heat shock proteins have a fundamental role in response to abiotic stressful conditions through helping to refold proteins which were damaged by heat stresses [44, 45]. The higher thermo- tolerability of Indica compared to the japonica subspecies was revealed in a comparative former study [46].
Indica vs. Aus
The number of signatures of selection windows in Indica vs. Aus scenario was 17, which included 55 genes related to 27 biological processes in the top 1% of Z(Fst) values (Fig. 6) with a negative Tajima’s D value (Supplementary data 5 and Supplementary information, Table S4). Among 17 detected windows underlying selection pressures, two overlapped windows (chromosome 4: 34.47- 34.57 Mb and chromosome 4: 34.50- 34.60 Mb) were observed in both Indica vs. Aus and Temperate japonica vs. Indica scenarios (Fig. 7). These genomic regions include seven genes (Os04g0676800, Os04g0675700, Os04g0676000, Os04g0676700, Os04g0676800, Os04g0677100, and Os04g0677200). There is a lack of information about biological processes related to these genes except for Os04g0676700, and Os04g0677100. The Os04g0676700 gene participates in response to salicylic acid (GO:0009751). Salicylic acid is a well-known molecule which involves in protection under biotic and abiotic stress responses [47, 48]. The Os04g0677100 involves in proteolysis (GO:0006508) and protein catabolic process (GO:0030163). These GOs can play important roles, such as senescing leaves [49] and releasing trans-membrane proteins [50].
Tropical japonica vs. Aus
A total number of 91 windows containing 254 genes were detected in the top 1% of Z(Fst) values (Fig. 6) with Tajima’s D values < 0 (Supplementary data 6). Our GO analyses showed that these genes are related to 77 biological processes (Supplementary information, Table S5). Among genes underlying selection pressures in this scenario, we detected two candidate genes including Os01g0266600 (Peroxiredoxin-2F or PRXIIF), Os03g0710500 (Heat shock 70 kDa protein BiP2 or BiP2) which are related to regulating root growth [51], folding and secretion of pathogenesis-related (PR) proteins [52] and stress response [53]. In the presence of salicylhydroxamic acid and cadmium, the PRXIIF gene can contribute to regulating root growth of rice [51]. In Arabidopsis, Wang et al (2005) showed the participation of BiP2 gene in folding and secretion of PR proteins during systemic acquired resistance; the PR proteins are classified as a conserved protein family which plays multiple roles in immune responses [54], plant development, and biotic and abiotic stresses [55]. Here, we also detected 6 genes related to phosphorylation (GO:0016310) including Os03g0755000, Os03g0756200, Os03g0686800, Os02g0241600, Os02g0769700, and Os05g0521300. Although this GO term was not significant in our gene ontology analysis, former studies revealed the effective role of phosphorylation in immunity of plants [56, 57]. For example, Matsui et al (2015) indicated that the phosphorylation of pto-interacting protein 1a can be important for activating rice immunity.
Aus vs. Temperate japonica
In Aus vs. Temperate japonica scenario, we screened a total number of 80 windows including 234 genes in the top 1% of Z(Fst) values (Fig. 6) with Tajima’s D values < 0 (Supplementary data 7 and Supplementary information, Table S6). These genes were related to 61 biological functions. However, the biological functions in this scenario were not significant in our gene ontology analysis. Additionally, our findings showed three genes including Os10g0430200, Os08g0463900, Os03g0810800 that are related to lignin biosynthetic process (GO:0009809), thylakoid membrane organization (GO:0010027) and response to water deprivation (GO:0009414), respectively. Lignin as a natural phenolic polymer is an important component of the cell wall structure in plants. The biosynthesis of lignin can widely participate in different traits such as responses to biotic and abiotic stresses through lignin accumulation which results in the accumulation of reactive oxygen species in plants, under such stresses [58]. The thylakoid membrane has an effective role in photosynthesis and electron transportation [59]. The chloroplast thylakoid membranes are dispersed in the stroma, where photochemical reactions take place in green algae and higher plants. Additionally, rice thylakoid membrane complexes can play a pivotal role in growth and crop production [60].
Overrepresented genes and GOs among the selection signature scenarios
In this study, we could not identify any overrepresented genomic region among all adopted selection signature scenarios. Conversely, gene ontology analyses screened four biological processes including proteolysis (GO:0006508), phosphorylation (GO:0016310), protein catabolic process (GO:0030163), and transmembrane transport (GO:0055085) which are underlying selection pressures (Fig. 7). Although, several involved candidate genes were intriguingly different from aforementioned biological processes among scenarios (Supplementary Table S2), which can be due to the different breeding programs or environmental conditions.