Genetic variation of A. ginnala populations
A total of 170 bands were amplified with the sequence-related amplified polymorphism (SRAP) marker, and 100% of the bands were polymorphic. For the simple sequence repeats (SSR) markers, there were 177 polymorphic loci in 179 putative genetic loci, with the percentage of polymorphic bands (PPB) being 98.99%. According to the SRAP markers, the HHG population had the highest genetic diversity, followed by QLY and BYS. Based on the SSR markers, the highest genetic diversity was present in the PQG population, followed by XTS and BYS (Table 1).
At the species level, A. ginnala exhibited a high level of genetic diversity using the two types of markers (I SSR = 0.561, I SRAP = 0.5044; He SSR = 0.384, He SRAP = 0.3366), which was higher than the mean values at the population level (I SSR = 0.086, I SRAP = 0.057; He SSR = 0.056, He SRAP = 0.038) (Table 1).
Through the STURCTURE results, we further analyzed the level of genetic diversity of the different groups. We combined two types of markers, Group I presented a relative genetic diversity, where the Shannon’s Information index I was 0.087SSR and 0.059SRAP, and the expected heterozygosity He was 0.056SSR and 0.039SRAP, respectively (Table 1), following Group III and Group II (Table 1).
To further elucidate the relationships between the populations, a cluster analysis was implemented. The dendrogram constructed from the SRAP markers divided the 19 populations into three main clusters (Fig. 1A). Similar results were obtained with the SSR markers (Fig. 1B). Within these three clusters, eight populations (BDG, HJG, YDS, PQG, QLY, HHG, XTS, and JMLC) from Shanxi Province were gathered into a cluster. Three populations (WCLC, FZL, and TTZ) from Anhui Province formed a group, whereas the eight other populations (SFS, MLG, TBD, TBS, BYSM LJL, LJS, and LTG) from Beijing, Inner Mongolia, and Henan Provinces were clustered together.
To reveal the patterns of genetic distribution of this species, we performed a population structure analysis using STURCTURE. The STRUCTURE results suggested that the best grouping number (K) was based on the ΔK, where all of the populations were divided into three groups, which was consistent with the cluster analysis (Fig. 2). Additionally, there were some admixed individuals among populations, which indicated that gene exchange occurred between them.
With SRAP markers, AMOVA analysis revealed that 88% (ΦST SRAP = 0.88) of the total genetic variation was found between the populations (P = 0.01), whereas the remaining 12% of the total variation occurred within the populations (Table 2). According to the SSR markers, the genetic differentiation between the populations was 84% (ΦST SSR = 0.84, P = 0.01), indicating that 16% of the total variance occurred within the populations. The two markers indicated a high level of interpopulation genetic differentiation and low level of intrapopulation genetic differentiation in A. ginnala (Table 2). Among the different groups, the variation was ~ 40% (ΦST SSR = 0.40, ΦST SRAP = 0.42, P = 0.01) of the total variation, while there was ~ 60% variation within the groups (Table 2).
In the DAPC analysis, which employed one discriminant function to distinguish five principal components (PCs), three groups were present on the two main axes and a scatter plot of the discriminant analysis (Fig. 3). Every cluster was clearly differentiated using the two main DA eigenvalues and were represented according the provinces of origin. The DAPC results were similar to the STRUCTURE results; however, it was possible to assign admixed individuals to multiple clusters [27].
Predicted spatial distribution areas
Within the 19 bioclimative variables, seven variables were selected. They were bio1 (Annual Mean Temperature), bio2 (Mean Diurnal Range), bio3 (Isothermality), bio4 (Temperature Seasonality), bio13 (Precipitation of Wettest Month), bio15 (Precipitation Seasonality), and bio18 (Precipitation of Warmest Quarter). Among these seven bioclimatic variables, the first four (bio1, bio2, bio3, and bio4) were associated with the temperature dimension, while the last three (bio13, bio15, and bio18) were associated with the precipitation dimension. The explanatory percentage of the first two PCs of the temperature dimension was estimated to be higher than that of the precipitation dimension, which suggested that the temperature might be more relevant in explaining the geographic distribution.
Ecological niche modeling (ENM) was further employed to predict the suitable distributions of A. ginnala,as well as to examine the key climatic variables in the prediction. Distribution models showed high discrimination performance (Fig. 4). The cross-validation area under the curve (AUC) value for all models was 0.989, indicating that 98.9% of the records were correctly predicted.
The ranges predicted by the seven bioclimatic variables (bio1, bio2, bio3, bio4, bio13, bio15, and bio18) were roughly consistent with the currently known distributions (Fig. 4A). The large areas of Shanxi and Shaanxi Provinces were high suitable regions for A. ginnala. Some areas of Jiangsu, Anhui, and Hubei were also the most suitable habitat for this species. Aside from these areas, there were small regions Northeast China that were suitable for the growth of A. ginnala.
In LIG, the potential distribution range expanded, compared with the current distribution. The most suitable areas (> 0.50) in Northeast China disappeared (Fig. 4B). The additional most suitable areas still existed and their extent was broader than currently. Compared with the LIG, the most suitable areas (> 0.50) increased significantly during the LGM, particularly in Shanxi and Shaanxi Provinces (Fig. 4 C and D). There were large suitable areas in the southern portions of Shanxi and Shaanxi Provinces. Moreover, some of the most suitable areas appeared in the Ningxia Autonomous Region. In Northeastern China, the most suitable areas were also expanded compared with that of the present day and LIG.
Influence of environmental factors
To assess whether geographic or environmental differences drove genetic divergence between populations, isolation-by-distance (IBD) and isolation-by-environment (IBE) tests were conducted using the Mantel test. The Spearman correlation showed no significant association between geographic and environmental distances (ρ SRAP = –0.1532, P = 0.284; ρ SSR = –0.0133, P = 0.474), or between genetic and geographic distances (ρ SRAP = 0.1943, P = 0.051; ρ SSR = 0.1549, P = 0.125). However, a significant relationship between genetic and environmental distances was found (ρ SRAP = 0.4068, P = 0.001; ρ SSR = 0.2647, P = 0.02). The partial Mantel test did not detect a significant correlation between genetic and geographic distance when conditioning on the environmental effect (ρ SRAP = 0.2187, P = 0.307; ρ SSR = 0.1643, P = 0.101). Nevertheless, there was a significant association between genetic and environmental distances, when the geographic distance was considered as a covariate in the partial Mantel test (ρ SRAP = 0.2187, P = 0.037; ρ SSR = 0.2701, P = 0.014). In terms of the joint effects of geographic and environmental distances, the geographic distance did not have an impact, while the environmental distance did impact the genetic distance significantly (r SRAP2 = 0.1686, β geo = 0.4709, P = 0.052, β env = 0. 3865, P = 0.001; r SSR2 = 0.1344, β geo = 0.2847, P = 0.215, β env = 0. 1728, P = 0.003). These results revealed that the genetic variation, or gene flow, of the populations were linearly correlated with climatic differentiation, but not geographic differentiation.
When conditioned on the geographic distribution, we found that more 40% of the variation (45.71% by SRAP data, 40.42% by SSR data) could be explained by climatic variables (Table 3). The ANOVA further indicated that seven predictors (bio1, bio2, bio3, bio4, bio13, bio15, and bio18) significantly explained the genetic components of the population (P <0.0001), and that bio2 and bio3 had the highest explanatory proportions for predicting the genetic variation of the population. Three bioclimatic variables (bio1, bio2, and bio3) of the temperature dimension had significant F statistics (adjusted R2 = 0.0259, 0.0400, and 0.0386, P < 0.05, Table 3) through SRAP data. According to the SSR data, two temperature variables (bio1 and bio3) had significant F statistics (adjusted R2 = 0.0067 and 0.0065, P < 0.05, Table 4). Only three variables (bio3, bio4, and bio13) were significantly correlated with the ordination axis1, while the other four variables (bio1, bio2, bio15, and bio18) were significantly correlated with axis 2 of dbRDA (Table 3).