3.1 Accuracy analysis of prediction model
When conducting zoning predictions by Maxent model, AUC value (ranging from 0 to 1) could reflect the accuracy of model predictions. A larger AUC value indicated higher precision of model. In this study, AUC value was 0.984 (Fig. 1), indicating that predicted results under different climatic scenarios were highly reliable.
3.2 Dominant habitat factors analysis and current zoning prediction
The relative contributions of 33 habitat factors were analyzed using the Maxent model and jackknife method. As shown in Fig. 2A, elev, bio_4, bio_13, bio_11, and S_clay exhibited relatively high contribution rates, with a cumulative contribution rate reaching 80.5%. Additionally, the permutation importance scores for elev, bio_4, bio_13, and bio_11 were also high (Fig. 2B), indicating that these climatic factors were potential key factors influencing the distribution of wild S. lanceolata, while S_clay may play a more significant role as a soil environmental factor. Single-factor habitat suitability analysis was conducted for elev, bio_4, bio_13, bio_11, and S_clay to select optimal habitat range suitable for wild S. lanceolata distribution. The results revealed that elev, bio_4, bio_13, and bio_11 exhibited relatively regular normal distributions (Fig. 2C). Specifically, the medium suitability and above zones for Elev were 1050.60 ~ 1633.13m, the high suitability zones were 1258.65 ~ 1397.34m, and the optimum value was 1334.93m; for bio_4, the medium suitability and above zones were 1080.44 ~ 1239.28, the high suitability zones were 1121.03 ~ 1186.33, and the optimum value was 1152.80; for bio_13, the medium suitability and above zones were 46.13 ~ 105.40mm, the high suitability zones were 69.61ཞ96.45mm, and the optimum value was 87.51mm; for bio_11, the medium suitability and above zones were − 10.67~-5.16°C, the high suitability zones were − 8.38~-6.97°C, and the optimum value was − 7.68°C. S_clay, as the most influential soil factor, exhibited a medium suitability and above zones of 2.87%~7.80%, a high suitability zones of 2.87%~6.00%, and an optimum value of 4.05%.
Under the current climatic conditions, the zoning prediction results were depicted in Fig. 2D. The high suitability zones of wild S. lanceolata were approximately 164709.45 km2, predominantly distributed in northern part of Ningxia province, central-southern regions of Inner Mongolia, and northern part of Shaanxi (adjacent zones of Ningxia province, Inner Mongolia, and Shaanxi), exhibiting a scattered distribution. Additionally, there were limited distributions in central-northern parts of Shanxi and central regions of Gansu. The medium suitability zones were mainly distributed in the surrounding regions of the high suitability zones, covering an area of approximately 281470.89 km2. The low suitability zones were primarily found in high-altitude cold regions such as the Qinghai-Tibet Plateau and the Tianshan Mountains, covering an area of about 72481.10 km2. The majority of remaining zones in China were classified as low suitability zones, with an area of approximately 9081338.57 km2. It was noteworthy that within the high suitability zones, scattered non suitability zones. These zones were mostly located in hinterlands of Maowusu Desert, Ulan Buh Desert, and Kubuqi Desert, while S. lanceolata mainly inhabited the desert grasslands on the edges of above deserts, and was absent from the central zones of them [4, 17]. It may be attributed to the more extreme habitats in the central regions of these deserts (scarce precipitation, scorching summer temperatures, and dry, cold winters) [17], which surpassed S. lanceolata adaptive capacity to the environments. In summary, wild S. lanceolata can generally survive nationwide, but its high suitability and medium suitability zones were limited. This reflected S. lanceolata strong environmental adaptability, indicating its capability to thrive in most habitats. However, during the long-term natural evolution and selection process, wild S. lanceolata distribution primarily concentrated in high suitability zones.
3.3 Future zoning and development trend prediction
Based on above studies, we further predicted future zoning, which revealed a shift in suitability habitat range (high + medium suitability zones) of S. lanceolata towards the northeast under 4 habitat backgrounds, primarily concentrating in central and northern zones of Inner Mongolia (Fig. 3A, Supplementary Fig. 1). Ningxia Province, as the largest cultivation zone, still predominantly concentrates its high and medium suitability zones in the central and northern regions. However, there is a diminishing trend in comparison to the current climatic backdrop. Conversely, zones in Shaanxi Province and Inner Mongolia Province exhibit an expanding trend. Notably, under 50s-126 climate background, unsuitability zones within original high suitability zones noticeably expands. Conversely, under 90s-126 climate background, unsuitability zones within the same zones decreased while high and medium suitability zones expanded. Under future climate backgrounds with continued high carbon emissions (50s-585 and 90s-585), more pronounced regional shifts in different suitability areas occurred. Besides northeastward migration of high and medium suitability areas, significant changes also occurred in non suitability areas. Specifically, the unsuitable zones within the high suitability zones noticeably decreased, while extensive unsuitable zones emerged to the north and northeast of this zone. This reflects that under a high carbon emission background, climate change may lead to more significant shifts in the distribution of S. lanceolata.
Area analysis under different grading zoning under different climate backgrounds (Fig. 3B, Fig. 3C) revealed that compared to current climate background, except for 50s-126, the total suitability (high + medium suitability) areas exhibited an expanding trend, with medium suitability areas showing a similar trend across all scenarios, with the most significant expansion observed under 90s-585; however, high suitability areas diminished in size across 4 climate backgrounds, with the most pronounced reduction under 50s-126 and the least under 90s-585. Low suitability areas demonstrated a decreasing trend across all scenarios except for 50s-126, with the most substantial reduction notably under 90s-585. Conversely, non suitability areas manifested an expanding trend under 4 climate backgrounds, particularly prominent under 50s-126. In summary, high suitability areas for future S. lanceolata may decrease, yet overall, suitability areas may expand, especially under high carbon emission scenarios, where medium suitability areas significantly enlarged, potentially providing a more substantial environmental foundation for S. lanceolata propagation and producing zones construction.
3.7 Analysis of the Diversity of Rhizosphere Microorganisms in S. lanceolata across Different Suitable Zones and Planting Years
Using amplicon sequencing, we analyzed the characteristics of rhizospheric soil microbiota in high suitability zones, medium suitability zones for S. lanceolata planted in different planting years. The Shannon Index analysis (Fig. 4A) indicates that bacterial α-diversity is highest in the rhizospheric soil of the S. lanceolata's high suitable zones (HS). In contrast, fungal α-diversity in the HS is lowest compared to that of the medium suitable zones (MS). And more abundant bacterial α-diversity was showed in increasing planting years of S. lanceolata. Principal Co-ordinates Analysis (PCoA) demonstrated significant differences in both fungal and bacterial β-diversity between HS and MS I, MS II, and MS III, with PCoA scores primarily located in the third quadrant for HS, the first quadrant for MS III, and the fourth quadrant for MS I and II (Fig. 4B, Supplementary Fig. 2). Further analysis of community composition of rhizosphere soil microorganisms at genus level revealed that dominant fungal taxa in HS were Aspergillus, Acremonium, and Filobasidium, while Gibberella, Acremonium, Alternaria, and Fusarium predominated in MS (Fig. 4C). In terms of bacteria, Arthrobacter and Nocardioides were dominant genera (Supplementary Fig. 3).
Using LefSe analysis, a comparison of significantly different microbial taxa between HS and MS III was conducted. The results revealed enrichment of beneficial bacterial genera such as Mesorhizobium, Agromyces, Streptomyces, and Bacillus, as well as beneficial fungal genera including Acremonium and Penicillium, with higher abundance in HS. Conversely, MS III exhibited enrichment of beneficial bacterial genera such as Arthrobacter, Sphingomonas, and Microvirga, along with higher abundance of harmful fungal genera including Gibberella, Alternaria, Sarocladium, Cladosporium, Paraphoma, Phaeomycocentrospora, and Leptosphaerulina (Fig. 4D, Supplementary Fig. 4). Comparison across different planting years revealed that relative abundance of harmful fungal genera such as Aspergillus, Sarocladium, and Paraphoma increased with increasing planting years, while beneficial fungal genera including Arthrobacter, Nocardioides, Mesorhizobium, and Chaetomium gradually decreased in relative abundance (Fig. 4E, Supplementary Fig. 5).
3.6 Metabolomic analysis of S. lanceolata in different suitability zones.
To further investigate the impact of different suitability habitats on S. lanceolata, representative S. lanceolata from different suitability zones (HW: SM, YCW, and SBZ; MC: QYQ and HSP; LC: LJT and HZ) were subjected to metabolomic analysis. Adequate amounts of all S. lanceolata samples were pooled to prepare quality control (QC) samples. The metabolites within samples were separated and detected using UHPLC-Q-TOF MS, with 11 repeated measurements. As evidenced by Principal component analysis (PCA) (Fig. 5A), QC samples clustered tightly, indicating great reproducibility. Furthermore, samples from HW, including SM, YCW, and SBZ, exhibited minimal variability and similar principal component characteristics. In contrast, significant variability was observed between MC and LC, with relatively less inter groups differences between MC and LC. The acquired data were matched against database established by Shanghai Applied Protein Technology based on standard substances, incorporating information on metabolite retention times, molecular weights (with errors < 25 ppm), secondary fragmentation spectrum, collision energy, etc. A total of 1586 substances were identified, comprising 880 in positive mode and 706 in negative mode (detailed information provided in Supplementary Table 4). These substances were classified into 13 major categories, predominantly encompassing lipids and lipoids molecules, organic acids and derivatives, organic heterocyclic compounds, phenylpropanoids and polyketides, as well as benzene ring type compounds (Fig. 5B). Based on identified substances, systematic cluster analysis was performed on S. lanceolata from different suitability zones. The results (Fig. 5C) revealed that HW formed a distinct cluster, whereas MC formed another, at the same time, MC and LC formed a distinct cluster, aligning with trends observed in principal component analysis. This further reflected discernible differences in metabolite profiles between MC, LC, and HW.
To analyze the differential metabolites between MC and LC compared to HW, OPLS-DA analysis, FC analysis, and T-test analysis were separately conducted for MC vs HW and LC vs HW. Differential metabolites with a VIP > 1, FC < 0.67 or FC > 1.5, and a p < 0.05 were selected. Ultimately, 281 differential metabolites (up: 226; down: 55) were identified from MC vs HW, and 370 differential metabolites (up: 222; down: 148) from LC vs HW. These differential metabolites mainly comprised lipids and lioids molecules, polyketides, organic heterocyclic compounds, organic acids and derivatives (Fig. 5D). Venn diagram (Fig. 5E) revealed 143 commonly shared differential metabolites between two comparison groups, with 30 metabolites showing higher expression in HW compared to MC and LC, including various bioactive secondary metabolites such as Mangostine, β-glycyrrhetinic acid, Arctiin, and Cucurbitacin e, etc. They further reflected HW characteristics. And KEGG pathway enrichment analysis of differential metabolites from MC vs HW and LC vs HW demonstrated shared enrichment in Metabolic pathways, Biosynthesis of secondary metabolites, Carbon metabolism, and C5-Branched dibasic acid, etc. (Fig. 5F, Supplementary Fig. 6). Specifically, MC and LC exhibited advantageous enrichment in multiple metabolic pathways including Biosynthesis of secondary metabolites, while HW showed enrichment in C5-Branched dibasic acid metabolism pathway.
3.4 Near infrared spectrum analysis of S. lanceolata in different suitability zones.
Near infrared spectrum can reflect physical properties and compound characteristics of substance-rich materials, commonly employed for the authentication and quality evaluation of medicinal materials [18]. In this study, Near infrared spectrum analysis was conducted on 19 distinct sources S. lanceolata, resulting in acquisition of their spectral profiles. As illustrated in Fig. 6A, S. lanceolata characteristic spectral information primarily concentrated in long wavelength region of near infrared (4000 cm− 1 ~ 9091 cm− 1), with various sources of S. lanceolata exhibiting similar spectrum and characteristic peaks, albeit with discernible differences in peak intensities, reflecting their comparable chemical compositions while potentially differing in contents. To address the issue of significant information overlap due to broad bandwidths of near infrared spectrum and better elucidate inter samples variances, data preprocessing involving baseline correction, second derivative transformation, gaussian smoothing (7 points), and normalization analysis was applied to the near infrared long wavelength spectrum, followed by hierarchical cluster analysis. The results (Fig. 6B) revealed that samples collected from high suitability zones (ALS, SBZ, GSW, ALZ, SM, ZE, ZTL, and YCW) clustered together, while those from medium and low suitability zones (HSP, MD, QYQ, SYG, TFC, NY, ZJS, HZ, LD, LJT, and YP) formed another cluster. This delineation underscored the differing near infrared spectrum features of S. lanceolata from high, medium and low suitability zones, indicative of variances in S. lanceolata quality and intrinsic component contents across distinct suitability zones.
3.5 Quality Evaluation of S. lanceolata in different suitability zones.
The contents of 7 major active ingredients (MA, TS, TF, DB, QG. DA and PA) were determined in S. lanceolata samples collected from different zones. The results (Fig. 7A) revealed trends in the average contents of TS, TF, DB, and QG among S. lanceolata from different zones, showing HW > MC > LC, with notably higher levels of QG (0.312 mg·kg− 1) and TF (2.772 mg·kg− 1) in HW compared to MC (0.233 mg·kg− 1, 2.270 mg·kg− 1) and LC (0.229 mg·kg− 1, 2.122 mg·kg− 1) (p < 0.05). Moreover, HW exhibited significantly higher DB (0.057 mg·kg− 1) compared to LC (0.042 mg·kg− 1). The average contents of ME and DA in S. lanceolata from different zones followed a trend of LC > MC > HW, in which LC showing significantly higher ME (37.0%) and DA (0.044 mg·kg− 1) compared to HW (32.0% and 0.026 mg·kg− 1). Furthermore, there were no significant differences in the contents of PA and TS.
To further analyze the relationship between zoning and contents of main active ingredients and quality of medicinal materials, principal component analysis was conducted on S. lanceolata 7 indicators from different zones. Based on the contributions of different indicators to different principal component factors, a comprehensive score equation was constructed to comprehensively evaluate S. lanceolata quality from different zones. As Fig. 7B showed, first three principal components had eigenvalues greater than 1 (2.95, 1.46, 1.01), with a cumulative variance contribution rate reaching 77.48% (42.21%, 20.85%, 14.42%), which could represent overall information of all S. lanceolata. PC1 and PC2 were selected to construct a biplot of principal component scores of S. lanceolata from different zones and load matrices of different indicators. The results (Fig. 7C) showed that the principal component scores of 8 S. lanceolata from HW were mainly concentrated in the first and fourth quadrants (ALS, SBZ, GSW, ALZ, SM, ZE, ZTL, and YCW), exhibiting a certain clustering trend, while S. lanceolata from MC and LC were mainly concentrated in the second and third quadrants (HSP, MD, SYG, TF, NY, ZJS, HZ, LD, LJT, and YP). Meanwhile, the loadings of TS, QG, DB, and TF tended to the principal component scores of HW S. lanceolata, indicating that 4 indicators were more reflective of the characteristics of HW S. lanceolata, whereas the loadings of DA, ME, and PA tended to the principal component scores of MC and LC samples, indicating that 3 indicators reflected more characteristics of MC and LC.
Further standardizing all relevant indicators of S. lanceolata, and calculating formulas for the scores of each principal component based on eigenvalues and loadings: F1 = -0.277*X1 + 0.134*X2 + 0.229*X3 + 0.28*X4 + 0.229*X5 − 0.054*X6 − 0.241*X7、 F2 = -0.076*X1 + 0.526*X2 − 0.324*X3 + 0.095*X4 + 0.195*X5 + 0.411*X6 + 0.277*X7、 F3 = 0.103*X1 − 0.026*X2 − 0.321*X3 + 0.305*X4 + 0.343*X5 − 0.704*X6 + 0.401*X7 (X1、 X2、 X3、 X4、 X5、 X6 and X7 represented ME、 TS、 TF、 DB、 QG、 PA and DA respectively). Finally, using variance contribution rate of each principal component as weights, the comprehensive evaluation equation for S. lanceolata was obtained by summing the scores of each principal component multiplied by their corresponding weights. The formula was F = -0.118*X1 + 0.162*X2 − 0.017*X3 + 0.182*X4 + 0.187*X5 − 0.039*X6 + 0.014*X7. Calculating S. lanceolata scores from different zones using it, the results were shown in Fig. 7D. HW generally had higher scores, and overall scores were higher than MC and LC, indicating superior quality of HW.