We obtained a total of 945,434 sequences from 78 crop samples at 26 sampling locations distributed across 8 Chinese provinces (Fig. 1a). Operational taxonomy units (OTUs) were demarcated according to 97% sequence similarity. Finally, a total of 4,473 OTUs were identified, and these were classified into 11 phyla, 26 classes, 69 orders, 151 families, 440 genera, and 1234 species. We detected 1399 OTUs per province on average, with a minimum of 620 OTUs for Chongqing (CQ) and a maximum of 2473 OTUs in Yunnan (YN) (Fig. 1b).
The classification tree for all samples visualized by GraPhlAn is shown in Fig. 1c, as well as the relative abundances of the microorganisms at the genus level is shown in Fig. 1d (Fig. S1 shows the community compositions at different taxonomic levels). A total of 440 genera were identified from the 78 samples, among which the top 13 genera (in terms of relative abundance) accounted for 82.09% of all OTUs for the entire dataset (details in Table S1). Notable genera from among these top 13 included Salmonella (4.3%), which is harmful for human health.
Species abundance distribution (SAD)
Species abundance distribution is a description of the abundance (number of individuals observed) for each different species encountered within a community; this can also be described as a community-level metric denoted by Φ(n) that tells us the probability of a given species having “n” abundance [24]. Thus, SAD is one of the major tools in ecology. In the present study, we applied common SAD models, including Poisson lognormal, log-series, Broken-stick, and Zipf to predict species abundance distributions of crop leaf bacterial communities. The Poisson lognormal model had the best fit for the SAD among all samples, and was able to explain 95.34% of the variation of the crop leaf bacterial SAD, thus outperforming the 84.77% for lognormal, 78.88% for log-series, 84.11% for Zipf, and 10.23% for the Broken-stick model.
Functional prediction with PICRUSt
We employed phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt) [25] to explore the potential functions of the microorganisms detected in our samples. The abundance distributions showed between 43.10%–50.48% of the predicted pathways from the OTU data were of the metabolism type (Fig. S2). In more detail, the values of the relative abundance for pathways including “carbohydrate metabolism”, “amino acid metabolism”, “energy metabolism”, “metabolism of cofactors and vitamins”, “xenobiotics biodegradation” and “metabolism, nucleotide metabolism, and lipid metabolism” were 8.91%–10.631%, 8.33%–10.27%, 4.60%–9.11%, 3.69%–5.48%, 2.13%–3.90%, 2.75%–3.19% and 2.71%–3.62%, respectively (Fig. S2).
α-Diversity analysis of the crop samples
We next examined microbial diversity and community composition by characterizing α-diversity. First, we confirmed that both the Shannon rarefaction curve and the species accumulation curve indicated sufficient sequencing depth to informatively reveal the community diversity of the samples (Fig. 2a, 2b). The α-diversity indices, including Shannon, Simpson, Chao1, and ACE, were applied to assess the microbial diversity of the samples. In the national scale, the Chao1 and ACE index values were quite high (Table S2), underscoring the high bacterial richness among the samples. Nevertheless, the values of the Shannon and Simpson index were quite low (Table S2), suggesting that the relative abundances of the taxa in a given sample differed greatly with other samples. The box plot of Shannon and Simpson index indicated that microbial diversity was highest in Fujian province and lowest in Henan province (Fig. 2c, 2d).
A national core bacterial community
We next used standard criteria for defining community selection to characterize the core bacterial communities at a country-wide scale (i.e., analysis of the samples from 8 provinces) based on abundance and frequency data (see Methods for details) [26, 27]. This type of analysis can identify the dominant species in each of the in situ environments represented by the 26 sampling sites. To determine the core community, OTUs were defined as abundant when they made up the top 80% of the reads in a sample (when ranked by decreasing OTU abundance) (Fig. 3a). When the cumulative read abundance reached 80%, 245 OTUs were observed. The frequency of each OTU was also considered in the core community selection. There are 24 OTUs present in more than 80% of the samples, and these together comprise 85.4% of the relative abundance for the whole data set (Fig. 3b). Following principles for core community selection, we cataloged a national core bacterial community, which included 0.31% of the OTUs (14 OTUs) yet occupied 31.01% (± 3.94%) of the total relative abundance of the whole data set comprising all 78 samples (Fig. 3c, 3d, Table 1). The national core community consisted of the following genera: Mastigocoleus (9.35% ± 6.89%), Pseudomonas (4.51% ± 2.43%), Salmonella (3.27% ± 6.24%), Leclercia (3.31% ± 2.81%), Methylobacterium (2.46% ± 1.69%), Enterobacter (2.17% ± 2.48%), Atlantibacter (2.14% ± 3.77%), Pantoea (1.93% ± 2.66%), and Sphingomonas (1.88% ± 1.57%).
Among these of the national core bacterial community, Salmonella enterica is a well-studied pathogenic bacterium, and is present in almost all of the samples. The function of Methylobacterium goesingense, Sphingomonas roseiflava and Sphingomonas aurantiaca are unclear. Mastigocoleus testarum is a kind of cyanobacteria previously implicated in biological erosion [28], and Atlantibacter hermannii was reclassified from Escherichia hermannii and Salmonella subterranean [29], but there is little other information available. The remaining 4 genera—Pseudomonas (Pseudomonas oryzihabitans, Pseudomonas straminea), Leclercia (Leclercia adecarboxylata), Enterobacter (Enterobacter soli), and Pantoea (Pantoea agglomerans)—have been implicated to function in processes including nitrogen fixation [30], heavy metal reduction [31], insecticidal activity [32, 33] and degradation of PAHs [34], lignin [35], and indole-3-acetic acid [36] (Table 1).
Provincial core bacterial communities
The core bacterial communities for each province were then determined using the same method as for the national core bacterial community (Table S3), and we prepared lists for beneficial bacteria that are specific to each province (Table S4). For example, in Yunnan province, OTU_4397 was close to Ochrobactrum anthropi, which is related to aniline degradation [37], and heavy metal detoxification [38]. Ochrobactrum lupine (OTU_8084) was core community member in Sichuan and Chongqing, and has been shown to degrade pesticides including chlorothalonil, beta-cypermethrin, and 3-phenoxybenzoic acid [39, 40]. Ochrobactrum pseudintermedium (OTU_8922), specific to Fujian, is known to degrade PAHs [41]. Moreover, Acinetobacter sp. (OTU_12502, OTU_9277, OTU_11315) was among the core bacterial communities of Guizhou, Fujian, Hunan, Chongqing, and Sichuan, and have been shown to degrade PAHs and nicotine and to detoxify heavy metals [42]. A recently reported cellulose-degrading specy, Beijerinckia fluminensis (OTU_2253, OTU_13065), was found in the core bacterial communities of both Fujian and Heilongjiang Province [43].
We also identified pathogenic bacteria among the province core communities. For instance, the black rot causal pathogen Xanthomonas campestris was in the Yunnan core [44]. We also found bacteria with reported links to human diseases, including Kluyvera intermedia [45] in the Chongqing core, Stenotrophomonas maltophilia [46] in the Heilongjiang core, Bordetella petrii and Bordetella hinzii [47, 48] in the Fujian core. In terms of both diversity and abundance of pathogenic bacteria, the Chongqing core community had the largest number (3 species, 8.85%).
β-Diversity analysis
We also conducted β-diversity analysis to characterize relationships among the samples. Two types of clustering analysis—PcoA analysis of unweighted Unifrac distance and UPGMA (unweighted pair-group method with arithmetic means)—both showed that the samples tend to cluster together by province, suggesting a significant effect from sampling location on microbial community composition (Fig. S3). A multi-response permutation procedure was applied to determine the composition difference between each province. More than half of the groups showed significant differences, again indicating that the microbial community composition varies greatly from province to province (Fig. 4a).
Redundancy analysis (RDA) and correlation with environmental factors
We employed RDA to explain the composition difference and to determine the influences of local environmental factors on microbial community composition, including temperature, precipitation, altitude, longitude, and latitude (Table S5). The microbial community composition of the samples in Yunnan province was most strongly affected by altitude, whereas that of Henan province was most strongly affected by latitude. However, none of the single environmental factors showed a significant effect on microbial communities at the national scale (Fig. 4b). Latitudinal‐diversity gradients for all samples were also analyzed, but no obvious influence was detected for latitude or longitude, which is consistent with previous reports of microbial latitudinal diversity gradients [49] (Fig. S4).
Locally specific factors drive community compositions
Our data support that the microbial communities are obviously affected by local environments, but did not exhibit obvious relationships to single ecological factors. Thus, we screened the significantly different species between different sampling locations with LEfSe (Linear discriminant analysis effect size) analysis attempting to unravel key factors influencing microbial community structure. This identified 8 species in Mangdui, Menglijiaojidi, Qujing, and Chuxiong of Yunnan province, Xuchang and Luoyang of Henan province, and Qianxinan of Guizhou province (Fig. 5, Table 2). Among these 8 species, Ochrobactrum anthropi from Mangdui and Pantoea agglomerans from Chuxiong have been previously reported as strains capable of heavy metal detoxification. Leclercia adecarboxylata was previously reported as a PAH-degrading strain. Enterobacter ludwigii from Menglijiaojidi is capable of crop-growth promotion and alkane degradation. Ochrobactrum lupini from Qujing was known to degrade pesticides (chlorothalonil and cypermethrin). The other three species identified in the LEfSe have no previously reported functional associations, but their genera do contain species capable of degrading PAHs and heavy metals detoxication.
We explored the potential local factors which have potentially fostered the high relative abundance of these locally specific contaminant-degrading strains, specifically by measuring heavy metal concentrations of (Cd, Cr, Cu, Pb and Zn) of 6 samples (Mangdui, Menglijiaojidi, Qujing, Chuxiong, Luoyang, and Qianxinan) as well as other 10 samples as references. Compared with other samples (including samples from the same province), Mangdui had the highest the Cd concentration, Chuxiong had the highest Cr concentration, and Luoyang had the highest Pb concentration, while Qujing had the second-highest Cu concentration of any site, and showed significantly higher Zn concentration among the detected samples (Fig. 5). These results are understandable given that Mangdui, Chuxiong, Qujing and Luoyang are areas with abundant complex metal ore and coal mines [50–53]. These LEfSe-based results support that both geographical and industrial factors have substantially driven the microbial community composition in the local environment of the testing sites.