Microbial communities in agricultural soil are essential components of crop and soil health, yet interpreting these highly dimensional datasets has proved a challenge and can hinder conceptual and applied advances. Research efforts are turning to machine learning as a way to use microbiomes to predict soil health [38]. Our results clearly showed that fertility source exerts the most pronounced impact on the microbial community, an influence that diminishes with increasing depth. This fertility source plays a pivotal role in molding the fungal community and profoundly affects the co-occurrence patterns within the microbial community, leading to the fertility source-sensitive modules dominated by fungi in the 0-30cm depth range. The influence of tillage on microbial communities is primarily observed at depths of 0–20 cm. This not only augments the dispersal capability of microorganisms but also amplifies the stochastic process, potentially jeopardizing the interactions within the microbiome. In contrast to the fertility source and tillage, cover crops have a less pronounced effect on microbial communities, lacking a clear pattern as depth increases; however, they still influence the co-occurrence patterns of microbial communities to some extent.
3.1 Machine learning-based analysis on community structure and environmental influences
Traditional ordination and correlation analyses, which often assume linearity in microbiome data, may lead to underestimating the influence of environmental variables on community assembly. We hypothesize that the disparities may be due to depth-based linear differences in the prokaryotic community, whereas the fungal community is more influenced by fertility sources, which are not easily described by a linear model. Consequently, more information is lost when reducing the fungal community data to two dimensions using PCoA. We found that the Random Forest (RF) model successfully estimated a large proportion of the variation of microbial community composition and explained nearly 20% more prokaryotic variation, which can be influenced by the environmental factors measured. Of the environmental variables and treatments incorporated into the SHAP summary, organic matter was the most influential environmental variable for the prokaryotic community and the second most influential variable for the fungal community. As soil depth increases, organic matter proportions decline, consequently shaping microbial communities at varying depths.
In farming practices, the fertility source stands out as the most influential treatment factor for the microbial community, as the most important factor for the fungal community, but less pronounced for the prokaryotic community. A recent review identified that specific bacteria responded to inorganic and organic fertilizers differently, but for the overall richness and diversity of the bacterial community, inorganic compared with organic fertilizers had no significant effect [39]. In soils between 0–60 cm depth, we found tillage and cover crops exhibited minimal influence on the microbial community's structure. Tillage only significantly affected prokaryotic communities, while cover crops impacted only fungal communities. This differed somewhat from patterns found with traditional permutational ANOVA where tillage type had a significant influence for both prokaryotic and fungal communities, but the relative influence was less than 2% (Bier et al. in revision). This analysis agreed though that the presence of cover crops significantly, but mildly, affected fungal community composition. One limitation to consider is that our analysis modeled soil properties in conjunction with agricultural practices. Since agricultural practices directly and indirectly influence soil properties, this approach could potentially understate the significance of agricultural practices by not accounting for this relationship.
3.2 Tillage enhances stochastic processes shaping shallow microbial communities
In shallow soil (0–20 cm), both stochastic processes and the immigration rate of the community are notably more pronounced than in deep soil (20–60 cm). This phenomenon can relate directly to tillage practices. Our comparison of the neutral community model (NCM) between fully tilled and reduced-till soils supports this assertion: in fully tilled soil, both community immigration rates and explanatory power of the NCM model are higher than those in reduced-till soil. Additionally, our machine learning model further demonstrated the limited effect of tillage in soil layers between 20–60 cm. This is because soils typically are only tilled above 25 cm depth, resulting in greater microhabitat heterogeneity among microbial communities below this layer [40].
Disturbance such as soil tillage can disrupt physical and chemical soil structure and change soil moisture content and depth-associated environmental parameters which are influential for deterministic processes in community assembly. Furthermore, the disturbance of soil macroaggregates and pore networks leads to a homogenization of ecological niches, thereby increasing the influence of stochastic processes on shaping microbial communities in shallow soil layers [7, 41]. Consistent with previous studies, our results indicate that stochastic processes are more important in shaping prokaryotic communities than fungi [41]. This observation was attributed to the fact that these experiments were conducted on cultivated land. Intensive land use, such as cultivation, tends to amplify the influence of stochastic processes on the assembly of soil bacterial communities across extensive depth ranges. In contrast, stochastic processes for fungi appear to be less affected by these disturbances [42].
Higher microbial community immigration rates occurred at shallow depth compared to deeper soil, indicating that the dispersal of most microbial taxa is enhanced at shallower depth. Two factors may contribute to these results. Firstly, tillage practices can mix soil layers and increase the potential for soil erosion and runoff [43], thereby physically relocating microbes and enhancing their migration capabilities by reducing dispersal limitations. Within the tillage layer, prior research has found that tillage increased microbial dispersal and thus, the homogeneity of soil bacterial communities across microhabitats [44]. Secondly, wind and water flows are more likely to facilitate dispersal at shallow depths [45, 46] especially if agricultural soils are compacted and infiltrate poorly. Therefore, the hydrological connectivity inherent in shallow soil could be another key factor contributing to this enhanced dispersal ability [36]. Additionally, we found that prokaryotes have greater dispersal ability than fungi, which has been consistently shown in previous work and is attributed to their smaller dispersal limitations due to their smaller body and unicellular growth [42, 47, 48]. Dispersal limitation can intensify the priority effect in assembly [42, 49]. This implies that species arriving earlier alter resources or conditions, affecting the establishment of subsequent species [50]. As a result, there is increased heterogeneity in fungal [51] and prokaryotic communities in deeper soil layers (20–60 cm) (Bier et al. in revision).
3.3 Biotic factors strongly influence microbial communities across soil depths
The Neutral Community Model (NCM) postulates all community members have equal chances of birth, death, and relocation, regardless of species [52]. Such a notion might be seen as counterintuitive by many microbiologists because the ability to grow, reproduce, and disperse varies across biotic and abiotic gradients [53, 54]. In contrast, it is well-recognized that microbial survival and growth in soils face challenges from persistent abiotic stressors, including limited and fluctuating water availability, a dearth of organic carbon substrates, and acidic conditions [11]. Furthermore, microorganisms are always entwined in complex ecological interaction webs, with interactions that can significantly affect involved species [55]. To compensate for the limitations of the neutral model, we applied the Random Forest (RF) model based on both biotic and abiotic factors [56]. Consistently, our findings underscore that, within each soil depth range, biotic factors outweigh abiotic ones in determining the distribution patterns at each depth range of both prokaryotic and fungal taxa. This reinforces the notion that interactions among microorganisms are a crucial component shaping microbial communities. Given the shortcomings of the NCM in fully capturing the relationships between co-occurrence and abundance in some microbial communities, we surmise that these gaps may be filled by considering the complex biotic interactions that occur among microorganisms. Prokaryotic interactions are more important than fungal interactions across the entire soil depth range, especially at shallow depth. A similar pattern also occurred in soil microbial communities from grassland mesocosms where bacterial taxa were more highly and consistently connected compared to fungi [57].
The proportion of predicted ASVs based on abiotic factors is limited for both prokaryotes and fungi, which might be because the abiotic factors we measured contained low heterogeneity at the local scale [58]. Our previous analysis emphasized that soil depth is the primary driver of microbial community changes, and the environmental variables we considered are closely associated with soil depth (Bier et al. in revision). Therefore, when we conducted models for individual soil layers, these environmental variables exhibited minimal fluctuations, complicating our ability to discern their environmental filtering effects on microbial communities. Additionally, our study may have some limitations due to the scope of the environmental variables measured, a limitation acknowledged in previous studies [37, 59]. Our focus was primarily on soil bulk physical and chemical properties, which may not sufficiently address the intricate spatial heterogeneity observed in soil microbial communities at the centimeter scale [11, 60]. The predictability fraction of prokaryotes was higher than that of fungi at all depths, while the predictability of both prokaryotes and fungi decreased with depth, these two phenomena are primarily due to increasing community heterogeneity (Bier et al. in revision). This divergence of sample variation occurs without a corresponding increase in sample size, making it challenging for machine learning models to accurately examine the relationships between microbial communities and various factors.
3.4 Soil depth affects soil microbial co-occurrence patterns
To deepen our understanding of microbial interactions across soil depth, we conducted co-occurrence network analyses at each soil depth. Remarkably, only a scant 3% of the edges appeared in more than one network, highlighting the strong influence of soil depth as a structuring mechanism. Additionally, the networks displayed substantial variations in topological properties at different depths. These findings highlight the pronounced heterogeneity of microbial co-occurrence patterns across different soil depths and the potential for a high degree of locally abundant specialist taxa. In a broad soil survey that operationally defined generalist and specialist taxa by their abundance and prevalence, soils contained twice as many generalist taxa as specialists, but the overall percentage was low (1% specialists vs 2% generalists) [61].
The co-occurrence network at the 10–20 cm soil depth had notably fewer nodes and edges compared to the networks in the adjacent soil layers. We speculate that this discrepancy is attributed to the predominance of stochastic processes within the 10–20 cm layer. This is supported by the optimal fit of the Neutral Community Model (NCM) within this depth range. Moreover, the heterogeneity of both prokaryotic and fungal communities was lowest at this depth (Bier et al. in revision). Further, the average degree, edge density and average clustering coefficient were the highest for the network in 30-60cm. This suggests a greater percentage of interacting ASVs are present at a depth of 30–60 cm. This phenomenon could be attributed to the soil's depth being beyond the reach of tillage, which is thought to disrupt microbial community structures [62].
3.5 Agricultural practice affects co-occurrence patterns of microbial communities
To explore the impact of different farming practices on microbial contribution patterns, we identified agricultural practices-sensitive ASVs. Notably, the number of ASVs sensitive to tillage is far fewer than those sensitive to fertility sources and cover crops, and these are primarily concentrated in the 0–20 cm soil depth. The distribution of degrees among these agricultural practice-sensitive ASVs varies in the co-occurrence network. Specifically, the mean degree for tillage-sensitive ASVs is lower than the overall mean degree for all ASVs. In contrast, the mean degree for ASVs sensitive to fertility sources and cover crops exceeds the overall mean. These results suggest that variations in cover crops and fertility sources contribute to the co-occurrence of specific microbial groups, whereas tillage does not exhibit such an effect. This may be more evidence that tilling practices cause damage to the network of microbial communities [62].
The reason for the differences in the effects of fertility source, cover crops, and tillage on microbial co-occurrence networks may be the input of organic matter. Fertilizer applications are known to enhance soil nutrient levels [63]. Similarly, cover crops benefit soil health by adding organic carbon through roots, root exudates, and above-ground residues [64, 65]. These specific forms of organic matter inputs recruited diverse microbial communities, which in turn influence patterns of microbial interactions. Unlike fertility sources and cover crops, tillage does not introduce additional organic matter into the soil although it does expose deeper layers to the organic matter remaining on the surface from crop residues. Ultimately though, tillage accelerates the decomposition of organic matter, causing a faster and more uniform release of nutrients throughout the tilled soil layers, which promotes fast-growing microorganisms [7, 66]. In our study, we observed little variation in microbial co-occurrence patterns between fully tilled and reduced-till treatments. Combined with our previous analyses, these findings suggest that reduced tillage frequency mainly influences the stochasticity of microbial processes within the 0–20 cm soil layer and alters the dispersal capabilities of the microbial community.
In co-occurrence networks at soil depths of 0–10 cm, 10–20 cm, and 20–30 cm, we identified modules containing high proportions of ASVs responding according to fertility sources. This suggests that different fertility sources can recruit similarly large groups of unique microbial taxa and affect soil microbial co-occurrence patterns. This observation aligns with results from a study on conventional and organic management systems [8]. However, in contrast to their results [8], we did not identify any module sensitive to tillage intensity. Additionally, we did not observe such clustering with respect to cover crops, potentially indicating that the impact of cover crops on microbial communities is less pronounced, and thus, insufficient to form modules in microbial co-occurrence networks across depth.
Microbial taxa that demonstrate strong interconnections and exert a significant impact on their communities are referred to as hubs. Abiotic factors may directly affect these hub microbes, who then influence the broader microbial community through microbial interactions [67]. Our findings reveal a substantial proportion of the relative abundance of ASVs within the fertility source-sensitive module that can be predicted by other ASVs. Notably, with increasing depth, fungi progressively diminish their presence in the network; nevertheless, they consistently maintain a significant presence among the hub taxa. This observation underscores the pivotal role of fungi in driving the distinct contribution patterns resulting from differences in fertility sources. Considering the limited impact of environmental factors on predicted microbial abundance, we hypothesize that the initial impact of fertility sources likely influences specific fungi, the effects of which subsequently cascade onto the broader microbial community. This phenomenon parallels the concept discussed by others [67]. Archaea, while a small proportion of hub taxa in the legume module, included important functional groups from families Nitrosotaleaceae and Nitrososphaeraceae. These groups contain ammonia oxidizers that may benefit from nitrogen-fixing legumes and Nitrososphaeraceae has been previously identified to increase activity in bulk soil after the pea legume, Pisum sativum, was grown [68].
3.6 Primary biomarkers and hub taxa
The primary biomarker and hub taxa were of varied ecological relevance to the soil ecosystem. When identifying fertility sources using RF models, fungi predominantly exhibited organic farming-enriched taxa as biomarkers, while the biomarkers for prokaryotes appear to be more balanced among the fertility sources. This may be due to the expanded range of metabolic lifestyles in prokaryotes while fungi are heterotrophic. Additionally, although many of the prokaryotic biomarkers occurred in the modules, none of these were identified as hub taxa while about 38% of fungi biomarkers were hub taxa. This suggests that the biotic interactions of fungi were more unique to different fertility sources while prokaryotic key biota responsible for the majority of biotic interactions were not specific to any one fertility source. Instead, the unique prokaryotic biomarkers were driven by abiotic interactions related to physical and chemical changes to soil from the agricultural managements.
Organic-based fertilizers were most strongly associated with fungal family Chaetomiaceae, the majority of which are saprotrophs (decomposers) and some species are used to decompose plant biomass (reviewed in [69]). Family Magnaporthaceae was also strongly associated with organic fertilizers and this family contains species found in plant tissues including roots and have a nectrotroph lifestyle [70]. These taxa may contribute to the decomposition of greater organic residues occurring from manure and legume fertility sources. A prokaryotic biomarker in genus Nitrospira was more often found in fields fertilized with manure which could benefit the nitrite-oxidation metabolism of these bacteria [71]. The fungal taxon Corynespora cassiicola was both a hub taxa and biomarker of synthetic fertilization and is a well-known plant pathogen that is associated with target spot disease in several plant species including soybeans (Glycine max) (reviewed in [72]), suggesting greater susceptibility of species in synthetically fertilized fields in conventional agriculture.
RF models employed to discern tillage practices identified till-enriched taxa as the primary biomarkers for both prokaryotes and fungi. The fungal family Didymellaceae marking full tillage is a broad group populated with many species and occurs in diverse ecosystems, but most taxa are plant pathogens [73]. Nectriaceae, a fungal family marking reduced tillage, has around 900 species that use saprotrophic or plant pathogenic lifestyles [74], suggesting an important connection with decomposition and plant health under reduced tillage. Prokaryotic families marking full tillage included Chitinophagaceae, some species of which can decompose chitin [75] which is available in the soil from organisms including fungi and insects and is potentially enhanced by tillage-associated mortality of these groups.
Although AUC scores for cover crops were not strong, several important taxa were identified as biomarkers for fields with and without cover crops. In the family Herpotrichiellaceae, species Exophiala equina was associated with cover crops and previously has been found with root mycorrhizae [76]. Fungal family Dictyosporiaceae that was a biomarker for fields without cover crops contain saprotrophs and are globally distributed [77]. Prokaryotic biomarker from class Dehalococcoidia that was associated with cover crops has also been found to increase in previous agricultural fields after afforestation [78]. The genus of family Rhizobiaceae that was a marker for cover crops, Pseudaminobacter, is commonly found in agricultural soils and is related to bacteria that form root nodules; this genus also contains atrazine-degrading species [79]. Family Oligoflexia which indicated fields without cover crops, includes many predatory bacteria [80] that may be enhanced when root exudate sources of carbon are limited.