Sample collection
In total, 61 samples were collected from sites in different continents, namely Queensland (Oceania-Australia, 1.0 m depth in autumn (Queensland-A), 1.1 m depth in spring (Queensland-S), farmland brown soils), Tianjin (Asia-China, 20.0 m depth, farmland yellow clunamon soils), Zurich (Europe-Switzerland, 8.4 m depth, grassland cinnamon soils), Bremen (Europe-Germany, 1.2 m depth, forest black soils), and Nampula (Africa-Mozambique, 8.0 m depth, desert red soils) (Fig. 1a). These sampling sites covered different climatic conditions, soil types, geological zones and were demonstrated to be affected by distinct environmental factors (Additional file 1: Table S7). Three parallel quadrats were dug to collect subsamples at each sampling site. For each quadrat, five parallel sample pits were drilled. The soil samples collected from the same depth in all five pits were thoroughly mixed to generate a composite sample. All three replicate samples were analyzed separately and averaged to represent site conditions. A portion of the sample was separated and used for chemical analyses, another subsample was stored at -20 °C for further analysis.
Physico-chemical analyses
Soil ammonium, nitrite, and nitrate were extracted from 6 g soil added with 30 ml KCl solution (2 mol L-1) and measured by flow analyzers (Germany, SEAL, AA3). Soil moisture content were detected by oven-drying 10 g fresh soil at 105 °C until they reached constant weight. The pH value was determined by utilizing a pH Analyzer (Mettler Toledo, USA) in a supernatant after preparing the soil and water mixture (1:2.5). Total organic matter was determined using air-dried soil based on the LOI550 method [53]. Soil total nitrogen, total carbon and total sulfur were determined using an elemental analyzer system (Vario EL III, Elementar). Soil metal concentrations were measured by X-ray fluorescence (XRF). Each parameter measured was determined in triplicate samples.
To provide a normalized contamination level of metals, metal toxicity index (TI) was expressed and calculated for each sample (Fig. 1b). TI was calculated using the formula in previous studies [54-55]:
Here is the content of heavy metal i in the sample (mg/kg), is the half-maximal effective concentration for heavy metal i referred to the study of Welp [54]. Details about the heavy metal concentrations and types of each sample were in the Supplementary Information.
DNA extraction and High-throughput quantitative PCR
About 0.25 g freeze-dried and sieved soil was used to extract soil DNA using the PowerSoil kit (MO BIO, Carlsbad, CA, U.S.A.) referring to the instructions. The extracted genomic DNA were checked by 1% agarose gel and its concentration was determined using Nano Drop ND-2000 UV spectrophotometry (Thermo Fisher Scientific Inc., USA).
To better reveal the antibiotic resistance in terrestrial subsurface ecosystems, high-throughput quantitative PCR was performed targeting the 283 ARGs, 12 MGEs, and one 16S rRNA gene, utilizing the WaferGen Smart Chip Real-time PCR system [13] (Additional file 1: Table S8). The 283 primer sets targeted ARGs included the major antibiotic resistance classifications. The major integrons and transposases, such as intI-1, cintI-1, tnpA, were also included. All primer sets included no-template negative controls. Each quantitative PCR had technical triplicates and the efficiencies were all in the range of 1.7-2.3 and an R2 above 0.98.
The standard curve method of quantification was used to determine the absolute 16S rRNA copy numbers in Roche 480 system. For standard calculation, a plasmid standard containing a cloned and sequenced 16S rRNA gene fragment was used to generate calibration curves. Each qPCR was carried out in triplicates with negative controls. The relative copy numbers of ARGs and MGEs generated by the HT-qPCR were transformed into absolute copy numbers by normalization, using the absolute 16S rRNA gene copy number.
Bacterial 16S rRNA gene and metagenomic shotgun sequencing and analysis
The V3-V4 region of the 16S rRNA gene was amplified with 338F and 806R prime set in an Illumina Miseq platform (Majorbio, China) to characterize bacterial communities. Paired end reads were demultiplexed, quality-filtered using QIIME [56] and were grouped into Operational Taxonomic Units (OTUs) using a 97% identity threshold with UPARSE. The taxonomic classification of each OTU was analyzed using RDP Classifier (http://rdp.cme.msu.edu/) against the Silva (SSU115) with confidence threshold of 70% [57].
Metagenomic shotgun library was constructed using TruSeqTM DNA Sample Prep Kit (Illumina, San Diego, CA, USA) and sequenced using an Illumina HiSeq4000 platform (Illumina Inc., San Diego, CA, USA). To obtain the high-quality sequences, the adapter sequences and low-quality reads were filtered from raw sequencing reads using fastp. After quality control, the clean sequences were assembled using Megahit [58]. The open reading frames (ORFs) of assembled contigs were predicted using MetaGene [59] with nucleic acid length ≥ 100 bp selected. The non-redundant gene sequences were constructed using CD-HIT [60].
Identification of ARGs and MRGs
The non-redundant gene sequences constructed from metagenomic data were used to identify ARGs and MRGs. The CARD [61] and BacMet [62] databases were respectively used for the annotations of ARGs and MRGs using BLASTP search [63] with an e-value cutoff ≤ 10-5. The relative abundance of each gene was determined using SOAPaligner [64] with 95% identity. The relative abundance of gene i was calculated by the following formula [65]:
Here
represents the relative abundance of gene i in a sample. represents the reads of gene i in a sample.
represents the total reads of all the genes in a sample.
Statistical analysis
Spearman’s rank correlation and one-way ANOVA analyses were done in SPSS 10.0. Non-metric multidimensional scaling (NMDS) and variation partitioning were performed using R 3.6.2. Procrustes and Mantel tests were conducted in the R environment using the ‘vegan’ package [13]. Redundancy analysis (RDA) was performed based on forward selection in CANOCO 5.0. Network maps based on Spearman’s rank correlations were built with the psych package in R 3.6.2. Only a correlation between two items with the r > 0.8 and the p-value < 0.01 was considered statistically robust and used to construct network in Gephi [66]. Correlation heatmaps were generated using vegan, ggcor, and dplyr packages in R 3.6.2. Linear regression was analysis and generated by Origin 9.0. A p-value < 0.05 indicated statistical significance. Circular stacked barplot were generated in R 3.6.2 using tidyverse and virdis packages. Upset plot were analyzed and generated in R 3.6.2 using UpSetR and yyplot packeages [67]. All bar graphs, box plots, and pie charts were generated by Origin.
Path analysis based on structural equation model (SEM) was used to assess the direct and indirect effects of soil metal contamination on ARG profiles by SPSS AMOS [5, 45]. Theoretical assumptions were made to establish the model, such that (i) heavy metal could directly influence the MGE abundance, bacterial diversity, ARG abundance, and ARG diversity; (ii) heavy metal could indirectly influence ARG abundance and diversity via MGEs and bacteria; (iii) bacteria could indirectly influence ARG abundance and diversity via MGEs. The ARG abundance and diversity, MGE abundance and bacterial diversity were used to fit the model based on maximum-likelihood estimation method. The model fit and its overall goodness-of-fit were indicated by several parameters, namely (i) non-significant Chi square value (p > 0.05); (ii) low RMSEA value (< 0.08); (iii) high values of goodness-of-fit indexes such as CFI, RFI, IFI, and TLI (> 0.9).