Sampling spots
Different hot spots (water and air) of the city of Alicante (331,000 citizens) were sampled (Supplementary Fig. S1). The following water samples were analyzed: 1) Untreated drinking water from one of the main water channel of Crevillente (38°11'23.4"N, 0°58'13.5"W; 11/20/20) that feed Alicante city, 2) Wastewater samples from the largest hospital in Alicante (Hospital General Universitario Dr. Balmis, 38°21'47.9"N, 0°29'08.6"W; 05/14/19), 3) the input of the wastewater treatment plant (WWTP) l'Alacantí Nord, which receives municipal wastewater (38°25'30.8"N, 0°25'10.1"W; 03/26/19, 01/28/20 and 02/19/20), 4) the output or treated wastewater by the aforementioned WWTP (38°25'38.5"N, 0°25'03.5"W; 05/11/16, 01/02820 and 02/19/20) and 5) seawater obtained in a spot nearby the WWTP outlet placed in Campello (38°25'08.6"N, 0°23'16.7"W; 02/27/19 and 11/18/20). Regarding air samples, different places were studied: 1) Roof of the above-mentioned Alicante hospital (07/14/20), 2) main entrance or reception of the hospital (10/22/19), 3) in situ sampling at the above-mentioned WWTP (05/27/20), 4) and two more spots located out of that WWTP but nearby (≈500 m) named as WWTP out 1 (07/11/20) and WWTP out 2 (07/12/20).
Water sample processing and sequencing
In total, 10 water samples were analysed to study the ARG presence in the prokaryotic and viral fractions (Supplementary Table S1). Three different analyses were performed: in silico analysis of the ARG using metagenomics, the study of two selected ARGs (sul2 and tetW) by dPCR in water samples from hospital, WWTP input, WWTP output, and seawater, and the search for the ARG presence in RNA and DNA in viruses from waters of the WWTP input and output.
Water samples from hospital (10 mL), WWTP input (10 mL), WWTP output (10 mL), and untreated drinking waters (106 mL) were filtered using a 0,2 µm filter (Isopore Membrane Filters, Ref. GTTP02500). Those filters were used to perform the DNA extraction from the prokaryotic fraction. Prokaryotic DNA fraction from the seawater sample used in this study was obtained as described [16]. The acid nucleic extraction was performed using MasterPure Complete DNA and RNA purification (Epicentre, Ref. MC85200) for all prokaryotic samples except the untreated drinking water, which was processed with DNAeasy PowerSoil Pro (Qiagen, Ref. 47014) as recommended by the manufacturer. WWTP output samples were the control samples from the Maestre-Carballa (2019) study that were processed as indicated in the paper [17].
Regarding the viral fraction (< 0,2 µm), samples from the input and output of the WWTP and hospital were filtered through a 0,2 µm filter with a syringe (PES membrane, Millipore). The filtered elute water (used volumes at Supplementary Table S1) was then concentrated with tangential ultrafiltration using Vivaflow (100 KDa; Sartorius, Ref. VF20P4) until a final volume of 19 mL, which was again filtered by 0,2 µm as above. The filtered sample was concentrated using Amicon ultra-15 (100 KDa; Millipore, Ref. UFC910008) to a final volume of 200 µL. For the viral ARN samples, 5 L of WWTP input and 5 L of WWTP output were sampled. Both were centrifuged at 10,000 g for 20 minutes (4°C) and the pellet was discarded. The supernatant free of cells was ultraconcentrated employing Vivaflow (100 KDa; Sartorius, Ref. VF20P4) until a final volume of 30 mL. 3% beef extract (Sigma-Aldrich, Ref. B4888-50G) and NaNO3 (2M final concentration; Scharlau, Ref. SO05010500) were added and the pH was adjusted to 5,5. The mix was incubated for 30 minutes at room temperature and the pellet was eliminated after 10 minutes of centrifugation (2500 g). The supernatant pH was then adjusted to 7.5. Polyethylene glycol 6000 (PEG; Sigma-Aldrich, Ref. 81253-250G) at 15% and NaCl (2%; Fisher, Ref. BB358-1) were added to precipitate the viruses and the mix was incubated at 4°C overnight. Viruses were obtained in the pellet after centrifugation (10,000 g, 30 minutes, 4°C) and resuspended in 10 mL of PBS (pH 7.4) (Adriaenssens et al., 2018). SYBR Gold (ThermoFisher Scientific, Ref. S11494) was used to confirm the lack of bacteria in viral fraction samples, and then they were concentrated with Amicon ultra-15 (100 KDa; Millipore, Ref. UFC910008) until a final volume of 200 µL.
Free DNA from the viral samples concentrated with Amicon ultra-15 (100 KDa; Millipore, Ref. UFC910008) was eliminated using 1 ul of Turbo DNase I (Invitrogen, Lituania, Ref. AM107) and 20 µL of DNase buffer at 37°C. After 30 minutes, 4.22 µL of RNase were added and both enzymes were deactivated 30 minutes later at 72°C for 10 minutes. To ensure the proper nucleic acids liberation and protein digestion, the sample was treated with 1% of proteinase K (50 µg/µl, Epicentre, Ref. MPRK092) and 20 ul of TE 10X at 65°C for one hour while shaking. The enzyme was inactivated at 4°C for 5 minutes. The protocol Qiamp MinElute Virus Spin Kit (QIAgen, Ref. 53704) was used to extract the nucleic acids. For the RNA samples, instead of the RNA carrier provided in the kit, 21.25 µL of glycogen was added (20 mg/mL; Thermo Scientific, Ref. R0551) to 200 µL of the AL buffer.
DNA and RNA concentrations were measured with Qubit HS dsDNA (Thermo Fisher Scientific, Ref. Q32854) and HS RNA (Thermo Fisher Scientific, Ref. Q32852) respectively. In the RNA samples, the DNA was digested with Turbo DNase I (Invitrogen, Lithuania, Ref. AM107) for 45 minutes at 37°C to increase the ratio RNA:DNA in the sequencing process.
Metagenomic DNA library preps were carried out with Nextera XT DNA library kit (Illumina, Ref. FC-131-1024) according to the manufacturer´s protocol. All samples were sequenced in a MiSeq Illumina sequencer (2x300), except the untreated drinking water sample that was sequenced in a HiSeq X sequencer (2x150). Sequencing was performed by Macrogen company (Seul, Rep. of Korea).
Metagenomic RNA libraries from the prokaryotic fraction were performed with Illumina stranded total RNA Prep, in which a step of rRNA depletion was included (Ribo-Zero Plus; Illumina, Ref. 20040525). For the RNA viral fraction, the metagenomic library was done using TruSeq Stranded mRNA kit (Illumina, Ref. 20020594) avoiding the step where mRNA is purified, allowing us to analyse all RNA present in the sample. All RNA samples were sequenced in a HiSeq 2500 (2x125). Sequencing was performed at the Genomics Center of CRG (Barcelona, Spain).
Air sample processing and sequencing
Temperature, wind speed, and direction were measured in real-time with a Mingtech Wireless Weather Station. Air samples were taken with a Coriolis Micro (Bertin Technologies) in 10 mL of sterile PBS 1X. Prior to sampling, different parts of the air sampler in contact with air and sampling buffer were autoclaved (e.g. air inlet tube and buffer receptacle). The total air volume sampled ranged from 52.5 to 108 m3 (Supplementary Table S2). As a negative control, a similar volume of air (68 m3) was sampled inside a laminar flow hood in our laboratory and processed identically to the rest of samples to monitor potential DNA contamination from manipulation, reagents, metagenomic DNA library prep, or DNA sequencing. During air sampling, as commonly described for air samplers [18] and to prevent evaporation of sampling buffer, sterile PBS 1X was added up to 10 mL volume each 10 min. After air sampling, the 10 mL PBS 1X buffer containing airborne particles and microorganisms were transported in ice to the laboratory and immediately processed for nucleic acid extraction and SYBR Gold staining. For nucleic acid extractions, samples were filtered through 0,2 µm Isopore Membrane Filters (Millipore, Ref. GTTP02500), and the elute was filtered through 0,02 µm Anodisc filters (Whatman, GE Healthcare Life Sciences, Ref. 6809 − 6002) to retain viral particles. Filters were stored at -80°C until use. Nucleic acids were extracted using DNAeasy PowerSoil Pro kit (Qiagen, Ref. 47014) following manufacturer´s protocol. DNA concentration was measured using a fluorimeter instrument Qubit with an HS dsDNA Qubit kit (Thermo Fisher Scientific, Ref. Q32851). Due to the very low input of extracted DNA, metagenomic libraries were performed strictly following the Nextera XT protocol recommended for ultra-low biomass samples that was published by Rinke and colleagues (2016) [19]. For this step, as recommended in that study, a metagenomic library prep control sample was included and comprised of lambda phage DNA (500 ug/mL; BioLabs, Ref. N3011S) that was processed as the rest of the samples and used for cleaning up the sequencing data for removing potential contaminant reads from exogenous DNA [19]. A whole genome amplification by multiple-displacement amplification (MDA) was only applied for the roof viral air sample since the amount of extracted DNA precluded the metagenomic library preparation. MDA was performed as described in [19–21]. As a negative control during MDA, sterile mQ water was included. For visualization of microbes and particles, the in situ WWTP air sample (54 m3; 7 mL; 10/11/19) was filtered through 0.02 µm Anodisc filters (Whatman, GE Healthcare Life Sciences, Ref. 6809 − 6002). A fragment of the filter was then stained with SYBR Gold (ThermoFisher Scientific, Ref. S11494) and inspected under a Leica epifluorescence microscope SP2.
Bioinformatic analysis
Raw data from air and water samples was quality-filtered using Trimmomatic 0.36 [22] (SLIDINGWINDOW:4:20, MINLEN:36). Then, sequenced samples and blank controls (see above) from air sampling were compared to remove potential contaminant reads from exogenous DNA. For that, reads in samples that matched with blank controls (query coverage ≥ 50% and identity ≥ 80%) were removed from the analysis. Taxonomic classification of reads was performed with Kaiju program [23] using different available databases. Then, the filtered and clean reads were assembled using SPAdes [24] (-meta), and only the obtained contigs > 500 pb were considered for further analysis. From those contigs, ORFs were predicted using Prodigal program [25]. ARGs were annotated by comparing the ARG databases ARG_ANNOT [26], RESFAMS [27], and CARD [28] with both the assembled and unassembled data using blast. Only the best-hits with a bit-score ≥ 70, e-value < 10− 5, and identities ≥ 50% or ≥ 90% (both thresholds were initially compared selecting later the cut-off of ≥ 90% as likely the most reliable) were considered as potential ARG. The housekeeping genes present in the used ARGs databases were not considered in our metagenomic analysis due to the difficulty of determining if they were bona fide ARGs using our thresholds [6] According to the database CARD [28] or the nr database (NCBI), the detected ARG were grouped by the antibiotic they confer resistance to. ARG abundance and normalization were estimated by dividing the total number of ARGs per Gb of metagenome (assembled or unassembled) and if necessary, by volume sample as well. Estimation of shared ARGs in the analyzed samples was carried out using a Venn diagram from the bioinformatics UGent webpage (https://bioinformatics.psb.ugent.be/cgi-bin/liste/Venn/calculate_venn.htpl).
In addition to the physical separation of prokaryotic (> 0,2 µm) and viral fractions (< 0,2 µm), bioinformatics was also used to specifically detect viral contigs as follows. All assembled metagenomes were analysed with VirSorter2 [29], which identified viral contigs (max.score ≥ 0.9) in the > 0,2 µm fraction and < 0,2 µm fraction, that were grouped in the hereafter named "putative viral fraction". Contigs without detecting viral proteins were grouped in the prokaryotic fraction, whose origin could be DNase-resistant DNA, DNA fragments in vesicles, or viruses that were not detected [17]. The contigs with at least one ARG were annotated using Kaiju [23] comparing them with the database nr_euk (12/22/23).
To identify viral RNA contigs, the program hmmsearch (hmmer.org) was used to search for RNA-dependent RNA polymerase (RdRP) profiles. The RdRP hidden Markov models (HMMs) used were downloaded from Pfam [30], from other RNA viruses’ papers [31, 32], or generated by HMMER 3.2.2 (hmmer.org) using the sequences obtained from IMG/VR [33].
We sought to compare the relative frequency of ARGs (unassembled data; grouped by the drug class they confer resistance to) with the human antibiotic consumption in our country region using Spearman's correlation coefficient and its significance with a t-student test using R program v. 4.1.2 (R Core Team, 2007). The antibiotic consumption data for the public hospitals and the whole community (including public hospitals and the private sector) was obtained from the PRAN webpage (Spanish Action Plan on Antimicrobial Resistance; https://www.resistenciaantibioticos.es/es/publicaciones/spanish-action-plan-antimicrobial-resistance) as defined daily doses (DDD) of common antibiotics per 1000 inhabitants and day from the Comunidad Valenciana region (year 2019).
Digital PCR for tetW and sul2
Absolute abundances of two abundant and ubiquitous antibiotic resistance genes (sul2 and tetW) in the analyzed water samples were studied by dPCR. Unfortunately, the minute amount of extracted DNA from air samples precluded the analysis by dPCR. For dPCR primers and probes design, PrimerQuest Tool (https://eu.idtdna.com/pages/tools/primerquest) was used. To obtain the target sequences for both ARGs, first all hits obtained from the assembled data were clustered (95% identity) with CD-HIT program using default parameters. TetW and sul2 clusters were selected, and contigs containing one of those ARG were aligned with each other and their corresponding ARG entry in the ARG databases (gb|AY055428.1|-|20269–21084|sul or AJ222769_gene_p01 for tetW) using MAFFT Alignment v.7.222 [34] available in Genious v. 9.1.3 program. This program generated the consensus sequences, which were used as target sequences to design the sul2 and tetW primers and dPCR probes.
The primer specificity was checked using primer-Blast (NCBI) against the nr database (NCBI). The primer sequences were tetW_F (5'->3') TCCAGTGGCACAGATGTAAAG and tetW_R (5'->3') CTTTAGCGGAGATCACCAAGAT. Regarding sul2, the sequences were sul2_F (5'->3') ATGCGCGCGTCAAAGAA and sul2_R (5'->3') ATCTGCCAAACTCGTCGTTATG. Probe sequences were for sul2 5'/6-FAM/CG CAA TGT G/ZEN/A TCC ATG ATG TCG CC/3IABkFQ/3' and for tetW 5'/6-FAM/AG GTG TAC C/ZEN/G CTC TTT GGC TGT TT/3IABkFQ/3'
Primers were checked with a PCR reaction with the same samples that were used to design them. The reaction mixture included 18,15 µL mili-Q water, 1 µL of each primer (10 µM), 0,75 µL of MgCl2 (50 Mm; Invitrogen, Ref. Y02016), 2,5 µL of Buffer 10x (Invitrogen, Ref. Y020228), 0,5 µL of dNTPs 10 mM (Thermo Fisher Scientific, Ref. 10297018), 0,1 µL of Taq polymerase (Thermo Fisher Scientific, Ref. 10342020) and 1 µL of sample. PCR reaction conditions were: 94°C for 5 minutes, 35 cycles of 94°C for 45 seconds, 55°C for 45 seconds, and 72°C for a minute and a half. A final extension step was included at 72°C for two minutes and then held at 4°C. The PCR products were observed with an electrophoresis gel and sequenced by Sanger.
Digital PCR for sul2 and tetW were performed in a 14,5 µL mixture reaction that included 7,25 µL of MasterMix QuantStudio 3D DIGITAL PCR V2 MMX (ThermoFisher Scientific, Ref. APPA26316), 4,14 µL of mili-Q water, 0,63 µL of each primer (10 µM), 0,35 µL of the probe (10 µM), 2,5 µL of MgCl2 (50 Mm) and 1 µL of the water sample or 1 µL of mQ water for the negative control.
The dPCR mix was loaded into a chip QuantStudio 3D DCPR V2 20K CHIP (12-PACK, ThermoFisher Scientific, Ref. A26316). The dPCR conditions were: 95°C for 10 minutes, 30 cycles of 95°C for 20 seconds, 55°C for 45 seconds, and 60°C for a minute. Another phase of 60°C for 2 minutes and held at 4°C. Chips were incubated in dark conditions and room temperature before reading them with QuantStudio™ 3D Digital PCR Instrument (ThermoFisher Scientific, Ref. 4489084). The obtained data were analysed with QuantStudio™ 3D AnalysisSuite™ software (ThermoFisher Scientific). The quantification of each ARG was calculated by dividing the number of copies of tetW or sul2 by the ng of DNA from the sample obtained. Replicates and serial dilution DNA samples were included. Before the dPCR, a qPCR was conducted with the same conditions as the dPCR, and the product was run in an electrophoresis gel to check the primer's performance and accuracy of the primers and probes (Supplementary Fig. S2).
To verify the proper ARG amplification during the dPCR reaction, the obtained dPCR product was later recovered from the amplified dPCR chip and used as a template in a PCR reaction with the same conditions as above for the dPCR (21,25 µL of MasterMix QuantStudio 3D DIGITAL PCR V2 MMX, ThermoFisher Scientific, Ref. APPA26316, 1 µL of each primer (10 µM), 0,75 µL of MgCl2 (50 Mm; Invitrogen, Ref. Y02016) and 1 µL of the dPCR product). Finally, the PCR product wads analyzed in an electrophoresis gel (Supplementary Fig. 2).
The comparison of the quantification through dPCR and metagenomics for both ARG was performed by contrasting the number of copies of each ARG per ng of DNA with the number of ARG hits (identity ≥ 90%, bit-score ≥ 70 and e-value ≤ 10− 5 with the ARG databases) for both assembled and unassembled data.