Additional Clustering Refiner (ACR) benchmarking with Critical Assessment of Metagenome Interpretation (CAMI) dataset
We verified the ACR method by comparing the obtained results with those of other binning methods using a synthetic metagenome dataset from CAMI45. The gold-standard assemblies of the high-complexity datasets of CAMI were binned with MaxBin219, MetaBAT220, and DAS Tool48. Moreover, each original binning results were refined with ACR to compare their performances. However, since the DAS Tool uses an algorithm that combines results from multiple binners, we used ACR with dereplicated MAGs (referred as dRep_ACR) from multiple binning results, including those of the analyses using the MaxBin2 and MetaBAT2 tools, to compare the number of recovered MAGs obtained. Overall, the numbers of near-complete and medium-quality MAGs recovered by the ACR method were higher than those recovered as per the binning results of the analyses using MaxBin2, MetBAT2, and DAS Tool (Fig. 1c). In particular, the ACR analysis of MetaBAT2 and dRep recovered the largest number of medium-quality MAGs (466 and 447, respectively), while the original binning results of MetaBAT2 and DAS Tool recovered 400 and 394 medium-quality MAGs, respectively (Fig. 1c).
The CAMI data set consisted of genomes with various similarities according to ANI values. To evaluate how ACR solves the problem when closely related genomes are present, we scrutinized a set of genomes called “common strains” with an ANI greater than 95% against other genomes45. A total of 204 gold-standard genomes exhibited ≥ 95% ANI, when compared to other genomes. In the MAG recovery among common strains, ACR improved the binning results of MetaBAT2 and dRep (Fig. 1d, e). Particularly, refined MetBAT2 produced 108 medium-quality MAGs, recovering almost half of the gold-standard genomes. The results of the analysis using dRep_ACR showed that the number of recovered strains was lower than that obtained in case of the MetaBAT2_ACR, showing a recovery of 88 medium-quality MAGs, since the MAGs with an ANI value of 99% or more were dereplicated. The ACR method significantly reduced the contamination of MAGs and resulted in an increase in the total number of medium-quality MAGs, and implied the possibility of recovering strain-level MAGs.
Construction of >1000 metagenome-assembled genomes (MAGs) using the Additional Clustering Refiner (ACR) method
For metagenomic analysis, co-assembly and binning were performed for each sample set to recover near-complete microbial genomes. A total of 9,467 genome bins were initially generated, and then dereplicated at ANI 99%. The dereplicated 1,040 genome bins were qualified using the quality criteria of MAGs49, and then 181 near-complete (NC) MAGs (≥ 90% completeness and <5% contamination) and 335 medium-quality MAGs (≥ 50% completeness and <10% contamination) were constructed. The initial binning results showed a broad range of contamination (0–25%) of each MAG (Fig. 1b). Thus, we applied a new method, the ACR, which split the contigs of the MAGs using k-means clustering algorithm (Fig. 1a), to obtain MAGs with better quality. This process nearly doubled the good quality MAGs to 257 near-complete and 653 medium-quality MAGs, compared to the initial binning results. ACR maintained the completeness through CheckM similar to the initial binning results but significantly reduced contamination and strain heterogeneity (Supplementary Fig. 3a). Furthermore, the chimerism was detected by GUNC50 based on the contigs lineage homogeneity constituting each MAG. Accordingly, it was observed that MAGs (medium-quality and near-complete) that passed the GUNC also significantly improved through ACR in the initial binning results (Supplementary Fig. 3b). Consequently, we utilized ACR in our study to reconstruct 1,100 MAGs, which consisted of 257 near-complete and 683 medium-quality MAGs with an average of 89.43 ± 7.18% completeness with 5.79 ± 4.07% contamination from 57 metagenomes (Fig. 2). All MAGs were classified (bacteria (n=1,095) and archaea (n=5)) using Genome Taxonomy Database (GTDB). The MAGs were affiliated to 28 bacterial phyla, including Proteobacteria (n=371), Bacteroidota (n=295), Patescibacteria (n=136), Actinobacteriota (n=132), Verrucomicrobiota (n=35), Bdellovibrionota (n=31), Planctomycetota (n=19), Omnitrophota (n=16), Cyanobacteriota (n=14), Chloroflexota (n=9), Campylobacterota (n=8), Dependentiae (n=5), Firmicutes (n=5), and others. Despite using the GTDB reference, only 5.18% and 67.73% of MAGs were classified at the species and genus levels, respectively. Therefore, most of the MAGs belonged to unclassified taxa in this study, indicating that they could belong to novel species and genera.
Prevalence Of Args In Korean Freshwater Mags
The ARGs in the MAGs were detected using AMRFinderPlus and Resfams, and the antibiotic class of ARGs was classified based on AMRFinderPlus classification51. Of the 1,100 MAGs, 1,030 possessed a total of 5,804 ARGs related to 16 antibiotic classes (Fig. 2). Except for those of the efflux class that is also related to cellular process, toxicity, pathogenicity, and transport of a wide range of substrates, the most abundant ARGs found in 738 MAGs encoded resistance against beta-lactams (1,007 genes), tetracycline (256 genes), and aminoglycosides (88 genes). A primary goal of this study was to determine the taxonomic hosts of ARGs; therefore, we focused our analysis on the genes found in near-complete and medium-quality MAGs.
Proteobacteria MAGs possessed a variety of ARGs belonging to the beta-lactam, tetracycline, macrolide, and aminoglycoside classes. Besides, ARGs for fosfomycin and sulfonamide and class C beta-lactamase (ampC) were found only in Proteobacteria MAGs (Additional File 2). Bacteroidota MAGs specifically harbored aadS (aminoglycoside resistance), blaOXA-1 (class D beta-lactamase), mefC, mphG, mphE, and msrE (macrolide resistance), catB (phenicol resistance), and tet39 and tetX (tetracycline resistance). Patescibacteria was the only phylum with MAGs possessing aadA1 (aminoglycoside resistance), blaPFM (class B beta-lactamase), and lnuC (lincosamide resistance). Actinobacteriota was the only phylum with MAGs that harbored aadA11 (aminoglycoside resistance), ermX (macrolide resistance), rox (rifamycin resistance), and tetV (tetracycline resistance). Other ARGs found in MAGs are described in Additional File 2.
There were 42 MAGs harboring ARGs to more than three antibiotic classes, which we refer to as multi-drug resistant (MDR) MAGs (Fig. 2). Clinically relevant ARGs targeting antibiotics underscored by World Health Organization (WHO) as important for human medicine were found in MDR MAGs52. Most MDR MAGs belonged to Proteobacteria, Bacteroidota, and Actinobacteriota and carried genes that confer resistance to efflux, beta-lactam, tetracycline, and aminoglycosides. In particular, the MAGs with the most diverse types of ARGs found belonged to Moraxellaceae, a well-known reservoir of ARGs which confer resistance to macrolides and phenicol. Microtrichales and Fluviicola, which had not been reported as encoding ARGs to date, were both found to contain MDR MAGs in our study.
Microbial dissemination from effluents into the downstream region of receiving rivers
The effect of wastewater effluents on rivers was assessed by the analysis of microbiomes using the MAG. Significantly different microbiomes among the upstream and downstream regions and effluents were detected in the multivariate analysis of MAG composition using unweighted UniFrac distance matrix [P < 0.001, determined by the analysis of similarities (ANOSIM)] (Fig. 3a). River water (including upstream and downstream) microbiome was clearly distinguished from the effluent microbiome. Notably, the downstream microbiome was significantly different from upstream microbiome in both the PC 1 and PC 2 axes [P < 0.05, P < 0.01, respectively, Wilcoxon rank sum test (paired), Fig. 3a]. This indicated that the upstream microbiome was altered by the influence of the effluents, resulting in a different downstream microbiome.
We also investigated the bacterial community using 16S rRNA gene amplicon sequencing. A total of 26,422 distinct operational taxonomic units (OTUs) were obtained by removing OTUs containing 10 or fewer sequences. Principal coordinate analysis (PCoA) of OTUs showed that the effluent microbiome showed a significantly different composition compared to that of the river water microbiome [P < 0.001, Wilcoxon rank sum test], based on the unweighted UniFrac distance matrix (Supplementary Fig. 4). Additionally, the richness of the effluent microbiome was significantly lower than that of the river water microbiome both in the MAG and 16S rRNA gene analyses, suggesting the lower microbial diversity in effluents (Supplementary Fig. 5).
More MAGs were assigned to Patescibacteria than any other phyla in 88.9% of effluent samples, but only in 38.5% of river water samples (FDR=0.011, Fisher’s exact test). Campylobacterota, Omnitrophota, Dependentiae, and Bdellovibrionota were also dominantly found in effluents, compared to the river water samples (P <0.05, Fisher’s exact test). In particular, the abundance of these five taxa showed a significant increase in the downstream regions of the rivers compared to the upstream regions, suggesting that the microbial population in aquatic environments was affected by the effluents (Fig. 3b). Especially, the phyla Parcubacteria and Saccharibacteria, in the superphylum Patescibacteria, were dominant in effluents and also found in the downstream regions of the rivers (Supplementary Fig. 6).
In order to assess the dissemination at the genetic level, the ratio of the abundance of ARGs to that of the core genes was calculated. Overall, the relative abundance of ARGs in MAGs was higher in the effluents than in the river water, and higher in the downstream regions than in the upstream regions of the rivers (Fig. 4a). In particular, specific aminoglycoside (aadA1, aadA5, aadA2), beta-lactam (blaOXA-10), phenicol (floR), sulfonamide (sul2), tetracycline (tetX, tetA), and macrolide (ereD, mefC) resistance genes were abundant in the effluent MAGs, and their abundance was higher in MAGs from the downstream regions than in the upstream regions of the rivers (Fig. 4b).
In addition to ARG distribution, the abundances of metal resistance genes (MRGs), mobile genetic elements (MGEs), and virulence factors (VFs) were also significantly higher in the effluent than in the river waters (Supplementary Fig. 7). In addition, the abundance of MGEs and VFs were higher in MAGs from the downstream than in the upstream regions, indicating that MGEs and VFs, as well as ARGs, were disseminated from the effluents into the downstream regions.
For each MAG, we investigated whether it originated from effluent or river water environments. The generated MAGs were divided into 2 groups: River Group and Effluent Group. The River Group consisted of MAGs found in the upstream and downstream regions regardless of the effluent sample. MAGs in the River Group might be considered native to the river microbiome. On the other hand, Effluent Group was composed of MAGs simultaneously found in effluent and downstream, while not found in the upstream samples, under the assumption that MAGs exist downstream due to the influence of effluent. An MAG that appeared only in effluent was also assigned to the Effluent Group (Fig. 5a). The presence of an MAG was defined as one with more than 40% genome breadth of coverage at each sampling site. A total of 542 and 460 MAGs were classified into the River Group and Effluent Group, respectively (Fig. 5a).
Although Proteobacteria, Bacteroidota and Actinobacteriota were the most prevalent phyla both in River Group and Effluent Group (Supplementary Fig. 8), the distribution of the MAGs differed between the two groups at levels lower than the order taxon (Additional File 2). In Proteobacteria MAGs classified as Betaproteobacteriales, Rhodobacterales, Sphingomonadales, and Steroidobacterales were more prevalent in the River Group, while MAGs classified as Micavibrionales and Pseudomonadales were more prevalent in the Effluent Group. MAGs assigned to the Chitinophagales, NS11-12g, and Flavobacteriales orders in Bacteroidota were more prevalent in the River Group, while MAGs classifieds as Bacteroidales were only detected in the Effluent Group. In Actinobacteria MAGs, Actinomycetales, and Microtrichales were more prevalent in the River Group.
Interestingly, the phyla Patescibacteria, Bdellovibrionota, Omnitrophota, Campylobacterota, and Dependentiae were mostly found in Effluent Group but not in River Group, with the exception of only two MAGs (Supplementary Fig. 8). This result suggested that MAGs belonging to these five phyla originated from effluents and existed in the downstream region after effluents merged into the rivers. These five phyla most commonly encoded beta-lactam resistance genes in addition to various other ARGs (Supplementary Fig. 9). MAGs from the genus Arcobacter (Campylobacterota phylum), considered a pathogen, harbored class B and D beta-lactamases (Additional file 2). Interestingly, blaPFM, which is responsible for carbapenem resistance, was detected in MAGs classified as Patescibacteria (Supplementary Fig. 9). Other carbapenem resistance genes (blaDIM and blaOXA-151) were detected in MAGs classified as Proteobacteria from Effluent Group, and cphA were detected in both Effluent Group and River Group. In addition, Patescibacteria, Bdellovibrionota, and Campylobacterota MAGs in the Effluent Group harbored genes that confer resistance to aminoglycosides, while Patescibacteria, Omnitrophota, and Bdellovibrionota MAGs in the Effluent Group harbored genes that confer resistance to tetracycline.
In each group, no significant difference was found with regard to the genome size and the number of ARGs, MRGs, and VFs. However, the number of MGEs was significantly higher in MAGs of the Effluent Group than in those of the River Group (Fig. 5a).
Furthermore, the detection frequency of MGEs at a given distance from ARGs in MAGs was higher in the Effluent Group than in the River Group; this difference gradually increased in proportion to the distance between the MGEs and ARGs on the genome (Fig. 5b). In particular, Patescibacteria and Campylobacterota, found mostly in Effluent Group, had a higher number of MGEs per genome size compared to the other phyla (Supplementary Fig. 10). In addition, Proteobacteria, Bacteroidota, and Actinobacteriota, which represent the common phyla in all samples, also showed a higher incidence of MGEs from ARGs in the MAGs of Effluent Group (Supplementary Fig. 11).
Co-localization Of Args And Mges In Mags
To assess the potential of horizontal transfer of ARGs from the effluent, the genetic linkage between ARGs and MGEs in the MAGs categorized at the phylum level in both groups were investigated (Fig. 6). Network analysis was performed by searching for ARGs within the 4 kbp flanking region of the MGEs. In general, this network displayed a different pattern of harboring ARGs and MGEs in hosts between Effluent Group and River Group. The number of triangles refers to the number of phyla that have a genetic link between an ARG and an MGE. The higher number of triangles occurred in Effluent Group, which means that the observed genetic link between ARG and MAG in the host is more frequent than in the River Group. In addition, in quantity and diversity of the ARGs and the MGEs harbored in phyla, Effluent Group represented a more complex network than River Group (Effluent Group: 126 nodes, 388 edges / River Group: 83 nodes, 226 edges). The ARGs were prevalent in broad ranges of taxa, with 25 and 14 phyla in Effluent Group and River Group, respectively. Proteobacteria, Bacteroidota, Verrucomicrobiota, and Actinobacteriota were the main phyla harboring ARGs with nearby MGEs in both groups.
Resistance islands persist from effluents to the downstream regions of rivers
Putative resistance islands (pRIs) containing ARGs with MGEs in 4-kbp-flanking regions of each gene were searched to ascertain the potential mobility of ARGs (Fig. 7). A total of 498 putative pRIs were found in 367 MAGs, including 3 class I integrons, 6 Clusters of attC sites lacking integron-integrases, 8 phages, 30 plasmids and 1 integrative and conjugative elements. Among them, only eight kinds of pRIs frequently appeared in the effluent (more than 10 sampling sites). The prevalence of these eight pRIs in the downstream regions was dependent on that in the effluent. In some cases, pRIs were detected at a higher frequency in the downstream regions than in the upstream region (Fig. 7b) due to contamination from the effluent. These pRIs carried resistance genes specific for plasmid and Clusters of attC sites lacking integron-integrase meditated resistance against clinically relevant antibiotics (Fig. 7).
pRI-166 containing class D beta-lactamase genes (blaOXA-2) and pRI-584 containing phenicol resistance genes (floR) were frequently found in effluents and were more abundant in the downstream regions, than in the upstream regions, suggesting a dissemination of the ARGs (Fig. 7). pRI-118 in Pseudomonas E had two aminoglycoside resistance genes (aph(3’’)-Ib and aph(6)-Ib) adjacent to Tn3 transposon. It was also remarkably widespread in more than 90% of the effluent samples and were more abundant in the downstream regions than in the upstream regions. Moreover, a short 2-kbp pRI-1238 containing genes for lincosamide resistance and IS10 family transposase was not found in the upstream regions of the rivers, but mainly found in the effluents and downstream regions of the rivers. These pRIs mainly originated from Patescibacteria (pRI-166 and pRI-1238) and Proteobacteria (pRI-584 and pRI-118). Moreover, Patescibacteria harbored a higher number of MGEs per genome size than most other phyla (Actinobacteriota, Bacteroidota, Bdellovibrionota, Chloroflexota, Omnitrophota, Planctomycetota, and Verrucomicrobiota with adjust P < 0.05) (Supplementary Fig. 10).
Since pRI-166, a Cluster of attC sites lacking integron-integrase with blaOXA-2 found in Patescibacteria, was composed of a short contig, a contig graph was used to extend the sequence continuity (Supplementary Fig. 12). As a result, aadA5, aacA8, and blaGES-5, which are responsible for the resistance to clinically relevant aminoglycosides and carbapenem, were detected in an additionally assembled contig that was not used to reconstruct MAGs. Besides, the integrase gene also found in this contig graph, pRI-166, classified as Clusters of attC sites lacking integron-integrase, might actually be a Class I integrons. This suggested that along with Proteobacteria, Patescibacteria could be one of the important carriers of ARGs, causing their dissemination into river environments.