Sequencing statistics. The 16S rRNA gene sequencing results for the soil and water samples are shown in Table S1. Across all of the samples, the total number of sequences was 321,131; the total number of bases (bp) was 134,242,228; and the average sequence length was 418.04 bp. The quality control results for the shotgun metagenomic sequences are shown in Table S2. The average numbers of raw reads in the soil and water samples were 99,317,796.6667 and 98,984,089.3333, respectively, while the average numbers of clean reads in the soil and water samples were 98,645,909 and 97,975,913, respectively. After quality control, the average proportion of raw reads for both the soil samples and the water samples was 99%, and the average proportion of raw bases for both the soil samples and the water samples was 98%.
The Shannon curve shown as Fig. S2, was one of the indexes that was used to estimate the microbial diversity of each sample. The result showed that our sequencing data were sufficient to reflect the vast majority of the bacterial diversity in the soil and water samples. The larger Shannon index for the soil samples indicated that bacterial diversity in soil was higher than that in the water in the DHM. The soil in DHM wetland as a highly heterogeneous medium are of marine alluvium, which is rich in organic matter. It provides a variety of suitable habitat and environment conditions for microorganism, supporting the high soil microbial diversity and a variety of different microbial13.
Species composition and difference analysis. As shown in the species community bar chart (Fig. 1a, b), Archaea and bacteria remained after the removal of Eukaryota and viruses. Most of the microorganisms detected in the soil and water samples from the DHM were bacteria (98.21% and 98.61%, respectively). Only a small proportion of the samples were Archaea (1.60% and 1.18%, respectively). The 16S rRNA gene analysis identified the dominant species in the DHM water samples as Proteobacteria (54.1%), Actinobacteria (13.0%), and Bacteroidetes (26.7%). However, the dominant species in the DHM soil samples were Proteobacteria (47.9%), Actinobacteria (7.04%), and Chloroflexi (16.5%) (Fig. 1a). The shotgun metagenomic analysis identified the dominant species in the water samples as Proteobacteria (65.55%), Actinobacteria (11.99%), and Bacteroidetes (16.44%), and identified the dominant species in the soil samples as Proteobacteria (62.32%), Actinobacteria (9.31%), and Chloroflexi (7.65%) (Fig. 1b). However, bacterial community composition and structure in soil or water were affected by different environmental factors. As a results that bacterial communities abundance in water were different from those in soil, and the major reason for this discrepancy might be related to different habitats.
Species heatmap clustering is based on similarities in relative abundance among species and samples, and aggregates species with high abundance and low abundance in separate block14. The bacterial community heatmap of the 30 most abundant species based on 16S rRNA gene sequences and shotgun metagenomic sequences showed the certain species that were highly abundant in the water samples were moderately abundant or uncommon in the soil samples, such as Pseudarcicella. However, other species had similar levels of relative abundance in both soil and water, such as the highly abundant species of unclassified Gemmatimonadetes (Fig. 1c,d)
The phylum Bacteroidetes was significantly different between soil and water (P ≤ 0.0001)(Fig. 2a), as showed in the bar chart of species differences based on 16S rRNA gene analysis. However, shotgun metagenomic analysis identified additional species that were significantly different between soil and water (P ≤ 0.0001), as shown in the bar plot of Welch’s t-test (Fig. 2b), including Bacteroidetes, Firmicutes, Cyanobacteria, Planctomycetes, Acidobacteria, Spirochaetes, Thaumarchaeota, and unclassified bacteria. Due to unequal amplication 16S rRNA gene sequence of species, it may be biased. However, shotgun metagenomes cover a widespread microbial community and generate huge number of reads with various length by using this sequencing technologies, as a result that covering more species with significantly difference and genetic information.
Network analysis. The co-occurrence network based on 16S rRNA gene analysis indicated that the species Chloroflexi, Acidobacteria, Planctomycetes, Proteobacteria, Gemmatimonadetes, Cyanobacteria, lgnavibacteriae, Bacteroidetes, Actinobacteria, Firmicutes, and Verrucomicrobia co-occurred in the soil and water samples (Fig. 3a). Shotgun metagenomics analysis identified additional species that co-occurred in the soil and water (Fig. 3b). The species correlation network of the top 35 phyla indicated that most species correlated with others (Fig. 4), including Euryarchaeota, Firmicutes, Verrucomicrobia, Chlorobi, and Tenericutes. Based on the clustering coefficient, these taxa play an important role in the species correlation network. Shotgun metagenomic sequencing not only elucidates species structure and systematic community evolution, but also supports the genetic analysis of functional metabolic networks within the microbial community.
Functional diversity analysis. 16S rDNA functional predictions indicated that three main categories, including metabolism, genetic information processing, and environment information processing, which have relatively high abundance in both the soil and the water (Fig. 5a). In the shotgun metagenomic functional analysis, we annotated six types of functional genes, and were also found to be enrich in three main categories of metabolism, environment information processing, and genetic information processing (Fig. 5b). Of these, metabolism represents more than 50% of all functional classifications at KEGG pathway level 1. This suggested that microbiome of DHM display a relatively high level of metabolism activities and genetic stability.
Fig. 5 Functional prediction at KEGG pathway level 1orthologies a 16S rDNA functional prediction; b shotgun metagenomic functional annotation.