Genome resolved metagenomics reveals dominance of Candidatus Mycoplasma in the gut microbiota across three salmonid species
We sampled gut content from three salmonid species from different environments. Two species relevant for aquaculture were chosen, including juvenile rainbow trout and adult Atlantic salmon. Juvenile rainbow trout were sampled in a land-based, freshwater recycled aquaculture system (RAS) in northern Denmark, Atlantic salmon were sampled from a commercial cohort in an open water net pen near Bergen, Norway. Thirdly, wild European whitefish were sampled from a freshwater lake in northern Norway (Fig. 1a) to represent a phylogenetic outlier for comparison (Fig. 1a).
A total of 2.9 billion metagenomic reads were generated from 12 individuals. Of those, 297 million reads passed quality control and host filtering criteria. Estimates of saturation revealed a sufficient sequencing depth for rainbow trout, but not fully saturated for Atlantic salmon and European whitefish (Supplementary Fig. 1). These filtered reads represent the gut microbiome of the three hosts studied here and were used to perform three metagenomic co-assemblies (Supplementary Table 4). A combination of automatic and manual binning was applied to each co-assembly output, which resulted in one manually curated, low redundant and near-complete metagenome-assembled genome (MAG), related to Mycoplasma, from each host species (Table 1) (Supplementary Fig. 2). Though sequencing depth were not fully saturated for Atlantic salmon and European whitefish, we do not suspect this to have a major impact on Mycoplasma MAGs, since they all were of a high completion (Table 1).
Metagenomic data from gut content of juvenile rainbow trout, adult Atlantic salmon, and European whitefish revealed a high relative abundance of Mycoplasma. The salmonid MAGs comprised x̅ = 69.44% (SD ± 11.45%) of the metagenomic reads in rainbow trout, x̅ = 72.98% (SD ± 13.07%) of the metagenomic reads in Atlantic salmon, and 58.03% of the metagenomic reads in European whitefish (Fig. 1b). Investigation of bacterial load in juvenile rainbow trout, using real-time PCR of the 16S rRNA gene, revealed an average cycle threshold (CT) value of 36.3 (SD ± 2.8), where water samples from RAS resulted in an average CT value of 27.6 (SD ± 2.8), clearly indicating a lower bacterial load in the intestinal environment compared to the surrounding water. The amount of sequencing effort to obtain any microbial data and qPCR results of the 16S rRNA gene indicates low bacterial biomass in juvenile rainbow trout gut content samples21, which we hypothesise reflects the young age of rainbow trout, a relatively sterile RAS environment and that the bacteria present are in a phase of initial colonisation of the gut (Supplementary Fig. 1 and Supplementary Table 1).
The GC content of the salmonid MAGs was in the lower end compared to the other Mycoplasma species included (Table 1, Fig. 3). Previous investigations of Tenericutes genomes reported GC content varied from 20 to 70%, indicating high plasticity22. Initial analysis of average nucleotide identity (ANI) clustering, and gene clusters from each sample to investigate the strain variance within each sample revealed identical MAGs within each respective host and clustered each individual salmonid MAGs according to host species (Supplementary Fig. 3a).
We used fluorescence microscopy to visualize the presence of bacteria on the distal gut epithelial surface as this would further indicate a functional adaption of the bacteria to the intestinal environment of juvenile rainbow trout. Identity of bacteria could not be established using specific probes, which is likely due to the low bacterial biomass impeding the generation of clear signals. However, our investigation did reveal a clear DAPI based signal from bacterial cells in close contact with the rainbow trout epithelial surface (Supplementary Fig. 4a-d), which we hypothesise are likely to include Mycoplasma cells, based on the observation that >50% of all microbial reads belonged to Mycoplasma (Fig. 1b).
Phylogenomics reveal new candidate species of Mycoplasma in salmonids
We performed a phylogenomic analysis to place the salmonid MAGs within Mycoplasma. To do so, we generated a database of protein clusters across 44 genomes of Mycoplasma isolated from multiple tissue types and host species (Supplementary Table 4). Gene annotation using Hidden Markov Models (HMMs) resulted in 2,646 hits, and data filtering for phylogenomics led to a final data set of 55 single-copy core gene alignments from 50 genomes (includes representatives of Ureaplasma and Bacillus).
Phylogenetic analyses consistently recovered several highly supported groupings in the genus Mycoplasma and led to a robust placement of our three salmonid MAGs. Two of the salmonid MAGs isolated from the fish subfamily Salmoninae, species rainbow trout and Atlantic salmon, clustered together with an Average Nucleotide Identity (ANI) of 96.1% and formed a monophyletic group with the grouping of Mycoplasma penetrans and Mycoplasma iowae (Fig. 2 and Supplementary Fig. 5a), the latter being commonly found in the intestine of turkey (Meleagris gallopavo)23. The two Salmoninae MAGs likely belong to the same species, as suggested by their short terminal branches and uniquely strong branch support according to multiple metrics. The fact that their ANI relative to M. penetrans and M. iowae was <80% further indicated that two MAGs correspond to a new Mycoplasma species (Fig. 3). The close relationship of these salmonid MAGs also indicates that they have a close ecological association with Salmoninae, rather than originating from the environment surrounding the host (Fig. 2 and Supplementary Fig. 5a).
The third salmonid MAG, characterised from the European whitefish, subfamily Coregoninae, was not identified as a close relative of the isolates from Salmoninae (Fig. 2; Supplementary Fig. 5a). Rather this salmonid MAG appears to be distinct from any of the reference Mycoplasma species, with the closest relative found to be Mycoplasma mobile (ANI <80%), a pathogen isolated from the gills of tench (Tinca tinca) (Fig. 3).
Overall, our analysis indicated that the salmonid MAGs we characterised represent two new Mycoplasma species. We tentatively name them according to their respective host species: ‘Candidatus’ M. salmoninae and ‘Candidatus’ M. lavaretus. Furthermore, we divided Mycoplasma salmoninae into two biotypes according to their salmonid host, rainbow trout and Atlantic salmon, resulting in ‘Candidatus’ M. salmoninae mykiss (MSM), ‘Candidatus’ M. salmoninae salar (MSS), and ‘Candidatus’ M. lavaretus (ML), respectively.
Beyond the scope of Salmonid MAGs, the phylogenomic analyses also recovered several clades of the genus Mycoplasma with high confidence according to most metrics of phylogenetic branch support (Fig. 2 and Supplementary Fig 3a-b). Two of the most divergent and well-supported clades contain species that have distinct veterinary importance, with species in one clade infecting primarily ruminants (the mycoides grouping, Fig. 2 and Supplementary Fig. 5a), and another second clade harbouring species infecting a range of species primarily used as livestock and pets (the hemoplasma grouping). Another clade with strong support harbours species that infects the respiratory tracts of pigs and cattle (Clade III), and yet species making up another group have been associated primarily with respiratory infections in humans and chickens (the pneumonia grouping). One metric of branch support was low for most branches, internode certainty for sites (sIC), reflecting the fast accumulation of substitutions in the genus. A fast-evolutionary rate relative to the taxonomic scale of the data is also reflected in the saturation of substitutions found in 12 genes (consequently excluded from phylogenetic analyses), and in the long estimated terminal branches across samples.
An open pangenome with diverse sets of functions is in accordance with niche adaptations of Mycoplasma
We performed a comparative analysis to place the salmonid MAGs within Mycoplasma with respect to their gene content and gene functions. Our Mycoplasma pangenome included 37,158 open reading frames (ORFs) from the 44 different reference Mycoplasma genomes, two Ureaplasma genomes and the three salmonid MAGs, and identified a total of 18,021 gene clusters. Surprisingly, the single-copy core genes which were present among all genomes of Mycoplasma and Ureaplasma genomes only comprised 1.34% of the total ORFs of the pangenome with 10 gene clusters and 499 ORFs (Fig. 3). The amount of singleton gene clusters in the pangenome (i.e., gene clusters only present in a single genome) was 62.8% of all the gene clusters. We investigated openness of pangenome using Heaps’ law, resulting in α = 0.281, confirming an open pangenome24, indicating specific adaptations with accessory genes and gene losses, which could derive from specific niche adaptations, like specific host environments and symbiosis (Fig. 3).
Comparison of ANI, environmental relation, and host group of Mycoplasma revealed that Mycoplasma are not only clustering according to phylogeny, but also origin of host and type of tissue, further emphasising niche specific adaptations (Fig. 3 and Supplementary Fig. 2a).
Metabolic reconstruction of salmonid MAGs of Mycoplasma suggests adaptation to host environment
Using KEGG we were able to annotate 59.8%, 49.4% and 55.1% of the genes for MSM, MSS, and ML, respectively. Especially singletons missed KEGG annotation, indicating that novel functions are yet to be described for many genes in these MAGs (Supplementary Fig. 2a).
Comparison of shared KEGG annotations among MSM, MSS, ML, and their nearest relatives M. mobile 163K and M. iowae 695 revealed 247 shared KEGG functions among the genomes (Fig. 2 and Supplementary Fig. 3b). Investigation of present KEGG annotations revealed that fermentation of sugars through glycolysis appeared to be the main method of ATP production in MSM, MSS, and ML. As in many other Mycoplasma species, the genomes are characterised by reduced metabolic functionality as all of the genomes lack general functions, such as the citric acid cycle. Together, these findings are in line with conserved adaptations to, and dependence of, the host gastrointestinal environments across the salmonid related MAGs and their nearest relatives.
Unravelling functions of salmonid MAGs revealed several putatively beneficial functions for salmonid, including thiamine (B1) biosynthesis25, riboflavin (B2) biosynthesis26, and polyamine metabolism27. Interestingly, we found a complete pathway of isoprenoids biosynthesis by the non-mevalonate (MEP) pathways in two of the salmonid MAGs, including MSM and MSS (Supplementary Fig. 6 and Supplementary Fig. 7). The MEP pathways are rarely found in Mycoplasma, except for the intestinal associated M. iowae and M. penetrans, the sister group to MSM and MSS28,29. We hypothesise this is to reduce the need to obtain isoprenoid precursors from the host and an adaptation towards intestinal environments22.
In brief, our genetic findings are in accordance with a model where Mycoplasma is functionally adapted to the environment in the gut of salmonids. In all three salmonid MAGs we found uvrABCD, the global genome nucleotide excision repair system (GG-NER). GG-NER is known to protect bacteria against bile salts in gastrointestinal environments30. We found several complete defence systems across the salmonid MAGs (Supplementary Fig. 6), including the stringent response, which is known to react to multiple stress conditions, including amino acid starvation. We found evidence of complete subsystems for lipoic acid metabolism in genomes of clade VI, including MSM, MSS, and M. Iowae 695. Lipoic acid metabolism is known to be important against oxidative stress response, in agreement with an adaptation against oxidative stress in the gut (Supplementary Fig. 6 and Supplementary Fig. 7). Furthermore, we found presence of the prtC gene in all genomes of clade VI, including MSM and MSS, which encodes a putative collagenase, responsible for mucus degradation in Helicobacter pylori (Supplementary Table 6). The presence of prtC indicates that MSS and MSM are able to live in gastrointestinal environments by facilitating degradation of mucus in the intestine. Interestingly, we found genetic evidence for a cellobiose and chitobiose degrading complex, known as the cellulosome encoded by celABC, in ML, MSM, and MSS. The closest homologues of celABC found in ML, MSM, and MSS were found in M. iowae 695 with identities ranging from 58.3% to 64.3%. The cellulosome is responsible for degrading complex polymers, like cellulose, hemicellulose, and chitin31, indicating that intestinal related salmonid MAGs have some putative ability to degrade long chain polymers in the gut, possibly originating from host mucus layers or host feed (Supplementary Table 6).
All three salmonid MAGs lack oligosaccharide ABC transporters, which are otherwise found in other Mycoplasma genomes, indicating that salmonid Mycoplasmas are relying on phosphotransferase system (PTS), like celABC (Supplementary Fig. 6 and Supplementary Fig. 7). This suggests that the main sources of energy absorbed by the Mycoplasmas from the gastrointestinal tract in its teleost host consist of long chain polymers, fatty acids, lipoproteins, and proteins.
Though the molecular basis of Mycoplasma pathogenicity remains largely elusive, we investigated the presence of Mycoplasma related pathogenicity factors, including the presence of traG/traE32, glpF33, katE34, oppA, mgpA/mgpC35, virulence factor BrkB, toxins, antitoxins, large membrane proteins (LMPs), and adhesion related proteins. Our investigation resulted in a lack of surveyed putative virulence factors in both MSM and MSS. We found three gene cluster with virulence factors, including virulence factor BrkB, an anti-toxin, and glpF, in ML, indicating that ML still possess some level of pathogenic potential, whereas we found no evidence for pathogenicity of MSS and MSM to its host (Supplementary Fig. 8).
Functional enrichment analysis suggests that intestinal related Mycoplasma species are relying on amino acid synthesis, isoprenoid synthesis, and an antioxidative protective system
We performed a functional enrichment analysis by reconstructing metabolic pathways specific for Mycoplasma with RAST36. This analysis revealed 641, 667, and 676 DNA features, including protein encoding genes and RNA coding genes, in MSM, MSS, and ML, respectively. Our pathway-based comparison among Mycoplasma genomes revealed that Mycoplasma species have a broad range of different functionalities (Supplementary Table 5), which fits the high dissimilarity of the phylogenetic and pangenomic analyses and the hypothesised host adaptation (Fig. 3).
Interestingly, we found a significant enrichment of the subsystems corresponding to arginine biosynthesis in Mycoplasma species and MAGs associated with intestinal environments including the three novel MAGs described here (87.5%) compared to those in other environments (27.3%) (Fig 4a-b). Our enrichment analysis also confirmed a higher prevalence of genes encoding the MEP pathway in intestinal environments (Fig. 4b). Lastly, our analysis revealed that glutathione peroxidase was significantly over-represented in Mycoplasma species associated with intestinal environments, indicating that antioxidative protective systems have a putative defensive role in intestinal related Mycoplasma (Fig. 4b) (Supplementary Table 7).