Phylogeny of newly constructed genomes and distribution of Gemmatimonadota
Gemmatimonadota is monophyletic with the Fibrobacterota, Chlorobi, and Bacteroidota (FCB) superphylum22,23. In this study, 495 Gemmatimonadota MAGs with completeness > 50%, and < 10% single gene duplications (based on CheckM24) were reconstructed from coastal sediments in the Bohai Sea, China (BS, 427 MAGs), coastal sediments in San Francisco Bay (SFB, 26 MAGs), U.S., hydrothermal sediments in Guaymas Basin (GB, 31 MAGs), Gulf of California, Mexico, biomat and oxide samples of Longqi hydrothermal vents in the Indian Ocean (IO, seven MAGs), and cold-seep sediments in the South China Sea (SCS, four MAGs), China (Supplementary Table 1). These bacteria represent < 5% relative abundance in the metagenomic assembled community in deep sea environments (GB, SCS, and IO) (see Methods, Supplementary Fig. 1). However, they are more abundant in coastal environments, specifically up to ~ 11% and 16% in SFB and BS, respectively. This may correlate with their ability to catalyze denitrification, which is known to be a dominant process in SFB and BS sediments25. Interestingly, Gemmatimonadota relative abundance increased with depth at all three sampling stations in BS, while it decreased with depth in GB (Supplementary Fig. 1).
The 495 MAGs were classified as Gemmatimonadota in Genome Taxonomy Database (GTDB) (Release 202, Supplementary Table 2), and this was confirmed by a maximum likelihood tree based on a concatenated alignment of 120 bacterial marker genes defined in GTDB-Tk (Supplementary Fig. 2). Glassbacteria, a phylum curated in NCBI, was classified as a class within Gemmatimonadota in GTDB. All Gemmatimonadetes genomes in NCBI are classified as Gemmatimonadetes class within the Gemmatimonadota phylum in GTDB. Based on the phylogenetic tree, these Gemmatimonadota MAGs were split into four groups, Group1, Group2, Group3, and Group4. These correspond to the JACCXV01, Gemmatimonadales, KS3-K002, and Longimicrobiales orders in GTDB. The genome sizes of the 495 MAGs range from 1.43 to 9.92 Mbp (average 3.68 ± 1.15 Mbp) (Supplementary Table 2). The wide range of genome sizes is likely associated with their evolution and ecology, rather than genome completeness. In support of this, different genome size ranges are associated with distinct habitats. For example, Group1 has the smallest average genome size and was only recovered from deep layers (30–62 cm) in BS samples (Supplementary Fig. 1). Group3 has a wider range of genome sizes than Group1 because of their prevalent distribution in different layers in BS samples, some GB samples and Great Barrier Reef samples. Group2 and Group4 are distributed in diverse marine and terrestrial habitats and have a wider range of genome sizes than the other two groups, which were only recovered from marine environments (Supplementary Fig. 1).
We selected 245 high-quality MAGs which have completeness > 80% and contamination < 5%, ranging from 2.24 to 6.67 Mbp (average 3.91 ± 0.99 Mbp) for phylogenomic analyses (Supplementary Table 2). To further confirm the taxonomy of these MAGs, we constructed two phylogenetic trees (Fig. 1 and Supplementary Fig. 3) of the 245 high-quality MAGs and 211 reference genomes. The phylogenies were constructed based on the concatenated protein alignment of 120 single-copy markers in GTDB (Fig. 1) and 37 concatenated ribosomal protein encoding genes identified using PhyloSift (Supplementary Fig. 3; see Methods). Both trees supported the classification of the four groups, which contained 10, 85, 100, and 50 MAGs for Group1, Group2, Group3, and Group4, respectively. However, the 37 marker gene trees showed that Group1 was phylogenetically closer to Glassbacteria than the rest of the three groups (Supplementary Fig. 3).
Average amino acid identity (AAI) analysis revealed that Gemmatimonadota MAGs are distinct from other phylogenetically related phyla (at most 45.9% identity to Fibrobacterota and 49.8% identity to Glassbacteria) (Supplementary Table 3 and Supplementary Fig. 4). AAI supported the classification of the four groups, which share a maximum 59.7% AAI between each other (Supplementary Table 4). The 16S rRNA gene phylogeny we constructed here is generally consistent with those previously reported10 and the 37-marker ribosomal protein and 120-marker phylogenies from this study (Fig. 1). 16S rRNA genes in Group1 are classified as the class AKAU4049 (JACCXV01 order in GTDB) (Supplementary Fig. 5). Group2 and Group3 16S rRNA genes are classified as classes Gemmatimonadaceae and PAUC43f marine benthic group, which correspond to GTDBtk orders Gemmatimonadales and KS3-K002, respectively. Group4 16S rRNA genes belong to the BD2-11 terrestrial group (Supplementary Fig. 5). However, based on our 37-marker ribosomal protein and 120-marker phylogenies, we suggest that Longimicrobiaceae, S0134 terrestrial group, and BD2-11 are within the Longimicrobiales order (Group4, Fig. 1).
The recovery of Gemmatimonadota MAGs in this study further confirms their global distribution 10,26–33 (Fig. 2). As of Nov. 11th, 2021, 324 Gemmatimonadota genomes were available in NCBI. 151 of these have been recovered from marine environments, including the Pacific Ocean, Atlantic Ocean, and Indian Ocean, however most Gemmatimonadota to date have been recovered from terrestrial environments (173 of 324 genomes). Group1 and Group3 are primarily composed of MAGs recovered in this study.
Protein-level comparison across the Gemmatimonadota
In order to resolve how these bacteria compare at the predicted protein level we clustered all of the genomes based on their Pfam profiles (see Methods). This approach has proven to be an effective way to identify guilds of bacteria that share common ecological capabilities34. This revealed these bacteria fall into seven distinct protein groups. Group1 forms a unique Pfam cluster, while Group2 is divided into three clusters (Fig. 1). Half of Group2 MAGs were deposited in the database, which were mainly recovered from terrestrial environments. However, Group2 MAGs recovered from this study were phylogenetically distinct with those recovered from terrestrial environments (Fig. 1). Moreover, the distribution of metabolic proteins based on presence/absence of protein families in these newly recovered Group2 MAGs were different from those curated in the database (Fig. 1). The interlaced Pfam clusters in Group4 (Fig. 1) together with their diverse habitats suggest more frequent horizontal gene transfer in Group4 than the other three groups. The worldwide distribution of this phylum reveals they are overlooked and of ecological significance.
Metabolic flexibility of Gemmatimonadota enables their wide distribution in marine environments
To understand the metabolic potential of the 245 high-quality MAGs (completeness > 80% and contamination < 5%, Supplementary Table 2) we compared their predicted proteins against six databases (see Methods). We determined that metabolic pathways for polysaccharide and detrital protein degradation, as well as sulfur, nitrogen, and iron utilization are common in these bacteria.
Large molecule organic matter degradation. Gemmatimonadota MAGs encode over 21,000 potential carbohydrate-active enzymes (CAZymes) classified as glycoside hydrolases (GHs), carbohydrate esterases (CEs), polysaccharide lyases (PLs), glycoside transferases (GTs), and carbohydrate-binding modules (CBMs) (Supplementary Tables 5 and 6, and Supplementary Fig. 6). Among ~ 11,100 glycoside hydrolases (GHs), carbohydrate esterases (CEs), and polysaccharide lyases (PL), 115 are predicted to be extracellular, contributing to the degradation of polysaccharides outside the cell. For example, extracellular CAZyme genes belonging to families/subfamilies CE1, GH16_3, and GH18, contributing to the degradation of xylan, chitin, and laminarin, were commonly identified in Groups 2, 3, and 4. Several MAGs in Group2 and Group4 encode multiple types of extracellular CAZymes (CE1 and GH16_3 in Group2; GH13_32 and GH16_3 in Group4) for the degradation of different substrates, e.g., pectin and laminarin (Supplementary Fig. 7). One MAG in Group3 (M8-44_Bin_110) encoded five types of extracellular CAZyme genes (GH16_3, GH30_1, GH136, PL31, and GH0), enabling them to degrade both complex and relatively simple carbohydrates, e.g., degrading laminarin by laminarinase (GH16) or releasing lacto-N-biose from oligosaccharide by lacto-N-biosidase (GH136). The released monosaccharides could benefit the community as a whole by supplying organic matter to other microorganisms.
Similar to CAZyme genes, Group1 encodes the least diversity of peptidases. For example, CAZyme genes annotated in over 50% MAGs of the other three groups are missing in Group1, such as CE14, GH13_9, GH57, GH74, and GH142 (Supplementary Tables 7 and 8, and Supplementary Fig. 6). Of over 41,000 identified peptidase sequences, 1,150 are predicted to be extracellular, suggesting that detrital proteins are degraded outside the cell and later taken up for consumption. Most of the MAGs recovered here have multiple extracellular peptidase genes (Supplementary Fig. 8). For example, they have genes predicted to produce extracellular peptidases belonging to family M28 (aminopeptidase and carboxypeptidase)35 and S8 (serine endopeptidase subtilisin)36. These are nonspecific peptidases that release amino acids for assimilation or dissimilation (Supplementary Fig. 8). Family M4, which are primarily secreted peptidases36, were identified across three groups (Group 2, 3, and 4) for degrading extracellular proteins and peptidases. The wide distribution of extracellular peptidase genes in marine Gemmatimonadota suggest these bacteria are important players in the degradation of detrital proteins. Additionally, these marine Gemmatimonadota also encode key genes for the transport, activation, and cleavage of fatty acids through beta oxidation37 (Fig. 3). The capability of degrading different types of large molecules, especially those extracellular degradations which release more readily degradable substrates, suggests that Gemmatimonadota may provide simple energy sources to support the entire microbial community.
Nitrogen, sulfur, and hydrogen cycling. Metabolic inference using Metagenomic Entropy Based Scores (MEBS)38 (see Methods) indicates Gemmatimonadota have pathways for nitrogen and sulfur utilization (Fig. 1, Supplementary Table 9). 203 MAGs are capable of incomplete denitrification, and encode genes for the reduction of nitrate, nitrite, and nitrous oxide (N2O), but not nitric oxide, as well as the oxidation of hydroxylamine (NH2OH) to nitric oxide (NO). Most MAGs (201/245) encode membrane-bound nitrate reductase (NarGHI) (Supplementary Fig. 9) and/or periplasmic nitrate reductase (NapAB) (Supplementary Fig. 10) suggesting Gemmatimonadota play a key role in nitrate reduction, the first step of denitrification. MAGs that encode NarG were commonly recovered from deep BS sediments (below 30 cm) and were rare at other depths. Some MAGs (40/201) encode both NarG and NapA, and these are predominantly from the BS (28–30 cm and 42–44 cm at M3, and 42–44 cm and 56–62 cm at M8). A phylogeny of NarG in Gemmatimonadota MAGs indicates that NarG is monophyletic, and thus may have been present in the last common ancestor of Gemmatimonadota (Supplementary Fig. 9). In contrast to the widespread presence of genes for nitrate reduction in Gemmatimonadota, dissimilatory nitrite reduction via NirK/S (62/245 MAGs) for NO production and NrfAH (8/245 MAGs) for ammonia production are less common in these bacteria (Supplementary Table 9). The 64 NirK/S genes occurred in all four groups from all the sampling sites, while NrfAH was mainly distributed in Group4 recovered from BS and SFB (Supplementary Table 9). For denitrification in BS and SFB, Gemmatimonadota likely relies on metabolic handoffs to complete denitrification, due to the lack of nitric oxide reductase, reducing nitric oxide to nitrous oxide. All Gemmatimonadota groups (125/245 MAGs) encode genes for periplasmic nitrous oxide reductase (NosZ), which reduces N2O to N2 (Supplementary Table 9). Phylogenetic analyses show that Gemmatimonadota NosZ are atypical type NosZ sequences (Supplementary Fig. 11), which are associated with microorganisms that are not complete denitrifiers39,40. N2O is a potent greenhouse gas and degrades ozone in the atmosphere41. The wide distribution of Gemmatimonadota suggests that they may have key roles in reducing N2O fluxes in marine environments42. Moreover, Gemmatimonadota have different transporters for small molecules, including nitrate, nitrite, and ammonium (Fig. 3). Gemmatimonadota has also been reported to hydrolyse urea as an energy source in wastewater treatment sludge43,44. However, we only identified urease (UreABC) from a single MAG in Group4 (M3-22_Bin_219), suggesting this is not important in marine environments. Collectively, our findings suggest Gemmatimonadota may play an important role in nitrogen cycling in marine sediments, especially in the coastal zones.
Unlike denitrification genes that are prevalent in all Gemmatimonadota groups, sulfur cycling genes are limited to specific Gemmatimonadota groups (Fig. 1). A clade within Group3 (21/100 MAGs), which has a unique protein composition (cluster1 in Fig. 1), has gene clusters for sulfate reduction (including DsrAB, SAT, AprAB, and QmoABC). Some Group2 MAGs (16/85) also appear to be capable of reducing sulfate. These DsrAB sequences were mostly associated with MAGs recovered from deep sediments (below 30 cm) in BS, where high concentrations of sulfate were detected (> 22 mmol/L in pore water)15. DsrAB genes (Supplementary Figs. 12 and 13) do not appear to have been horizontally transferred from different phyla45, suggesting sulfite reduction may be an ancient function within Gemmatimonadota.
Three deep branching groups consisting of five MAGs from SFB, four from BS, and one from BS in Group2 (10/85 MAGs), contain genes encoding sulfhydrogenase I complex (HydADGB)46 for coupling sulfur reduction with H2 oxidation (Supplementary Table 9). However, the majority of Gemmatimonadota are capable of oxidizing different sulfur substrates, e.g., sulfide and sulfite. Specifically, 167/235 MAGs (excluding 10 Group1 MAGs) have sulfide:quinone oxidoreductase (SQR) for sulfide oxidation, and 14 of those 167 MAGs encode both DsrAB and SQR (Supplementary Table 9). Phylogenetic analysis indicates that these SQRs belong to the membrane-bound type I, type II, and type III SQRs (Supplementary Fig. 14). Interestingly, 194/245 MAGs have homologs to eukaryotic thiosulfate/3-mercaptopyruvate sulfurtransferase (TST)47, which could transfer thiosulfate and cyanide to sulfite and thiocyanate. Group1 (8/10 MAGs) has homologs to eukaryotic sulfite oxidase (SUOX)48,49, a type of molybdopterin-dependent oxidoreductase, for sulfite oxidation with oxygen as the electron acceptor (Supplementary Table 9). However, Group1 was recovered from the deep layer of BS sediments (below 30 cm), suggesting that this sulfite oxidase may also use cytochrome c as the final electron acceptor49. Interestingly, all Group1 MAGs (10/10), as well as 2/85 Group2 and 5/100 Group3 MAGs recovered from deep layers of BS sediments (below 30 cm), encode methanethiol oxidase to aerobically oxidize methanethiol. Methanethiol is a key intermediate for global organosulfur compounds, e.g., dimethylsulfoniopropionate (DMSP) and dimethyl sulfide (DMS) cycling50,51. Moreover, 34 (25/85 Group1, 8/100 Group2, and 1/50 Group4) MAGs are predicted to produce DMS via methanethiol S-methyltransferase (MddA) from methylate L-methionine or methanethiol (MeSH) under oxic conditions52. In addition, 117 (62/85 Group2, 42/100 Group3 and 13/50 Group4) MAGs encode genes for the large subunit of thiosulfate dehydrogenase (DoxD), which may convert thiosulfate to tetrathionate. Six (1/85 Group2, 1/100 Group3, and 4/40 Group4) MAGs also have genes for the catalytic subunit of tetrathionate reductase (TtrA), which may reduce tetrathionate to thiosulfate. Thus, Gemmatimonadota likely play important roles in a variety of intermediate steps in marine sulfur cycling (Fig. 3).
Hydrogen metabolism is crucial in energy cycling in marine environments53. Gemmatimonadota, except for Group1, have different types of [NiFe] hydrogenases (Supplementary Fig. 15) and few [FeFe] hydrogenases (mainly in Group2) (Supplementary Fig. 16), suggesting hydrogen is coupled to metabolic pathways in these bacteria54,55. Hya hydrogenase (HyaABCD, [NiFe] type) was widely distributed in Group2, Group3, and Group4 (Supplementary Table 9). Hya hydrogenase is resistant to oxidative stress (e.g., superoxide and hydrogen peroxide), which may enable Gemmatimonadota to oxidize H2 in the presence of oxygen56,57. A subgroup of Group3 and Group2 also have the F420-non-reducing hydrogenase (MvhADG), belonging to Group 3c [NiFe] hydrogenase (Supplementary Fig. 17). This F420-non reducing hydrogenase links with heterodisulfide reductase (HdrABC), by providing reducing equivalents without reacting with F420, i.e., transporting electrons using H2 as an electron donor58. Additionally, 10/100 Group3 MAGs have the HoxFHUY operon (Supplementary Fig. 17), a bidirectional [NiFe] hydrogenase mainly described in Cyanobacteria. The Hox operon serves as a regulator for maintaining a proper redox state in the cell59, which could be important for the metabolic versatility of Gemmatimonadota.
Iron, mercury, and arsenic utilization. Microbially mediated iron cycling has been linked with many crucial marine processes, such as carbon storage, greenhouse gas emission, and primary production in the ocean60. We identified a variety of genes potentially involved in cryptic iron cycling in marine Gemmatimonadota, including iron acquisition, storage, oxidation, and reduction. Specifically, we identified two clusters of MAGs in Group3 and Group4 recovered from a wide range of depths (ranging from 0–62 cm) in BS sediments encoding genes for sulfocyanin61,62 (Supplementary Fig. 18), a putative iron oxidase. These bacteria may link iron oxidation via sulfocyanin with nitrate reduction via periplasmic nitrate reductase (NapAB) or nitrous oxide reduction via nitrous oxide reductase (NosZ) (Supplementary Fig. 18). We also identified three MtrABC operons63 (M3-44_Bin_97, M3-38_Bin_128, and M3-30_Bin_133) in Group2, suggesting they may be capable of reducing iron in anoxic sediments (Supplementary Fig. 18). Other widely annotated potential iron cycling gene homologs, such as cytochrome-c Cyc2 and DFE_461–465, in these MAGs suggest that Gemmatimonadota may actively participate in iron cycling; however, it is difficult to distinguish iron reduction and iron oxidation based on the current annotation.
Gemmatimonadota also encodes mercury and arsenic detoxification systems. They are capable of transforming the extremely toxic Hg(II) to metallic Hg(0) via mercuric reductase (MerA), potentially detoxifying mercury (Fig. 3). All four groups are capable of reducing arsenate to arsenite via arsenate reductase (ArsC) through thioredoxin64 (Fig. 3). Resistance and detoxification of heavy metals may enable Gemmatimonadota to be widely distributed from coastal sediments to deep oceans65, where Hg and As have accumulated from anthropogenic pollution66,67 or released via hydrothermal activity and volcanic eruptions68,69.
Extensive genetic potential for secondary metabolite biosynthesis in Gemmatimonadota
Microorganisms produce secondary metabolites to interact with other community members and their environment. The importance of biosynthetic gene clusters (BGCs) in Gemmatimonadota has been described in soil environments21. This has not been examined in marine Gemmatimonadota70 due to the limited representatives in public databases. We identified a diverse genetic potential for secondary metabolite biosynthesis, including nonribosomal peptide synthetase (NRPSs) and polyketide synthases (PKSs) (Fig. 1). Combined gene clusters consisting of different NRPS, PKS, and hybrid NRPSs/PKS were identified in 69 MAGs in four groups (Fig. 1). NRPS and PKS are known to synthesize a diversity of antibiotics, antifungals, and immunosuppressants with pharmaceutical potential71, while the majority of these NRPS and PKS have unknown end products72.
The most common type of BGCs identified in Gemmatimonadota are involved in the biosynthesis of terpenes, including carotenoid, isorenieratene, and N-tetradecanoyl tyrosine, and were found in 174 MAGs in these bacteria. Terpenes can have antibacterial properties73, participate in bacterial-fungal interactions74, and provide colorful pigments75. However, the ecological functions of different terpenes remain poorly understood. BGCs encoding lasso peptides, a class of ribosomally synthesized and post-translationally modified peptides (RiPPs)76,77, were identified in 29 MAGs mainly from Group3 (Fig. 1). The antibacterial properties of lasso peptides produced by Gemmatimonadota suggests a potential role of affecting the abundance of the other community members. Bacteriocins (TIGR03798, Nif11-related peptide) experience intensive post-translational modifications to generate antimicrobial peptides which are toxic to the strains of closely related species78. Genes encoding bacteriocin have been particularly prominent in Gemmatimonadota in soil environments79. We annotated genes for microcin, a type of bacteriocin80, in 20 MAGs exclusively within Group2 (Fig. 1). Specifically, six MAGs (recovered from below 30 cm at station M3 and M8, BS) within a monophyletic group in Group 2, have multiple copies of microcin genes that may mediate Gemmatimonadota population size81.
A broad diversity of Gemmatimonadota have the potential to produce different secondary metabolites, which may play a critical role in the survival and adaptation of the microbial community and result in their prevalence across different habitats. Perhaps most strikingly, there are clades in Group4 associated with corals that are enriched in bacteriocin, terpene, and type I polyketide synthase (T1PKS) genes (Fig. 1). These genotypes have unique protein composition comprising Pfam Cluster 7, suggesting these bacteria use secondary metabolites to interact with other organisms in reef communities. BGCs with low levels of similarity to known databases can be used to mine novel BGCs and point to new compounds82. The Gemmatimonadota phylum may thus represent a reservoir for the discovery of secondary metabolites, which could also be useful in medicine and biotechnology.
Potential Gemmatimonadota viruses
In total, 6,611 double-stranded DNA (dsDNA) viral metagenome-assembled genomes (vMAGs) of high- and medium-quality were identified from 15 BS samples (see Methods). We identified three CRISPR-Cas systems (Supplementary Fig. 19) and 639 CRISPR spacer sequences (Supplementary Table 10) in 156 of 245 high-quality Gemmatimonadota MAGs. However, only one Gemmatimonadota could be linked with vMAGs via the CRISPR spacer sequences. Using CRISPR spacers, tRNA matching, 6-mer oligonucleotide frequency, and whole genome matching, we identified 32 vMAGs ≥ 10 kilobases in length that potentially infected Gemmatimonadota (see Methods) (Fig. 4 and Supplementary Table 11). Among these are 15 viruses that could not be assigned taxonomy, while the other 17 of the 32 viruses were classified as Caudovirales belonging to Myoviridae (12), Podoviridae (2), and Siphoviridae (3) (Supplementary Table 12). However, none of these viruses were clustered with known viral genomes at the genus level based on shared-gene content (Fig. 4). To understand the viral roles in host metabolism, we assigned functions to the gene content of these 32 vMAGs, revealing a variety of putative auxiliary metabolic genes (AMGs) that may ‘hijack’ and manipulate host metabolism (Fig. 4 and Supplementary Table 13). We identified D-beta-D-heptose 7-phosphate kinase in three Myoviridae viruses from different BS samples, mainly associated with Group3 and Group2, suggesting these viruses may contribute to the assembly of the lipopolysaccharide in their hosts83. In addition, one unknown taxonomy virus and four Myoviridae viruses encode heptosyltransferase, a class of glycosyltransferases (GTs) that may modify heptose residues on lipopolysaccharides to affect viral-host interactions84,85.
We also identified viral genes involved in genome replication, nucleotide metabolism, and posttranscriptional modifications, including ribonuclease H (RNaseH-like domain), ATP-dependent DNA ligase (ligD)86, and peptidases (Supplementary Table 14). Four vMAGs contain genes predicted to encode ribonucleotide reductase, which is important for nucleotide metabolism in nucleocytoplasmic large DNA viruses (NCLDVs)87. Putative Gemmatimonadota viruses also contain genes for DNA methylation and glycosylation that may be important for host interactions. We identified genes for methyltransferase and endonuclease (Supplementary Table 14), suggesting the viruses may be involved in epigenetic modification via autonomous DNA methylation88. Finally, we identified genes encoding for pyruvate:ferredoxin oxidoreductase in one vMAG (Supplementary Table 14), suggesting that this virus may contribute to host anaerobic metabolism by generating acetyl-coenzyme A, carbon dioxide, and reduced ferredoxin (Fd2−)89.
Potential phototrophic and autotrophic capabilities in Gemmatimonadota
Gemmatimonadota in order Gemmatimonadales have recently been shown to possess photosynthetic gene clusters (PGCs)16. However, none of the newly reconstructed MAGs recovered here code for PGCs, suggesting horizontal gene transfer of PGCs is not common among Gemmatimonadota16 (Supplementary Fig. 20). We did not identify any key genes for the type II photosynthetic reaction center (puf, bch, and acsF genes) in our MAGs, as found in terrestrial environments, e.g., the isolate from freshwater Swan Lake in the Gobi Desert in China, the Cock Soda Lake, and Lake Baikal in Siberia16,90,91. Therefore, marine Gemmatimonadota appear to lack phototrophic metabolism. However, as stated above, Gemmatimonadota encode bacteriocins (TIGR03798, Nif11-related peptide) and carotenoids, which are associated with photosynthetic Cyanobacteria80,92 and the latter is thought to contribute to adaptation to low light conditions93 or UV exposure94. Thus, the phototrophic metabolism may be occurring in shallow marine environments.
Carbon fixation genes via Calvin–Benson–Bassham (rbcS, rbcL, and prk) have been reported in Gemmatimonadota from soda lakes91,95. However, the soda lake Gemmatimonadota MAGs are phylogenetically distinct from our marine groups (Supplementary Fig. 20). Moreover, only the large subunit of ribulose 1,5-bisphosphate carboxylase/oxygenase-like protein (RLP, form IV RuBisCO), potentially important for sulfur metabolism rather than CO2 fixation (Supplementary Fig. 21), was annotated in 43 MAGs in this study. Additionally, we did not find any complete autotrophic pathways (Wood-Ljungdahl pathway, Calvin–Benson–Bassham, reductive tricarboxylic acid, 3-hydroxypropionate bicycle, 3-hydroxypropionate-4-hydroxybutyrate, and dicarboxylate-4-hydroxybutyrate cycles) in our marine Gemmatimonadota MAGs (Supplementary Table 9). There has been no physiological confirmation of autotrophic metabolism in Gemmatimonadota, and thus they are likely heterotrophs.
Ecology of Gemmatimonadota
Gemmatimonadota is estimated to be the eighth most abundant bacterial phylum in soils, with relative abundance of ~ 1% of soil bacteria worldwide26. They are globally distributed with low abundance (< 2%) in marine environments96,97, and are estimated to be over 10% relative abundance in deep-sea sediments98. Marine clades are phylogenetically distinct from terrestrial clades, where Group1 and Group2 members described in this study are distinct from their terrestrial sister groups, and Group1 was only recovered from deep sediments (38–62 cm) in two sampling sites. This suggests a potential unique ecological role of marine Gemmatimonadota. The marine genotypes described here are metabolically diverse and many are capable of partial denitrification and organic carbon degradation. A diversity of nitrous oxide reductases suggests marine Gemmatimonadota may mediate the reduction of nitrous oxide to nitrogen gas for the removal of the most potent greenhouse gas, which is increasing due to increased anthropogenic activities41,99,100, and a vital process in ocean biogeochemistry. These organisms encode proteins for the degradation of different complex carbon compounds, including pectin, laminarin, and fatty acids. Marine pectin and laminarin are produced by photosynthetic marine microalgae101, diatoms, macrophytes102 and terrestrial plants103. Thus, Gemmatimonadota are likely players in organic matter degradation in the oceans. Also, the protein repertoire of these MAGs suggests they participate in arsenic and mercury cycling/detoxification. Interestingly, there are clades associated with coral reefs that are enriched in BGC genes, these genotypes also have unique protein profiles (Pfam Cluster 7, Fig. 1). This suggests that they produce metabolites for interactions in reef ecosystems.
The prevalence of Gemmatimonadota across various terrestrial environments has been shown in several studies9,11–14. However, the metabolic potential and ecological roles of Gemmatimonadota in the ocean is poorly understood due to a lack of genomic sampling. Gemmatimonadota have versatile metabolisms and high abundance in coastal areas where they appear to be involved in the degradation of complex organic carbon, denitrification, sulfate reduction, and sulfide/sulfite oxidation. Interestingly, marine genotypes are distinct in their numbers of BGCs, as well as, sulfur and iron metabolic genes. The expanded genomic biodiversity provided in this study is a framework to understand the roles of Gemmatimonadota on a global scale.