Sponge samples and metagenomic assemblies
L. herbacea sponge samples were collected over a three-year period from four different coastal locations in Guam, including both summer and winter seasons (Supplementary Table 1, Additional file 1). Based on a phylogenetic tree of internal transcribed spacer (ITS) regions (Figure 1), these samples included representatives of all four previously described L. herbacea clades [16]. Clade Ia and Ib samples, whose microbial communities have not previously been described, were obtained at multiple different times and locations. Samples from clades II and III, previously analyzed by 16S rRNA gene amplification [14] are represented by tissue samples taken from different individual sponges collected on the same dates at a single site.
Assemblies for each metagenomic sample were initially performed using all available reads for that sample (Supplementary Table 2, Additional file 1). Combined properties of read coverage, nucleotide composition, and predicted protein matches to the GenBank nr database revealed a relatively small number of discrete scaffold clusters, mapping primarily to database sequences from Cyanobacteria, Bacteroidetes, Alphaproteobacteria, Gammaproteobacteria, and Oligoflexia (Figure 2). Scaffolds derived from host sponge DNA were present at similar coverage depths to bacterial clusters, but were distinguishable by nucleotide composition and predicted matches to reference database sequences.
Metagenomic scaffold bins were used to recruit matching raw read pairs that were subsequently used in targeted sub-assemblies to produce 23 consensus population genomes (Table 1, Supplementary Table 3, Additional file 1). All of the assembled genomes except one (clade Ib: GM102ARS1) met or exceeded MIMAG standards for high or medium quality drafts [23]. CheckM analysis [24] estimated that 16 of the 23 MAGs were more than 90% complete, and 19 contained near full-length 16S rRNA gene sequences. The exceptional quality of these MAGs enabled detailed comparisons with both genomically sequenced relatives and environmental 16S rRNA gene surveys.
Taxonomic classification of assembled genomes
Taxonomic classifications for assembled MAGs were established using concatenated multi-locus phylogenetic trees of conserved proteins and matrices of 16S rRNA gene and average amino acid identity (AAI) scores (Figure 3, Supplementary Figures 1-5, Additional file 1). AAI scores were especially valuable in providing taxonomic classification levels for genomes with missing or incomplete 16S rRNA genes and those with few sequenced relatives, according to quantitative threshold ranges previously established for large numbers of database examples [25, 26].
Cyanobacteria genomes from the genus Hormoscilla were present in all samples, as expected, but sample SP5 from Clade Ia also yielded a genome from the genus Prochloron (SP5CPC1; Supplementary Figure 1, Additional file 1). Prochloron are well known tunicate symbionts [27], but have not previously been discussed in molecular studies of sponge microbiomes. Although Prochloron bacteria were first observed microbiologically on the surfaces of both marine sponges and sea cucumbers more than 30 years ago [28, 29], no sponge-derived matches to PCR-amplified (Figure 4) or metagenomically assembled Prochloron 16S rRNA genes from the SP5CPC1 genome were identified at 97% or greater nucleotide identity in the GenBank nr database (Supplementary Table 4, Additional file 1) or the Sponge Microbiome Project Database [30, 31] (Supplementary Table 5, Additional file 1). The Prochloron 16S rRNA gene sequences obtained in this study do not contain any mismatches to commonly used amplification primers that might explain their absence from these databases, but we are also not aware of any prior 16S rRNA gene amplification studies that included Lamellodysidea herbacea samples from Clade Ia.
Genomes SP12BCY1, SP5BCY1, and GM202BCY1 were assigned to the Bacteroidetes family Cytophagaceae (Supplementary Figure 2, Additional file 1). Based on AAI scores greater than 98%, these three genomes represent members of the same species. Their closest cultured relative was Ekhidna lutea, a free-living aerobic heterotroph isolated from seawater [32]. 16S rRNA gene identities of 95% and AAI scores of 67% suggest membership in the same genus as E. lutea.
Four Alphaproteobacteria genomes (SP12ARB1, SP5ARB1, GM7ARB1, GM7ARB2) were classified in family Rhodobacteraceae (Supplementary Figure 3, Additional file 1). Their most closely related cultured isolate was Nioella sediminis, a free-living aerobic marine bacterium from a sister genus of the Roseobacteria clade [33, 34]. Alphaproteobacteria genomes SP5ARS3, GM102ARS1, and GM202ARS1 fell within family Rhodospirillaceae. Terrestrial plant-associated Azospirillum brasilense [35] was their closest sequenced relative. Other members of genus Azospirillum have been identified in environmental samples from marine habitats, but not yet cultured or sequenced. An AAI score of 87% suggests that SP5ARS3 should be classified in the genus Azospirillum, but GM102ARS1 and GM202ARS1, with AAI scores of 49-51%, represent a new genus within the family Rhodospirillaceae. Alphaproteobacterial genomes SP12AHP2 and SP5AHP1 were classified in the Hyphomonadaceae family, as a sister genus to closest isolate relative Hellea balneolensis, an aerobic, heterotrophic bacterium isolated from surface seawater [36]. SP12AHP2 and SP5AHP1 have identical 16S rRNA genes and AAI scores of 99%, qualifying them as members of the same species in a new genus of family Hyphomonadaceae (Supplementary Figure 2B, Additional file 1).
GM7GCV1 was the only genome sequenced from the Gammaproteobacteria order Cellovibrionales (Supplementary Figure 4, Additional file 1). Its closest sequenced relative, Halieaceae LZ-16-2, is an uncharacterized bacterium obtained from a mixed laboratory culture with the saxitoxin-producing dinoflagellate Alexandrium tamarense (NZ_RFLW00000000.1). AAI scores of 50-65% with other members of family Halieaceae suggest GM7GCV1 might be classified as a new genus within this family, but 16S rRNA gene identities of 88-89% imply a more distant relationship, potentially in a new family.
The remaining community genomes were too distant from reference database examples to allow precise taxonomic assignments. GM7GCR1 had no close matches among sequenced isolates, being most closely related to unclassified environmental Gammaproteobacteria, with 16S rRNA gene nucleotide sequence identities of 90% and AAI scores below 45% (Supplementary Figure 3 and Supplementary Tables 4-5, Additional file 1). These results suggest it should be classified as either a new family within Chromatiales or a new order within Gammaproteobacteria.
Two Alphaproteobacteria genomes (GM202ARS2, GM7ARS4) fell outside any previously established orders, and could only be assigned at the class level, although their 55% AAI value when compared to each other implies they might be members of a single family (Supplementary Figure 2, Additional file 1). Genome SP5OBV1 was most closely related to Bdellovibrio bacteriovorus from the order Bdellovibrionales, but low 16S rRNA gene identity (82%) and AAI (40%) scores, suggest it might represent a new, previously unreported order within the recently described Proteobacteria class Oligoflexia (Supplementary Figure 5, Additional file 1) [37]. Verification of these taxonomic assignments will require the discovery of additional sequences for closely related taxa.
16S rRNA genes from Rhodobacteriaceae and Rhodospirillaceae genomes were conserved in samples obtained over three years in the current study, and also matched previously reported sequences from a 2005 amplification study of Guam Lamellodysidea sponges [14] at 98-100% identity (Supplementary Table 4, Additional file 1). No 16S rRNA gene matches to the Cytophagaceae, Cellvibrionales, Hyphomonadaceae, and Bdellovibrionales genomes from this study were detected in the 2005 Lamellodysidea microbiome study, but this may be due to the absence of samples from host clades Ia and Ib and the small number of clones analyzed (< 40 per sample) in this earlier work [14]. The closest GenBank 16S rRNA gene matches to MAGs in the current study were associated with Ircinia, Tethya. Stylissa, and Axinella sponges, at 90-96% identity levels (Supplementary Table 4, Additional file 1). Additional low abundance 16S rRNA gene matches were present at 97% identity in numerous other sponge genera from the Sponge Microbiome Project Database (Supplementary Table 5, Additional file 1). No species-level matches were detected in this database for 16S rRNA gene sequences from GM7GVC1 or SP5CPC1, but more distant matches were found at 95% identity, suggesting the presence of bacteria from the same genera but not the same species [38].
Relative abundance comparisons between samples
Microbial community compositions in all samples were strongly correlated with host clade taxonomy (Figure 4). The most taxonomically diverse communities were observed in host clade Ia, which included two bacterial taxa that were absent from all other clades: alphaproteobacterial family Hyphomonadaecae and cyanobacterial genus Prochloron. Host clade II samples were notable for the consistent replacement of Alphaproteobacteria family Rhodobacteraceae with Rhodospirillaceae. Sponge metagenome 16S rRNA gene sequences from assembled MAGs accounted for 44-88% of all bacterial amplicon sequence variants (ASVs) in sponge samples, but only 0.1% in nearby seawater (Supplementary Figure 6, Additional file 1).
Relative abundances of the four MAGs with incomplete or missing 16S rRNA sequences (GM102ARS1, GM102CHS1, SP5ARS3, and GM7ARS4) were assessed by recruitment of unassembled metagenomic reads to assembled MAGs (Supplementary Figure 7, Additional file 1). Metagenomic read recruitment is known to underestimate relative abundance of genomes with smaller sizes, but 16S rRNA gene amplification over-reports taxa with multiple gene copies, precluding exact numerical agreement. Despite these differences, results from both procedures were consistent in showing the host clade-specificity of taxonomic compositions, including greater diversity and the unique presence of Prochoron and Hyphomonadaceae in clade 1a, as well as the replacement of Rhodobacteraceae with Rhodospirillaceae in clade II samples. These patterns were consistent over all available sample collection time points and locations tested.
Predicted bacterial lifestyles
Predicted protein annotations for the 23 microbial genomes in this study were used to explore both broad, community-wide patterns and detailed metabolic pathways specific to individual genomes (Figure 5). Shared features among all taxonomic groups included the presence of aerobic respiration, glycolysis, and TCA cycle enzymes. None of the assembled genomes contained genes encoding flagellar biosynthesis, although genes for gliding motility were found in all Bacteroidetes family Cytophagaceae genomes (group BCY), and twitching motility in Cyanobacteria (groups CHS and CPC), one of the Gammaproteobacteria (group GCV), and Oligoflexia (group OBV) genomes (Supplementary Table 6, Additional file 1). Several genome groups contained expanded gene families encoding adhesive molecules with the potential to resist shear forces from high seawater flow rates, for example genes encoding cadherin and ankyrin domains in all Bacteroidetes family Cytophagaceae (BCY) and Gammaproteobacteria group GCR genomes, as well as type IV pilus structures in Cyanobacteria, Alphaproteobacteria Candidatus Methylospongiales, Gammaproteobacteria, and Oligoflexia genomes (groups CHS, CPC, AMS, GCR, GCV, and OBV). The absence of nitrogenase complex genes in any of the assembled MAGs, combined with the near-universal presence of ammonia transporters suggests ready availability of fixed nitrogen, consistent with detection of ammonia excretion in many sponge species [2].
Metabolic phenotype analysis of the assembled genomes identified pathways associated with phototrophic, methylotrophic, heterotrophic, and parasitic or pathogenic lifestyles. Phototrophic pathways included not only chlorophyll-based photosynthesis with carbon fixation via the Calvin-Benson Cycle in Hormoscilla and Prochloron cyanobacterial genomes, but also bacteriochlorophyll-mediated anoxygenic photosynthesis in alphaproteobacterial Rhodobacteriaceae, coupled with carbon fixation via the reductive TCA cycle. A third mode of light-driven energy production was identified in Bacteroidetes family Cytophagaceae genomes, supporting ATP synthesis through proteorhodopsin-generated proton motive force. Rhodobacteriaceae, Hyphomonadaceae, and Cytophagaceae genomes all contained carotenoid biosynthesis pathways, offering potential protection against free radicals generated during photosynthesis and/or exposure to ultraviolet radiation.
The GM7ARS4, GM202ARS2 and SP5GCR1 genomes all included a complete methylene-tetrahydromethanopterin dehydrogenase (dH4MPT)-dependent oxidation pathway, diagnostic for methylotrophic C-1 metabolism [39]. Each of these genomes also encoded a complete Type III secretion system (T3SS), often associated with eukaryotic pathogenicity. No pathways for dH4MPT-dependent oxidation or T3SS biosynthesis were present in other genomes recovered from this study. Based on these data, the taxonomic relationship between GM7ARS4 and GM202ARS2, and the absence of any other previously described bacteria from the same order, we have provisionally assigned these two genomes to a new order named Candidatus Methylospongiales.
T3SS operons in both SP5GCR1 and the two Methylospongiales genomes contained 15-18 proteins annotated as T3SS components, including base, inner rod, and needle proteins, pore-forming translocation proteins, chaperonins, ATPases and regulatory proteins. Although it was not possible to determine the nature of substrates being transported, T3SS operons also included matches to distinctive virulence-associated protein families (e.g. YscX), associated with human pathogens [40]. Placement of sponge microbiome YscR homologs in a reference tree of conserved, habitat-classified examples (Supplementary Figure 8, Additional file 1) [41], shows their closest relatives derive from extracellular bacteria associated with animal and insect hosts, but not plants, protists, or fungi. These results suggest possible roles for GM7ARS4, GM202ARS2, and SP5GCR1 in sponge host parasitism and/or pathogenicity. Genome sizes ranged from 1.6-1.8 Mbp, consistent with small sizes often observed in obligate symbionts and pathogenic bacteria. [42-44]
Evidence identifying other community members as potential obligate symbionts was constrained by assembled genome incompleteness and limited availability of well-characterized free-living relatives. However, supporting evidence was provided in some cases by consistent presence or absence of clade-specific diagnostic features in multiple closely-related MAGs from different samples. As an example, Hormoscilla genomes GUM202 and GUM007 have previously been shown to lack complete pathways for biotin synthesis [13]. This same deficiency was confirmed in Hormoscilla genomes from samples SP5, SP12, and GUM102, along with the presence of multiple transporters for importing this essential cofactor.
Genomic streamlining has previously been proposed as a shared feature of pelagic marine Rhodbacteriacae of the Roseobacter clade [45]. The genomes of GM7ARB1 and GM7ARB2, estimated to be 94-97% complete by CheckM analysis, are approximately 2.55 Mbp in size. This is smaller than the previously reported minimum genome size of 3.3 Mbp for Roseobacters, as well as closest free-living relative Nioella nitratereducens (4.0 Mbp). The four Rhodbacteriacae in the current study have 8-10% of their genomes devoted to transporter functions (Supplementary Table 6, Additional file 1), but lack genes for flagellar biosynthesis, and RuBisCO-mediated carbon fixation typically found in other family members. However, many other clade-specific characteristics have been preserved, including pathways for the synthesis of capsular polysaccharides and the degradation of phosphonates, urate and dimethylsulfoniopropionate (DSMP).
Marine Bacteroidetes from the Cytophaga-Flavobacteria group are noted for their role in degrading organic matter during phytoplankton blooms, but group members containing proteorhodopsin have consistently smaller genomes than close taxonomic relatives lacking this gene function [46]. Consistent with these observations, Cytophagaceae genomes SP12BCY2, SP5BCY1 and GM202BCY1 (~3 Mbp) are similar in size to free-living family members containing proteorhodopsin, but smaller than closest sequenced relative Ekhidna lutea (4.2 Mbp), which does not. SP12BCY2, SP5BCY1 and GM202BCY1 are enriched in signal peptide-containing peptidases (9-12 per genome) and glycosidases (3-4 per genome), suggesting the retention of conserved heterotrophic capabilities for degrading extracellular proteins and polysaccharides (Supplementary Table 6, Additional file 1).
All three of these genomes have retained Bacteroidetes-specific gliding motility and Por (Type IX) secretion system functions. The Type IX secretion system is both an essential component of gliding motility in non-pathogenic species [47, 48] and a virulence factor in human and fish pathogens [49-51]. A large number of novel cadherin domain-containing proteins (Supplementary Table 6, Additional file 1), with closest GenBank matches at 29-44% amino acid identity, may facilitate adhesion to sponge hosts and/or particulate matter. [52]
Potential symbiotic adaptations in Hyphomonadaceae genomes SP12HP1 and SP5AHP2 were suggested by the absence of both flagellar motility genes and the stalk formation pathway found in many other Hyphomonadaceae genomes [53]. However, genome sizes (2.4-2.5 Mbp) were only about 25% smaller than closest free-living relative Hellea balneolensis (3.2 Mbp). Lineage characteristic genes for heparinases, chondroitinases and chitinases, and signal-peptide containing glycosidases were retained in SP12HP1 and SP5AHP2, potentially facilitating the degradation of host extracellular matrix as well as organic marine particulates. The presence of typical Hyphomonadaceae pathways for the transport and degradation of aromatic compounds such as benzoic acid suggests retention of diverse heterotrophic capabilities.
Rhodospirillales genomes SP5ARS3, GUM102ARS1, and GM202ARS1 have much smaller genomes than their closest sequenced relative, the terrestrial soil bacterium Azospirillum brasilense (<3 versus 7.2 Mbp), but this difference could be due to expansion in Azospirillum rather than reduction in sponge-associated genomes. SP5ARS3, GUM102ARS1, and GM202ARS1 are unique among the genomes of this study in encoding pathways for carbon fixation via the Wood-Ljungdahl (Reductive Acetyl-CoA) pathway, the glyoxylate cycle, and trimethylamine degradation. The presence of nitrile hydratases and a large repertoire of transporters (9-12% of predicted coding sequences; Supplementary Table 6, Additional file 1) with predicted specificities for amino acids, oligopeptides, taurine, spermidine/putrescein, lipoproteins, ribose/xylose, and glycerol suggest metabolic versatility encompassing a wide variety of substrates, especially those containing amine groups. Genes encoding capsular biosynthesis may provide protection against host phagocytosis, viral attack, and/or hydrophobic toxins.
Marine relatives of genome GM7GCV1 include both free-living and host-associated species from Gammaproteobacteria family Cellovibronaceae, with AAI scores of 50% and 16S rRNA gene identities of 87-88%. All cultured examples have genome sizes of 4 Mbp or larger, including the dinoflagellate-associated Halieaceae LZ-16-2, GM7GCV1’s closest relative. The much smaller genome size of GM7GCV (1.8 Mbp) is typical for more distantly related, marine particle-associated Cellovibronaceae from the Tara Oceans project (e.g. TMED119 at AAI 45%, Supplementary Figure 3, Additional file 1), whose genome sizes range from 1.1 - 2.9 Mbp [54]. Factors responsible for smaller genome sizes in these uncultured, uncharacterized bacterial relatives are unknown.
The unavailability of sequenced genomes for relatives closer than order level makes it difficult to determine whether the SP5OBV1 genome lacks clade-specific features. However, it is unlikely that this species follows the same bacterivorous lifestyle as its distant cousins, not only because its genome size, estimated at 91% complete by CheckM analysis, is less than half as large as closest relative Bdellovibrio bacteriovorus (1.7 versus 3.8 Mbp), but also due to the absence of genes encoding flagellar motility and proteoglycan synthesis described as essential for this activity in other Bdellovibrionales species (reviewed in [55]). Fatty acid auxotrophy in SP5OBV1 is suggested by the absence of acetyl-CoA carboxylase and all other fatty acid biosynthesis pathway enzymes except FabG, coupled with the presence of complete pathways for lipoic acid metabolism and fatty acid degradation. Some mammalian pathogens are known to suppress endogenous biosynthesis while incorporating exogenous host fatty acids as a triclosan resistance mechanism, but previously reported genomic pathway deficiencies accompanying these adaptations have been much less extensive than those observed in SP5OVB1 [56].
Viral sequences and phage defense
Electron microscopic studies have identified phage-like particles associated with Lamellodysidea sponge samples [57], but these have not yet been characterized by sequence analysis. In this study, viral scaffold candidates detected by VirSorter [58] in assembled holobiont metagenomes were dominated by GenBank matches to double stranded DNA tailed bacteriophage (Caudovirales) of the Siphoviridae, Myoviridae, and Podoviridae lineages (Supplementary Figure 9, Additional file 1). Matches to Microviridae, archaeal Bicaudaviridae, and eukaryotic Baculoviridae and Herpes viruses were also detected at low levels. Genetic heterogeneity of integrated phage regions can make them difficult to capture in metagenomic assemblies, potentially underestimating occurrence. However, abundant CRISPR sequences, transposons, and restriction enzymes roughly proportional to genome size in assembled MAGs suggest a past history of recurring viral challenges (Supplementary Table 7, Additional file 1).
Integrated prophage genomes sometimes carry passenger genes of bacterial origin that can modify the phenotype of the host, resulting in improved fitness of infected cells, a process known as lysogenic conversion [59]. Lamellodysidea-associated phage candidate scaffolds included bacterial genes encoding DNA methyltransferases, glucanases, peptidases, and vitamin B-12 biosynthesis. One candidate Myoviridae scaffold, from sample SP5, encoded two key enzymes (PhnI and PhnJ) of the carbon-phosphonate lyase pathway [60]. This pathway has previously been demonstrated to be enriched under phosphate limiting conditions in metagenomic marine microbes [61], and could potentially assist in liberating phosphate from recalcitrant organic particles.
Secondary metabolite pathways
Secondary metabolite gene cluster candidates identified by antiSMASH [62] were most abundant in cyanobacterial taxa, with the largest number in previously described Hormoscilla genomes GM7CHS1 and GM202CHS1 (Supplementary Table 8, Additional file 1; [13]). Hormoscilla genomes SP12CHS1, SP5CHS1, and GM102CHS1 had fewer predicted clusters, but antiSMASH cluster detection sensitivity may have been reduced by scaffold fragmentation in these MAGs, which were assembled from Illumina reads without PacBio read supplementation.
The next most abundant source of biosynthetic gene clusters was Prochloron genome SP5CPC1, predicted to include two non-ribosomal peptide synthetases (NRPS), four terpene synthases, three ribosomally synthesized post-translationally modified peptides (RiPPs), and two flavin-dependent aromatic halogenases (Supplementary Table 8, Additional file 1). Although SP5CPC1 has a smaller genome size with fewer total clusters than previously sequenced Prochloron relatives [63], SP5CPC1 is also less complete, with short scaffolds and fragmented, incomplete pathway sequences that cannot easily be linked to specific molecular products. The aromatic halogenases were unrelated to those found in PBDE-producing strains of Hormoscilla spongeliae ([16], Supplementary Table 10B, Additional file 1), and putative RiPP clusters could only be classified as distantly related to non-cyanobactin bacteriocins and lassopeptides, based on HMM (Hidden Markov Model) pattern matches [62]. None of the putative RiPP clusters contained genes from the well-characterized patellamide pathway of Prochloron didimei [64].
Predicted biosynthetic clusters in non-cyanobacterial genomes were limited to terpenes and bacteriocin-related RiPPs, except for the SP12BCY1, SP5BCY1, and GUM202BCY1 genomes, which all contained an identical type III polyketide synthase (PKS). The closest database matches to this protein were from Cytophagaceae genera such as Ekhidna, Marinoscillum, and Pontibacter at 53-64% amino acid identity. Biosynthesis of plant-like flavonoids like those recently characterized in Flavibacteria cheonhonense [65] seems unlikely, because key pathway enzyme phenylalanine ammonia lyase is missing [66, 67]. However, type III PKS genes from the current study also matched CepAB genes from taxonomically distant sponge symbiont Entotheonella gemina (class Tectomicrobia) at 48% amino acid identity, suggesting phenolic lipids as a potential biosynthetic product [68].
Previously reported gene clusters encoding polybrominated compounds in assembled Hormoscilla genomes [13, 16] do not account for the full range of halogenated products described in Lamellodysidea sponges [69]. Additional diversity may be contributed by members of the broader microbial community, for example Hyphomonadaceae SP12AHP1 and SP5AHP2, which encode a highly expanded family of novel aromatic flavin-dependent halogenases (Supplementary Figure 10AB, Additional file 1). Several of these halogenases occur in sets of 2-4 tandem repeats, suggesting family expansion by gene duplication. Based on similarity to Pfam database model PF04820 [70], these proteins are annotated as flavin-dependent tryptophan halogenases. However, their closest match among experimentally characterized enzymes is the brvH gene product from Brevundimonas sp. BAL3, at 47% amino acid identity (Supplementary Figure 10B, Additional file 1). BrvH uses free indole rather than tryptophan as a substrate, preferentially incorporating bromine over chlorine into the C3 position [71],
It is possible that some of the additional diversity in halogenated compounds previously observed in Lamellodysidea sponges, including brominated phenols and catechols [72, 73], arises from degradative, rather than biosynthetic pathways. The SP12AHP1 and SP5AHP2 genomes each encode a complete 3,5-dichlorocatechol degradation pathway, characteristic of bacteria that use halogenated benzoates as sole carbon and energy sources [74]. Non-halogenated benzoate degradation pathways are common in other Hyphomonadaceae, but the SP12AHP1 and SP5AHP2 genomes are unique in encoding chlorocatechol 1,2-dioxygenase, a key enzyme that cleaves chlorocatechol rings to 2,4-dichloro-cis,cis-muconate, followed by spontaneous dehalogenation during further processing by muconate cycloisomerase, carboxymethylenebutenolidase, and maleylacetate reductase [75]. The closest chlorocatechol 1,2-dioxygenase match in the GenBank nr database (64% amino acid identity) was found in Altererythrobacter marensis, a free-living coastal marine bacterium from the Alphaproteobacteria order Sphingomonadales [76]. Dichlorocatechol degradation pathways in the SP12AHP1 and SP5AHP2 genomes are clustered together in the same conserved gene order as A. marensis (Supplementary Figure 11, Additional file 1).
Plasmids containing the 3,5-dichlorocatechol degradation pathway are frequently exchanged between environmental bacteria in chloroaromatic contaminated environments (reviewed in [77]). Although the pathway is not located on a plasmid or within a genomic island in SP12AHP1, SP5APH2 or A. marensis, patchy phylogenetic distribution and order-level relationships between closest protein sequence relatives suggests historical dissemination by horizontal gene transfer (Supplementary Figure 12, Additional file 1). It is not known whether amino acid sequence similarity to experimentally characterized chlorocatechol dioxygenases might also capture activity towards brominated substrates, including PBDEs, but this seems like a reasonable hypothesis for future testing.