Discovery of novel archaeal and bacterial lineages containing FAE genes
Ass/Bss/Nms each consists of three subunits and form an (αβγ)2 heterohexamer [21]. Genes encoding the alpha (catalytic) subunits of Ass/Bss/Nms (thereafter referred to as FaeA) are regarded as diagnostic genetic markers for anaerobic hydrocarbon biodegradation [28]. A local database was established (n=66; Table S1), that included (1) canonical FaeA protein sequences of Deltaproteobacteria and Firmicutes, and (2) archaea-type AssA protein sequences in Vallitalea guaymasensis strain L81 and A. fulgidus strain VC-16. Based on this reference, 65 metagenomic datasets (~600 gigabase) were probed from a variety of different hydrocarbon-associated marine sediments as well as 3667 archaeal genomes (accessed from NCBI in September 2019) for FaeA protein sequences (Figure S1andTable S2).
Interrogating these genomes for FaeA sequences gave rise to 41 bacterial metagenome-assembled genomes, and 21 archaeal genomes from among the 3667 individual genomes downloaded from NCBI. Of these 62 genomes containing FaeA sequences, 45 had estimated completeness from 70-100%, while the other 17 were 51-70% complete. Estimated contamination ranged from 0 to 9.95% (Table S3). As expected, most identified protein sequences affiliated to those from the class Deltaproteobacteria (n=25; including orders Desulfobacterales, Syntrophobacterales and Desulfuromonadales) (Figure 1). FaeA sequences were also found in Chloroflexi (n=15, class Dehalococcoidia) and the relatively unstudied phylum Candidatus Stahlbacteria (n=1). Within the phylum Euryarchaeota, genes encoding FaeA were not only identified in Archaeoglobi (n=8) that are known to oxidize alkanes, but also in Thermoplasmata (n=2). Other archaea with these genes belong to the Heimdallarchaeota (n=2), Lokiarchaeota (n=6) and Thorarchaeota (n=3), all within Asgard superphylum (Figure 1). Phylogenetic placements of bacterial and archaeal genomes reported here were also cross-checked using 16S rRNA genes (retrieved in 32 out of 62 genomes), revealing taxonomic profiles consistent with the phylogenomic tree (Data S1).
Classification and verification of identified faeA genes
Previous studies reported that fae genes can be misannotated as pyruvate formate lyases, and vice versa [21, 30]. Therefore, manual curation of sequences is necessary for accurate identification of fumarate addition genes. Phylogenetic analysis of protein sequences identified from the 62 genomes along with established reference sequences gave rise to five clusters (Figure 2). Groups I to III contained protein sequences of canonical AssA, BssA and NmsA, respectively, in each case exhibiting high sequence similarity (42-100%) to experimentally proven hydrocarbon-degrading bacteria [15, 30-32]{Leuthner, 1998 #215} (Figure 2, Table S5). One Desulfuromonadales MAG (GB_003647135) encoded both canonical BssA and NmsA, suggesting its ability to degrade multiple substrates [33]. Group IV sequences clustered with neither canonical FaeA nor characterized archaea-type AssA (Figure 2). Phylogenetic separation of these sequences from pyruvate formate lyase protein sequences was evident from them sharing only ~20% amino acid identity, compared to 30-64% with known FaeA (Figure 2;Table S5). Group IV sequences belonged to genomes assigned to Desulfobacterales, Desulfatiglandales, Syntrophobacterales and Dehalococcoidia. Group V consisted of archaea-type AssA (Figure 2) including AssA protein sequences from anaerobic alkane-degrading V. guaymasensis strain L81 and A. fulgidus strain VC-16 [21, 34]. In agreement with this, Lokiarchaeota, Heimdallarchaeota, Thermoplasmata AssA protein sequences were closely associated with those in V. guaymasensis strain L81, with sequence identities of 32-62%. Meanwhile, the AssA protein sequences of Archaeoglobi and Thorarchaeota were more similar to that of A. fulgidus strain VC-16 (40-100% identity) (Table S5). One genome of Candidatus Stahlbacteria phylum also contained a gene encoding for archaea-type AssA (Figure 2).
All recovered protein sequences from Groups I-V were confirmed to be responsible for fumarate addition by analyzing conserved protein motifs. The multiple sequence alignment revealed a conserved glycine residue close to the C-terminus and a conserved cysteine residue in the middle of each aligned protein sequence (Figure S2). This feature has been reported as being specific to FAE, as compared to pyruvate formate lyases that display two adjacent conserved cysteine residues in corresponding regions of protein sequences [30, 33]. Apart from the alpha subunit of FAE, genes encoding their accessory beta- and gamma-subunits (FaeB and FaeC) were detected in Deltaproteobacteria and Dehalococcoidia genomes (Figure 3a). Despite the roles of these two small subunits remaining unclear, they are unique to FAE and not found in pyruvate formate lyases [35]. All 21 archaeal genomes analyzed lack genes encoding beta- and gamma-subunits, including A. fulgidus VC-16. The same was observed for the short-chain alkane degrader Peptococcaceae strain SCADC [21, 31]. It is possible that these shorter sequences were more often missed in lower-coverage genomes or during assembly. Alternatively, given the novelty of Group V archaea-type Ass, associated small subunits may be more difficult to predict.
Additional structure-based analysis was performed by protein homology modelling. As an example, one Dehalococcoidia MAG ScB_bin257 harbored genes for alpha, beta and gamma subunits of Ass and was screened for modelling. The crystal structure of Bss from Thauera aromatica (PDB ID: 5bwe) agreed with the Ass complex in Dehalococcoidia ScB_bin257. All key residues (i.e. glycine, cysteine, arginine) [36] required for catalytic activities of addition of fumarate to hydrocarbons were observed in the active site of Dehalococcoidia Ass (Figure S3). Other Dehalococcoidia and archaeal genomes were not checked due to absence of beta- and gamma-subunits.
Conversion to the active, radical-containing form of FAE requires an activating enzyme [12, 33]. Most of bacterial and archaeal genomes except Heimdallarchaeota contained genes encoding proteins similar to previously reported activating enzymes of FAE with sequence identities of 38-77% (Figure 3a and Table S6). In addition to the radical SAM cluster binding site in the N-terminus, all activating enzyme sequences described here contained two additional cysteine-rich regions likely involved in iron-sulfur cluster binding (Figure S4). Such a feature is not found in activating enzymes of pyruvate formate lyases [14, 15].
Reconstruction of downstream pathways
Abilities of bacterial and archaeal lineages to catalyze anaerobic hydrocarbon oxidation was further investigated by assessing the presence of downstream pathways that are required following the addition of fumarate.
Alkylsuccinate metabolism. Following alkane activation, alkylsuccinate is activated to the respective CoA ester followed by carbon-skeleton rearrangement and decarboxylation, yielding branched fatty acids (Figure 4) [37, 38]. These reactions are catalyzed by CoA-ligase (AssK), methylmalonyl-CoA mutase (McmLS) and acetyl-CoA decarboxylase (Acc)/methylmalonyl-CoA decarboxylase (Mcd), respectively. Genes encoding these enzymes were present in most bacterial and archaeal genomes studied here, including Dehalococcoidia (n=14), Deltaproteobacteria (n=18), Archaeoglobi (n=6), Heimdallarchaeota (n=1), Lokiarchaeota (n=1), Thorarchaeota (n=1) and Thermoplasmata (n=2) (Figure 3b). No genes were found to encode enzymes for alkylsuccinate metabolism in the genome of Candidatus Stahlbacteria (Table S7), suggesting its identified Group V FaeA sequence might be contamination arising from misassembly.
Arylalkylsuccinate metabolism. Toluene was considered as a representative alkyl-substituted monocyclic aromatic hydrocarbon. Following initial fumarate addition, the resulting benzylsuccinate is converted to the central intermediate benzoyl-CoA by the multi-subunit enzyme BbsA-H catalyzing several enzymatic steps [33] (Figure 4). Benzoyl-CoA is then dearomatized by BcrA-D/BamB-I and subsequently degraded to CoA-bound fatty acids in modified β-oxidation reactions by Dch/BamR, Had/BamQ, Oah/BamA [39] (Figure 4). The Bss-containing Desulfuromonadales GB_003647135 had a complete benzoyl-CoA degradation pathway and was regarded as a toluene degrader (Figure 3c). Previous studies suggested that some fumarate addition protein sequences that clustered with NmsA might function in the activation of toluene [40]. Accordingly, NmsA-containing Desulfobacterales (CK_bin19, GB_003646785, SB_bin384) possessed a partial benzylsuccinate degradation pathway. Therefore, these NmsA-containing Desulfobacterales may also be toluene degraders. In addition, Desulfobacterales KS_bin50, which contained Group IV FaeA, also had the potential to oxidize toluene with the support of a near-complete benzylsuccinate degradation pathway (Figure 3c).
For polycyclic aromatic hydrocarbons, 2-methylnaphthalene was used as the representative compound. Following 2-methylnaphthalene activation by fumarate addition, naphthyl-2-methyl-succinate intermediates are converted to 2-naphthoyl-CoA by reactions similar to anaerobic toluene degradation by BnsA-G [41]. However, such genes were not detected in recovered NmsA-containing genomes.
β-oxidation and fumarate regeneration. After conversion to CoA-bound fatty acids, degradation pathways of aliphatic and aromatic hydrocarbons converge at a conventional β-oxidation pathway (Figure 4). The resulting fatty acids would be degraded in a series of β-oxidation reactions (Acd, Crt, FadB, and AtoB), ultimately yielding acetyl-CoA [38]. The β-oxidation pathways were nearly or fully complete in all bacterial and archaeal genomes, with the exception of Candidatus Stahlbacteria (Table S7). In anaerobic alkane degradation, one propionyl-CoA is generated by β-oxidation (Figure 4) and can be used to recharge the fumarate supply. This reaction is catalyzed by a membrane-bound succinate dehydrogenase (Sdh) [38] that was identified in most bacterial and archaeal genomes (Figure 3d).
Mineralization versus fermentation. Hydrocarbons can be either completely mineralized to CO2 or fermented to organic acids and hydrogen as end products of incomplete oxidation. One Dehalococcoidia (EGoM_E44bin31), several Desulfobacterales (n=8) and several Archaeoglobi (n=5) genomes revealed the potential to completely oxidize acetyl-CoA to CO2 via the reverse Wood-Ljungdahl pathway, coupling hydrocarbon oxidation to the sulfate reduction (Figures 3e, 4a, and S5-S6). Sulfate-reducing Desulfobacterales also employed a two-step process through acetylphosphate to acetate for energy conservation (Figure 3f) catalyzed by phosphate acetyltransferase (Pta) and acetate kinase (AckA). One Archaeoglobi (Ag-1) carried a gene for respiratory nitrate reductase (NarGH), indicating potential to couple hydrocarbon oxidation with nitrate respiration (Table S4). Additionally, one genome classified as Heimdallarchaeota and eight bacterial genomes contained genes for cytochrome bd (cydAB), predicted to be involved in protection against O2 to allow strict anaerobes to survive in the presence of nanomolar O2 concentrations [42].
Other Dehalococcoidia (n=9), Syntrophobacterales (n=3), Desulfuromonadales (n=1), Heimdallarchaeota (n=1), Lokiarchaeota (n=6), Thorarchaeota (n=3) and Thermoplasmata (n=2) lacked genes encoding canonical terminal reductases (e.g., ferric iron, sulfate, nitrate). These genomes did contain genes for acetate fermentation catalyzed by ADP-forming acetyl-CoA synthetase (Acs). The fermentation process converted acetyl-CoA to acetate and simultaneously generated ATP via substrate-level phosphorylation (Figures 3f and 4b). Furthermore, genes encoding [FeFe]-hydrogenases and group 4 [NiFe]-hydrogenases for H2 production were also detected in some Dehalococcoidia, Syntrophobacterales, Desulfuromonadales, Lokiarchaeota, Thermoplasmata and Heimdallarchaeota genomes (Figures 3f and 4b). These groups might ferment hydrocarbons to reduced products like acetate and hydrogen, similar to scenarios in methanogenic hydrocarbon-degrading enrichment cultures [43].
Environmental distributions of microorganisms containing FAE
To investigate the biogeography of the newly identified hydrocarbon degraders, their relative abundances were determined in microbial community data derived from 20 geographically different locations (Figure S1 and Table S2). In general, FAE-containing hydrocarbon degraders were broadly distributed in subsurface sediments, natural seeps, hydrothermal fields, and oil-polluted sediments (Figure 5 and Table S8). Hydrocarbon-utilizing taxa were most obviously enriched in natural oil seep sediments, e.g., in the Eastern Gulf of Mexico, Campeche Knolls and Scotain Basin, comprising ~10% of the communities in these environments. Anaerobic hydrocarbon degraders were also present at high abundance in the subsurface sediments not associated with seepage, e.g., accounting for 11 and 16% of the communities in Costa Rica and Kattegat Sea sediments, respectively. Dehalococcoidia and Deltaproteobacteria were predominant bacterial lineages responsible for anaerobic hydrocarbon degradation in most habitats. Lokiarchaeota comprised relatively large fractions of the communities in both natural oil seeps (up to 6.5% of the community in Eastern Gulf of Mexico sediments) and the subseafloor (up to 3.8% of the community in Costa Rica). Lokiarchaeota were also found in the sediments from coastal ecosystems, but at low abundances (<1% of community). Hyperthermophilic Archaeoglobi were hardly detected in these marine sediments, even in hydrothermal vent sediments like Guaymas Basin. Recovered 16S rRNA gene sequences from lineages of those anaerobic hydrocarbon-degrading archaea and bacteria were used to search NCBI databases. The result suggested that they might be more prevalent in other environments such as oilfields (Table S9).