Phylogeny and distribution of ‘Ca. Aenigmarchaeota’
Eight MAGs of ‘Ca. Aenigmarchaeota’ were successfully reconstructed from five hot spring sediments collected in Tengchong county in Yunnan, China (Fig. 1a, Additional file 1: Table S1), including four from a single sample from Diretiyanqu-6 (DRTY-6) and one for each of the other springs Diretiyanqu-7 (DRTY-7), Gumingquan (GMQ), Qiaoquan (QQ), and Jinze (JZ-2) (Table 1) [14]. ‘Ca. Aenigmarchaeota’ represents a rare group in hot spring habitats with relative abundances of all MAGs < 0.5% (Fig. 1b). Most MAGs are high-quality, with completeness > 90% and nearly no contamination, and detectable 16S rRNAs and tRNAs (>21) (Additional file 1: Fig. S1; Table 1; Additional file 1: Table S2) [15]. Compared to related MAGs from other studies, these have smaller genome sizes (0.64 vs. 0.86 Mbp; Mann-Whitney U test, P = 0.0003; Additional file 1: Fig. S2) and a lower range of GC content (average 31.74% vs. 38.59%; Mann-Whitney U test, P = 0.012) [2,3,8]. They harbor less genes (752 vs. 1070; Mann-Whitney U test, P = 0.002), shorter average gene length (771 vs. 683 bp; Mann-Whitney U test, P = 0.0005), remarkably high coding density (88-94.6%) and number of overlapping genes (~20.6%) (Table 1). This is consistent with previous studies suggesting that thermophiles harbor small genome sizes as a result of genomic streamlining due to high fitness costs of life at high temperature [16]. Both whole genome-based phylogenomic and 16S rRNA gene-based phylogenetic analyses revealed that the eight MAGs from this study are highly scattered within the phylum ‘Ca. Aenigmarchaeota’ with high bootstrap confidences (Fig. 1c, Additional file 1: Fig. S3). Eight MAGs from Crystal Geyser share a high average AAI (>99 %), and the AAIs of the other 15 genomes range from 38% to 98.24% (Additional file 1: Fig. S4, Additional file 1: Table S3). Recruited 16S rRNA sequences representing this phylum demonstrate that ‘Ca. Aenigmarchaeota’ represents an evolutionary diverse group that inhabits a broad range of habitats (Additional file 1: Fig. S5; Additional file 2), including freshwater (40.96%) and marine (27.71%) environments, hot springs (7.63%), hydrothermal vents (6.83%), and groundwater (7.63%). A minor portion of the 16S rRNA gene sequences were retrieved from hypersaline lakes and soils (<5%). PCA results based on KEGG and arCOG annotations show that genomes are clustered by phylogenetic position rather than habitat type (Additional file 1: Fig. S6).
Metabolic features of ‘Ca. Aenigmarchaeota’
Consistent with previous studies, all eight MAGs in this study have a limited metabolic capacity. Pathways including the tricarboxylic acid cycle (TCA), fatty acid metabolism, and dissimilatory/assimilatory sulfur and nitrogen metabolisms were missing [8,17,18]. Unlike most genomes in DPANN [3,8], ‘Ca. Aenigmarchaeota’ MAGs possess a near-complete glycolytic pathway (Fig. 2). Phosphofructokinase (PFK) and glucokinase (GK) were solely found in DRTY-6_2 bin_201 and GMQ_1 bin_18-1, respectively. The absence of pyruvate kinase (PK) and pyruvate kinase isozymes R/L (PKLR) prohibit the conversion of phosphoenolpyruvate (PEP) to pyruvate during the last step of glycolysis. However, a gene encoded by phosphoenolpyruvate synthase (pps), which has been shown to perform the same function in thermophiles [19,20], was detected in most MAGs in this study (Additional file 3). All MAGs lack gluconeogenesis pathways because they are missing glucose-6-phosphatase, a key enzyme that hydrolyzes glucose-6-phosphate to glucose [21]. Several MAGs are predicted to ferment by production of acetate, lactate, and ethanol based on the presence of acetate-CoA ligase, L-lactate dehydrogenase, and acetaldehyde dehydrogenase/alcohol dehydrogenase, and would therefore be predicted to promote syntrophy among community members [3]. The lack of electron transport chain (ETC), especially V/A type ATPase, suggests that ‘Ca. Aenigmarchaeota’ cannot produce ATP via a membrane proton-motive force (PMF). Thus, glycolysis and other fermentation pathways could be the main sources of ATP for ‘Ca. Aenigmarchaeota’ [22]. However, MAGs from groundwater lack a glycolytic pathway [3,8].
Two genes relevant to polysaccharide degradation including α-amylase (for starch and glycogen) and α-1,6-glucosidase (for starch and disaccharides) were identified in many of the hot spring MAGs (Additional file 1: Table S4). DRTY-6_1 bin 65 might also degrade and utilize pullulan (GH13_20) [23,24]. Aside from α-amylase, MAGs from groundwater harbored different and more glycoside hydrolases, including β-glucosidases (for disaccharides), endoglucanases (for cellulose) glucoamylases (for starch), β-1,2-mannosidases (for beta-1,2-mannotriose and beta-1,2-mannobiose) and α-1,4-galactosaminogalactan hydrolase (for galactosaminogalactan). This might reflect a greater abundance and heterogeneity of carbohydrates in groundwater. None of the known carbon fixation pathways were detected in these MAGs, though three MAGs contain archaeal ribulose-bisphosphate carboxylase (Rubisco). Phylogenetic analysis suggests that the six Rubisco genes recovered from ‘Ca. Aenigmarchaeota’ belong to the form III group, of which five are from thermal environments (Additional file 1: Fig. S7). Four of them belong to group III-b and the remaining two could be classified as a novel lineage which clustered with Rubisco genes from Candidate Phyla Radiation (CPR) genes, which were previously suggested to have been passed by HGT from ‘Ca. Aenigmarchaeota’ to CPR [25]. As previously described in other archaea, Rubisco genes may function in the CO2-incorporating adenosine 5′-monophosphate (AMP) pathway, together with genes encoding for adenosine monophosphate phosphorylase and ribose-1,5-biphosphate isomerase [25,26]. We identified different types of hydrogenases in ‘Ca. Aenigmarchaeota’. Eight MAGs from thermal habitats harbor NiFe 3b-type hydrogenases and cluster into one clade, whereas NiFe 4-type has only been found in ‘Ca. Aenigmarchaeota’ from non-thermal habitats (Additional file 1: Fig. S8).
Despite the possession of genes involved in glycolysis, fermentation, and the nucleotide salvage pathway, the absence of many pivotal pathways strongly suggests a symbiotic lifestyle for ‘Ca. Aenigmarchaeota’. Firstly, this archaeal phylum is devoid of de novo amino acid biosynthetic pathways. Consequently, they have to salvage amino acids from environmental sources by degrading extracellular proteins and peptides and transporting by-products into the cell by using a variety of extracellular peptidases, membrane peptidases, cytoplasmic peptidases, and proteases, along with amino acid transporters (Additional file 3) [3]. Secondly, de novo nucleotide biosynthetic pathways are absent in most of the genomes of this phylum. Meanwhile, genes for purine and pyrimidine salvage pathways were detected in most of genomes, especially in MAGs from hot springs (Additional file 3). Thirdly, ‘Ca. Aenigmarchaeota’ genomes are unable to synthesize cell membranes de novo due to the lack of genes for synthesis of sterol isoprenoids involved in the mevalonate pathway (MVA), although genes encoding mevalonate kinase, glycerol-1-phosphate dehydrogenase, and associated enzymes for phospholipid biosynthesis have been detected [27,28]. Nevertheless, genes encoding S-layers, archaella, and type-IV pili were identified in eight MAGs, endowing “Ca. Aenigmarchaeota” with protection, motility, and cell-to-cell attachment abilities, which could consequently facilitate host-symbiont interactions [29,30,31].
Stress responses employed by ‘Ca. Aenigmarchaeota’
Comparative genomics showed that “Ca. Aenigmarchaeota” inhabiting thermal habitats harbor more genes involved genetic information processing including transcription, translation, replication and repair, and folding, sorting and degradation (Additional file 1: Fig. S9a). Combine with the smaller genome sizes of these thermophiles, these hint the smaller number of accessory genes and genome reduction may drive the losses. In addition, genes involved in cell motility, posttranslational modification, protein turnover, and chaperones predominantly were enriched in the thermophiles (Additional file 1: Fig. S9b). This reflects the fact that cells in at high temperatures have to combat constant thermal denaturation of both macromolecules and monomers [32,33,34]. ‘Ca. Aenigmarchaeota’ MAGs from thermal habitats have evolved multiple strategies to overcome this stress. Chaperonin GroEL, associated with repair of DNA and protein damage caused by high temperature, was present all ‘Ca. Aenigmarchaeota’ genomes (Fig. 3) [8,35]. The prevalence of DNA repair protein RadA could be used for homologous recombination and an alternative strategy for DNA repair [36], indicating the prevalence of genome streamlining among these genomes. Type I (IA and IB type) topoisomerases and reverse gyrase, the latter considered a hallmark of hyperthermophily, were solely detected in MAGs from thermal habitats, which could stabilize DNA and modulate DNA topology to maintain the structure and integrity of chromosome [34,37,38]. HSP70 (DnaK), DnaJ and GrpE were consistently absent in MAGs from thermal habitats, but commonly detected in groundwater MAGs (Fig. 3), consistent with the potential role of this system in the adaptation to mesophily [39].
The presence of cell defense systems provides protection from virus infection (Additional file 1: Fig. S10). One MAG DRTY-7_1 bin_34 was detected to contain a CRISPR-Cas system (Class III-A, Additional file 1: Fig. S10a). All three types of restriction modification (RM) systems were found in ‘Ca. Aenigmarchaeota’ [40] (Additional file 1: Fig. S10b). The type II RM system was detected in five of the MAGs. Additionally, the recently discovered Hachiman system was detected in DRTY-6 bin_65 and Gabija system in DRTY-6 bin_215, providing broad protection against viruses [41] (Additional file 1: Fig. S10c). The broad distribution of defense systems in thermophilic ‘Ca. Aenigmarchaeota’ suggests viruses could be an important threat for the survival of these and other microbes in hot spring habitats [42,43,44]. Viruses with different morphologies have been detected in hot springs and have been shown to be highly active in situ [14,41]. Hence, it seems plausible that ‘Ca. Aenigmarchaeota’ may confer their hosts with the immunity to viruses by serving as ‘viral decoys’ [45]. Meanwhile, the attracted virus could be degraded, and the DNA can be recycled as a nucleotide source [45]. Therefore, the host-symbiont interaction between ‘Ca. Aenigmarchaeota’ and its potential hosts appears to be mutually beneficial.
Horizontal gene transfer (HGT) in ‘Ca. Aenigmarchaeota’
Frequent horizontal gene transfer (HGT) between symbionts has been observed, owing to the physical proximity of genetic material [16,46,47]. As a consequence, organisms in stable symbioses can pass genes between each other via HGT, facilitating the co-evolution of partners during the long-term development of symbionts. Therefore, the detection of HGTs among ‘Ca. Aenigmarchaeota’ genomes may provide a clue to infer potential hosts. Surprisingly, results uncovered a lower proportion of HGT-derived genes in ‘Ca. Aenigmarchaeota’ in comparison to thermophilic, free-living Archaea; e.g., Aigarchaeota (mean 7.5 vs. 22.9%; Mann-Whitney U test, P = 0.00029) [7]. This result might reflect extensive genomic streamlining in symbiotic ‘Ca. Aenigmarchaeota’. Additionally, it seems plausible that restricted contact with non-hosts may limit the diversity of genes available for transfer to ‘Ca. Aenigmarchaeota’. Intriguingly, most MAGs within ‘Ca. Aenigmarchaeota’ possessed comparable percentages of HGT-derived genes (mean 7.5%; Additional file 1: Table S5). A significant positive correlation (Pearson’s R2 = 0.74, P value: 6.185e-6) was observed between detected HGTs and predicted gene totals in corresponding genomes regardless of habitat and genome sizes (Fig. 4a and b). MAGs from hot springs showed relatively larger variance in HGT rate, and thermophiles had both the highest and lowest HGT frequencies.
As is typical, very few of the detected HGTs in ‘Ca. Aenigmarchaeota’ are predicted to be involved in information processing (Additional file 1: Fig. S11c and d). HGT was less prevalent in the genomes of thermophiles; however, genes involved in carbohydrate metabolism and protein folding, sorting, and degradation were more frequent among HGTs in MAGs from thermal habitats. Thus, we hypothesize that HGT-derived genes for carbohydrate metabolic genes may have expanded their niche and/or facilitated their survival under stressful conditions. By looking into the potential donors, we found that members from DPANN including, phyla ‘Ca. Micrarchaeota’ and ‘Ca. Woesearchaeota’, contributed the most to the genetic innovations of MAGs from groundwater (Fig. 4c). However, the reported small genome size and potential symbiotic lifestyle of these members suggest they are not the hosts of ‘Ca. Aenigmarchaeota’. Unlike the MAGs from groundwater, less HGTs were observed between these two phyla and ‘Ca. Aenigmarchaeota’ in hot springs (Fig. 4c). Other HGT-derived genes could be traced to Euryarchaeota and TACK superphylum archaea, indicating a potential host-symbiont relationship, consistent with previous studies reporting symbioses between DPANN and TACK archaea and Euryarchaeota [10,16,48,49,50].
To infer the potential hosts for MAGs inhabiting hot springs and groundwater, we compared the acquired genes between these groups. Results indicated very little overlap between HGT-derived genes among these two habitat groups, only 39 KOs and 54 arCOGs (14.6 and 17.4% among all HGTs which could be assigned into a KO or arCOG number) (Additional file 1: Fig. S11a and b), suggesting they have distinct hosts, despite the broad phylogenetic distribution of these two habitat groups. MAGs from hot springs tend to form symbionts with members from TACK archaea or Euryarchaeota. This conjecture can be exemplified by the case study of the PFK gene, which was derived via HGT and likely drives niche expansion due to an expanded carbohydrate metabolism in ‘Ca. Aenigmarchaeota’ (Additional file 1: Fig. S11c; Additional file 4). A BLAST search against the NCBI-nr database showed that the best hit to the only detected PFK gene in DRTY-6_2 bin_201 is from ‘Ca. Bathyarchaeota’. Intriguingly, potential hosts may confer abilities to adapt harsh conditions. The detected reverse gyrase in GMQ bin_18-1 suggested that TACK-derived genes (‘Ca. Bathyarchaeota’) may have improved the fitness of this organism to invade high-temperature niches (Additional file 4). Also, several genes encoding superoxide dismutase (SOD2) and 8-oxo-dGTP diphosphatase (mutT) were identified as HGTs (Additional file 4 and 5), which could be employed to resist oxidative damage. BLAST searches of mutT genes confirmed that TACK members including Thaumarchaeota and ‘Ca. Bathyarchaeota’ may have served as donors. In QQ bin_128, a gene encoding an alkyl hydroperoxide reductase (ahpF) is situated near the mutT gene, which might be involved in responding to oxidative stress. Strikingly, the SOD2 gene in GMQ bin_18-1 even shows 89.1% identity and 100% of coverage to that from Pyrobaculum sp. WP30, indicating a rather recent HGT event. BLAST searches against NCBI-nr database suggest that all genes except this one in GMQ bin_18-1 show identity < 80%, even for those vertical inherited genes, revealing a novel lineage of this MAG with high divergence to the extant genomes. Therefore, it is reasonable to infer that Pyrobaculum has high possibility to be the host of GMQ bin_18-1 and the genetic interaction might be ongoing. In DRTY-6 bin_65, DRTY-6 bin_202 and DRTY-7 bin_34 MAGs, BLAST searches suggest that less than ten genes of each show identity > 80% to the existing genes. Three of them including transcription initiation factor IIB (TFIIB), phage integrase and translation elongation factor EF-1 (EEF1A) are identified to belong to phyla outside of DPANN, with former two show high identity to Euryarchaeota and the third one to Planctomycetes. However, none of them are identified as HGTs. In DRTY-6 bin_65, all three genes are in the same scaffold. The taxonomic information of nearby genes of TFIIB and EEF1A were mostly affiliated to ‘Ca. Aenigmarchaeota’, consolidating the inference of vertical inheritance of these two genes. The phage integrase may be transferred horizontally from Euryarchaeota since six and three of the ten downstream genes are close relatives to Theionarchaea (51.3-77.4%) and Thermoplasmata (48.3-61.1%). Among them, three genes exhibit homologies to methyltransferase, DNA binding protein and restriction endonucleases associated with type II restriction-modification (RM) systems. This suggests that the integrase-mediated HGT may confer ‘Ca. Aenigmarchaeota’ the special niche to resist the infection of virus. Thus, we conjecture that these three MAGs tend to form symbionts with Euryarchaeota and result potential bidirectional gene flows between symbionts and hosts (Additional file 1: Fig. S12). Overall, the above observations further support the hypothesis that TACK archaea and Euryarchaeota may serve as possible hosts. Though several bacterial phyla contribute a substantial of HGTs to ‘Ca. Aenigmarchaeota’, such as Firmicutes, we rule out the possibilities of them as hosts since most of the transferred genes represent ancient events with high divergence (~ 42% of identity for all best hits belonging to Firmicutes), which cannot reflect the current symbiotic state.
Co-occurrence pattern of ‘Ca. Aenigmarchaeota’ with community members
To further explore the potential host-symbiont relationships between ‘Ca. Aenigmarchaeota’ and the TACK superphylum and Euryarchaeota, a network interface was constructed, aiming to reveal microbial co-occurrence patterns and possible ecological interactions. The network encompasses 891 nodes and 11 of them are identified as ‘Ca. Aenigmarchaeota’ OTUs. Modularity analysis showed that four modules contained OTUs from ‘Ca. Aenigmarchaeota’ (Fig. 5). After extracting ‘Ca. Aenigmarchaeota’-containing sub-networks from each module, we observed tight connections between this phylum and OTUs from ‘Ca. Bathyarchaeota’ (module 3 and 5), Crenarchaeota (module 5), Thaumarchaeota (module 3), and Euryarchaeota (module 2, 3, and 5). Thus, both HGT analysis and network analysis consistently suggest that TACK superphylum archaea and Euryarchaeota are likely to be the hosts of ‘Ca. Aenigmarchaeota’. Notably, we observed significant positive correlations between ‘Ca. Aenigmarchaeota’ and several bacterial OTUs from Chloroflexi, Firmicutes, Nitrospirae, and Proteobacteria. Previous studies reported that these lineages are present at high abundance in their respective communities [51,52,53], and they contain several lineages of chemolithoautotrophs (lineages of Nitrospira, Firmicutes and Proteobacteria) or photoautotrophs (lineages of Chloroflexi) [54,55,56,57,58,59] and subsequently the primary producer of the community. The observed correlations could be a result of the nutrient dependency of ‘Ca. Aenigmarchaeota’ hosts or the symbionts themselves to these primary producers (Additional file 1: Fig. S12).
Here, we propose a co-evolutionary scenario among archaea living in hot spring habitats. Generally, all members in the community form a complex network (Fig. 5). The heterotrophic hosts and symbionts would stay in proximity and form a mutual beneficial relationship with each other. By direct contact, they can fuel themselves by extracting these requisite nutrients directly into the cell to sustain their growth. In return, provide the host with virus defenses by using themselves as decoy and degrade viral DNA by multiple systems. Subsequently use them as a potential DNA and protein sources. Spontaneously, horizontal gene transfer event could occur with the direct contact which plays a pivotal role in expanding the niche diversity of symbionts ‘Ca. Aenigmarchaeota’ (Additional file 1: Fig. S12).