Prokaryotic Community Diversity and Composition
To characterize the microbial communities inhabiting Antarctic geothermal sites, we extracted DNA from 21 soil samples from three geothermal areas in Victoria Land, Antarctica: Mt. Rittmann, Mt. Melbourne, and two sites on Mt. Erebus, Tramway Ridge (TR) and Western Crater (WC), for a total of four sampling sites (see Fig. 1 for maps and site descriptions). Two of these four sites, Melbourne and WC, have little to no fumarolic activity (i.e., fumarolic passive) and lower maximal soil temperatures (maximum 50℃). The other two sites, Rittmann and TR, have many actively steaming fumaroles (i.e., fumarolic active) and higher maximal soil temperatures (maximum 65℃). Thus, we categorized WC and Melbourne as “passive” sites and TR and Rittmann as “active” sites. We used 16S rRNA gene amplicon sequencing to study the microbial communities present in these samples, generating 1,871 amplicon sequence variants (ASVs) from 926,649 reads (Table S1). We also conducted metagenome sequencing on four samples, one from each site, with almost all metagenomes reaching sequencing saturation (nonpareil C value of 1 for all except TR, which was 0.93; Figure S2B). Metagenome diversity, based on nonpareil diversity indices, were 17.8, 15.6, 15.3, and 15.2 for TR, WC, Melbourne, and Rittmann, respectively, all very low compared to other habitat microbiomes (17.5 for human-associated, 19 for lake water) (38). From these metagenomes, we binned 140 metagenome assembled genomes (MAGs): 42 from Melbourne, 15 from Rittmann, 23 from TR, and 60 from WC, all with completeness > 80% (except for two that dropped to 77% during cleaning) and contamination < 5% (Table S2).
Overall, we found that the microbial communities at WC and Melbourne (hereafter, passive sites) were more closely related to each other than to TR and Rittmann (hereafter, active sites) (Fig. 2). This was most apparent from the 16S rRNA samples, where passive sites had significantly higher alpha diversity than the active sites (Fig. 2A), clustered very closely together in a principal coordinates analysis (PCoA) (Fig. 2B), and were more similar to each other based on abundance patterns both at the phylum (Figure S3A) and class level (Figure S3B). MAGs from passive sites also clustered together based on similarities in abundance profiles, separate from active sites (Figure S4). However, a pairwise PERMANOVA test using the 16S rRNA gene sequence data showed no significant difference between the microbial communities at any of the sites except between TR and WC (p = 0.05), which could be due to the small number of samples analysed at Melbourne and Rittmann. TR and Rittmann also clustered separately from each other in the PCoA plot, indicating distinct microbial communities at the two active sites (Fig. 2B).
Based on the 16S rRNA gene sequence-based and metagenome-based community composition of the sites, we found that differences between site types were primarily attributable to greater abundances of members of Crenarchaeota (class Nitrososphaeria) at active sites, while passive sites had significantly greater abundances of phylum Planctomycetota (family Gemmataceae) and class Acidobacteriae within Acidobacteriota (Fig. 2C-D, Figure S3). When looking for distinguishing taxa at each of the four sites, we found that TR was the easiest to distinguish from all other sites based on greater abundances of phylum Bacteroidota and the archaeal family Nitrososphaeraceae present (Fig. 2E). Interestingly, both the Rittmann and WC metagenomes had much greater relative abundances of Nitrososphaeria compared to the corresponding 16S rRNA gene sequence-based community profile (85% vs. 20% for Rittmann, 25% vs. 5% for WC for metagenome vs. 16S rRNA, respectively), which could indicate primer selection against this group.
We used our complete set of non-dereplicated MAGs, categorized by the site the MAG was binned from, to examine genomic characteristics of microbes from each site. We found that MAGs binned from active sites were distinguishable by their significantly smaller average genome size (average: 3.1 Mbp for active, 4.8 Mbp for passive) and significantly higher predicted optimal growth temperature (OGT) (average: 50.8℃ for active, 38.0℃ for passive); MAGs from Melbourne had the largest average genome size and the lowest predicted OGT of the four sites (Fig. 3A-B). We did not see significant differences in %GC, coding density, or predicted maximal growth rate between MAGs from the two site types or the four sites (Figure S5A-B).
When we explored the physicochemical parameters that could be influencing the microbial communities at these sites, we first found that the three volcanoes clustered separately from each other in the physicochemical data (Figure S6A), with elemental composition playing a large role in this separation (Figure S6B), reflecting the unique geological origin of each volcano. When we looked for correlations between physicochemical parameters and biological communities (16S rRNA gene sequence-based), although many parameters were significantly correlated (Figure S6C), only magnesium remained significant (p = 0.033) after Benjamini and Hochberg adjustment. Values for selected parameters are presented in Figure S6D and all physicochemical parameter values are given in Table S3.
Exploration of Cyanobacterial Diversity and Functions
We used the Cyanobacteria group as a case study to explore the influence of physicochemistry on the distribution and functioning of these important primary producers and to investigate the global uniqueness of these microorganisms. We binned an additional five cyanobacterial MAGs from the metagenome of Ritt3M-1, a cyanobacterial mat sample from Rittmann; this was necessary for our analysis as there were no cyanobacterial MAGs recovered from the other Rittmann metagenome sample. In total, we retrieved 50 cyanobacterial ASVs and 9 cyanobacterial MAGs from our data set.
Overall, we found organisms from three families (Nostocaceae, Leptolyngbyaceae, and Obscuribacteraceae), but could only identify a few instances of clear correlations between physicochemical parameters and the distribution and functioning of these groups (Fig. 4A, B). Supporting earlier findings at the entire microbial community scale, Tramway Ridge had a distinct Cyanobacteria community, with species from Leptolyngbyaceae and the genus Fischerella (family Nostocaceae) dominant and an absence of Obscuribacteraceae. Leptolyngbyaceae was positively correlated with gravimetric water content (GWC) and negatively correlated with nickel (Fig. 4C). In general, Nostocaceae was the most widely distributed and had a positive relationship with nickel and sulfur and was negatively correlated with GWC (Fig. 4C). Only MAGs from Nostocaceae had the capacity to fix nitrogen (Figure S7), which matches previously measured nitrogen fixation by this group at Melbourne (26).We found that Obscuribacteraceae MAGs were non-photosynthetic, as previously observed for this group (39), with pathways for ethanol fermentation and fatty acid degradation instead (Figure S7B).
To illuminate potential evolutionary relationships between the Cyanobacteria MAGs from this study and globally distributed representatives of these families from GTDB (GTDB species cluster representatives, 217 Nostocaceae, 38 Leptolyngbyaceae, and 55 Obscuribacteraceae), we used a phylogenomic approach (Figure S8). We observed that all MAGs from our study clustered with globally distributed members of these families, from the Mojave Desert to China to the continental subsurface. The short branch lengths we observed for our MAGs also confirming the cosmopolitan distribution of these families, as has been found previously at Tramway Ridge (29). These results are dependent on the completeness of the database used and should not be treated as conclusive evidence.
Functional Comparison of Sites
We analyzed the functional potential of the microbial communities at the four sites using the community metagenome sequences, post-assembly, looking at the abundances of 52 conserved marker genes that characterize various key metabolic pathways (Fig. 3C). All sites had a high abundance of marker genes for aerobic organotrophic respiration (NuoF, SdhA, AtpA, and CoxA), as well as low abundances of hydrogenases involved in atmospheric H2 scavenging (Group 1h and 1l NiFe hydrogenases) (40).
Several key differences were notable between sites. First, TR was distinguishable by an enrichment in the number of genes involved in sulfur (SoxB, Sqr, and FCC) and nitrogen (NosZ, NxrA, NirS, NirK, NorB, and NarG) metabolisms (Fig. 3C; see Figure S5D for the same heatmap in log scale for distinguishing lower abundance gene categories). These differences are expected given the relatively large concentrations of TN and sulfur at TR (Figure S6). A notable exception to the suite of nitrogen metabolising genes detected at TR is the lack of AMO (ammonia monooxygenase) encoding genes, an exception which has recently been accounted for (29). Melbourne and WC were highly similar to each other, except for the presence of methane monooxygenase genes at Melbourne and not WC (Fig. 3C). Several MAGs in the genus Methylocella were found at Melbourne, confirming the potential for methane metabolism there, which aligns with large measured concentrations of methane at Melbourne (41) (Table S2). Genes for methane monooxygenase genes were also detected in the samples from Rittmann, but no Methylocella MAGs. Rittmann was distinguishable by an enrichment in marker genes for anaerobic metabolism (FrdA) and CO oxidation (CoxL) (Fig. 3C). The gas composition of Rittmann fumaroles has yet to be studied, but based on these results, is likely to be enriched in CO and methane.
Habitat Specialization and Community Assembly Factors
Following our taxonomic and functional analysis of these Antarctic geothermal sites, we asked whether there was a large community of shared microorganisms between these sites (habitat generalists) and, if so, whether dispersal between sites was an important factor in structuring these communities. To do this, we first classified 16S rRNA gene sequence ASVs and MAGs by the number of sites or (for ASVs) samples at which they were present, after confirming that all samples were similarly sampled using rarefaction (Figure S2). Based on this classification, we observed that most ASVs (90%) and MAGs (77%) in our study were only found at either one or two sites/samples (Figure S9A-C, F, I). However, these site/sample-specific ASVs and MAGs comprised only a small fraction of the total read count at all samples. Instead, it was the ASVs and MAGs that were found across all four sites (36 ASVs, 3% of all ASVs; 2 MAGs, 2% of all MAGs) that were generally the most abundant (> 30% of all reads for both ASVs and MAGs) (Figure S9A, B, D, G, J). Melbourne and WC shared the most ASVs and MAGs (120 ASVs, 9% of all ASVs; 36 MAGs, 32% of all MAGs) (Figure S9A, B), consistent with their similar overall microbial community composition. TR shared very few ASVs with any other site (Figure S9A, B).
Next, we asked whether these common microorganisms were adapted to thrive at all of the sites (habitat generalists), or if their presence at all sites was a product of stochastic processes. We used a robust implementation of Levin’s niche breadth (42) with our 16S rRNA gene sequence data (reduced to 832 ASVs by eliminating ASVs only present in one sample) to identify specialists and generalists. Levin’s niche breadth (BN) is calculated by:
$$Levin{s}^{{\prime }}{B}_{N}= \frac{1}{R{\sum }_{\text{i}=1}{p}_{i}^{2}}$$
where R is the number of differing environments and pi is what proportion of all instances of taxon j is found in environment i. BN values close to 1 indicate broad habitat occupancy, while values close to 1/R indicate habitat specialization. In our data set, we defined each of our four sites as different environments and used ASV read counts for the taxon proportions.
In contrast to the theoretical distribution of niche breadth scores (Figure S10A), surprisingly, we did not find any habitat generalists, with 95% of ASVs instead identified as specialists and the rest as intermediate (Fig. 5A). Because two of our sample sets (TR and WC) were collected along temperature gradients, we re-ran the analysis with only high temperature samples (> 35℃), but again found 96% of ASVs were specialists; only one ASV (ASV_5, GAL15) was returned as a generalist (Figure S10B). This absence of habitat generalists held true both with a relaxed limit of quantification (LOQ) for including ASVs in the niche breadth analysis, as well as when we randomly subsampled within each site to bring the sample number from all sites to the same amount (data not shown). A recently developed social niche breadth metric (SNB(43)) identifies specialists or generalists based on the compositional similarity of the communities that microbes inhabit. Using this metric, we found a distinct skew towards specialization (lower SNB score) in our data set, with a much longer tail on the specialist side both in raw SNB scores and in modified z-scores at all taxonomic ranks, supporting our findings of high habitat specialization (Figure S10).
A refinement to Levins’ niche breadth adds in a measure of the resource availability across environments, called Hulbert’s niche breadth (42). This allows measurement of habitat specialism or generalism in reference to environmental gradients; in our case, we chose to examine temperature and pH, as these were identified as having the strongest correlations with community composition (Figure S6C). Again, we found a strong skew towards specialization for temperature and pH (95% specialists for temperature, 73% for pH) (Fig. 5B, C). However, there were two ASVs identified as generalists for temperature alone (ASV 5, phylum GAL15) or both temperature and pH (ASV 42, phylum Acidobacteriota, genus Bryobacter). When examining the abundance of these ASVs across temperature and pH, they clearly had a broader distribution across different temperature/pH values than highly specialized ASVs (Figure S10E). However, their distribution within sites is more variable (Figure S10E).
Next, we explored what ecological factors might be driving the lack of habitat generalists in our data set. First, we used a common method, beta nearest taxon index and Bray-Curtis-based Raup-Crick, to calculate contribution of different factors to community assembly in the 16S rRNA gene sequence data (44–46). We found that, across all sites, ecological drift was the strongest factor (30.5%), with homogenous dispersal (i.e., similar community structure caused by high dispersal rates) was the second strongest factor (25.5%) (Fig. 5D). On the other hand, 20% of the community assembly could be explained by dispersal limitation (i.e., dissimilar community structure caused by low dispersal rates) (Fig. 5D). Environmental selection (either variable selection, i.e. dissimilar community structure caused by different environmental conditions, or homogenous selection, i.e. similar community structure caused by similar environmental conditions) was the weakest factor (11.4% variable selection, 12.9% homogenous selection) (Fig. 5D). Similar results were found when comparing only active sites or only passive sites, although homogeneous environmental selection was much stronger within passive sites than active (Fig. 5D). The importance of ecological drift and dispersal in structuring these communities was again apparent when we examined the distance-decay plot for the 16S rRNA gene sequence data, finding a relatively flat slope of the linear regression line (m = -0.0003), which points to strong dispersal between sites (Fig. 5E). This shallow slope held true when Active and Passive site samples were considered separately as well (m = -0.0007 and − 0.0003 for Active and Passive, respectively) (Fig. 5F, G). Slopes for similarly large-scale studies have shown much larger slopes (e.g., (47–49)).
Functional Imprint of Habitat Specialization
Finally, we explored the impact of high levels of habitat specialization on the microbial communities from these sites by exploring the genomic and functional profiles of MAGs based on site distribution. Generally, habitat generalists tend to be metabolic generalists (1, 2); given the absence of habitat generalists in our data set, we predicted that the number of sites that a microbe is present at should have little to no correlation with genome size, number of unique protein domains (used as a proxy for metabolic breadth), or predicted metabolic pathways present. Rather, we would expect that most microorganisms would likely be metabolic specialists.
Indeed, we found very few significant differences (two-sided Wilcox test, Benjamini & Hochberg-adjusted p-value < 0.05) between different MAG site occupancy categories in genome size, number of unique pfam domains, or predicted metabolic pathways/categories present (Fig. 6A-B). We did find that MAGs occupying three sites had significantly more metabolic pathways/categories than MAGs only occupying one site. Additionally, we observed significantly larger genome sizes and numbers of unique pfam domains in MAGs present at both WC and Melbourne compared with those occupying only WC (Fig. 6B). Correlations between other genome characteristics and site occupancy were non-significant (Figure S11A-B). When we explored the ASVs that were identified previously as generalists for temperature and pH, ASVs 5 and 42, we were able to match ASV 5 to the MAG Ritt_bin.6.strict_final with 100% identity. This MAG had some hallmarks of generalism, with 20.9 predicted metabolic pathways/categories but a genome size of only 3.4 Mbp. Two MAGs (Erebus-WC_bin.32.strict_final and Mel_bin.5.orig_final) were classified to the Bryobacteraceae family, as was ASV 42; neither MAG contained any 16S sequence, so we cannot be sure if they match ASV 42. Still, both MAGs have genome sizes and numbers of predicted metabolic pathways/categories on the upper end of the distribution (6.2 Mbp and 10.1 Mbp, respectively for genome size; 18.2 and 26.2, respectively, for metabolic pathways/categories).
We investigated the different kind of metabolisms present in MAGs based on site occupancy patterns and found few patterns (Fig. 6C, Table S4). We did observe a trend towards MAGs present at two or three sites being more likely to have the capacity for nitrile hydration (6%, 17%, or 11% of MAGs present at one, two, or three sites, respectively) and nitrate reduction (6%, 13%, or 11% of MAGs present at one, two, or three sites, respectively) (Fig. 6C). Nitrile hydration provides microorganisms with another avenue of acquiring carbon and nitrogen needed for growth (50). Also, MAGs at one or two sites were more likely to have the capacity for selenate reduction compared to those at three or four sites (9%, 12%, or 0% of MAGs present at one, two, or three and four sites, respectively) (Fig. 6C). Notably, the two MAGs that were present at all four sites lack many pathways/categories that were found in other MAGs, such as ethanol fermentation, nitric and nitrous oxide reduction, urea utilization, hydrogenases, and halogenated compound utilization, among others (Fig. 6C), indicating that presence at all four sites does not require metabolic generalism.
As we initially hypothesized, there was no observable trend towards metabolic specialization, rather, relatively even numbers of metabolic specialists and generalists across sites and site occupancy patterns (Fig. 6C). Rather, there were relatively even numbers of metabolic generalists (many types of pathways, aerobic/anaerobic, multiple energy sources, etc.) and metabolic specialists between the different site occupancy patterns. One example of a metabolic generalist, the MAG with the second largest number of metabolic pathways present, Ritt_bin.7.strict_final (Gammaproteobacteria, genus Paraburkholderia), appears to be capable of, among other things, urea utilization, thiosulfate and sulfur oxidation, nitrite reduction to ammonia, fatty acid degradation, ethanol fermentation, some types of complex carbon degradation and C1 metabolism, carbon fixation (via RuBisCo), and aerobic CO oxidation, yet was only present at Rittmann (Fig. 6C). On the other hand, we observed multiple metabolic generalist MAGs that also had broad habitat ranges, such as Erebus-WC_bin.32.strict_final (Acidobacteriota, order Bryobacterales) which was found at all sites except TR and is capable of both organic and inorganic metabolisms, and both aerobic and anaerobic metabolism (among other things, sulfur oxidation, nitrile hydration, nitrite reduction to ammonia, iron oxidation, fatty acid degradation, ethanol fermentation, some level of complex carbon degradation, chlorite reduction, and aerobic CO oxidation) (Fig. 6C).
One factor that seems to be largely negating the trends towards metabolic generalism in MAGs with higher site occupancy is the presence in our dataset of a large number of archaeal MAGs with broad habitat ranges (present at 3 + sites) with few predicted metabolic pathways and very small genomes (mean archaeal genome size: 2 Mbp). For example, Erebus-TR_bin.23.permissive_final (class Nitrososphaeria), present at all four sites, has a limited metabolic repertoire (genes for aerobic CO oxidation, chlorite reduction, nitrite reduction, and sulfur oxidation), with mostly aerobic pathways (Fig. 6C). Similarly, Ritt_bin.9.permissive_final (class Nitrososphaeria, order Conexivisphaerales), present at all sites except Melbourne, is capable of the same set of metabolic pathways minus nitrite reduction (Fig. 6C).
We did observe that most MAGs only present at active sites (e.g., Rittmann only, TR only, or Rittmann + TR only) were lacking in several metabolic categories that were found in MAGs present in at least one passive site (Figure S11C). These pathways include methane oxidation (unique to MAGs in Melbourne, matching the metagenome functional analysis results), nitrile hydration, manganese cycling, nitrate reduction, urea utilization, and ethanol fermentation (Figure S11C). These metabolic pathways do not match with measured, corresponding nutrient concentrations (e.g., manganese and total nitrogen levels are significantly greater at TR than WC or Melbourne (Figure S6D), despite manganese cycling being enriched in MAGs present at WC and Melbourne compared to TR), so it is not clear why these pathways are enriched in these MAGs.