Population genomes (PGs) from the Amazon River
Our original dataset contained 106 Amazon River metagenomes from 30 different stations divided into 5 sections: upstream, downstream, estuary, plume and ocean (Fig. 1a). We generated 30 co-assemblies (Table S1) and after binning, 54 high-quality PGs were selected featuring ≤500 contigs with half of them >10 Kbp, completeness >50%, contamination <10%, and quality ≥50% (quality=completeness – 5 x contamination) [Figure S1]. PGs featuring >99% of Average Amino-Acid Identity (AAI) were considered redundant and removed (Table S2). A total of 51 non-redundant PGs were kept in our dataset, including 49% (25) high-quality and 51% (26) medium-quality PGs (Table 1 and Table S3) according to previously used criteria (34). PGs ranged in size between 0.5 and 7.9 Mbp, the maximum contamination was 6.8%, with ~53% of the PGs displaying a contamination < 1% (Table S3). In total, we recovered 25 PGs from the upstream section, 9 from downstream, 6 from the estuary, 9 from the plume and 2 from the ocean (Fig. 1a-b). Proteobacteria (39% of PGs, considering Alpha-, Beta-, Gamma- subdivisions as well as PGs classified as Proteobacteria), Bacteroidetes (15.7%) and non-classified bacteria (15.7%) predominated (Fig. 1b). In contrast, Cyanobacteria represented only 4% of the recovered PGs (Fig. 1b). Only two archaeal genomes were retrieved: one belonging to Thaumarchaeota (from the Upstream section) and another one from Euryarchaeota (from the Plume). PG distribution and abundance along the river was heterogeneous (Fig. 2).
We identified 10 PGs that had a high similarity (>97% ANI) and non-significant (p > 0.05) differences to other known genomes (Table S4): Richelia intracellularis-A with AM_2804, Trueperella pyogenes with AM_0546, Acinetobacter junii with AM_0608, Methylopumilus sp1 with AM_0507 / AM_0219, Coccinistipes sp. with AM_2208, Sphingobium sp2 with AM_1603, the taxonomically unannotated UBA11236 with AM_1606, Xanthomonas fuscans with AM_0519 as well as a few characterized species from Rokubacteria, GWA2-73-35 sp1 with AM_2207. Other PGs from the TARA-Oceans expedition (35) or freshwater environments (24, 30, 31) did not display high similarity to the Amazon PGs. Thus, ~80% of the Amazon PGs had no close genomic relative in databases or published datasets. These PGs had the lowest taxonomic rank assigned at the level of kingdom (14% of PGs), phylum (14%), class (8%), order (16%), family (26%) or genus (4%). The remaining 18% PGs could not be taxonomically assigned.
Cellulose and lignin oxidation
Terrestrial organic matter (TeOM) degradation is a fundamental process in the Amazon River and happens in two steps that are modulated by microbes: first, lignin oxidation mediated by laccases, and second, cellulose degradation mediated by specific glycosyl hydrolases (GHs) families. More than half of our PGs (~53 %) did not possess the ability to degrade TeOM, while only 24 PGs were able to degrade TeOM (Fig. 3). Laccases were present in all taxa, except Bacteroidetes. All PGs displaying laccases also displayed GHs, suggesting the systems of hemi-/cellulose degradation and lignin oxidation are coupled. Furthermore, there were few cellulolytic PGs (~20%) that did not display lignin oxidation potential, pointing to two assemblages, one that besides being cellulolytic is also lignolytic, and another one that performs only cellulose degradation. Overall, the PGs with the highest potential for TeOM degradation were AM_0519 (Xanthomonas fuscans), AM_0876 / AM_0936 (both unclassified bacteria), and AM_1603 (Sphingobium sp2), according to our criterion of having a minimum of two protein families related to TeOM degradation, with at least two different genes.
Decoupling lignin-oxidation byproducts from TeOM degradation
After lignin oxidation, small aromatic compounds are formed and need to be internalized into the cell via transmembrane transporters to complete lignin degradation. Among the PGs having transporters for lignin oxidation byproducts (Table S5) only two of them (AM_0630 and AM_0902) were also lignin oxidizers. Thus, the oxidation of lignin performed by lignolytic assemblages seems to be completed by cellulolytic microbes that degrade aromatic byproducts.
PGs were analyzed also for genes required to process aromatic compounds produced after lignin oxidation (Table S6). Only two PGs (AM_0519 and AM_1603) seemed able to both degrade lignin-derived aromatic compounds and oxidize lignin. The PGs potentially able to degrade mono-/di-aryls derived from lignin did not possess genes for cellulose degradation or lignin oxidation. Therefore, there is an apparent decoupling of functions related to the oxidation of cellulose and lignin as well as functions associated to processing byproducts of lignin oxidation. The previous points to different assemblages specialized in each step of the TeOM degradation process (that is, lignin oxidation, degradation of byproducts generated by lignin oxidation, and cellulose oxidation).
Alternative carbon sources and carbon storage
TeOM degradation involves the formation of glucose (from cellulose hydrolysis) and various aromatic compounds (from oxidation of lignin and its derivatives); all viable carbon sources. Microorganisms tend to prefer specific carbon sources, like sugars, and in their absence, they metabolize other compounds, such as citrate, to obtain energy and structural carbon. Compounds that are metabolized only in the absence of preferred carbon sources, such as glucose, are called alternative carbon sources. For an effective carbon flux in aquatic environments, transporting systems present in microbes are crucial to ensure that alternative carbon sources can be used, such as tricarboxylates, mono- and di-aryls generated during lignin oxidation. In the Amazon River, there are two main carbon contributors: the TeOM as well as the less complex compounds, such as humic acids and tricarboxylates. In particular, tricarboxylates are good examples of alternative carbon sources, being constituted by molecules containing three carboxyl functional groups (-COOH), e.g. citrate. Tripartite tricarboxylate transporters (TTT) use substrate binding proteins to sequestrate their ligands from the extracellular milieu and to import them into the cytoplasm (Fig. 4a).
Only seven PGs appeared to use tricarboxylates via the TTT system (Fig. 4b). The PGs containing the complete TTT system included Alphaproteobacteria (AM_0275) as well as Betaproteobacteria, mainly from the Burkholderiales family. One important characteristic of the TTT system is the specificity of each substrate-binding protein to a certain substrate (Fig. 4a). This promotes a high diversity of tctC genes, which were found to range from tens to hundreds across PGs (Fig. 4b). In contrast, <10 genes appeared to be needed for the membrane attached portions (tctA and tctB) of this system (Fig. 4b). PGs containing a complete TTT systems seem uncapable of TeOM degradation, except for AM_0630, a Burkholderiales member containing laccase and GH8 genes. Interestingly, all PGs containing the TTT system (except AM_0630 and AM_0233) also had the biochemical machinery to process aromatic compounds derived from lignin oxidation.
Bacteria have developed impressive mechanisms to cope with adversity. Fluctuations in the water levels, change in the concentration of nutrients and seasonality, are common disturbances in the Amazon river. The production and intracellular accumulation of nutritive polymers, later used to prevent starvation during unfavorable conditions, represent an important trait in multiple microbes. In particular, specific mechanisms, such as carbon storage are relevant also to understand the flux of carbon inside ecosystems. One of the most important carbon storage systems is the polyhydroxy-butyrate (PHB) metabolism performed by a few enzymes (Fig. 4c). PHB biosynthesis enzymes were searched in PGs to evaluate their potential to store carbon via this polymer (Fig. 4d). Almost all PGs displaying the complete PHB pathway (phaA-C) (Fig. 4d) included also the TTT system, except for AM_0528 and AM_1603, which were found to be TeOM degraders and did not have the TTT system. Yet, the largest number of genes related to the PHB pathway were found in the TeOM degrader PG AM_1603, a Sphingobium representative. The largest gene diversity was observed to be related to the initial steps of PHB biosynthesis (genes phaA and phaB), not crucial for PHB production as they perform non-specific transformations, but ensure monomer availability. However, a few gene variants encoded the last steps performed by the phaC gene (Fig. 4d), which is the last and crucial step for PHB formation. The gene phaR, a transcription regulator protein also related to the accumulation of PHB, was present in 7 out of 13 PGs presumed to produce PHB (Fig. 4d). Only AM_1111 was presumed to produce other polymers than PHB, the polyhydroxy-alkanoate/butyrate, as it contains the phaE gene that allows this species to produce alternative monomers (Fig. 4d).