Taxonomic structure and composition of the capybara gut microbiota
To explore the microbial community structure, membership, and metabolic exchange of capybara gut microbiome, we collected replicated fresh samples from the cecum and rectum from three wild female animals. Herein we combined several culture-independent omics approaches including 16S rRNA targeted sequencing to access the community structure; whole shotgun metagenome sequencing (MG) to reveal the community genetic and functional profile; metatranscriptomic RNA sequencing (MT) to determine the expression level of genes; and NMR-based metabolomics to elucidate the small molecules profile of this specialized community adapted to degrade recalcitrant plant polysaccharides.
The capybara gut microbiome taxonomic structure is dominated by Bacteria, with only a small fraction of reads corresponding to Archaea (16S: 1.07%, MG: 0.40% and MT: 1.55%), and Fungi (MG 0.12% and MT: 0.32%). The most abundant bacteria found in capybara gut microbiota based on 16S taxonomy analysis were members of the Firmicutes (mean ± sd: 35.8 ± 12.4%), Bacteroidetes (31.5 ± 9.8%), followed by Fusobacteria (15.3 ± 5.4%) and Proteobacteria (8.4 ± 5.4%) (Fig. 1A). A similar taxonomic distribution was observed for 16S rRNA reads recovered from metagenome (16S_MG), MG and MT datasets (correlation coefficient r = 0.96 for MG:MT, r = 0.74 for 16S_MG:MG and r = 0.71 for 16S_MG:MT, P < 0.05). Bacteroidetes, Fusobacteria and Proteobacteria phyla were significantly more abundant in MG than in MT considering cecal samples (Fig. 1B), whereas Euryarchaeota (MT/MG ratio expressed as mean ± sd: 2.3 ± 0.6), Fibrobacteres (1.7 ± 0.9) and Spirochaetes (1.6 +/- 0.4) were more represented in MT than in MG data in cecal samples (Fig. 1B). The higher MT/MG ratios for these phyla indicate that these microorganisms were more active in the time of samples collection, and thus that they may be important players in this gut compartment.
Binning of MG assembled contigs based on tetranucleotide frequency and coverage profile resulted in the building of 79 unique Metagenome Assembled Genomes (MAGs), with completeness > 55% and contamination < 15% (Supplementary Table 1); among those, 24 were considered of high quality (completeness > 90% and contamination < 5%) and 50 medium-quality (completeness > 50% and contamination < 10%), according to parameters suggested by Bowers et al. 2017 (15). Taxonomy classification indicates that 35 of the recovered MAGs belong to the Firmicutes phylum, including six from the Erysipelotrichaceae family and eight from the Lachnospirales family. The second most abundant group was the Bacteroidetes, with 30 MAGs classified in the Bacteroidales order (Supplementary Table 1). Although only two genomes from Fusobacteria and Proteobacteria were recovered from MG, the most abundant OTUs identified by 16S analysis were classified as Fusobacteria and Proteobacteria with a relative abundance of 20% and 18%, respectively, (Figure S1), pointing to a central role of those species in this environment.
A dominance of Firmicutes, Proteobacteria, Bacteroidetes and Tenericutes was observed in the hindgut microbiomes of other herbivores such as Castor fiber, Castor canadensis, horse, rabbit and koala (16–20). Further, microbiota analysis of domesticated herbivores including hindgut fermenters, ruminants and monogastric animals revealed Firmicutes as the dominant phylum (53.11, 63.35 and 52.27% respectively), followed by Bacteroidetes (31.36, 20.95 and 26.95%, respectively). Although the dominance of Bacteroidetes and Firmicutes is a general feature of mammalian gut microbiomes, the microbiota of native Brazilian capybara differs from other hindgut fermenters and ruminants, mainly due to a reduced abundance of Firmicutes (35%) along with a higher abundance of Fusobacteria (15%) and Proteobacteria (8%) (21). The increased presence of Fusobacteria can be associated with the production of butyrate, a short-chain fatty acid that is often the end-product of carbohydrate fermentation (22). On the other hand, and in spite the high polysaccharide diet, the lower abundance of the Firmicutes in the capybara microbiome may point to strategies for lignocellulose utilization distinct from those typically found in other hindgut herbivores and ruminants.
Metabolic profiling indicates high performance on the conversion of dietary fibers
Recalcitrant glycans found in diet components such as cellulose, hemicellulose and pectins are processed via anaerobic microbial fermentation to produce a wide range of metabolites, reflecting the diversity of substrates available in the digestive tract of herbivores, as well as the biochemical potential of the gut microbiota. The major fermentation products detected in the capybara gut by NMR spectroscopy-based metabolomics, were short-chain fatty acids (SCFAs) such as acetate, propionate, and butyrate, among more than 40 metabolites measured (Supplementary Table 2). SCFAs were detected in high concentration in both cecal and rectal samples. The most abundant metabolites were acetate (mean ± SD: 74.83 ± 22.17 and 30.40 ± 22.76 µM), propionate (31.0 ± 6.67 and 15.98 ± 12.8 µM) and butyrate (23.30 ± 5.63 and 8.35 ± 12.83 µM) in cecal and rectal samples, respectively (Supplementary Table 2). These SCFA ratios indicate a forage-based diet and are similar to that observed for ruminants (23, 24).
The MG and MT datasets were analyzed to describe the microorganisms and metabolic pathways associated to fermentation and SCFA production (Supplementary Fig. 2A). Genes related to pyruvate fermentation were highly abundant in both MG and MT data for cecal and rectal samples and the microbiota related to this pathway was dominated by Firmicutes, Bacteroidetes and Fusobacteria (Supplementary Fig. 2B). Metabolic pathways reconstruction of the 79 unique genomes recovered from capybara gut microbiome was conducted to further investigate the contribution of individual microorganisms to SCFAs production (Fig. 2). This analysis indicates that acetate can theoretically be produced by any of the bacterial genomes recovered from capybara gut microbiome, in agreement with the high abundance of this metabolite in both cecal and rectal samples (Table S2). Butyrate is known to be produced mainly by Firmicutes and the analysis of the key genes involved in the final steps of this pathway including butyryl-CoA:acetate CoA-transferase atoA/D genes, butK and ptb genes encoding butyrate kinase (EC 2.7.2.7) and phosphotransbutyrylase (EC 2.3.1.19), respectively, showed that Firmicutes Ileibacterium sp. MAG6 and Megasphaera sp. MAG33 are likely the major butyrate-producing bacteria in the capybara gut since they present the highest expression of atoA/D genes (Figure S3 and Table S3). Other bacteria, for instance the Bacteroidetes Marinilabiliaceae MAG47 and Fusobacteria MAG38 and MAG39 also presented co-localized genes atoA/atoD and ptb/butK, suggesting that they may also contribute to butyrate production (Figure S3 and Table S3).
In order to verify the distribution of the pathways for propionate production within the capybara gut microbiota, key genes from each pathway (acrylate, propanediol or succinate) were analyzed (25). Lactoyl-CoA and propane-1,2-diol, intermediates from acrylate and propanediol pathways respectively, were not identified in the metabolic reconstruction of any of the genomes recovered from capybara gut (Fig. 2). On the other hand, the succinate pathway, assessed by the mmdA gene encoding methylmalonyl coA decarboxylase, was widespread mainly among Bacteroidetes, but also detected in some Firmicutes and Fusobacteria genomes (Figure S3 and Table S3), indicating that the main substrate used by capybara gut microorganisms to propionate production are probably hexoses and pentoses. Furthermore, the proportion of propionate detected in the gut capybara gut correlates (R = 0.77 and p = 0.07) with the relative abundance of Bacteroidetes, reinforcing that succinate pathway of this phylum is the major source of propionate production in capybara gut.
A few gut microorganisms are known to produce both propionate and butyrate, such as Roseburia inulinivorans, Coprococcus catus and Eubacterium hallii (26, 27). Other microorganisms able to produce acetate, butyrate and propionate as metabolic end products are Megasphaera sp. NM10, BL7 and M. elsdenii (28). According to metabolic reconstruction analysis, butyrate and propionate were predicted to be present concomitantly in 15 genomes (Fig. 2) and Megasphaera sp. MAG33 shares ci. 95% identity to the ruminal M. elsdenii suggesting similar metabolic capabilities. These observations reinforce the idea that the capybara microbiome is a promising source of novel species with diversified metabolic functions, with great potential for the breakdown of dietary structural carbohydrates as the high SCFA production are common markers of digestion performance of recalcitrant plant fibers (29).
Capybara gut microbiome strategies for the breakdown of dietary polysaccharides
The capacity of capybara to convert lignocellulosic materials into SCFAs is determined by the genomic potential associated with Carbohydrate-Active enZymes (CAZymes) of the gut microbiota. A total of 6,132 putative CAZymes encoding genes from 105 Glycoside Hydrolases (GH) and 10 Polysaccharide Lyases (PL) families were identified, of which 456 genes presented a modular architecture (Figure S4 and Table S4). The most abundant CAZymes identified are plant cell wall-degrading enzymes from families GH3, GH2 and GH1 (by decreasing abundance) that encompass diversified activities including β-glucosidases, β-xylosidases, β-galactosidases and β-mannosidases, among others. These enzymes are often associated with the later steps in the degradation cascade of several plant polysaccharides such as cellulose, heteroxylans, mixed-linkage β-glucans and β-mannans. Moreover, it has already been reported that these families are highly abundant in several host-associated gut microbiomes such as human, mouse, swine, and cattle rumen (30), probably due to their broad functions.
As sugarcane is part of the capybara diet dwelling Brazil Southeast region, it was expected that its microbiota would be able to use the easily metabolizable sugar sucrose. In capybara gut CAZymes arsenal, invertases from GH32 family were identified in a proportion of approx. 1.5%, which is similar to that reported for several gut microbiomes from ruminants to humans (30, 31). It is worth to mention that in the sequenced genome of capybara itself there is no gene encoding GH32 invertases, which holds for all mammals sequenced to date. Further analysis of MG and MT datasets, revealed a high abundance of GH32 enzymes in MG only (Figure S4), which led us to the hypothesis that, although the gut microbiome has the genomic potential to metabolize sucrose, the capybara was digesting more recalcitrant components of its diet at the time of sample collection.
One of the main dietary polysaccharides of capybara is cellulose, which is highly resistant to microbial degradation due to its chemical and structural organization along with numerous intermolecular interactions with a complex matrix of hemicelluloses, pectins and lignin. Neither cellulases from families GH6, GH7 and GH48, nor cellulosomes, assessed by the presence of cohesin and dockerin domains associated with cellulases, could be identified in capybara gut MG or MT datasets. This suggests that cellulose degradation in the capybara gut may be accomplished by endo-β-1,4-glucanases (EC 3.2.1.4) from families GH5 (subfamilies GH5_2, GH5_4, GH5_25 and GH5_37), GH8, GH9 and GH45, which were detected either as single domains or in multi-modular protein architectures. Interestingly, the most expressed genes putatively encoding endo-β-1,4-glucanases detected in capybara gut microbiome belong to families GH5_2, GH8, GH9 and GH45 and were recovered from Fibrobacter genomes (Figure S5 and Table S5), indicating that these bacteria may be the major contributors to cellulose degradation in the capybara gut. Fibrobacter succinogenes is known as a highly efficient cellulolytic bacterium in the cow rumen (32). It is proposed that F. succinogenes utilizes a multi-protein complex to attach to cellulose fibers and secretes cellulases by the T9SS-dependent secretion system to enable cellulose breakdown into cellodextrins, which then would be imported into the periplasm for further degradation and utilization (33). The three Fibrobacter genomes recovered from capybara gut microbiome encode cellulases with a T9SS signal sequence as well as proteins for cellulose adhesion including tetratricopeptide, fibro-slime, OmpA and pilin proteins, as reported for F. succinogenes (33). Furthermore, from the set of 347 proteins observed in the outer membrane vesicles (OMVs) from F. succinogenes (34), we have identified 262 with sequence identity ranging from 30–99%. These observations suggest that typical Fibrobacter mechanisms, fundamentally relying on cell surface adhesion and OMVs, are central for cellulose degradation in the capybara gut.
Hemicelluloses and pectins are also important polysaccharides in the diet of capybaras and 30 Bacteroidetes genomes were recovered from capybara gut microbiota. Bacteroidetes are known to possess highly diversified carbohydrate degradation capabilities, many of them encoded as polysaccharide utilization loci (PULs), which are clusters of genes encoding CAZymes, SusCD-like transporter and regulators. Around 120 predicted PULs and 150 Clusters of CAZymes (CCs) were identified in our Bacteroidetes MAGs (Extended Data Fig. 1 and Table S6), and were compared to literature-derived PULs available in the PULDB database (35). PULs probably involved in the degradation of xylans and arabinoxylans – polysaccharides highly abundant in grasses including sugarcane – were identified in the genomes of B. heparinolyticus MAG 61 and Bacteroidota bacterium MAG40 (Fig. 3A), resembling PULs from B. ovatus (36). The strategies for the breakdown of mixed-linkage β-glucans are highly conserved in capybara and human microbiomes, with an exact same PUL organization encompassing GH16 and GH3 enzymes (Fig. 3A) (37). PULs involved in xyloglucan (XyG) degradation, a more recalcitrant hemicellulose, were identified in the Bacteroidaceae bacterium MAG53, featuring core hydrolases from families GH5_4, GH31 and GH9 (Fig. 3A). In B. ovatus, the XyG-PUL encodes other enzymes from GH43, GH3 and GH2 families (38), which were also detected in MAG53, albeit in distinct genomic regions. These enzymes may function as escorts for a complete depolymerization of XyGs similar to that reported for the saprophyte Cellvibrio japonicus (39). PULs predicted to act on mannose-containing glycans were also identified in the capybara gut microbiome (Fig. 3A), conserving the core genes GH26 (endo-β,1-4-mannanases) and GH130 (β-1,4-mannosylglucose phosphorylases) as described for the human gut bacteria B. fragilis (40). Furthermore, a set of different PULs putatively enabling the degradation of other polysaccharides such as starch and pectins, were identified mainly present in Bacteroidaceae genomes (Fig. 3A and Table S6). For instance, PUL54 from Bacteroidaceae bacterium MAG51 involved in the degradation of homogalacturonan, a key component of sugarcane cell wall pectin (41), comprising enzymes from families GH105, GH43_10 and GH28 (Fig. 3A and Table S6) resembles the corresponding PUL from B. ovatus (36). However, a clear target substrate could not be defined for a large fraction of PULs predicted from Capybara gut microbiome (Table S6), in part due to intrinsic limitations of genome reconstruction from metagenomes, but also reflecting the variability, heterogeneity and insufficient knowledge of the structure and composition of the glycans present in the diet of wild capybaras. Nevertheless, our analyses highlight the importance of the Bacteroidetes phylum in the Capybara gut providing a diverse arsenal of enzymatic systems for the degradation and utilization of the main components of dietary carbohydrates.
Taken together, our results demonstrate that the capybara gut microbiota preferentially exploits a combination of free enzymes (rather than cellulosomes) containing a catalytic module either isolated or appended to CBMs or other catalytic modules to deconstruct dietary polysaccharides with a biochemical diversity provided by Bacteriodetes PULs/CCs and with Fibrobacter genera as workhorses for cellulose breakdown.
A new partner for an old acquaintance in heteroxylan degradation
Among the genomes recovered from capybara gut microbiome, Prevotella sp. MAG57 is the one with the largest number of CAZyme-encoding genes (Fig. 3B and Table S6). Phylogenetic analysis and whole genome comparison indicated that MAG57 is closely related to other uncultured genomes from the Prevotella genus recovered from capybara and from the UBA project (42) from sheep, elephant and mice gut (Fig. 6A). Regarding sequence-based genomic comparisons, MAG57 has an average nucleotide identity (ANI) of 75% but with an alignment fraction < 60% to genomes selected across Bacteroidetes phylum, and thereby it most likely corresponds to a novel species (Figure S6B). Many different PULs and CAZyme clusters organizations were identified in MAG57, probably involved in the degradation and utilization of hemicelluloses and pectins (Table S6). In particular, a gene cluster with predicted GH10, GH43 and GH97 members drew our attention as putatively acting on arabinoxylans, an abundant hemicellulose in secondary cell walls of sugarcane and other grasses. In particular, its GH10 member appear to contain an unknown N-terminal domain extension with a predicted mass of approx. 45 kDa (Fig. 4A). Sequence analysis showed that this unusual N-terminal domain is also present in Bacteroidetes species derived from human, mouse, and elephant gut-associated species (Table S7). However, it displays no similarity to domains typically associated with GH10 members such as xylan-binding CBM22 and xylanase-specific CBM9.
To evaluate the function of this unconventional GH10 member (CapGH10), the full-length protein and its domains along with other GH members of the CC102 cluster were recombinantly expressed and characterized. The GH97 member (CapGH97) is a calcium-activated α-galactosidase, whereas the GH43 member is a highly active α-L-arabinofuranosidase (Figure. S7-S8 and Table 1), two critical activities to remove decorations of heteroxylans. The later belongs to subfamily GH43_12 and showed low sequence identity to other structurally characterized GH43 members [~ 34% with Bacteroides ovatus GH43a, PDB 5JOW (43)]. Structural elucidation by SeMet phasing (Table S8) revealed a two-domain architecture with a β-sandwich accessory domain tightly bound to the catalytic domain (Figure S8D). Distinct to all other GH43_12 members structurally characterized so far, in which the β-sandwich domain is composed only by C-terminal β-strands, the GH43_12 structure herein elucidated shows an N-terminal β-strand that integrates with C-terminal β-strands to form the β-sandwich domain (43–45) (Figure S8 D). It indicates a further level of structural complexity within the GH43 family that might be carefully considered when designing constructs and chimeras involving these instrumental enzymes for plant polysaccharides depolymerization. Structural comparisons with other GH43_12 arabinofuranosidases showed a highly conserved active-site pocket including all residues comprising − 1 subsite, which is in agreement with the specificity and action mode of CapGH43_12 (Figure S8 E-F).
Table 1
Kinetic parameters of CAZymes heterologously expressed in E. coli BL21.
Protein ID | CAZy family | Substrate | pH | T (°C) | KM | kcat (s− 1) | kcat/KM |
09512 | GH97 | pNP-α-D-Gal | 7.0 | 35 | 8.43 ± 0.57 (mM) | 34.1 ± 0.98 | 4.05 |
09513 (full-length) | GH10 | Rye arabinoxylan | 5.5 | 50 | 2.14 ± 0.44 (mg/mL) | 127.7 ± 16.3 | 59.67 |
09513 (GH10 domain) | GH10 | Rye arabinoxylan | 5.5 | 55 | 1.93 ± 0.08 (mg/mL) | 180.1 ± 5.3 | 93.31 |
Xylan | 5.5 | 55 | 1.69 ± 0.08 (mg/mL) | 160.6 ± 5.22 | 95.03 |
09514 | GH43_12 | α-L-arabinofuranoside | 6.5 | 35 | 2.74 ± 0.29 (mM) | 151.19 ± 6.21 | 55.18 |
44807 | GHXXX | pNP-β-D-Gal | 7.5 | 45 | 0.57 ± 0.05 (mM) | 17.6 ± 0.39 | 30.88 |
CBK67650.1 β-Gal Domain | GHXXX | pNP-β-D-Gal | 7.5 | 45 | 1.19 ± 0.35 (mM) | 29.85 ± 1.95 | 25.08 |
The GH10 domain of the CapGH10 protein was shown to be an endo-β-1,4-xylanase active on beechwood xylan and several arabinoxylans including high viscosity rye flour arabinoxylan (33 cSt), low viscosity wheat flour arabinoxylan (8 cSt), acid debranched wheat arabinoxylan (26% Ara and 22% Ara) and enzyme debranched wheat arabinoxylan (30% Ara). Kinetic analyses indicate that decorations present in rye arabinoxylan (arabinose/xylose ratio = 40/60) are not detrimental to the enzyme catalytic performance, exhibiting similar Km and kcat constants compared to xylan (Table 1 and Figure S9). The Xyn10Z enzyme from Hungateiclostridium themocellum ATCC 27405, sharing 36% of sequence identity with CapGH10, is the closest characterized member so far, with high activity on xylan (46). The N-terminal region of Xyn10Z comprises a feruloyl esterase followed by a CBM6 domain, both of which are not present in CapGH10 (47). The CapGH10 N-terminus, comprising approximately 500 residues, showed only sequence similarity with uncharacterized proteins, with the closest homologs mostly presenting a GH10 domain with sequence identity around 37–44%, and further hypothetical proteins without the GH10 module but with a T9SS signal domain sharing ca. 30% sequence identity. Homologs with similar domain architecture, attached to the N-terminus of a GH10 module, were found in PULs from ruminal Prevotella sp. such as Prevotella sp. BP1-148, Prevotella sp. BP1-145, Prevotellaceae bacterium HUN156 and Prevotellaceae bacterium MN60. These PULs further comprise members from families GH97, GH43_29 + CBM6 and CE1 + CE6 + CBM48, and are likely targeting xylan-related polysaccharides.
The potential enzymatic activity of the isolated N-terminal domain of CapGH10 was assessed for over 30 different substrates including synthetic substrates, oligosaccharides, and polysaccharides (Supplementary Table 9), but no (hydrolase, lyase or esterase) activity was observed. Typical activities involved in heteroxylans breakdown including endo-β-1,4-xylanase, β-xylosidase, α-L-arabinofuranosidase, α-D-galactosidase, α-D-glucuronidase, 4-O-methyl-glucuronoyl methylesterase, feruloyl esterase and acetyl xylan esterase were assayed by distinct methods without the detection of product formation or substrate consumption. Under this perspective, we further interrogated the capacity of this N-terminal domain to bind potential substrates of its GH10 partner such as beechwood xylan and arabinoxylans using affinity gel electrophoresis (AGE). As shown in Fig. 4C, this domain can indeed interact with the substrates of the GH10 domain, suggesting that this N-terminal domain may target the CapGH10 catalytic domain to xylan polysaccharides (Fig. 4C).
To get further insights into the potential role of this unconventional N-terminal domain, its crystallographic structure was solved by SeMet phasing at 1.8 Å resolution (Table S8). The domain exhibits a parallel right-handed β-helix fold, consisting of 14 complete helical turns with two main short helices protruding from the β-helix backbone (Fig. 4B). The 14 helical turns are twisted and curved with a calcium ion between the 11th and 12th turns in an octahedral coordination sphere (Fig. 4B). This β-helix fold is observed in the clan GH-N of the GH superfamily, in the carbohydrate esterase CE8 and in several polysaccharide lyase (PL) families; however, structural comparisons with these CAZy families led to high rmsd values (> 3 Å), indicating poor three-dimensional conservation (Table S10). Despite that, structural superpositions were performed with CAZy families (GH28, GH91, PL6 and CE8) as an attempt to identify similarities of CapGH10 β-helix domain with the active sites of these enzymes. Neither the catalytically relevant residues nor the active site topology of these families are conserved in the CapGH10 β-helix domain (Extended Data Fig. 2). Besides the lack of all key catalytic residues, a long loop (G126-K140) in the CapGH10 β-helix domain also partially occludes the region corresponding to the active site in the GH28 enzymes (PDB ID 3JUR (48)) (Extended Data Fig. 2A). In comparison to family GH91 (PDB ID 2INU (49)), the two loops critical for catalytic activity, T2 and T3, are absent in CapGH10 β-helix domain (Extended Data Fig. 2B) and in the PL6 family (PL6, PDB ID 6QPS (50)), the Ca2+-binding site essential for catalytic activity is not present in CapGH10 β-helix domain (Extended Data Fig. 2C). Despite there is a cleft-like region in the CapGH10 β-helix domain near to the corresponding active site of the CE8 family (PDB ID 3UW0 (51), Extended Data Fig. 2D), the catalytic residues are not conserved and most residues populating this region in the CapGH10 β-helix domain are not even conserved within homologues, weakening the possibility of this region to be a catalytic center. Moreover, SAXS data (Figure S10) indicated that the CapGH10 β-helix domain is monomeric in solution, unlike the GH28 and GH91 families that rely on oligomerization to be functional. These structural analyses, and the lack of conservation of residues corresponding to the cleft-like region in CapGH10 β-helix domain homologues support the biochemical data that this domain is not catalytically active.
Considering aromatic and acidic residues as important platforms for carbohydrate interaction, mapping of the molecular surface of the CapGH10 β-helix domain led to two potential binding regions, one between turns 1–4 (region I) and another between turns 6–10 (region II). Therefore, residues Y62 and E82 from region I and residues E132, D133, Y193, E225, E247, Y279, E282, D360 and D365 from region II were mutated to alanine (Supplementary Fig. 11). Moreover, one mutation at the calcium-binding site (D344L) was evaluated to address whether calcium ion incorporation could be essential for carbohydrate binding. Mutations E247A and E282A severely impaired protein stability and led to the expression only in the insoluble fraction. Mutation D344L also affected protein stability in a less extent, but the arabinoxylan/xylan binding capacity was preserved (Figure S12). This result indicates that calcium ion has a structural relevance rather than a functional role in carbohydrate recognition. Among the other nine mutants, only Y62A and E82A, affected the migration pattern in AGE assays with beechwood xylan and rye arabinoxylan (Fig. 4C). Both residues are located at the region I, indicating that this region plays a role in carbohydrate binding. It is worth to mention that two aromatic residues located at the corresponding region of the GH28 active site, Y193 and Y279, did not alter the carbohydrate binding, being in agreement with no functional relevance of this region for CapGH10 β-helix domain. Combining the biochemical, structural and mutagenesis analyses, we would define CapGH10 β-helix domain as a CBM, therefore, establishing a novel structural scaffold in this superfamily and founding the new family CBMXX.
Taken together this unprecedented modular endo-β-1,4-xylanase along with the synergistic activities of other CC107 partners, we conclude that this cluster confers the ability to Prevotella sp. MAG57 to act on complex heteroxylans (Fig. 4D), a key function in the gut microbiome of capybara that have grasses as a major component in its diet.
A new GH family mined from the genomic dark matter of capybara microbiome
The combined MG and MT analysis of capybara gut microbiome revealed several expressed genes annotated as hypothetical proteins. Some of these genes presented extremely remote similarity to CAZy members, with percentage of sequence identity ranging from 10 to 20%, suggesting a potential function in the processing of plant polysaccharides, but requiring confirmation by functional investigation (Table S11). Aiming to uncover the activity of these proteins, synthesized ORFs were expressed and subjected to biochemical assays employing a diverse set of synthetic, poly- and oligosaccharides substrates.
One of these proteins (SEQ ID PBMDCECB_44807, named here CapGHXXX) was active on p-nitrophenyl-β-D-galactopyranoside (pNP-β-D-Gal), and its kinetic parameters were determined (Table 1 and Figure S13). CapGHXXX orthologues are present in Actinobacteria, Firmicutes, Verrucomicrobia and mainly in Bacteroidetes genomes recovered from diverse sources such as rumen, feces, gut and oral microbiota (Table S12), being the closest sequence from a rumen-derived genome (UBA2817) from the uncultured RC9 group (42). Sequence analysis showed that CapGHXXX is distantly related to families GH5 and GH30 (Fig. 5A) and protein threading indicates a TIM barrel fold (Supplementary Fig. 14), suggesting that this novel GH family belongs to the clan GH-A. To further explore this GH family, the enzyme CBK67650.1 (SEQ ID BXY_26070) from B. xylanisolvens, which shares 46% sequence identity with CapGHXXX, was synthesized, produced and biochemically characterized (Table 1). This second member also showed β-galactosidase activity that strengthens at biochemical level the establishment of this new GH family.
In the genome of Bacteroidota bacterium MAG42 recovered from Capybara gut, CapGHXXX is found in a putative PUL additionally comprising enzymes from families GH2 and GH78. A similar PUL organization was predicted in the genome of Bacteroidetes sp. 1_1_30 recovered from human gut, which yet harbors enzymes from GH36, CE7 and PL8_2 families. It is noteworthy that CapGHXXX is often found fused appended to a GH36 module or in PULs also having GH36 members such as in B. xylanisolvens and Prevotella dentalis, recovered from stool and oral cavity, respectively (Fig. 5B), indicating a synergistic relationship between these families. Moreover, these families are also commonly found along with GH78 α-L-rhamnosidases in the PUL context. In the genome of the Bacteroidales bacterium UBA2817, a GHXXX member is appended to a GH78 module carrying a CBM67, both targeting rhamnogalacturonans (Fig. 5B). These observations suggest that GHXXX could act on β-linked galactosyl residues in pectic polysaccharides. Further studies in the PUL context are required to shed light on their biological role in complex gut environments.