Modified TSB supports the growth of several bacterial strains from the oral microbiota.
While GAMc was a suitable media for Bacteroides strains, overnight inocula of Fusobacterium, Porphyromonas, Prevotella and Treponema strains in GAMc resulted in undetectable growth (Supplementary Table S1). To improve the growth conditions, TSB medium was selected, supplemented with 5 µg/mL hemin and 2.5 µg/mL vitamin K1. This enhanced the growth of several Fusobacterium species. However, Prevotella, Porphyromonas, and Treponema species could still not grow on this medium. Therefore, TSB supplemented with increased concentrations of hemin was tested, as hemin is an essential growth factor for many Bacteroidota species.[43,44] This proved successful for Prevotella and Porphyromonas species, which grew reliably in TSB supplemented with 500 µg/mL hemin and 2.5 µg/mL vitamin K1, but not for Treponema species (Supplementary Table S1). In fact, Treponema was not able to grow in TSBh50 supplemented with L-cysteine, which can function as an oxygen scavenger, pulling oxygen out of the solution.[45,46]
Antimicrobial interactions
Antimicrobial tests were conducted in TSBh50 (Table 3) and GAMc (Table 4) using bacterial microbiota strains as producers and sensitive strains. The colony antimicrobial assays revealed that in the conditions studied, Bacteroides sp. 4_1_36 (B6) showed antimicrobial activity against members of the Fusobacterium, Porphyromonas and Bacteroides genera but not against the tested Prevotella species (Tables 3 & 4). On the other hand, under these conditions, Prevotella melaninogenica D18, originally isolated from an oral swab from a healthy male patient in Canada[47] inhibited the growth of 10 other strains, including members of Prevotella, Fusobacterium, Porphyromonas and Bacteroides (Table 3). In particular, D18 showed antimicrobial activity against five other oral strains and four gut strains. Interestingly, two of those five gut strains were: B. stercoris DSM 19555 (Bster1) and B. salyersiae CL02T12C01 (Bsal1). Finally, B. fragilis 3_1_12 (Bf1) showed intra-species antimicrobial activity,[48] inhibiting the growth of other B. fragilis strains, while unable to inhibit the growth of other members of the same genus, except for B. stercoris DSM 19555 (Table 4). The antimicrobial activity found in B6, D18 and Bf1 against B. stercoris DSM 19555 and B. salyersiae CL02T12C01 is very relevant since these species had been correlated to the progress of MAFLD.[49] Therefore, finding bacterial strains able to impair their growth might prove helpful in developing new ways of dealing with MAFLD.
Following the results from the antimicrobial assays, several strains were selected to further study their influence against known pathogens. However, the antimicrobial test showed no inhibitory effect (Supplementary Table S2).
The antimicrobial screenings (Tables 3 & 4) revealed P. melaninogenica D18 as a strain with broad-spectrum antimicrobial activity, and both B. fragilis 3_1_12 (Bf1) and Bacteroides sp. 4_1_36 (B6) as strains with a narrower antimicrobial spectrum. Moreover, all three strains showed activity against B. stercoris DSM 19555, a strain related to MAFLD progression.[49] Therefore, these strains were selected for antimicrobial cluster mining to identify the putative biosynthetic gene cluster (BGC) involved in the halo formation.
Biosynthetic gene cluster analysis reveals two novel putative antimicrobial clusters in Bacteroides sp. 4_1_36
Genome mining of B6 with BAGEL4 revealed one BGC of interest (Figure 2A), termed BGC1 hereafter. BGC1 encodes 17 ORFs, including one sequence predicted to be rSAM-modified_RiPP_057 (A1), which belongs to the TIGR04149 protein family or peptides associated with peptide-modifying radical SAM enzymes. Peptides in this family present a characteristic modular sequence, including a leader sequence with a conserved consensus N-terminal region (MKKLKKLKL), a conserved Gly-Gly cleavage motif, after which a Cys-rich-15-residue sequence follows. Interestingly, the identified peptide from BGC1 presented a similar consensus N-terminal sequence (MKKLGKIKL) and a double glycine motif. However, besides the CXCXC motif recognised by rSAM, the identified core peptide sequence was longer than canonical core peptides from the TIGR04149 protein family, consisting of 41 amino acids. Moreover, the predicted core peptide contained a second GG motif in the C-terminal region (Figure 2B). The presence of two rSAM-related ORFs in BGC1 suggests the modification of this peptide by these rSAM proteins. In fact, many enzymes of the rSAM superfamily are recruited for PTMs of RiPPs, which usually enhances the peptide’s stability and substrate recognition and/or is critical for their activity.[23–25,29,30,50]
BGC1 presented two ORFS possibly involved in peptide modification: a rSAM peptide maturase (HMPREF1007_03210, UniProt: E5VEM4) and a pseudo-rSAM (HMPREF1007_03209, UniProt: E5VEM3). Enzymes from the rSAM superfamily catalyse a wide variety of reactions involving the creation of free radical intermediates. To identify the function of these enzymes in BGC1, both protein sequences were retrieved and used as a query in RadicalSAM.org (https:// radicalsam.org/)[40], a web-based tool developed by Gerlt and co-workers to help in the identification and interpretation of rSAM sequences. The rSAM peptide maturase protein sequence (E5VEM4) diverged from cluster-1-1 at cluster-1-1-4 with an Alignment Score (AS) of 45; however, cluster-1-1-4:45 contains rSAM sequences with low sequence similarity (Supplementary Figure S1). Therefore, a more stringent alignment search was applied (AS=50). This revealed a tight cluster of nodes, suggesting that the encoded proteins are interrelated and could perform the same chemistry (Supplementary Figure S2A). Besides the identified cluster, sequence alignment showed two length peaks (Supplementary Figure S2B) belonging to hits with low sequence similarity (Supplementary Figure S2C), indicating that E5VEM4 could be further separated.
Further analysis of the E5VEM4 protein sequence with increased alignment scores resolved cluster-1-1-556:110 (Figure 3). Genome Neighbourhood analysis revealed that rSAM sequences belonging to cluster-1-1-556:110 were spread between 9 Parabacteroides and 24 Bacteroides species (Figure 4). Moreover, protein alignment revealed a separation from sequences belonging to B. fragilis from B. uniformis, Bacteroides sp. and from Parabacteroides sp. and P. distasonis. All sequences presented the CX3CX2CX motif (Supplementary Figure S3A) and a SPASM/twitch domain albeit interspecific variations (Supplementary Figure S3B).
While a high AS helps in sorting the query sequence into clusters containing isofunctional rSAMs, there were no annotated protein sequences in the identified cluster-1-1-556:110. Preventing associating a known functionality from previously annotated sequences in the same cluster as E5VEM4. Hence, previous clusters were explored until annotated protein sequences were found. Cluster 1-1-4:45 contained 8 annotated sequences (Supplementary Table S3). Thus, the annotated protein sequences were retrieved and aligned with the two rSAM protein sequences from BGC1 (E5VEM3 and E5VEM4) using MUSCLE (Figure 5).
Annotated sequences were labelled as anaerobic sulfatase-maturating enzymes. Previously, other peptide maturase proteins have been described as dual-substrate enzymes involved in the PTM of a cysteine or serine residue in the target sequence.[51,31] Similar rSAM ORFs from the Bacteroidota phylum have been proposed to fulfil a similar activity[52] despite the low sequence similarity between previously annotated rSAM proteins within the cluster-1-14:45 and E5VEM4 (Supplementary Figure S1 and Figure 5). Enzymes containing the SPASM domain have been described to form Cα-S bond formation (AlbA, ThnB, ThrC/C, SkfB), C-C bond formation (Pqqe, StrB), epimerisation (PoyD), and Decarboxylation and C-C bond formation (MftC). We hypothesised that E5VEM4 could perform a similar chemistry as other anaerobic sulfatase-maturating enzymes within the radical SAM cluster, catalysing the PTM modification of a serine or cysteine into a 3-oxoalanine, also known as C(alpha)-formylglycine (FGly). Based on the amino acid sequence of the putative peptide in BGC1, E5VEM4 could belong to the “Ser-type” sulfatase maturation enzyme family. The presence of several serine residues in the absence of cysteine residues outside the recognition motif would align with our hypothesis. However, other maturases such as CteB[50] and Tte1186[53] maturases have been characterised to perform γ-thioether linkages between Cys and Thr residues. The presence of several Cys and Thr residues on the precursor peptide suggests that the rSAM from BGC1 could also form such thioether linkages.[52]
Further analysis of the Genome Neighbourhood revealed that out of the 36 clusters identified, 28 clusters showed a 60 to 74 aa ORF that belonged to IPR026408 (GG_sam_targ_CFB) rSAM-modified peptides, supporting the role of rSAM peptide maturases in the modification of RiPPs. Protein alignment of these peptides showed high intraspecific similarity, where peptides from Bacteroides fragilis clustered separately from Parabacteroides species and from Bacteroides sp. and Bacteroides uniformis (Figure 6). Interestingly, while all clusters identified contained at least one rSAM ORF (Supplementary Figure S3), only peptides from the identified Bacteroides sp. and Bacteroides uniformis species showed the CXCXC motif recognised by rSAM. These Gly-rich peptides may form a new subclass of bacteriocins and may be useful in discovering novel strategies for peptide modification and/or microbiota modulation.
Leader peptide removal can be coupled with peptide export by some class I lanthipeptides and most class II lanthipeptides,[54] in the form of PCAT enzymes (Figure 2). LanT enzymes are bifunctional proteins containing an N-terminal peptidase C39 domain coupled to a type-1 transmembrane domain and a C-terminal P-loop in charge of nucleoside triphosphate hydrolysis. These ABC transporters cleave the leader peptide from the core peptide at the Gly-Gly motif and transport the peptide across the membrane.[54] However, because of an outer membrane (OM) in Gram-negative bacteria, transport of the cargo from the periplasm to the exterior of the OM requires the presence of additional export proteins. For instance, a sophisticated export mechanism was proposed for pinensins, an anti-fungal peptide proposed by Mohr and colleagues in the Gram-negative Chitinophaga pinensis[55]. The post-translationally modified pinensins are cleaved while exported to the periplasm by a LanT enzyme (PinT). Subsequently, the active peptide is exported through the OM by a TolC-dependent efflux pump[55]. Tripartite efflux pumps have been widely described in the literature,[56–58] and, in fact, a similar cluster architecture is present in BGC1 (Figure 2). In the case of the BGC1 cluster, the LanT encoded by HMPREF1007_03208, downstream the rSAM ORFs, is followed by a periplasmic adaptor HlyD-like encoded by HMPREF1007_03207. Based on the similar cluster architecture, we propose a similar export mechanism: the leader peptide is cleaved, and the core peptide is transported across the membrane by LanT, where the HlyD-like periplasmic adaptor establishes and stabilises the interaction with the OM exporter (TonC). Although we did not identify the TolC OM exporter directly in the BGC1 cluster, the lack of a tonC ORF in BGC1 is consistent with other observations where only the adaptor and transmembrane pump would be present in a cluster[52,56]. TonC can be encoded separately since it has various interactions with other transporters and adaptors, providing functional diversity[57]. Finally, the TonB-dependent receptor (HMPREF1007_03211) could be involved in sensing the extracellular rSAM-modified RiPPs to control and feedback its expression. Furthermore, upstream BGC1, there are five ORFs involved in transport, encoding two drug efflux proteins, one outer membrane efflux protein, an AcrB/AcrD/AcrF family protein and an efflux RND major facilitator protein (MFP), which could alternatively be involved in peptide export through the OM (Supplementary Figure S4, Figure 7). In fact, these transport-related ORFs could also be involved in the transport of two additional putative peptides located upstream.
BGC2 contains 21 ORFs, including two sequences belonging to the PF14055 protein family. Protein sequences in this family present a conserved NVEALA motif, NIEALA, as of in peptides from BGC2, preceding a Gly-Gly motif. These peptides also showed a conserved N-terminal region, consistent with the consensus motif from rSAM-modified peptides (TIGR04149 and IPR026408). Moreover, peptide A2 presents the signature motif KXXXW, recognised by a rSAM enzyme from family TIGR04080, which, in Streptococcus thermophilus, catalyses the cyclisation between Lys and Trp.[21,59] However, the putative peptide sequences did not show a CXCXC recognition motif in their core peptide region despite the presence of an rSAM OFR in the cluster. In fact, the rSAM (E5VER0) aligned within the Megacluster-2-4-1: Elongator protein-like (Supplementary Figure S5). These protein clusters showed low sequence similarity, and while there were no annotated protein sequences, one sequence, P0ADW6, was identified as an iron-sulphur protein, [60] which can cleave S-adenosyl-L-methionine into methionine and 5'-deoxyadenosine (AdoH). While the rSAM ORF from BGC2 did not contain the consensus CX3CX2C or the SPASM/twitch motif, it cannot be ruled out that it could be involved in peptide modification by other mechanisms involving alternative peptide recognition sequences.
Interestingly, analysis of the protein sequences from the NVEALA protein family (namely, A1 and A2 from BGC2) by the Domain Architecture Retrieval Tool (CDART) revealed several protein families containing the NVEALA domain followed by either a peptidase M76 domain, a LemA domain, a TolB-like domain or a thioredoxin-like domain, plus an additional protein family containing a rSAM domain followed by a NVEALA domain. This could suggest that these peptides are involved in regulatory functions attached to the N-terminal regions of selected proteins.
Biosynthetic cluster analysis of Bacteroides fragilis species reveals a conserved cluster architecture.
Genome mining with BAGEL4 of B. fragilis 3_1_12 (Bf1) revealed 2 antimicrobial clusters, BGC3 (Figure 8A) and BGC4 (Figure 8B).
Interestingly, clusters from Bf1 contained peptides with an N-terminal region similar to those present in B6; however, several LanC-like ORFs were present in Bf1. Lanthionine synthetase (LanC) proteins are implicated in the post-translational modification of lantibiotics, and they conform a vast protein family with broad substrate specificity, largely distributed across eukaryotic and prokaryotic phyla.[61] Particularly, in class I lanthipeptides, after the dehydration of the Ser and Thr residues in the core region of the peptide via the lanthionine dehydratase (LanB), the closing of the lanthionine ring is achieved following a Michael-type addition of Cys residues onto the modified Ser and Thr residues.[28,52] Although LanC-like proteins form a vastly diverse group of proteins, they share highly conserved features. In the case of the well-characterised nisin-modifying NisC[62], this enzyme presents a highly conserved triad of residues involved in coordinating the zinc ion: Cys284, Cys330, and His331, as well as two other residues that are conserved among the LanC cyclases: His212 and Arg280. However, LanC sequences from Bf1 showed no conservation in these residues (Supplementary Figure S6). This may suggest that these LanC-like ORFs lack the functionality associated with LanC enzymes in the biosynthetic process of class I lanthipeptides. However, the presence and number of LanC-like ORFs in the different B. fragilis strains in this study (Supplementary Figure S7) and across the Bacteroidota phylum (data not shown) suggest a conserved function.
Finally, BGC3 and BGC4 and the various BGCs identified in the other B. fragilis strains used in this study (Supplementary Figure S7) contain one or two rSAM-related ORFs. These rSAM ORFs belong to the same Megacluster-1-1: SPASM/twitch domain containing as the rSAM ORFs from BGC1 (Figure 5). This, together with the presence of the CXCXC rSAM recognition motif in the sequences of the putative antimicrobial peptides from both BGC3 and BGC4, suggests that these rSAM ORFs perform a similar function, catalysing the transformation of a Ser residue to 3-oxoalanine.
Biosynthetic cluster analysis fails to reveal antimicrobial clusters in P. melaninogenica D18.
Despite the broad antimicrobial capabilities of P. melaninogenica D18, antimicrobial cluster mining of this strain showed no areas of interest in both BAGEL4[37] and PRISM[63]. AntiSMASH[38] analysis also resulted in no regions containing antimicrobial clusters. However, a manual search of regions adjacent to a saccharide cluster resulted in the identification of a potential antimicrobial cluster, BGC5 hereafter (Figure 9A).
BGC5 encodes 14 open reading frames (ORFs) comprising 1 putative peptide sequence and 3 transporter ORFs. BGC5_A1 showed little sequence similarity to peptide sequences identified in B6 in this work (Figure 9B). Hence, to investigate the presence of sequences similar to this putative antimicrobial peptide, its sequence was retrieved as an amino acid FASTA file and similar sequences were mined from genomes deposited in NCBI using BLASTP. A total of 47 amino acid sequences of similar proteins were found within members of the Prevotellaceae family and 1 belonging to Solobacterium sp. from the Bacillota phylum (Figure 10). Peptide sequences presented a similar N-terminal sequence to the consensus MKKLKKLKL sequence and a Gly-Gly motif, followed by a second double Gly motif in some sequences. Despite these similarities, none of these sequences showed the CXCXC rSAM recognition motif. These sequences present a somewhat conserved Gly-Gly motif followed by a conserved PNEKNQDDIDT domain prior to a second conserved Gly-Gly motif (Figure 10). This, together with a highly conserved leader peptide region, suggests that these peptides, present in various Prevotellaceae species, are subjected to PTMs, although their nature remains unknown.