Pan-genome construction and analysis for S. gallolyticus strains
In this present study, the available genomic datasets of 31 S. gallolyticus strains were obtained, and the genomes and their corresponding protein sequences were collected. The results of the genetic features of S. gallolyticus strains showed that the genomes of 31 S. gallolyticus strains were mainly assembled at the scaffold levels and the number of scaffolds ranged from 1 to 260 (Table 1). Whereas the size of the genome of S. gallolyticus strains ranged from 2.05 to 2.64 Mb (Table 1). The GC content (GC%) ranged from 37.3 to 40.6 and the number of proteins ranged from 1743 to 2424 with an average of 2137(Table 1). In particular, we focused on the isolation source of these 31 S. gallolyticus strains and found that the host of S. gallolyticus strains can be Homo species (23, 74.19%), pig (2, 6.45%), and koala (2, 6.45%). And in the previous study, S. gallolyticus strains were mainly isolated from the rumen, crop, excretory cavity, and human colon, which suggested that the niches of S. gallolyticus strains were primarily the humans, in particular the intestinal tract of humans[30]. And given that the S. gallolyticus strain is an opportunistic pathogen, understanding the genetic features and functional traits of S. gallolyticus strain is essential to investigating the interaction between S. gallolyticus strain and its host. Hence, these genomes and protein sequences of 31 S. gallolyticus strains were selected to conduct the comparative analysis (Table 1).
To explore the differences in genomic features among S. gallolyticus strains, a total of 66,243 high-quality proteins were used to generate the protein orthologues. Thus, a total of 4,606 homologous clusters were generated as pan-genome, which were divided into core, accessory, and unique genes based on their appearance in different genomes (Supplementary Table S1), and 755 homologous clusters were identified as a core genome for S. gallolyticus (Fig. 1a). The results of the gene accumulation curve of the pan-genome and core genome of these 31 S. gallolyticus strains showed that the pan-genome of S. gallolyticus presented as an open pan-genome structure (Fig. 1a). Compared with the accumulation curve of pan-genome, the accumulation curve was under a steady-state approximation. These results suggested that with the emergence of new S. gallolyticus strains, the size of the pan-genome of S. gallolyticus tended to increase gradually, while the size of the core genome decreased gradually. In particular, our results showed that the number of genes/proteins affiliated with the core genome of S. gallolyticus was smaller than that of a previous study[15]. These results suggested that with the increase of the genome of S. gallolyticus, the size of the pan-genome of S. gallolyticus would lead to the expansion and the increase of genomic diversity. Additionally, a total of 755 gene families (contains 16.4% of all clusters, Fig. 1b) existed in all 31 S. gallolyticus strains, and these gene families constituted the core genome. Whereas 707 single-copy genes (contains 94%of all the core genes) and 48 multi-copies gene families were identified for these 31 S. gallolyticus strains. Besides, we found that the distribution of unique genes among 31 S. gallolyticus strains was different. Specifically, the number of unique genes of each S. gallolyticus strain was diverse, ranging from 0 to 63 (Fig. 1b). For example, S.gallolyticus NCTC13767 contained 63 unique genes, S.gallolyticus BI02 contained 41 genes, while 22 S. gallolyticus strains didn’t harbor a unique gene (0 genes). Together, these results suggested that the genetic composition of S. gallolyticus strains is diverse and the genetic variations contributed to the different functional traits of S. gallolyticus strains. Therefore, to obtain an in-depth understanding of the genetic background and functional traits of S. gallolyticus, more S. gallolyticus strains should be separated and sequenced. Overall, our work is the first and most important step to exploring the functional traits and variants of S. gallolyticus, suggesting that the variety of unique genes may be very crucial factors in the evolution of the S. gallolyticus strains to produce different evolutionary diversity.
Phylogenetic trees were constructed by two strategies for revealing the evolutionary relationship of S. gallolyticus strains
Two different phylogenetic trees were constructed with two strategies to explore the phylogenetic relationship of 31 S. gallolyticus strains. First, based on the presence or absence of the 4,606 nonredundant genes in 31 S. gallolyticus strains, the Manhattan distance was calculated and then dendrogram to quantify their evolutionary relationship. Our results showed that S. gallolyticus strains from different ecological niches, including different isolated sources, disease statuses, and geographical origins, were mixed in each clade (Fig. 1d), suggesting that the pervasiveness of S. gallolyticus strains in different environmental niches and the transmission of S. gallolyticus strains between different ecological niches is possible. Second, the core genome phylogenetic tree of the 31 strains was constructed based on the Manhattan distance in R and concatenated alignments of 755 core genes shared by the 31 S. gallolyticus strains (Fig. 1c).
In comparison with these two trees, we found that there were similarities and inconsistencies in the evolutionary relationship of 31 S. gallolyticus strains. Specifically, these two phylogenetic trees showed that S.gallolyticus VTM3R24 and S.gallolyticus VTM3R42 were presented in the same phylogenetic branch (Fig. 1c-1d), suggesting that these two strains have a close evolutionary relationship, which is consistent with the distribution of unique genes of these two S. gallolyticus strains (Fig. 1b). Besides, we found that inconsistencies in the evolutionary relationship among S. gallolyticus strains are common (Fig. 1c-1d). For example, the phylogenetic tree constructed based on the distribution of the core genome of S. gallolyticus strains showed that S. gallolyticus VTM2R47 has a close evolutionary relationship with S. gallolyticus BI02, S. gallolyticus BSJ27, and S. gallolyticus BSJ31 (Fig. 1c), while phylogenetic tree based on the distribution of pan-genome showed that S. gallolyticus BI02, S. gallolyticus BSJ27, and S. gallolyticus BSJ31 were presented in the same branch and these strains have a far evolutionary with S. gallolyticus VTM2R47 (Fig. 1d). Similarly, the inconsistencies in the evolutionary relationship were also presented in S. gallolyticus NCTC13767 (Fig. 1c-1d), which demonstrated that unique genes mainly influence the evolutionary relationship of S. gallolyticus strains. These results showed that the evolutionary of 31 S. gallolyticus strains is complex and the variations of the phylogenetic relationships among strains of S. gallolyticus were substantially affected by specific genes, but the gain and loss of different genes still accounted for an important proportion of phylogenetic signals and genetic diversity was of great significance in evolution.
Identification of CAZyme from S. gallolyticus strains
Previous studies have reported that S. gallolyticus upregulates the expression of genes involved in carbohydrate metabolism [31]. However, the specific metabolic pathways related to carbohydrate metabolism in S. gallolyticus strains have not been clarified. To obtain a comprehensive understanding of the carbohydrate metabolism in S. gallolyticus strains, the representative proteins of 4606 homologous clusters of S. gallolyticus strains were annotated to dbCAN database to identify the profile of CAZyme. As a result, a total of 182 orthologous genes were detected as CAZymes and divided into 6 CAZyme families for S. gallolyticus (Fig. 2a, Supplementary Table S2) including AAs, CBMs, CEs, GHs, GTs, and SLHs. Moreover, we found that the sources of these 182 orthologous genes were different. Among these 182 orthologous clusters, 36 and 146 genes were derived from core and accessory genes, respectively. Specifically, 36 orthologous genes belonging to core genes were divided into 6 CAZyme families, while 33 single-copy genes were divided into 5 CAZyme families, including CBMs, CEs, GHs, GTs, and SLHs (Supplementary Fig. 1). Moreover, 58 orthologous genes belonging to accessory genes were divided into 5 CAZyme families, namely AAs, CBMs, CEs, GHs, and GTs (Fig. 2b). These results suggested that the different potential metabolic abilities of carbohydrates for S. gallolyticus strains were dependent on the component of core genes and accessory genes.
Based on the annotated results of CAZyme for S. gallolyticus strains, we found that the distribution of CAZymes families in each S. gallolyticus strain was different, which revealed that their diverse metabolic ability of carbohydrates and a majority of CAZyme families were classified into GHs and GTs, especially the GH1 and GT2 (Fig. 2a-2b). In general, glycosidic hydrolase (GH) is an enzyme with wide application, which can catalyze the hydrolysis of glycosidic bonds and performed single-substrate reactions on simple and inexpensive substrates [32]. As to S. gallolyticus strains, four kinds of GHs, namely GH1, GH2, GH23, and GH32, were detected from the core genes and accessory genes (Fig. 2b-2c). GH1, as an enzyme mainly derived from Eubacteria (approximately 90%) [33], has been proved to possess the most characterized β-glucosidases [34], which revealed that S. gallolyticus strains harbor the potential to remove the nonreducing terminal glucosyl residues from saccharides and glycosides [34]. Furthermore, three kinds of GTs, namely GT1, GT2, and GT4, were detected from the core genes and single-copy genes for S. gallolyticus strains (Fig. 2b, Supplementary Fig. 1). To our knowledge, GTs are key enzymes for both the production of various carbohydrate-containing structures and important catalysts for obtaining complex oligosaccharides and glycoconjugates, as well as sugar modifications of glycoconjugates and cell surface sugars [35]. The presence of GT2 for S. gallolyticus strains revealed that it harbors the metabolic ability and linkage specificity of α-glucosyltransferase, chitin synthase, and cellulose synthase [36]. Together, these results suggested that S. gallolyticus had a special preference for glycosides and played an important role in the formation and modification of glycans and glycoconjugates, in particular, GHs and GTs might play a very important role in the environmental adaptation of these strains.
Functional annotations for S. gallolyticus Strains
To gain a deeper understanding of the functional diversity of 31 S. gallolyticus strains, the representative proteins of S. gallolyticus strains were annotated to the Clusters of Orthologous Groups (COGs) database with functional gene ontology categories [37]. In the present study, 4,606 proteins were annotated to 20 out of 25 functional categories. Specifically, a majority of orthologous clusters of the pan-genome were annotated as the component of “General function prediction only” (R, 13.73%), “Transcription” (K, 11.55%), and “Carbohydrate transport and metabolism” (G, 11.13%, Fig. 3a, Supplementary Table S3). In particular, the annotated results of accessory and core genes were classified into 20 functional categories, whereas that of the unique genome was only divided into 11 functional categories, including “Energy production and conversion” (C, 3.44%), “Cell cycle control, cell division, chromosome partitioning” (D, 0.84%), “Amino acid transport and metabolism” (E, 8.27%), “Carbohydrate transport and metabolism” (G), “Translation, ribosomal structure and biogenesis” (J, 4.86%), “Transcription” (K), “Replication, recombination and repair” (L, 6.97%), “Posttranslational modification, protein turnover, chaperones” (O, 2.17%), “Inorganic ion transport and metabolism” (P, 4.74%), “General function prediction only” (R) and “Signal transduction mechanisms” (T, 4.25%). Besides, the pan-genome analysis of the 31 S. gallolyticus strains revealed that several COG categories were more prevalent in the accessory group than others, but less prevalent in the core group, such as “Amino acid transport and metabolism” (E), “Carbohydrate transport and metabolism” (G), “Transcription” (K), “General function prediction only” (R) and “Function unknown” (S, 8.27%). The COG annotation in the pan-genome of the 31 S. gallolyticus strains, by comparison, showed that the proportion of functional traits in the accessory clusters had a higher percentage than in the core. The result indicates that the main genetic functions of S. gallolyticus strains depend on their accessory genome, while other metabolic functions of S. gallolyticus are determined by differences between the core and the unique clusters (Fig. 3b).
Identification of virulence factors for S. gallolyticus strains to explore its pathogenicity
The S. gallolyticus species, as an opportunistic pathogen, resulted from the various pathogenicity and infection caused by the presence or absence of potential virulence factors. To profile the composition of virulence factors and characterize the relationship between virulence and the evolution of S. gallolyticus, we identified the virulence factors for 31 S. gallolyticus strains by annotating the 4,606 proteins. As a result, 28 clusters related to virulence factors in the pan-genome of S. gallolyticus in totals were annotated, in which 27 presence/absence toxin genes of the 31 S. gallolyticus strains were summarized (Fig. 4a, Supplementary Table S4). Specifically, first, we found that the virulence factors were mainly annotated from the accessory genes (87.8%) and there are no virulence factors in unique genes for 31 S. gallolyticus strains (Fig. 4b). Second, based on the profiles of virulence factors of these 31 strains, we found that S. gallolyticus TM07-4AT and S. gallolyticus VTM3R42 are more pathogenic than other strains because these two strains contain a wider variety of genes involved in virulence (Fig. 4a). Third, we observed that the distribution of 27 toxin-encoding genes was different in 31 S. gallolyticus strains, which contributes to the differences in pathogenicity of S. gallolyticus strains. Such as, we found that bsh and groEL were detected in all 31 S. gallolyticus strains and their number was higher than other toxin genes (Fig. 4c). In particular, the toxin gene bsh gene accounted for the highest proportion of these related virulence factors in S. gallolyticus, which proved to be associated with the bile salts and acute toxicity of bile [38] because bsh genes are thought to protect commensal bacteria from bile salt toxicity and play a key role in bacterial intestinal colonization and survival [39]. It's worth noting that bsh gene belonged to the core genome and the 26 other genes belonged to the accessory cluster (Fig. 4c), which suggested that all S. gallolyticus strains can affect the metabolism of bile. Besides, groEL, as one part of the heat shock protein, plays an essential role in protein folding, intracellular protein transport, and the reaction of denatured proteins. In addition, groEL as an important molecule has been implicated in inflammatory responses and bacterial infections[40], which indicated that the toxin gene groEL can patriciate in many important activities to affect the host. Moreover, cpsA, as a modular protein, was detected in 29 out of 31 strains except for S. gallolyticus VTM3R42 and S. gallolyticus TM07-4AT, which affects a variety of regulatory functions, possibly including capsule synthesis and cell wall-related factors [41]. Fourth, the composition of accessory genes may be very crucial toxin factors, which played a crucial in the evolution of the S. gallolyticus strains. Overall, the results of the virulence factors indicated that the pathogenicity of S. gallolyticus was related to immunity, bile acid metabolism, and membrane synthesis. The pathogenicity of S. gallolyticus strains with multiple virulence factors may be important, and the existence of unique genes may be one of the crucial factors to affect the different pathogenicity of S. gallolyticus strains, which would enhance the evolutionary flexibility of these strains.
Profiles of antibiotic resistance genes for S. gallolyticus strains
In general, pathogenic infections are determined by the pathogenic potential of invading bacteria and their ability to evade host defenses, while the challenge of therapeutically in pathogenic infections comes from antibiotic resistance [42]. To clarify the effects of antibiotic resistance in S. gallolyticus strains, the representative proteins of 4,606 orthologous clusters of S. gallolyticus were systemically annotated. As a result, 28 antibiotic resistance genes and 9 resistance antibiotics were detected for S. gallolyticus strains (Fig. 5a, Supplementary Table S5). We found that ABC_transporter, ArlR, bacA, macB, mefA, mexT genes were presented in all S. gallolyticus strains, which suggested that S. gallolyticus strains harbor the resistance to macrolide resistance, multidrug, beta-lactam, etc.,. Specifically, we identified several kinds of ARGs, includes mefA, lnu(B) tet(L), tet(M) and ermB from S. gallolyticus, and these results were consistent with a preview study, which has proved that this strain harbored a mass of erythromycin and tetracycline resistance genes. Moreover, we found that a large proportion of ARGs (75.4%) was derived from accessory genes of S. gallolyticus strains (Fig. 5b), which suggested that the diversity of resistance to antibiotics were depend on the composition of accessory genes and they are essential to obtain the knowledge of ARGs in each S. gallolyticus strain for curing the infection caused by S. gallolyticus. In addition, 9 antibiotics identified by pan-genomes of S. gallolyticus including sulfonamide, multidrug, tetracycline, bacitracin, aminoglycoside, vancomycin, macrolide, beta-lactam, and unclassified (Fig. 5a, Supplementary Table S6), which was consistent with previous research that S. gallolyticus is associated with a wide range of antibiotic resistance[30]. Among these antibiotics, the abundance of macrolide and multidrug accounted for the higher proportion compared to the abundance of other toxins. It is interesting to note that these two antibiotics were composed of core clusters and accessory clusters (Fig. 5c), and both core and accessory genes played a role in the evolution of antibiotic resistance in S. gallolyticus, which would enhance the evolutionary flexibility in S. gallolyticus strains. Macrolide antibiotics are the oldest and most successful clinical ribosomal targeted antibiotics which target bacterial ribosomes to inhibit protein synthesis [43]. The macrolide antibiotics we detected were consistent with a study that proved clinical isolates of S. gallolyticus harbored macrolide[10]. Moreover, the specific genes didn’t play any role in the antibiotic resistance of S. gallolyticus strains (Fig. 5b). Together, these results showed S. gallolyticus harbored the ability of resistance to antibiotics especially macrolide and multidrug, which suggested that the use of these two antibiotics should be reduced in the clinical treatment of S. gallolyticus.