Taxonomic status of B. catenulatum strains
To confirm that sequence similarity and taxonomic status among the strains in this study, the pairwise ANI and TNI values of all twenty genome assemblies were calculated (Fig. 1). Strains with an ANI value of over 95% are generally considered to be of the same species [14]. ANI and TNI obtained similar clustering results, that distinct subspecies branches were obtained. IMAUFB085 and IMAUFB087 were grouped with most of the B. catenulatum subsp. catenulatum strains, ANI values of them compared to B. catenulatum subsp. catenulatum JCM1194T were 98.41% and 98.42%, and TNI values were 87.45% and 84.48%, respectively. Above results confirmed that IMAUFB085 and IMAUFB087 were classified as B. catenulatum subsp. catenulatum. These two strains were previously identified as B. pseudocatenulatum by 16S rRNA (Additional file 1.), but 16S rRNA could not reliably distinguish the species of Bifidobacterium [15], so we revised their taxonomic status to B. catenulatum subsp. catenulatum here.
ANI analysis revealed that three B. catenulatum subsp. catenulatum strains, namely, JGBg468, BCJG468, and MC1, significantly differed from the other B. catenulatum subsp. catenulatum strains (Their ANI values compared to JCM1194T were 93.83%, 93.88% and 93.86%). It is assumed that the taxonomic information of these three strains was wrongly recorded, so these three strains were subsequently excluded. Cluster analysis clearly distinguished two subspecies, ANI between the two subspecies groups were greater than 95%, and within the subspecies were greater than 98%, indicating that these strains belonged to the same species.
Comparison of general genomic features between two subspecies
The general information of the strains shows that all of B. catenulatum subsp. kashiwanohense are derived from infants, while only two strains of B. catenulatum subsp. catenulatum are known to be infantile isolates (Additional file 2.). The general genomic features of 17 strains of B. catenulatum species and two novel strains, IMAUFB085 and IMAUFB087, are summarized in Table 1. The genomic characteristics within B. catenulatum species showed different degrees of difference. The genome size and GC content of B. catenulatum isolates were 2.16 ± 0.13 Mb and 56.21% ± 0.11%, respectively. Comparing the basic genomic characteristics between the two subspecies (Additional file 3.), the results showed that the genome size of B. catenulatum subsp. kashiwanohense (2.36 ± 0.05 Mb) was significantly higher than that of B. catenulatum subsp. catenulatum (2.09 ± 0.07 Mb) (p<0.0001), while there was no significant difference in GC content (p>0.05). This strong genomic differences reflected the speciation boundaries of the two subspecies, while the consistency of GC content represented the intimate relationship between them [16, 17]. In addition, B. catenulatum subsp. kashiwanohense contained more CDSs than B. catenulatum subsp. catenulatum (p<0.01), and there was no statistical difference in the number of tRNA (p>0.05).
Table 1
General genomic features of B. catenulatum genomes
Collection strain
|
Genome size
(Mb)
|
GC content
(%)
|
No of
CDSs
|
No of
tRNAs
|
IMAUFB087
|
2.01
|
56.06
|
1,834
|
56
|
IMAUFB085
|
1.98
|
55.94
|
1,781
|
54
|
B. catenulatum subsp. catenulatum JCM1194T
|
2.08
|
56.20
|
1,616
|
56
|
B. catenulatum subsp. catenulatum DSM16992
|
2.06
|
56.10
|
1,606
|
56
|
B. catenulatum subsp. catenulatum LMG11043
|
2.08
|
56.11
|
1,515
|
56
|
B. catenulatum subsp. catenulatum DSM16992(2)
|
2.11
|
56.41
|
1,616
|
56
|
B. catenulatum subsp. catenulatum 1899B
|
2.12
|
56.25
|
1,656
|
56
|
B. catenulatum subsp. catenulatum A2
|
2.02
|
56.15
|
1,584
|
54
|
B. catenulatum subsp. catenulatum A1
|
2.06
|
56.21
|
1,659
|
56
|
B. catenulatum subsp. catenulatum A3
|
2.15
|
56.36
|
1,707
|
59
|
B. catenulatum subsp. catenulatum HGUT-01490
|
2.08
|
56.20
|
1,615
|
56
|
B. catenulatum subsp. kashiwanohense PV20-2
|
2.37
|
56.12
|
1,876
|
58
|
B. catenulatum subsp. kashiwanohense JCM15439T
|
2.34
|
56.30
|
1,842
|
54
|
B. catenulatum subsp. kashiwanohense APCKJ1
|
2.45
|
56.20
|
1,968
|
54
|
B. catenulatum subsp. kashiwanohense DSM21854
|
2.31
|
56.20
|
1,758
|
53
|
B. catenulatum subsp. kashiwanohense DSM21854(2)
|
2.32
|
56.30
|
1,854
|
68
|
Phylogenetic divergence of two subspecies of B. catenulatum
The classification of species and the establishment of intraspecific relationships are frequently based on phylogenetic analysis. We reconstructed the phylogenetic structure of B. catenulatum species and specified B. pseudocatenulatum JCM1200T as the outgroup. A NJtree on the basis of 785 core genes was constructed and visualized with bootstrap support of 1000 replications (Fig. 2A). This phylogenetic tree well confirmed the phenomenon of subspecies divergence of B. catenulatum, 16 B. catenulatum strains in our analysis were clearly divided into two subgroups, B. catenulatum subsp. kashiwanohense and B. catenulatum subsp. catenulatum subgroup, and showed that there were diacritical differences between the two subspecies at gene level. By annotating the source of the isolates, interestingly, there was a significant association between the B. catenulatum strains and their isolated source. Infant isolates, including all strains of B. catenulatum subsp. kashiwanohense and two of B. catenulatum subsp. catenulatum, showed intrapecific genetic similarity, while the rest were adult isolates in another cluster that they also appeared close phylogenetic relationships. This suggests that the trend of divergence of the B. catenulatum strains may be dependent on their hosts. We believe that B. catenulatum may adapt its functions to infant and adult intestines respectively, thus gradually differentiating into subspecies of B. catenulatum from the species level.
Constructing the pan-core genome of B. catenulatum
The gene pool of a population contains all the genetic material and functions of a species. Roary was used to calculate the pan-core genome of the above-mentioned 16 B. catenulatum strains, in total 4608 pan genes were searched. The above phenomenon is that the two subspecies of B. catenulatum branched from the same ancestor made them share a certain proportion of core genes, with the number of 998 (21.66%) (Fig. 2B). The genetic distribution of B. catenulatum showed that there were individual gene set in the two subspecies that 87 core genes in B. catenulatum subsp. kashiwanohense and 63 in B. catenulatum subsp. catenulatum, these unique core gene sets can provide a internal basis for differentiation of Bifidobacterium species [18]. Additionally, there are different numbers of strain specific genes in the B. catenulatum species, their numbers ranged from 20 to 578, which reflected the potential genetic diversity among B. catenulatum species.
Subsequently, we established the pan-core genes curves for genomes of the species B. catenulatum (Additional file 4.A). With the augment of new genomes, the number of pan genes increases, which indicates the existence of an open pan-genome within the species B. catenulatum. Contrary to the pan genome, the core genes are not expected to be significantly reduced in number by the addition of further genomes since the exponential trendline essentially reached the number of 1000. At the same time, we found that B. catenulatum subsp. catenulatum has a fairly open pan-core genome (Additional file 4.B), while B. catenulatum subsp. kashiwanohense tends to be closed (Additional file 4.C). These results indicate that B. catenulatum subsp. catenulatum may have strong ability to be adaptable to the environment, while B. catenulatum subsp. kashiwanohense exists in a more specific and conservative habitat. This result is consistent with the isolated information of the B. catenulatum strains in this study (Additional file 2.), for B. catenulatum subsp. kashiwanohense only exists in the intestinal tract of infants and it is an infrequently encountered species [10], while B. catenulatum subsp. catenulatum can exist in both the intestinal tract of infants and adults and has a large number, so we speculated that more novel strains of B. catenulatum subsp. catenulatum may be discovered in the future than B. catenulatum subsp. kashiwanohense.
Comparison of the main functions between two subspecies
Above results reveal genetic differences between the two subspecies from the general genomics level, which are usually associated with functional differentiation [18]. Therefore, it is necessary to conduct further functional genomic comparisons between the two subspecies of B. catenulatum. In order to acquire the functional genomic difference of them, all strains were annotated online through the RAST website. Examination of the functional annotation of the 16 genomes, functional annotation of B. catenulatum genome sequence classified proteins into 23 functional categories (Additional file 5.). Results suggests that the amino acid derivatives (21.06%) functional is the most highly represented functional category within the B. catenulatum followed by protein metabolism (21.00%), carbohydrate metabolism (15.73%), and cofactors, vitamins, prosthetic groups, and pigments (9.76%). This indicates that a strong ability exists for substrate utilization by B. catenulatum. Comparison of the main functional differences between the two subspecies showed that B. catenulatum subsp. kashiwanohense had wholely stronger metabolic capacity than B. catenulatum subsp. catenulatum, and there were significant differences in the metabolism of amino acid (p<0.01), carbohydrates (p<0.01), and protein (p<0.05). While there was no statistical difference in the cofactors, vitamins, prosthetic groups, and pigments-related categories (p>0.05) (Fig. 3).
In view of the remarkably significant differences in the metabolic functions of amino acids and carbohydrates in the two subspecies, we first compared the detailed functions of metabolic amino acids in the two subspecies (Additional file 6.A). We found that the main difference lies in a subfunction of amino acids, lysine, urea cycle, polyamines and cysteine, which is also the most abundant amino acid metabolic category in B. catenulatum. Further analysis of the details of this function showed that methionine biosynthesis was the main factor affecting the difference of amino acid function between the two subspecies, and methionine biosynthesis of B. catenulatum subsp. kashiwanohense was significantly stronger than that of B. catenulatum subsp. catenulatum (Additional file 6.B, C). This suggests that the functional difference in methionine biosynthesis is the key to the difference in the amino acid functional genomes of the two subspecies of B. catenulatum.
Different carbohydrate utilization patterns in two subspecies
In order to compare the carbohydrate utilization ability of B. catenulatum at the genomic level, the functional genes of carbohydrate active enzymes of 16 B. catenulatum strains were analyzed. As shown in fig. 4A, 16 strains of B. catenulatum were distributed in all six carbohydrate active enzyme families, which indicated that they had rich carbohydrate function. It is worth noting that the clustering results of CAZyme are roughly consistent with phylogenetic trees that the two subspecies are clearly distinguished. This not only suggests that the two subspecies have different metabolic patterns in terms of carbohydrate utilization, but indicated that CAZy -related genes were closely associated with the divergence of subspecies of B. catenulatum.
Among the identified GH families, the most dominant were the GH3, GH13 and GH43 families in B. catenulatum species, and GT2 and GT4 were the main family of carbohydrate enzymes within B. catenulatum species. By comparing the main carbohydrate hydrolase families between subspecies (Fig. 4B), it was found that the GH3 (p<0.0001) and GT4 (p<0.01) of B. catenulatum subsp. catenulatum were significantly higher than those of B. catenulatum subsp. kashiwanohense. Which indicates that B. catenulatum subsp. catenulatum has stronger ability to utilize β-D-glucosyl, sucrose and cellulose [19]. However, there was not statistically significant difference in the function of GH13, GH43 and GT2 between the two subspecies (p>0.05) (Fig. 4C,D,E).
By analyzing the specific carbohydrase of B. catenulatum subsp. kashiwanohense, we identified five families closely related to it, including GH18, CBM5, GH95, CBM51 and CBM66 (Fig. 4G). The CBM family is primarily responsible for banding carbohydrates, GH18 family often combines with CBM5 to participate in the function of chitinase, and CBM66 mainly assists in the degradation of fructose [20]. In particular, GH95 is specifically participated in the production of α-L-fucosidase, which is the most abundant substance in HMOs and is closely related to the function of infant-specific species [21]. Additionally, CBM51 is beneficial to bind to GH95 and help it pick up fucose substrates to metabolize HMOs [22]. These two CAZy families, CBM51 and GH95 may be conducive to the colonization of B. catenulatum subsp. kashiwanohense in infant intestines, especially the utilization of HMOs, which is in contrast to the abundance of plant-derived glycan of B. catenulatum subsp. catenulatum, and further reflects the bias of the two subspecies in the utilization of carbohydrates.
HMOs-related genes in B. catenulatum genomes
To further explore the deletion of HMOs-related genes and the overall genomic differences between the two subspecies, the BLAST Ring Image Generator (BRIG) was used to graphically compare B. catenulatum strains using type strain B. catenulatum subsp. kashiwanohense JCM15439T as reference (Fig. 5). In order to compare the presence and deletion of HMOs-related genes in the two subspecies, we annotated four fuc genes (fuc1, fuc2, fuc3 and fuc4) that encode α-L-fucosidase in reference genome JCM15439T based on the close relationship between L-fucose and HMOs mentioned above. The results showed that all fuc genes were specifically found in B. catenulatum subsp. kashiwanohense but not in B. catenulatum subsp. catenulatum. fuc2, fuc3 and fuc4 are included in all B. catenulatum subsp. kashiwanohense genomes except fuc1, which is absent in PV20-2. Above phenomenon further confirms the unique functional advantage of B. catenulatum subsp. kashiwanohense subspecies in metabolic HMOs.
Overall, the figure shows that the majority of sequences present in JCM15439T are also present in all other strains, the genomes were more than 90% identical. However, it shows that two large genome gaps (GGs) exist separately in the two newly sequenced strains, IMAUFB085 and IMAUFB087, which have less than 70% of the matched-degree compared to JCM15439T. In general, sequences located in the GGs represent hypothetical CDSs, genomic islands, or prophages [23], which indicates that there are still many unknown functions of these two strains to be explored.