2.1 Genomic analysis and geographic distribution
Acinetobacter baumannii is a genetically diverse bacterial species, and there is a variety of typing methods to identify genetic differences among the strains that could be associated with pathogenicity, epidemiological origin, dissemination, and evolutionary patterns [19]. Sequence type and phylogenetic analysis allow to identify genotype groups with a phylogenetic relationship and explore the diversity among the strains [20]. Similarity nucleotides and MLST analysis with geographical data can reveal a better knowledge of the epidemiological context and population structure among the strains around the world [21, 22]. The analysis of genomic similarity based on sequence alignment and geographic distribution, it is possible to infer bacterial clonality, considering that strains of bacterial species isolated from the same region tend to have the same genic repertoire. Even though events of gene drift and vertical gene transfer cannot be ruled out, genetic characteristics are generally conserved when dealing with isolated bacteria in the same site or nearby sites.
Numerous epidemiological studies of A. baumannii associate with the presence of ST by local origin, as seen in the occurrence of ST 848 (CC 208) (Oxford scheme) carrying resistance gene to carbapenems in India [23], and likewise the frequent presence of ST15, ST25, ST79, and ST1 in South America [24, 25]. A recent phylogeographical analysis of the Italian isolates belongs to an only clonal group ST78 (Pasteur scheme) [19].
The 206 A. baumannii complete NCBI genomes sequences were analyzed (see Additional file 1). The genomes have a size varying from 3.48 Mb - 4.43 Mb with a genomic GC content of 39.05%. Considering that nearby isolated bacterial genomes tend to maintain the same genetic characteristics, the study of the geographical distribution of A. baumannii is an essential method for evaluating the conservation of the species in the global context.
A total of five relevant clusters with high similarity (≳ 98.5%) belonging mainly to specific STs (1, 2, 10, 79 and 437) were retrieved (see Additional file 2). This finding corroborates the conservation of genomes belonging to the same ST. Consequently, strains related to the same ST were expected to be isolated at locations to justify the high genomic similarity. Nevertheless, the geographic distribution of the strains according to the ST proved to be misplaced. Considering that different STs were isolated on distinct continents, possible factors that could justify this misplacing are microbial ubiquity and globalization (Figure 1). There is a higher number of deposited genomes belonging to ST 2 (50% of the used dataset), as well as a more significant number of strains isolated from the Asian continent (51.2% of the used dataset). These data do not corroborate the epidemiological information on the distribution of outbreaks caused by the bacterium A. baumannii[9, 20, 23]. Thus, this leads to the conclusion that there is a more significant number of sequencing performed on the Asian and North American continents since epidemiological outbreaks have been reported in several developing countries over time (Argentina, Brazil, and South Africa). Furthermore, outbreaks of infections by this pathogen have also been reported in the European continent; however, the number of isolates from that continent is still much lower.
2.2 Phylogeny and phylogenomics
Phylogenetically, all the A. baumannii strains were grouped in the same clade within the Acinetobacter genus, confirming the monophyly of this species (Figure 2). This result was expected because not only is observed in different microbial species but also is consistent with reports from the literature on phylogenetic analysis, indicating that the use of housekeeping genes to infer evolutionary history is a good qualifier of phylogenetic distance and epidemiology [26].
This result also points out that the A. baumannii strains represented in blue are highly conserved within the species (Figure 2). Moreover, this result revealed that strains of other species (represented mainly by the colors red, green, and yellow) did not group with A. baumannii strains, because they have different metabolic and phenotypic characteristics. A relevant fact is the distant phylogenetic relationship of A. radioresistens strains (represented in pink), which formed a basal monophyletic clade; therefore, it is a species of interest for comparative analysis.
Three strains (FDAARGOS_494, FDAARGOS_493, and FDAARGOS_560), previously identified as Acinetobacter sp., were grouped together and inside the A baumannii clade, strongly suggesting that they are, in fact, of this same species. This taxonomic re-classification has already occurred in other cases of bacterial species [27–29]. More phylogenomic studies, including tetranucleotide analyses, Average Nucleotide Identity (ANI), and the presence and absence of species-specific genes evaluation, are needed to confirm this hypothesis and assure taxonomic reclassification based on genomic data and theoretical background [27, 30].
The A. baumannii strains were grouped according to their respective STs in the phylogenomic tree, using the core genome sequence (Figure 3). Nonetheless, in the phylogenomics analyses, the strains ST 2 (represented in green) formed paraphyletic clades, and, thus, these strains cannot be considered in the same group. The strains represented in gray do not have a defined ST, but they all grouped in the same clade, indicating the high similarity among them (see Additional file 2).
2.3 Genomic plasticity
During the analysis of genomic plasticity, a large gap in the A. baumannii strains can be observed when visually compared. Even strains belonging to the same ST are not identical, although they are genomic and phylogenomic closer and share the same clade. This result suggests that the strains of this species are not very clonal and tend to have a high rate of gene permutation since there are many gaps between genomes (Figure 4).
Comparative genomic analyses of the 206 A. baumannii genomes, using de strain AYE as a reference, showed the presence of fourteen genomic islands (Figure 4). Among these 14 genomic islands, four were Pathogenicity islands, two were Metabolic islands, one was a Symbiotic island, and seven were Resistance islands. Furthermore, one full-sized resistance island (RI7 or AbaR1) was identified within the AYE strain. This genomic region has a length of 96878 nucleotides and contains the highest amount of resistance genes found in this species. There are 25 resistance genes within this island divided into efflux pumps and proteins with enzymatic activity.
The islands RI2 (80220 bp) and RI7 (96878 bp) are conserved within the species, which are more present within strains belonging to ST 1. Outside of this cluster, however, both islands were not completely found. A similar result is observed in smaller islands, such as RI1 (20317 bp), RI3 (6077 bp), RI4 (12534 bp), RI5 (14763 bp) and RI6 (10374 bp), indicating that they are unstable regions within the genome.
There is a great number of genomic islands for the species A. baumannii, which reveals its high genomic plasticity. Although we identified a reduced number of type sequences and phylogenetically close strains, analyzing the complete genomes, one can see how all the strains are different in their gene content. This could be due to the horizontal acquisition of mobile genetic elements or gene duplication events.
2.4 Analysis of the pan-genome for understanding this species
There is an intensive effort to know the total repertoire of the species A. baumannii. As a result, according to the power-law regression model, the pan-genome of A. baumannii remains open (γ = 0.46), which by each newly added genome, the number of new genes will increase the genetic repertoire of the species. This result was obtained using the formula n = a.xγ, where: n is the estimated size of the pan-genome for a given number of genomes; x is the number of genomes used; and γ are fitting parameters [31]. As a rule, when 0<γ<1, the pan-genome is considered open. Figure 5 shows the development of the pan-genome. The fact that it was not possible to reach the plateau of the total number of genes concerning the total of genomes corroborates the assumption that the pan-genome of the species remains open. This fact also corroborates the high genomic plasticity already reported for this species, especially considering that this bacterium has an exceptional ability to obtain new gene content through transposable elements [19, 32].
The pan-genome analysis revealed a total of 27682 genes, which 1373 genes are shared for all strains (complete genome sequences of A. baumannii), and 10683 were strain-specific genes. The accessory genome, except for single genes, is made up of 15626 genes. For these results, the > 95% threshold was considered for the prediction of orthologous genes.
The presence and absence of genes associated with the phylogenomic analysis of the strains under study, the diversity of presence patterns regarding different strains, and the accessory genome can be visualized in Figure 5. Therefore, each gene presence pattern accompanies the genomic nearness of the core genome. The distribution of accessory genomes and unique genes show high variability and the clustering of strains with shared gene content. Genomic proximity pattern is also observed in genomic similarity analysis and is connected to the grouping of strains by sequence type (Additional file 2 and Figure 3). Thus, strains belonging to the same ST tend to maintain similar profiles of the accessory genome distribution.
The different patterns of the presence of genes of the SDF strain can be observed in a detailed analysis. This strain is already known to be susceptible to antimicrobials and is the only representative of the sequence type 17. Its pattern of accessory genes differs from all the others and has about 803 unique genes, which contrasts with the pattern of the super-resistant AYE strain, which contains about 62 unique genes. This fact, combined with the distant phylogenomic position of the strain, shows how different the susceptible strain is from the others.
A more accurate analysis of the total pan-genome indicates the number of genes related to specific bacterial metabolic pathways. Such analysis is based on the KEGG database. It demonstrates a high number of core genes related to metabolic pathways intrinsic to microbial existence, such as energy metabolism (8.68 %) and molecular translation (5.87 %) (Figure 7). The accessory genes are related to amino acid metabolism (14.78 %), carbohydrate metabolism (15.55 %), and xenobiotics biodegradation and metabolism (5.93 %). Most of the genes related to drug resistance are part of the accessory genome (2.45 %) when compared to their percentage represented in the core-genome (1.76 %). Similarly, genes related to infectious diseases are represented in the core genome (0.80 %), accessory genome (2.07 %) and strain-specific genes (1.44 %).
As for genes related to adaptation to the environment, there is a very low gene repertoire associated with this process in the general pan-genome, with less than 0.5 % of the total repertoire linked to such a pathway in any subdivision of the pangenome.
2.5 Pan-resistome characterization of Acinetobacter baumannii
Considering a similarity criterion greater than 70 % and an E-value < 5e-6, all the studied strains present a pan-resistome of 171 genes, and within that, a core resistome constituted by five genes is shown in Table 1 [11].
In these analyses, the strains that presented ade-type bombs were expected to have the complete gene repertoire to be functional. Nevertheless, this pattern was observed exclusively for the adeIJK efflux pump, as all the genomes presented the genes adeI, adeJ, and adeK. The same pattern, however, was not observed for the other genes of the same family (Figure 8 and Additional file 3). Similarly, to the genes capable of constituting the adeFGH pump, the presence only for the adeF and adeG genes were detected in all the strains. The gene adeH (outer membrane factor protein in the adeFGH multidrug efflux complex) was not found in three strains (XDR-BJ83, ORAB01 and DS002), which, in theory, makes the activity of the pump unfeasible. Our study also identified an interesting protein present in all strains: ampC enzyme. This is responsible for generating resistance to beta-lactams, more specifically to cephalosporin, and is thought to cause hydrolysis of the drug [33, 34].
Analyzing the accessory portion of the resistome, an interesting distribution profile of specific genes was retrieved. The OXA-66 gene, responsible for coding variant 99 of beta-lactamase with action against penam and cephalosporin, for example, is present in 78 strains, which is equivalent to approximately 48% of the dataset. Among these, 93 belong to the ST 2. This fact makes this gene almost exclusive to strains belonging to ST 2. Regarding the other ST, only six strains have the OXA-66 gene, and they do not belong to ST 2, which are: BAL062 - ST unknown; SAA14 - ST 187; XH857 - ST 215; XH906 - ST 922; 7847 - ST unknown; TP1 - ST 570.
A similar pattern was observed with the ADC-76 gene, responsible for encoding a beta-lactamase that causes cephalosporin inactivation and was present in strains belonging exclusively to ST 23, 10, 85, 464, 575 and 639. The same is true for the OXA-68 gene, identified only in strains belonging to STs 23 and 10, but not present in all the strains. The same for the OXA-180 gene detected only in strains of STs 267. The gene responsible for encoding OXA-69 is almost exclusive to strains belonging to ST 1, 20, 81 and 195.
Other different patterns of gene distribution can be seen in Additional file 3. Nonetheless, there is no significant pattern of visible distribution related to the geographic location of the isolates, except in some cases. The OXA-67 gene is exclusive to isolates (strains EC and EH) from the Czech Republic while ADC-81 and OXA-92 genes are entire to the A388 strain.
As the distribution related to the number of antibiotics is linked to each subpartition of the pan-resistome, the antibiotic with the highest amount of resistance mechanisms linked to it is cephalosporin with about 103 resistance proteins within the formed pan-resistome (Figure 9). In contrast, antimicrobials (sulfonamide, sulfone, cephamycin and pleuromutilin) have low amounts of resistance mechanisms related to the predicted resistome of A. baumannii.
In accordance with the distribution of the types of resistance mechanisms found, 131 cause the enzymatic inactivation of the antibiotic (Figure 10). This total is equivalent to 76.6% of the predicted pan-resistome. Also, almost all the core resistome-related proteins are efflux pumps (8 proteins).
The genomics islands of resistance identified some genes, such as adeS, adeR, adeA, adeB and adeC (within resistance island 2). Moreover, on resistance island 7 (or AbaR1 island), the following antimicrobial resistance-related genes and products were detected: sul1, qacH, AAC(3)-Ia, APH(3’)-Ia, catI, tet(A), dfrA10, ANT(3'')-IIa, OXA-10, cmlA5, arr-2, ANT(2’’)-Ia VEB-1, AAC(6’)-Ian, tet(G), floR, dfrA1, APH(6)-Id, APH(3’’)-Ib.