Unique Genome Differences
The interspecies comparison is discussed between the two sequenced genomes of B.abortus vaccine A19 and virulent strain 9-941. Comparative analysis revealed differences in strain-specific and may help explain its virulence attenuation mechanisms. Although the highly conserved genomic skeleton shared by two Brucella strains, several important gene differences have been identified in the context. and are described the details below.
Genome Structural Variation
Previously reported sequences larger than 100 bp unique between the two genomes were defined as Structural variation. The A19 were aligned to the 9-941 genomes to determine their presence or absence. Ten unique fragments have been found. Three of the special sequences were located at the intergenic region. In the remaining seven special regions located in the gene coding region, two deletions sequence may resulted in changes in their gene function with virulence.
In B.abortus A19, a deletion region has led to the loss of 63bp sequence, which interrupts the coding regions of an ABC transporter permease. It is play an important role in maintaining the osmotic pressure balance inside and outside the cell, cell differentiation, antigen presentation, bacterial immunity and the like [20]. The deletion occurred on chromosome 2 of A19 at the position of 371693 to 371756, which its corresponding gene was WP_002965788.1. Compared with 9-941, A19 has sequence variation results a 31 amino acid removed of the ORF length, which may results its gene function changed from ABC transporter protein to hypothetical protein. According to reports, ABC transporter protein was predicted to related with virulence in the Brucella genome.
Using the above gene data with MEGA7, a phylogenetic tree was constructed, showing the evolutionary relationship of this gene between the sequenced Brucella genomes. Clustering includes vaccine and virulent strain of B. abortus, B. suis and B. melitensis data that can be clustered into different groups (Figure 4). This is consistent with other analyses showing that B. abortus A19 was most closely related to B. abortus S19, and A19 was more closely associated with B. abortus 9-941, 2308, A13334 than B. suis and B. melitensis. From the results of cluster analysis, it can be concluded that the genetic relationship between the different species and the classification of Brucella species are similar. However, there are different close relationships between vaccine strains and virulent strains of the same type of Brucella.
To further investigate the relationship of this deletion sequence in different species of Brucella, we performed sequence analysis of 9 complete genomes of the deleted sequence to provide more detailed information on changes in the genome region. As shown in Figure 5, the region marked with red box represents the detailed sequence alignment of the deleted gene. As can be seen from the comparison results, the sequence was deleted in both the A19 and S19 genomes, and was present in the other B. abortus, B. suis and B. melitensis. However, these existing genes are identical in B. suis and B. melitensis, and there are some differences in B. abortus. This corresponds to the results of the close relationship we analyzed in Figure 4. We also compared this sequence with the Brucella genome-wide library on NCBI and found that there are many different species of Brucella that contain this sequence, so this deletion sequence on A19 is significant and it is related to virulence. It also provides a research hotspot for the study of subsequent attenuation mechanisms and the establishment of diagnostic methods.
In order to analysis the structural features of the above gene, a conservative motif was determined based on evolutionary relationship of the conserved motifs. Using the MEME database to search for its conserved sequences, a total of 20 conserved motifs were identified. These motifs were then further aligned in different types of Brucella. As shown in Figure 6, most of the identified B. suis, B. melitensis and B. abortus contain 20 motifs, except that these motifs were ordered differently in different types. The ordering in B. suis and B. melitensis was the same. Surprisingly, we found that motifs 17 and 18 were missing in A19 and S19. After the InterProScan search, we screened aligned sequences with an E-value of less than 0.05, and motifs 17 and 18 aligned the 23 and 14 nucleotide sequences, respectively, and labeled with Nuclear receptors with C4 zinc fingers and DNA-Binding of Transcription Factors. Taken together, these results indicate that most species have the same protein motif, further supporting their phylogenetic classification.
The zinc finger (ZnF) domain is a relatively small set of protein motifs that contain fingertips stabilized by zinc ions and function to bind DNA, RNA, proteins and lipid substrates. It is also widely involved in gene transcription and translation, mRNA transport and processing, and chromatin remodeling [21]. The Cys4 (C4) zinc finger motif (class II ZnF motif) consists of eight conserved cysteine residues bound to two zinc atoms, usually in a single form, and is involved in DNA binding, and in many Nuclear receptors are ubiquitous in transcription factors. The SZnF protein is mainly located in the cytoplasm and is also present in a small amount in the nucleus.
Proteins that read gene regulatory codes, transcription factors (TFs), have been extensively identified. Affinity for all possible DNA sequences is described based on the affinity and biochemical principles of the TF-DNA binding model. The overall analysis of the data indicates that the orientation and spacing preferences of homodimers and base stacking interactions play a large role in TF-DNA binding. In this model, transcription factors control gene expression by dynamically binding and generating partial possession of the same locus [22].
The high affinity binding protein-dependent ABC transporter composed of a high affinity periplasmic substrate binding subunit, two hydrophobic membrane subunits and two other cytoplasmic subunits. ATP hydrolysis by the related ATPase provides energy for the substrate to accumulate across the intima over a range of concentrations. The periplasmic substrate binding subunit consists of two separate but similarly folded globular domains joined by a hinge region consisting of two or three short polypeptide fragments.
Next, in order to evaluate the possible effect of the gene on protein function, we performed a structural analysis of the amino acid sequence of ABC transporter protein, and constructed a three-dimensional structural model of this gene using Blast and MOD (Fig.7). It can be seen from the figure that the red region represents the deletion fragment of A19, and the tertiary structure analysis of the protein indicates that the structure of the deletion fragment is irregular curl. The three-dimensional structure of the protein in the blue region is an alpha helix and a random coil. Our analysis indicated that the deleted region is located within the alpha helix and irregular curl of the gene. Irregular curl is a loose peptide chain structure with no regularity, often the active site of the enzyme. Moreover, these sites are often important regions for the functional and conformational functions of protein molecules. Therefore, the B. abortus A19 vaccine lacking this fragment may result in the loss of the active region of its active center, resulting in loss of function. We hypothesized that this sequence affects protein activity, although further studies are needed to determine its possible link to A19 decay.
The second variable deletion region in A19 occured of Hypothetical protein compared to 9-941. ChrI of A19 also contained 63bp changes, the position of the missing sequence starts at 435999 and ends at 436062, which corresponds to gene WP_002963568.1. This gene encodes a hypothetical protein in 9-941, resulting in the deletion of 63 nucleotide sequences in the noncoding region of A19. We continued to observe the upstream and downstream genes of this gene and found that the upstream and downstream genes of the two genomes are identical, encoding ribbon-helix-helix protein%2C CopG family and hypothetical proteins, respectively. After a structural similarity search in the protein database, a major hit was found in the hypothetical protein BR0409 of Brucella suis 1330.
By comparison with other species of Brucella, the sequence was found in B. suis (1330, VBI22), B. abortus (9-941) and B. melitensis (16M, M28), and were missing in B. abortus vaccine A19, S19 and virulent strain 2308. Drawing the phylogenetic tree shows that the close relationship between A19 and 2308 is closer than that of 9-941, so we speculate that the virulence strains 2308 and 9-941 also have differences in virulence.
Genome Indels
Indels refers to the insertion or deletion of one or more nucleotides in one genome relative to another, which can be used as a sequence feature to characterize the evolution of different organisms [23]. Indels can have a dramatic effect on the gene, resulting in frameshift, extension or truncation of the encoded protein. Pairwise analysis identified that there were 47 indels in A19 genome compared to 9-941, 8 of which are located in the intergenic region. A total of 29 indels are located in 22 different ORFs of the coding region. We assume it results in changes in gene function. Of the 22 different genes, 3 are related to virulence in 9-941 and may missing their functions in A19. The specific Indels are shown in Table 2. These data suggest that mutations in these genes can play a central role in virulence attenuation mechanisms and future vaccine research.
A notable Indel was located in the Ceg34 gene, and the function of its annotation is gamma-glutamyl-gamma-aminobutyrate hydrolase. Deletion of 16-nucleotide was found at the position 247339 of chromosome 2 in Ceg34 gene. By aligning the results, the region between the two genes WP_002965668.1 and WP_002965669.1 in 9-941 was the intergenic region, which was located from 975 to 1077. In A19, the location of the two intergenic regions was from 975 to 1061. Due to the 16 bp deletion in A19, a hypothetical protein appeared in the interdigit region of 981-1076, may resulting in a change in gene function. To further explore the reason, we analyzed the upstream and downstream of WP_002965669.1 gene. As shown in Figure 9, the blue arrow to the right represents the forward annotation result, while the yellow arrow to the left represents the reverse annotation result. It can be seen that the upstream and downstream of this gene did not change in A19 and 9-941, but only one hypothetical protein was added. Therefore, we speculate that the promoter only encodes and translates the hypothetical protein, which in turn inhibits the function of the WP_002965669.1 gene, rendering it incapable of expression.
Through the alignment of nucleotide and protein sequences, we conducted an in-depth study of the deleted sequence of A19. Surprisingly, it was found that the deleted 16-nucleotide sequence (16bp) is a tandem repeat in 9-941. And this sequence is located at the end of the gene BRUAB_RS11615. Two copies of this repeat appeared in the full length ORF of strain 9-941. A variant ORF or protein is produced by deleting a repeat of strain A19. The absence of B.abortus vaccine and virulent strains is different. We compared the S19 vaccine strain and found that S19 also contains only one copy. The possibility of this deletion may be related to the host preference of a particular species.
γ-glutamyl-γ-aminobutyrate hydrolase (PuuD) identified its active center as Cys-114 by site-directed mutagenesis. Expression of PuuD is induced by putrescine and O2 [24]. Strains lacking puuD accumulateγ-glutamyl -aminobutyric acid (g-Glu-GABA) and could not grow on the putrescine as the sole nitrogen source (Kurihara S and Oda S, 2005) [25]. This finding indicates that PuuD has important physiological significance of g-Glu-GABA hydrolase.
The protein sequence of the BRUAB_RS11615 gene was compared to the VFDB database and aligned with the ceg34 gene belonging to the Dot / Icm Type IV secretion system. Intracellular replication of Chlamydia pneumoniae requires the Dot / Icm Type IV secretion system, which consists of approximately 27 proteins that may cross the bacterial membrane and the phagocytic membrane [26]. The ability of bacteria to survive in phagocytic hosts depends on the Dot / Icm Type IV secretion system (T4SS), which transports multiple effector proteins into host cells. The perturbation of host killing capacity is largely mediated by the collective function of the protein substrate injected into the host cell by the Dot/Icm transporter [27]. This gene may be a virulence factor for Brucella A19, and we speculate that this may be the reason for the virulence of the A19 vaccine.
Another 1bp insertion was occured in A19 resulting in loss of encoded protein of the amino acid ABC transporter permease compared to 9-941. The BRUAB_RS09305 gene was located on the ChrI, and insertion of the C base at position 1901322. We further compared the upstream and downstream genes of this gene. As can be seen from Fig. 10, the red position in the figure represents the position of base insertion, and the upstream and downstream of the BRUAB_RS09305 genes are identical in 9-941 and A19. The WP_002965020.1 gene in 9-941 became a hypothetical protein in A19 due to the insertion of the C base. The membrane-associated complex of Salmonella typhimurium periplasmic histidine permease is a member of the ABC transporter or transport ATPase superfamily, a copy of two integral membrane proteins HisQ and HisM and two ATP-binding subunits HisP composition [28]. HisP is absolutely essential for ATP hydrolysis and HisQ could not. HisP is dependent on HisQ to deliver an induction signal from the soluble receptor HisJ, and HisQ regulates the ATPase activity of HisP, while HisP changes conformation after exposure to phospholipids. This ensures the normal operation of the virulence factor ABC transport system.
In chromosome 2 of A19 genome, a 1 bp difference in nucleotide sequence. Insertion of an adenosine residue at position 215674 of the BRUAB_RS00955 gene results in a frameshift. It may caused the changes in gene function of the alcohol dehydrogenase. Alcohol dehydrogenase D (AdhD) is a monomeric thermostable alcohol dehydrogenase derived from the protein aldose-keto reductase (AKR) superfamily [29]. AdhD has a fairly broad base specificity. And it is a metal ion-independent enzyme that has been isolated from thermophilic and archaea and successfully expressed in E. coli. It catalyzes the oxidation–reduction reaction of NAD / NADH linked short chain alcohols and aldehydes (or ketones). This enzyme is involved in the synthesis of ethanol. It has been reported that constitutive expression of the AADH2 gene can be used by Agrobacterium to produce 1-butanol, but overexpression of this gene prevents cell growth. The resulting Adh acts synergistically with other enzymes involved in 1-butanol metabolism, and Adh is an important and irreversible enzyme for the synthesis of 1-butanol [30]. This deficiency of ADH in the A19 strain does not prevent the growth of body cells. Therefore, we suspect that this mutation may be the cause of A19 virulence attenuation.
Previously, this study also compared the genomic differences between A19 and S19. It was found that in chromosome I of A19, a 1 bp insertion results in aldehyde dehydrogenase to become dehydrogenase. They are all a kind of dehydrogenase. But the function of this gene also exists in 9-941. Therefore, we speculate that there are dehydrogenases in other positions of the chromosomes of A19 and S19, which may be a factor that they are still virulent.
Genome SNPs
SNPs are powerful tools for describing the phylogenetic framework of species [31]. SNPs data will help develop novel high resolution molecular typing techniques for intra- and interspecies identification of pathogenic microorganisms. We compared the analysis of B. abortus vaccine A19 and virulent strain 9-941 to show that SNPs are widespread in these genes.
The distribution of SNPs between the two chromosomes was not equally, one hundred and forty seven SNPs (68%) and the remaining sixty-eight SNPs (32%) were located on chromosome I and chromosome II respectively. Among these total 215 SNPs, Fifty-six SNPs were synonymously substituted and encode the same amino acid (aa). The 117 SNPs were missense substitutions, conserved, encoding different aa with similar properties; Two SNPs cause frame changes or protein coding regions to stop prematurely.
A total of 117 SNPs were present in 112 different genes, and these genes have changed their functions due to SNP. Among the 112 well defined genes, more than half of the genes encode various enzymes involved in the basic life cycle, such as RNA methyltransferase, nitrite/sulfite reductase, 6S rRNA-dimethyltransferase RsmA, 23S rRNA-methyltransferase RlmJ, arginyltransferase and CTP synthase. In addition, there were always 9 genes whose function may be related to virulence in 9-941, but in A19 the function of this virulence gene is missing. The correlation with virulence is described in the following section (Table 2).
Gene BRUAB_RS00275 and BRUAB_RS15760 encode LysR and IclR family transcriptional regulator respectively. The two regulatory genes in 9-941 were different from the A19 genes. The LysR regulatory gene located on chromosome 1, and another different IclR gene located on chromosome 2. Mutation of the LysR gene results in the replacement of arginine by leucine, and the IclR gene is a substitution of proline for serine. Some study describes LysR family transcriptional regulator in B. abortus, which activates abcR2 but does not activate abcR1, and more importantly, is critical for the ability of bacteria to successfully infect macrophages and mice [32]. The IclR gene affects stress survival and host infection in the zoonotic pathogen Brucella abortus. In one study, LysR and GntR mutants show reduced virulence [33]. Members of the IclR, Crp, LysR and TetR family of transcriptional regulators are involved in virulence and symbiosis. Therefore, the lack of these two LysR and IclR regulatory genes may be the cause of A19 attenuation.
Gene BRUAB_RS14580 encodes MFS transporter. The mutation site of this gene is located at position 875262 of chromosome 2, and its mutation causes arginine to replace histidine. It is worth noting that the major facilitator superfamily (MFS) transporter is located in the plasma membrane of the cell and is critical for the pathogenicity of plants and many other organisms (Michael Hohl and Sille Remm, 2019) [34]. Their active efflux is resistant to a variety of secondary metabolites and antibiotics. It can be a virulence factor by protecting pathogens from virulent compounds produced by host phagocytic cells.
Gene BRUAB_RS04595 encodes outer membrane subunit BepC. Lipopolysaccharide (LPS), outer membrane proteins (OMP) and adhesins are all virulence factors of Brucella ovis and may be of greater importance in the host and in interaction with pathogens. Several OMPs associated with the virulence of Brucella include Omp10, Omp16, Omp19, SP41 and BepC, which have previously been associated with the virulence of Brucella. And BepC is a TolC homologous protein that was found to be essential for the complete virulence of S. suis 1330 in a mouse model. Heparin Bartonella (Bhe) can invade human endothelial cells (EC) by invading body-mediated internalization as a large bacterial aggregate. The process depends on the functional type IV secretion system (T4SS) and the thereby translocated Bep effector proteins. Bep effector protein is one of the virulence factors of bacteria invading the body [35].
There are other three genes (BRUAB_RS07715, BRUAB_RS14740 and BRUAB_RS14975) that encode ABC transporter protein, which their corresponding virulence genes are hitB, ddrA and oppA respectively. Bacteria can use peptides as a source of amino acids, nitrogen, carbon and energy. A variety of peptide uptake systems mediate the transport of peptides across the plasma membrane of bacteria. Recent studies have shown that OppA can preferentially select substrate peptides. E. coliOppA prefers a positively charged peptide with three or four amino acids in length. However, other studies have also shown that the substrate preferences of peptide uptake systems are more complex [36].
In addition, gene BRUAB_RS06870 and BRUAB_RS07680 were also related to virulence, the functions of the two genes are cell division protein FtsA and Gfo/Idh/MocA family oxidoreductase. In bacterial models, including Streptococcus pneumoniae, Escherichia coli and Bacillus subtilis,FtsZ and FtsA proteins assemble into loops at midcell and are specifically used for spacer peptidoglycan (PG) synthesis. Therefore, inactivation of FtsZ or FtsA results in the inability of filamentous cells to divide. Moreover, complete depletion of FtsA results in dislocation of the FtsZ loop and ultimately leads to cell expansion and lysis [37]. The Gfo / Idh / MocA protein family contains many different proteins, almost all of which are composed of NAD(P)-dependent oxidoreductases (or dehydrogenases), which have multiple substrates. The earliest mentioned Gfo / Idh / The history of the MocA family dates back to 2001, and the publication describes whole-genome sequencing of a virulent isolate of Streptococcus pneumonia [38].
In the comparison of SNPs, many virulence genes have their functions in 9-941, but are missing in A19. When comparing A19 and S19, we focused on the reason why A19 is more virulent than S19. Although there is no clear report on which of the two is more virulent, it is well known that the immune effect of S19 is better than that of A19. Some virulent genes have functions such as LysR family transcriptional regulator and ABC transporter permease. When these genes were exist in A19, the comparison with 9-941 is lost, but the comparison with S19 does exist. Therefore, it is first explained that these genes are related to virulence, and then it is also the reason why the A19 vaccine still has virulence. Finally, the number of functional genes in virulent strains and vaccines is also an important indicator for judging its virulence.
In summary, the genomes of A19 and 9-941 were very similar in sequence, organization and structure. Few fragments in the genome are unique. Even some genes are closely related to virulence. Our future research will focus on in vitro and in vivo infection studies using the above virulence genes to analyze their virulence characteristics with the aim of developing a new live attenuated vaccine for brucellosis in livestock.
PCR Verification
In this study, the WP_002965788.1 gene screened was A19-specific deletion by genomic alignment, but exists in 9-941. Further BLAST alignment of this gene sequence in NCBI found that it was also deleted in S19, but it was present in most other commonly used Brucella vaccine strains and virulent strains. This experiment designed primers for this gene to verify (Table 3). Using PCR differential diagnosis, the sequence was amplified in the genomes of A19, S19, M28, S2 and Rev.1, respectively. The results showed that the WP_002965788.1 gene was not amplified in the A19 and S19 genomes, and could be amplified in S2, M28 and Rev.1 (Figure 11). Indirectly indicates that the gene sequence can distinguish A19, S19 and most other commonly used Brucella viruent strains and vaccine strains, and the results are in line with expectations.
Vaccine immunization is the main measure for the prevention and treatment of brucellosis, which has been widely used in many herds in China. Different types of vaccines have differences in virulence, protection, immune protection period, applicable animal species, etc. The successful construction of PCR differential diagnosis method can effectively distinguish artificial immune(A19, S19) and natural infections of Brucella, so as to better purify the herd and prevent and control disease.