The Genome assembly and component characteristics of the genome Halobacillus trueperi S61
The second-generation sequencing technology combined with the third-generation sequencing technology was used for deep sequencing of Halobacillus trueperi S61, and a detailed map of a circular chromosome was obtained. The whole genome of Halobacillus trueperi S61 identified 57549 total reads by QC-PacBio and consists a complete circular chromosome of 4047887 bp with 43.86% GC content without gaps (Figure1). The GC-Depth demonstrated that the strains Halobacillus trueperi S61 shown Poisson distribution, GC without obvious bias and appeared a scattered area between 20-30% GC content, which may be affected by mitochondrial DNA (Figure 2). Gupta et al [24] identified the genome Halobacillus trueperi SS1 has 4329 sequences with 4.14Mbp and 42.15% GC content as well as 35 RNA genes.
The genome Halobacillus trueperi S61 predicted total number of 3982 nucleotides with total length 3567510 and 44.57% GC content. The non-coding RNA (ncRNA) was non-encode proteins while perform various biological functions in life activities at the RNA level [25]. In this study, 139 non-coding RNA was identified included 86 tRNA, 30 rRNA and 23 sRNA (Table 1). Among them, the largest number was tRNA that sequence length accounts for 0.16% of the total sequence length, indicated the important role of tRNA contributed to expression and regulation in Halobacillus trueperi S61 cells. The repeated sequence as component of the gene regulatory network which affects the evolution, heredity, and mutation of life [26]. This study predicted total 58 interspersed repeats with 3909 bp and five types, the largest number of elements were 28 and 18 in LINEs and SINEs with 1856 bp and 1149 bp. While less proportion in DNA and LTR elements with 497bp and 218bp. In addition, three types of transposons (helitronORF and LINE) were predicted.
Table 1
The statistics of non-coding RNA prediction of Halobacillus trueperi S61.
Type
|
Number
|
Average length(bp)
|
Total length
|
In genome (%)
|
tRNA
|
86
|
77
|
6648
|
0.16
|
16S_rRNA
|
10
|
1538
|
15380
|
0.38
|
5S_rRNA
|
10
|
115
|
1150
|
0.03
|
23S_rRNA
|
10
|
2926
|
29260
|
0.72
|
sRNA
|
23
|
123
|
2847
|
0.07
|
The clustered regularly interspaced short palindromic repeats (CRISPR) as genetic weapon or natural immune system of most bacteria and archaea, since resistance to extraneous plasmids and phage sequence [27]. Two kinds of CRISPRs were predicted in the genome Halobacillus trueperi S61, Crispr 1 (AGAAAACAAAACCAACAATCAGCTG) and Crispr 2 (TGATGGGAATCGAACCCACGACAT) indicated that strain Halobacillus trueperi S61 provide the corresponding acquired immunity to the host through CRISPR pathway. Gene islands (GI) considered as mobile genetic elements due to related with various biological functions, especially the horizontal transfer of genes [28]. These predicted GI regions might be contained Halobacillus trueperi S61 antibiotic resistance genes and bacteriostatic gene fragments. Total 16 gene islands with 260275 bp were predicted in the whole genome of Halobacillus trueperi S61, which may support microbial adaptability to distinct abiotic stresses and antimicrobial resistance environments. In addition, Prophage as carrier of genetic information could be integrated with infected microbes’ genome after infection. Previous studies found that bacteriophages were able to dissolve certain pathogenic microbes which may beneficial for diseases curing, while at the same time may also dissolve beneficial or other harmful microbes. Thus, it is widely used as carrier for horizontal transfer of beneficial microbes [29]. Present study identified two prophages with total length 82682 in Halobacillus trueperi S61, contained 43.86 CDs of 44 and 61 genes with 44.85 and 38.43% GC respectively. Therefore, we speculate that Halobacillus trueperi S61 has ability to lysis pathogen which need further explored.
The Basic function Annotation of genome Halobacillus trueperi S61
The whole genome sequence of Halobacillus trueperi S61 has summarized basic annotation for 3982 protein-coding genes. In order to improve the functional prediction, 3980, 3667, 2998 and 2303 unigenes were annotated with Nr, Swissport, KOG and KEGG database.
Specifically, in the whole genome sequence of Halobacillus trueperi S61, 3668 genes have been annotated with COG function database (Figure 3). The protein function mainly distributed in 9.95% amino acid transport and metabolism (E), 8.02% carbohydrate transport and metabolism (G), 8.53% transcription (K), the number of genes were 365, 294, and 313 respectively. There were 13.71% genes for general function prediction only, and 9.13% of gene protein functions were unknown which require for further evaluation. In addition, we found that 107 genes were involved in secondary metabolites related biosynthesis transport and catabolism and 222 genes related with inorganic ion transport and metabolism (P), while other categories account less proportion. There were 7829 genes with GO annotation function (Figure 4) and demonstrated that biological process account for 52% and dominant by metabolic process, cellular and single-organism process (1016, 953 and 752), as well as localization and biological regulation (318 and 298). Additionally, molecular function account for 23% and affiliated to catalytic activity and binding (888 and 657), followed by transporter activity and nucleic acid binding transcription factor activity (114 and 103). As to part of cellular component account for 25% and most distributed in membrane, membrane part and cell (537, 484 and 376). This result indicated that the gene products of strain Halobacillus trueperi S61 was mainly focus on biology process.
The KEGG pathway annotation could be identified the functional genes up or down-regulated in target metabolites [30–31]. In this study, total 3672 genes of Halobacillus trueperi S61 were annotated with KEGG (Figure 5a) and annotated as five types included 80.94% metabolism, 8.66% environmental information processing, 5.80% genetic information processing, 4.36% cellular processes, 0.25% organismal systems. In the metabolic pathway, 602 related with metabolic, 274 and 215 related with biosynthesis of secondary metabolites and antibiotics as well as 183 related with microbial metabolism in diverse environments, 117 biosynthesis of amino acids and 96 carbon metabolisms identified. Furthermore, 76 genes were related with microbial viability (replication and repair) and reflected to 6 KEGG pathways, indicated Halobacillus trueperi S61 may play a role in homologous recombination, mismatch repair, DNA replication, base excision repair, nucleotide excision repair, and non-homologous end-joining metabolic pathways [32]. Additionally, Nr database annotated total number of 3980 genes (Figure 5b) and among which 88.32% matched with Bacillus subtilis (number of 3515), 134 and 130 genes belong to species Bacillus sp. EGD-AK10 and Streptococcus pneumoniae, followed by 66 Bacillus sp. YP1, 22 Bacillus sp. CMAA 1185, 17 Bacillus sp. LM 4-2, 13 Bacillus and Bacillus sp. JS. Overall, the GO, COG, KEGG, and Nr annotations of the protein-coding genes indicated that the protein function of Halobacillus trueperi S61 was mainly focus on biological processes, and the protein function was mainly distributed in gene transcription and amino acids, and carbohydrates metabolism.
The Advanced function Annotation of genome Halobacillus trueperi S61
The carbohydrate enzymes (CAZymes) as essential factor when pathogens pass through the primary barrier cell wall after the host attacked, and contributed to carbohydrates, glycoconjugates biosynthesis and decomposition [30]. The Halobacillus trueperi S61 contains a total of 561 CAZymes (Figure 6). Among them, glycoside hydrolases (GH) had highest content accounted for 35.29% of all carbohydrate enzymes, and the rest were glycosyltransferases (GT), carbohydrate-binding modules (CBM), carbohydrate esterases (CE), auxiliary activities (AAs), and polysaccharide lyases (PL), which accounted for 31.37%, 19.61%, 12.66%, 0.89% and 0.18%, respectively. Importantly, GT and GH play vital role in the metabolism process since GT associated with nucleotide and amino sugar metabolism and GH associated with glycogen, maltose, and N-acetylglucosamine degradation, that might be favorable to nutrient acquisition and maintain structure for survivability of strain Halobacillus trueperi S61 in salt sea [32]. Woo et al [33] annotated Halobacillus mangrovi KTB 131 genome and pointed out that the most strains distributed in secondary metabolite biosynthesis, catabolism and transport. Additionally, secreted proteins involved enzymes, antibodies and some hormones, total of 3982 proteins were predicted with 273 signal, 138 transmembrane and 135 secret proteins. Effector protein as critical point in bacterial secretion systems, pathogens secrete effector proteins into the extracellular or host cells through TNSS (type N secretion systems, type I-VII), which affect various important activities such as immune response and cell death in the cell process and caused pathological reactions. There was identified four symbol of effectors included yfjA (Hal61 00834), yueC (Hal61 03072), yueB (Hal61 03073), yukC (Hal61 03075), which all belong to Bacillus subtilis.
The pathogenic host interaction gene database pathogen host interactions (PHI) included diverse pathogenic genes related to different types of hosts, its crucial to find target genes for drug intervention [29]. Through gene annotation, strain Halobacillus trueperi S61 has a total of 4416 PHI-related genes, which most dominant pathogen species distributed in Burkholderia glumae that caused bacterial grain rot disease with function of DNA gyrase (bacterial topoisomerase II), followed by Flavobacterium psychrophilum (DNA gyrase), Cryptococcus neoformans (GTP Biosynthesis), Bacillus anthracis (Tellurite Resistance), Pectobacterium wasabiae (Posttranscriptional regulator) which caused bacterial cold-water disease, meningoencephalitis, anthrax and soft rot. Among which, 742 pathogenic factor genes come from Magnaporthe oryzae (related to Magnaporthe grisea), 367, 262, 216 and 208 pathogenic related with Fusarium graminearum (related to Gibberella zeae), Aspergillus fumigatus, Alternaria alternata and Candida albicans. Moreover, the virulence factors of pathogenic bacteria (VFDB) database annotated 15 factors in form of Listeria monocytogenes, Legionella pneumophila Philadelphia, Chlamydia trachomatis, Salmonella enterica, Escherichia coli, Bacillus anthracis, Bacillus anthracis, Mycobacterium tuberculosis. In addition, the prediction results of the secondary metabolism gene cluster showed ten gene cluster types, comprised nrps, terpene, nrps-transatpks-otherks, t3pks, lantipeptide, and sactipeptide-head_to_tail.
CARD was used to associate antibiotic modules and their targets, resistance mechanisms, and gene mutations [28]. There was predicted 11 efflux pump complex or subunit conferring antibiotic resistance included lmrB, ykkD, TaeA, sav1866, ykkC, lmrD, TriC, bmr, and blt. Four antibiotic inactivation enzymes included aadK, VgbC, rphB, BLA1, and mphI, as well as antibiotic target protection protein (mfd). Antibiotic resistant gene Enterococcus faecium cls conferring resistance to daptomycin, antibiotic resistant fabI, mecA, Bacillus subtilis mprF, Escherichia coli EF-Tu mutants conferring resistance to kirromycin, Staphylococcus aureus rpoB mutants conferring resistance to rifampicin, Mycobacterium tuberculosis intrinsic murA conferring resistance to Fosfomycin as well as determinant of resistance to nucleoside antibiotic (tmrB). Treves et al [34] evaluated the draft genome of Halobacillus sp. BBL2006 identified 4331 open reading frames and comprised of heavy metals and antibiotic resistance genes. Although the coding genes were annotated from different databases while the phenomenon reflected was consistent, mainly distributed in protein biological processes and antibiotic resistance which provides a potential resource for biotechnology.