Genomic features of strain NCD-2
A total of 501,671,500 paired-end reads and 5,016,715 clean single reads (412-bp library; paired-ends of 75 bp) were assembled using the software package Velvet [29]. The genome of B. subtilis NCD-2 contains 189 contigs (>133 bp; N90, 16,187) of 4,644,322 bp, with an average G+C content of 43.5%. The final assembled genome comprises 4,444 genes, including 4,329 protein-coding genes (418 signal peptide-coding genes), 83 tRNA genes for all 20 amino acids, 30 rRNA genes, and 2 clustered regularly interspaced short palindromic repeats (CRISPR) genes. A total of nine putative gene clusters responsible for antimicrobial metabolite biosynthesis were identified. These gene clusters included PKS and NRPS genes (Fig. 1).
The taxonomic status of strain NCD-2
At present, 272 B. subtilis genome sequences have been deposited in the GenBank database, including 113 whole- and 159 incomplete genome sequences. The genome sizes of the 272 B. subtilis strains range from 2.68 Mb to 5.35 Mb, and the GC contents range from 42.9% to 46.6%. These genome sequences were downloaded from the GenBank database, and their accession numbers were listed in Additional file 1, Table S1. To analyze the evolution of different B. subtilis strains, a phylogenetic tree was constructed based on the complete genome sequences. The 272 strains of B. subtilis were divided into four subspecies, subtilis, inaquosorum, spizizenii, and stercoris due to producing different bioactive secondary metabolites [30]. As shown in Fig. 2, strain NCD-2 (represented by the black bar) clustered with B. subtilis strain UD1022 and was closely related to B. subtilis strains XF-1, BAB-1, HJ5, SX01705, and BSD-2.
Secondary metabolite biosynthetic gene clusters in strain NCD-2
The secondary metabolite biosynthetic gene clusters in the genome of strain NCD-2 were predicted using antiSMASH [31]. In total, nine such clusters were identified (Table 1), including three NRPSs, two terpenes, one hybrid NRPS-TransAT PKS-Other KS, one type III polyketide, one sactipeptide-head to tail gene cluster, and a gene cluster with unknown function. The structural compositions of the gene clusters are shown in Fig. 3. These clusters were composed of core biosynthetic, additional biosynthetic, transport-related, regulatory, and other genes. Among them, clusters 3, 7, 8, and 9 had 100% amino acid sequence homologies with known gene clusters that synthesize bacillaene, bacillibactin, subtilosin, and bacilysin, respectively (Table 1). Gene cluster 1 showed 82% amino acid similarity with a surfactin synthetase gene cluster, and gene cluster 4 showed 93% amino acid similarity with a fengycin biosynthetic gene cluster in B. velezensis strain FZB42. However, gene clusters 2, 5, and 6 did not match any known gene clusters. Clusters 1 and 4 of strain NCD-2 were further compared with their counterparts in the model strain 168 and B. subtilis strains closely related to strain NCD-2 in the phylogenetic tree. The predicted fengycin biosynthetic gene cluster in strain NCD-2 contained three genes, fenEAB, while all the other strains contained five genes, fenCDEAB (Additional file 1, Fig. S1). SrfAB of surfactin was synthesized via typical transcription and translation of srfAB in the 11 strains. However, the same SrfAB was potentially assembled with Gms0366 and Gms0367 and then separately transcribed and translated by gms0366 and gms0367 in strain NCD-2 (Additional file 1, Fig. S2). Therefore, we hypothesized that the structures and functions of fengycin and surfactin from strain NCD-2 might be different from those in other B. subtilis strains.
Specificity of the surfactin and fengycin synthetase gene clusters in B. subtilis NCD-2
The surfactin biosynthetic gene cluster gms0365-0368 in strain NCD-2 was analyzed using PRISM, and the core genes were selected for PKS/NRPS analysis. Gms0365 had an identical conserved domain, CATCATCATe, with SrfAA in strain FZB42, in which C, A, T, and Te represent the condensation, adenylation, thiolation, and thioesterase domains, respectively (Fig. 4a). Compared with SrfAB in strain FZB42, Gms0366 in strain NCD-2 lacked the T and E domains, but the amino acid residues for the binding pockets in Gms0366 were exactly the same as those of SrfAB. The residues of the different adenylation domains A6 and A2 from the enzymes Gms0365 and Gms0366, respectively, were exactly the same, and both bound the amino acid leucine. Gms0367 only had T and E domains, and no specific substrate-binding domain. The superposition of the Gms0367 and Gms0366 domains formed a complete SrfAB. The T domain was reversed between Gms0367 and Gms0368. Gms0368 contained CATe domains, in which the thioesterase domain releases linear peptide chains. The domains of Gms0368 were the same as those of SrfAC, but the binding pocket (DAF-LGCV) had one missing residue compared with that of strain FZB42 (DAFXLGCV).
The fengycin biosynthetic cluster in strain FZB42 contained five genes fenCDEAB (Fig. 4b). However, the same cluster in strain NCD-2 contained only three genes: gms1961, gms1960, and gms1959 (Fig.4b). Gms1961 corresponded to FenE in strain FZB42 had conserved residues of A8 and A9, which bind amino acids Glu and Val, respectively (Fig. 4b). Gms1960 and Gms1959 had conserved amino acid sequences related to FenA and FenB in strain FZB42, respectively. Interestingly, no homologs of FenC and FenD were identified in the genome of strain NCD-2. Consequently, the amino acid sequences of FenC and FenD of strain FZB42 were compared with the strain NCD-2 proteome using BioEdit, and it was found that their most similar proteins were Gms1961 and Gms1960, respectively (Additional file 1, Tables. S2, S3). This finding led to the hypothesis that Gms1961 and Gms1960 performed the functions of FenC and FenD in strain NCD-2, respectively. Thus, Gms1961 and Gms1960 might both have dual functions in the synthesis of fengycin. Gms1961 in strain NCD-2 had the functions of FenE and FenC in strain FZB42, and Gms1960 had the functions of FenA and FenD. However, it should be pointed out that the FenD domain in strain NCD-2 varied greatly with that of FZB42, and we cannot rule out the possibility that other enzymes in NCD-2 might have similar functions as FenD.
PCR amplification using the primer set targeting the fenE and dacC genes produced the expected 1,032 bp fragment in strain NCD-2 but not in strain FZB42 due to the extremely large size of the target (16,555 bp ) (Fig. 5a-b). Sequencing of the 1,032 bp fragment and alignment with the gene locus gms1959-1962 confirmed the lack of fenC and fenD homologs in this cluster (Fig. 5c-d). Compared to wild-type NCD-2, the in-frame deletion mutant of gms1961 completely lost fengycin production (Fig. 6a-c).
We further compared the fengycin synthetase gene cluster of NCD-2 with other 11 corresponding clusters from B. subtilis strains closely related to strain NCD-2 (Additional file 1, Fig. S1). All the strains contained the fengycin biosynthetic gene cluster fenCDEAB (also ppsABCDE), except strain NCD-2, which contained fenEAB, suggesting that the fengycin biosynthetic gene cluster of strain NCD-2 is unique.
MS/MS of fengycin and surfactin in NCD-2
Fengycin was separated from the lipopeptide extract of strain NCD-2 using fast protein liquid chromatography (FPLC) (Additional file 1, Fig. S3). QTOF–MS/MS analysis revealed five fractions in the fengycin cluster (Fig. 7a–e), which had mass-to-charge ratio (m/z) values of 732.4, 746.4, 725.4, 739.4, and 767.4 (secondary MS), representing fengycin A, fengycin B, fengycin A2, fengycin B2, and fengycin C, respectively. The typical MS/MS spectra showed the distributions of key fragmentation ions (α and β), representing the linear N-terminal and the cyclic C-terminal segments, respectively, of diverse fengycin species (Fig. 7a–e and Additional file 1, Fig. S4a-b). The MS/MS spectrum of the fengycin ion at m/z 732.4 yielded two intense product ions at m/z 966.5 and 1,080.5, representing fengycin A (Fig. 5a), while the MS/MS spectrum of the fengycin ion at m/z 746.4 (Fig. 7b) yielded key product ions at m/z 994.5 and 1,108.6, representing fengycin B (Fig. 7b). The MS/MS spectrum of the fengycin ion at m/z 725.4 yielded two intense product ions at m/z 952.4 and 1,066.5, representing fengycin A2 (Fig. 7c), while the MS/MS spectrum of the fengycin ion at m/z 739.4 (Fig. 7d) yielded key product ions at m/z 980.5 and 1,094.5 representing fengycin B2 (Fig. 7d). The MS/MS spectrum of the fengycin ion at m/z 767.4 yielded two intense product ions at m/z 994.5/1,008.5 and 1,108.6/1,122.6 representing fengycin C (Fig. 7e). Five classes of fengycins were identified based on the key product ions of β-hydroxy fatty acid (β-OH FA) with chain lengths varying from C12 to C20 (Table 2, Figs. S5–S9). The MS/MS spectrum of the surfactin ion at m/z 1,008.7 yielded one intense product ion at m/z 685.5 (Fig. 7f and Additional file 1, Fig. S4c). Based on this key product ion, one class of compounds was identified: surfactins (m/z values of 994.6, 1,008.7, 1,022.7 and 1,036.7) with fatty acid chains varying from C11 to C15 (Fig. S10).
Detection of other antimicrobial active compounds in NCD-2
Besides of the fengycin and surfactin, other four antimicrobial compounds bacillaene, bacilysin, bacillibactin and subtilosin were also extracted from the fermentation broth of strain NCD-2 by using different extracting methods, respectively. However, only bacillaene and bacillibactin were detectable from the extracts by UHPLC-QTOF-MS (Fig. 8a, 8b).