Phages isolation and morphology
Phages were isolated from Yangtze River according to their ability to lyse and generate clear plaques with or without hallow zones using several Shigella species as the host cells. The transition electron microscopy micrographs shows that the isolated phages for S. flexneri and S. dysenteriae have icosahedral head, contractile tail, collar, and base plate, the typical properties Myoviridae family of bacteriophages (Fig. 1-A to D). Moreover, the TEM micrograph of the isolated phage for S. sonnei shows that it has an icosahedral head and a non-contractile tail, a similar structure to that of the Siphoviridae phages (Fig. 1E and F). The head diameter, tail length and width of the phages are summarized in Table 1. The phages were designated as vB_SflM_004, vB_SdyM_006 and vB_SsoS_008 according to their host species and phage family.
Bacteriophages host ranges
The host range of the isolated phages were tested on a wide range of bacteria including Shigella isolates as well as standard strains of gram negative and gram positive bacteria. The vB_SdyM_006 phage was capable of producing clear plaque only on S. dysenteriae isolates (3/3 isolates), while vB_SflM_004 and vB_SsoS_008 produced either clear or cloudy plaques on most of the tested S. flexneri and S. sonnei isolates (Additional file 1, Table S1). A relatively wide range of EOP (0.12 ± 0.07 ~ 1) of the phages was observed against different isolates of Shigella spp. (Additional file 1, Table S2).
DNA fingerprinting
The DNA fingerprinting of the isolated phages were obtain using restriction endonucleases EcoRI, EcoRV and HindIII. The obtained restriction pattern revealed that the genome of the phages vB _SflM_004 and vB_SsoS_008 were digested only with EcoRV and the genome of phage vB_SdyM_006 was digested with EcoRV and HindIII. The observed differences in the DNA fingerprints in terms of size and pattern (Fig. 1) imply that the genome size and sequence of the isolated phages were different from each other.
Basic biological characteristics
The thermo- and pH stability of the phages were tested at a wide range of temperatures and pH values (Fig. 2). Titer of all the 3 phages were stable (> 90%) at -20 to 40 ºC, but it started to decrease when incubated at 50 ºC for 1 h. While by further increase in the temperature to 80 ºC, vB_SdyM_006 and vB_SsoS_008 could not be recovered, the vB _SflM_004 phage was still recovered at this temperature, but lost its activity when incubated at 90 ºC (Fig. 2- A). In the case of pH stability, the highest activity was observed at pHs ranging from 6 to 8. Incubation at basic pH of 12 (for all phages) and acidic pHs of 4 (for vB_SsoS_008) and 3 (for vB_SdyM_006 and vB _SflM_004) led to deactivation of the phages (Fig. 2-B). The one-step growth curves demonstrated that the phages vB _SflM_004, vB_SdyM_006 and vB_SsoS_008 were started to release from their host cells after 30, 50 and 15 min, respectively. Moreover, the burst sizes were estimated to be about 139 ± 29, 93 ± 15 and 94 ± 9 virions per single bacterium for vB _SflM_004, vB_SdyM_006 and vB_SsoS_008, respectively (Fig. 2-C). Additionally, as shown in Fig. 2-D the phages particles were absorbed immediately after incubation where vB _SflM_004, vB_SdyM_006 and vB_SsoS_008 phages were fully absorbed on their host cell after 12, 6 and 10 min, respectively.
Genome Analysis
The fold coverage, genome size, G + C contents and other general genome features of the phages vB_SflM_004, vB_SdyM_006 and vB_SsoS_008 is presented in Table 1.
Genome Analysis of vB_SflM_004
ATG was detected as the start codon in all of the ORFs. The Opal (TGA), Ochre (TAA) and Amber (TAG) stop codons were presented in 59, 49 and 20 ORFs, respectively. The BPROM search detected 16 promoters (Additional file 2) with consensus sequences at -10 (tttTAtaaT) and − 35 (TTcAca) (the capital letters indicated conserved nucleotides). Nine Rho-factor independent termination sites were also detected in the vB_SflM_004 genome with FindTerm online software (Fig. 3).
In general, the genomic organization and genetic analysis of the phage vB_SflM_004 demonstrated that the genome contained 135 possible open reading frames (ORFs) including 20 ORFs encoding structural proteins, 25 ORFs for metabolism-related proteins, 4 ORFs associated with bacterial lysis-like proteins, 83 ORFs encoding hypothetical proteins which showed relatively high similarity to the previously described phage hypothetical proteins with no clear understanding of their functions yet, and the 3 remaining ORFs which showed no similarity to any known proteins in the databases (Fig. 3). The list of 135 ORFs as well as their details and annotation is provided in the Additional file 3. The detected genes involved in the bacterial cell lysis were lysozyme (gp111), holin (gp79) and two spanin (o-spanin, gp28 and i-spanin, gp29) which are similar to the previously reported genes in the Ounavirinae Subfamily including Felixounavirus (gp28 and gp29), Mooglevirus (gp111) and Suspvirus (gp79). In addition, two pairs of rIIA/rIIB proteins were also detected at the semi-beginning (ORF32 and ORF33) and the end (ORF134 and ORF135) of vB_SflM_004 genome which could play role in regulation of bacterial lysis (Additional file 3). The gene products involved in the metabolism/regulation pathways of vB_SflM_004 were identified as different types of DNA polymerases, kinases, reductases, protease, nucleases, hydrolysis, and regulatory proteins with relatively high similarity to those of Ounavirinae Subfamily (check Additional file 3 for more detail). The structural and assembly genes were encoding the tail fiber proteins, tail sheath, tail protein, tail tube protein, minor tail protein, tail assembly protein, Major capsid, pro-head assembly scaffold protein and a head maturation protease. Some of these proteins were similar to those available in the GenBank database. For instance, tail tube, major capsid, and tail protein were almost identical (≥ 97%) to the respective predicted gene products of phages vB_EcoM_Alf5, SF19, Meda and SF13. On the other hand, the tail proteins and the major capsid protein represented a low identity (≤ 55%) to the previously reported phage proteins. The gene distribution pattern (Fig. 3 and Additional file 3) shows that about half of the gene products of ORF81 to ORF110 were identified as structural proteins. Same as other viruses, bacteriophages tend to have the genes with similar function close to each other in a compact arrangement [40]. Thus, it is possible that the remaining ORFs (which have been considered as hypothetical proteins) in this region of the genome may have structural function.
The highest similarity of the hypothetical proteins was to those of phage SF13 (13 out of 83 ORF) with a clear concordant relation in their gene products function and their respective identified conserved domains. However, in the case of conserved domains of DUF3277 and DUF3383 no clear relations were found (Additional file 3).
BLASTN analysis of the phage vB_SflM_004 genome revealed that the genome of the phage was highly similar (~ 94% similarity with > 75% query coverage) to Escherichia coli phage 11, phage 12, Enterobacteria phage WV8 and Salmonella phage BPS15Q2. As shown in (Fig. 4) the dotplot analysis of these bacteriophages using Gepard demonstrated a considerable sequence similarity between vB_SflM_004 and the other related phages with a few remarkable differences such as deletion of an approximately 10 kb region at position around 18 000. Comparison of the genome with other close phages using CoreGenes showed that 68% of the proteins were shared with Ounavirinae subfamily in which the entire lysis group proteins, some genes with structural or regulatory functions, as well as some of the hypothetical proteins were conserved (Fig. 4 and Additional file 4). Fig S2 depicts the relatedness of vB_SflM_004 and other phages with high homology using Easyfig software.
Genome Analysis of vB_SdyM_006
The genome of vB_SdyM_006 contains 252 ORFs (Additional file 3) and 9 tRNA coding regions (Table S3). The only identified start codon was ATG. Ocher, Amber and Opal stop codons were identify in 91, 50 and 111 ORFs, respectively. A BPROM search identified 31 promoters, with the consensus sequences of ATGTATAAT and TTTAAT at the −10 and −35 positions, respectively (the conserved bases were presented in bold) (Additional file 2). In addition, the only identified potential Rho-factor independent termination site were located after the gene encoding the inhibitor of the prohead protease (gp187) (Fig. 5).
With regard to the comprehensive genetic analysis of vB_SdyM_006 and the homology-based search of its 252 ORFs, the predicted ORFs could be clustered into five groups. Forty-five ORFs were predicted as structure proteins, nearly from ORF133 to ORF182 (Additional file 3). Tail completion and sheath stabilizer protein, head completion protein, baseplate wedge subunit, baseplate wedge tail fiber connector, baseplate wedge subunit and tail pin, short tail fibers, fibritin, neck protein, tail sheath stabilization protein, tail sheath protein, tail tube protein, portal vertex of head, prohead core protein, prohead core scaffold protein, major capsid protein, capsid vertex protein, Membrane protein, baseplate tail tube initiator, baseplate tail tube cap, baseplate hub subunit, and baseplate hub distal subunit all were detected in this region and have a high similarity rates to the respective predicted gene products of phages vB_PmiM_Pm5461, PM2, phiP4-3 and vB_MmoM_MP1 [41–44]. Moreover, a small group of five genes was detected close to end of the genome (from ORF233 to ORF237) encoding different parts of the tail structure including long tail fiber proximal subunit, long tail fiber proximal connector, long tail fiber distal connector, long tail fiber distal subunit and distal long tail fiber assembly catalyst which had 100% similarity to vB_PmiM_Pm5461 and phiP4-3 (Additional file 3) [42, 44].
Within the lysis functions, the ORF98 encodes an endolysin with peptidase activity (conserved domain pfam13539). The gp238 was identified as a Holin lysis mediator due its high similarity to the respective predicted gene product of phage vB_PmiM_Pm5461. The gp85 (lysis inhibition regulator) and gp200 (rIII lysis inhibition accessory protein) are predicted to have the regulatory roles in the lysis pathway. It is worth mentioning that ORF137 (baseplate hub + tail lysozyme) and ORF157 (head core scaffold protein + protease) encodes bifunctional proteins whose contain either C-terminal or N-terminal sequences and showed a relatively high similarity with those of the cell wall lysozymes.
Terminases, the proteins responsible for packaging the phage genome were detected almost in the middle of the genome (ORF150 and ORF151), showing a ≥ 94% similarity to the small and large subunits of phage vB_MmoM_MP1 terminase (Fig. 5 and additional file 3). Furthermore, the conserved domains of DNA_Packaging (pfam11053) and Terminase_6 (pfam03237) were identified in the small and large subunits of the terminase, respectively.
The predicted genes involved in the metabolic and regulatory functions were including several DNA-associated genes (DNA polymerase, helicase, primase, ligase, topoisomerase and endonuclease proteins), RNA-assosiated genes (RNA polymerase, tRNA synthetase, ligase, endonuclease and RNaseH proteins), different types of exonuclease, recombinase, anti-sigma factors, sigma factors, anaerobic NTP reductase, thioredoxin, kinase, host translation inhibitors, and several other genes. Most of the predicted proteins showed a high identity (≥ 90%) with the counterpart proteins of Tevenvirinae Subfamily of phages while some others had no similarity (gp71, gp74, gp100, gp129, gp191 and gp212) (additional file 3).
Based on the BLASTN analysis, the genome sequence of vB_SdyM_006 had 98% (97% query coverage) and 97% (73% query coverage) similarity to the genome sequences of Proteus phages, phiP4-3 and vB_PmiM_Pm5461, respectively. Moreover, the sequences alignment of these three phages using Gepard software showed a higher similarity between vB_SdyM_006 and phiP4-3 than vB_SdyM_006 and vB_PmiM_Pm5461 (Fig. 4). Furthermore, the relatedness of vB_SdyM_006 and other phages with high degree of homology was determined using Easyfig software (Fig S3)
The CoreGene analysis showed that vB_SdyM_006 shared ~ 84% similarity with that of the encoded proteins of the mentioned phages above (score > 70), including 111 hypothetical proteins and 101 known proteins with different functions. These protein coding genes were spread out all along the genome and were not restricted to any particular region (Fig. 4 and additional file 4).
Genome analysis of vB_SsoS_008
The genome of vB_SsoS_008 contained 83 putative ORFs, of which the function of 33 ORFs was predicted (additional file 3), and the other 50 ORFs were assigned as hypothetical proteins in which 47 ORFs had similarities with the hypothetical proteins of bacteriophages vB_EcoS_SH2, Sfin-1, T1, SH6 and phi2457T while the other 3 ORFs were evidently unique to UAB_Phi87 and showed no similarity with the already deposited sequences. Twelve sequences with conserved consensus sequences of gTtTAatAT (−10) and TTgCaA (−35) were identified as promoter and were distributed throughout the phage genome (the conserved bases were presented in capital letter) (additional file 2). All of the ORFs started with an ATG codon, with Opal (36 ORFs), Ochre (30 ORFs) and Amber (17 ORFs) stop codons. Only one Rho-independent terminator was identified by FindTerm (Fig. 6). The genome of vB_SsoS_008 contained no tRNA or pseudo-tRNA genes.
The vB_SsoS_008 ORFs was encoding known protein that can be classified into 5 functional groups. The structural group contained 21 proteins including portal protein (gp26), capsid proteins (gp28-31, tail proteins (gp41, 43–55, 61 and 62). All of the structural proteins showed a relatively high to high similarity (85–100%) with the respective predicted gene products of phages B_EcoS_SH2, Sfin-1, T1, SH6 and phi2457T, except gp46 in which only 58% similarity was observed to the tail fibers protein of Shigella phage Sfin-1. Detection of pfam05939 conserved domain (Phage_min_tail) in this gene approved the function of gp46 as the tail fibers. The second group includes 8 proteins predicted to be associated with nucleotide metabolism and its regulation. The product of these genes facilitate genome replication, transcription and translation. These proteins are DNA methylase (gp3), kinase (gp17), nuclease (gp58), recombination protein (gp59), DNA primase (gp63) and primase (gp64), helicase (gp66) and methyltransferase (gp68) which showed ≥ 80% similarity to the counterpart proteins of phages Sfin-1, T1 and phi2457T (additional file 3). The third group includes the necessary protein involved in the bacterial cell lysis process. The two genes, 76 and 77, are predicted to encode endolysin and spanin, respectively, and had 90% (query coverage of 65%) and 84% (query coverage of 79%) identity with their counterpart proteins of Shigella phage Sfin-1and Shigella phage SH6, respectively. Interestingly, holin gene was found neither close nor far from the lysine gene. The DNA packaging complex consisted of large (gp25) and small (gp24) subunits of terminase was categorized as the fourth group. The large subunit of this complex had a high identity (96%, query coverage of 100) while the small subunit had only 73% similarity (query coverage of 90%) with the counterpart proteins of the related phages (additional file 3).
BLASTN analysis of the phage vB_SsoS_008 genome showed approximately 91.2% (query coverage 90%), 91.7% (query coverage 84) and 90.5% (query coverage 96%) similarity with Shigella phage SH6, Enterobacteria phage T1, and Shigella phage Sfin-1, respectively. The CoreGenes analysis (score > 70) revealed that vB_SsoS_008, Shigella phage SH6, Enterobacteria phage T1, and Shigella phage phi2457T had ~ 60% proteins in common including the structural, DNA packaging, metabolic, endolysin and hypothetical proteins (Fig. 4 and additional file 4). In addition, the alignment of nucleotide sequences using Gepard software showed a high similarity between vB_SsoS_008 with Shigella phage SH6, Enterobacteria phage T1, and Shigella phage phi2457T (Fig. 4). Furthermore, the relatedness of the vB_SsoS_008 and other phages with high homology was determined using the Easyfig (Fig S4).
Phylogenetic analysis
The phylogenetic relationship between the isolated phages and other similar phages available in online databases was studied using construction of the phylogenic tree based on major capsid sequences that were identified in all of these phages (Fig. 7). Both vB _SflM_004 and vB_SdyM_006 were clustered as a member of the Myoviridae family. However, their major capsid sequences were different enough to classify them into lower taxa levels in which vB _SflM_004 was clustered into Felixounavirus genus of Ounavirinae and vB_SdyM_006 was only classifiable to a subfamily level and as a member of the Tevenvirinae. The constructed phylogenic tree based on the major capsid sequences suggests that vB_SdyM_006 along with phiP4-3, PM2 and vB_PmiM_Pm5461 phages could be considered as a new genus in Tevenvirinae subfamily due to the considerable phylogenic distance with other related members such as those of Tequatrovirus genus. Moreover, the phylogenic analysis indicated that phage vB_SsoS_008 should be added to Tunavirus genus, Tunavirinae subfamily of Siphoviridae family.