Whole genome sequence (WGS) is the crucial tool used in the study the detailed genome characterization including toxin genes, the virulence factor, enzymes, protein structures etc. The present study, the WGS results revealed that 307,126,690 bp of total read length and more than 750 hypothetical proteins were identified. To investigate genetic variability among phages from the collection, closely related genome sequences of Escherichia phage PBECO 4, Escherichia phage vB Eco slurp01 and Escherichia phage121Q have been compared with our Escherichia Phage CMSTMSU. Map of our phage genomes and full annotation details have helped to understand better about our phage genome. Jurczak-Kurek et al. [12] in their findings was excited about all the analysed bacteriophage genomes dominated by uncharacterized coding sequences. In the present study too uncharacterized coding sequences have dominated more than the known protein sequences. Genomic comparisons of the well-studied phages to the high similar and previously sequenced phages from the NCBI database have shown the Escherichia Phage CMSTMSU 83% nucleotide sequence similarity with Escherichia phage PBECO 4 (GenBank No. NC 027364) whereas two other analysed phages Escherichia phage vB Eco slurp01 and Escherichia phage121Q have shown 83% sequence similarity with Escherichia Phage CMSTMSU (GenBank No. MH494197). Bhensdadia et al. [36] characterized the genome as well as protein structure of Escherichia phage ADB-2. Bacteriophage ØCJ19 which active against enterotoxigenic E. coli was chararacterized by genomic level and identified the total bases of 49,567 with 79 open reading frames (ORFs) [37]. Escherichia phage fBC‑Eco01,isolated from wastewater in Tunisia had the genome contains 78 predicted genes, with 38 encoding proteins [38].
Blast2go mapping is highly useful to study the gene ontology including biological process, molecular function and cellular component for the phages. Gene Ontology is a powerful tool to interpret the biological implications of selected groups of genes. In our present study revealed that, the functionally annotated sequences are nucleic acid phosphodiester bond hydrolysis 32 sequences, proteolysis 12 sequences, DNA biosynthetic process and phosphorylation 12 sequences. The molecular function annotation reveled that, ATP binding 45 sequences, DNA binding 30 sequences, hydrolase and peptidase activities ten sequences each and lysozyme activity 8 sequences respectively. Kim et al. [37] annotated the genome by blast analysis of the Enterotoxigenic Escherichia coli ØCJ19 and characterized the genes encoding lysogeny, toxins and virulence factors. Escherichia phage Eco_BIFF genome contains functional genes related to phage architecture and packaging machinery (head protein, portal protein, terminase protein), tail structure for host interaction (tail proteins, tail assembly protein, tail fiber proteins, tail tape measure protein), phage DNA synthesis (ATP-dependent helicase, DNA primase, methylases), and host lysis and degradation (endolysin) reported by Cooper et al. [39]. The cellular component annotation including integral component of membrane had 33 sequences, membrane of 6 sequences, cytoplasm 4 and viral capsid has 3 sequences respectively. Escherichia phage PGN829.1 was isolated from sewage of a tertiary care referral hospital in North India and GO terms molecular function (MF), cellular component (CC), and biological process (BP) for genes were mapped using the Blast2Go tool has a genome size of 74.4 kb and a GC content of 42.9% with the lytic activity [40]. Top hit species distribution summary of the Escherichia phage CMSTMSU annotated was highly similar to the phages like Escherichia phage Vb_Eco_slup01 followed by Escherichia phage PBECO4 and Escherichia phage121Q respectively due to the high similarity of the genome. The medium similarly found was, our Escherichia phage CMSTMSU similar to Enterobacteria phage KleM-RaK2, Cronobacter phage Vb_CsaM_gap32, Serratia phage BF, Yersinia phage fHe-Yen9-04 and Pectobacterium phage CBB etc. Kim et al. [41] isolated a novel bacteriophage infecting E. coli O157:H7 (ATCC 700927) from a sewage treatment facility in Gwachon, Korea, and designated it PBECO4 which high similar to Escherichia phage CMSTMSU. Also Escherichia phage121Q isolated from faecal samples collected from nests of juvenile Threskiornis spinicollis (straw necked ibis) in Australian wetlands was highly resemble to Escherichia phage CMSTMSU [42]. The enzyme code distribution annotation of hydrolases had 37 sequences, tranfereases 22 sequences and oxidoreductases had 12 sequences. Bacteriophages encode enzymes which are responsible for hydrolysis of cell wall peptidoglycan core [43] and these enzymes are necessary for the nucleic acid penetration. endolysin ORF28 of phage φ11 that has N-acetylmuramidase, endo-β-N-acetylglucosaminidase, and endopeptidase activities [44]. Bacteriophages are equipped with various virion-associated carbohydrate active enzymes, termed polysaccharide depolymerases and lysins, that recognize, bind, and degrade the polysaccharide compounds [45].
The KEGG pathway database displays several metabolisms including purine, pyrimidine, thiamine and enzymes involved in drug metabolisms etc. Among the different metabolisms several sequences involved in purine, pyrimidine and thiamine metabolisms. Viral-enriched genes were mapped by KEGG metabolic pathways and revealed that purine and pyrimidine metabolism pathways are among the most enriched pathways [46]. Auxiliary metabolic genes (AMGs) associated with sulfur metabolism especially sulfite reductase subunits A and C (dsrA and dsrC), thiouridine synthase subunit E (tusE, a homolog of dsrC), sulfane dehydrogenase subunits C and D (soxC, soxD), and fused sulfur carrier proteins Y and Z for thiosulfate oxidation (soxYZ) were characterized by KEGG metabolic pathways in sulphur metabolizing microbes infected with phages [47]. KEGG database pathways indicated the participation of six viruses in the biosynthesis of seven enzymes including glyoxylate and dicarboxylate metabolism, terpenoid backbone biosynthesis, protein export, purine metabolism, pyrimidine metabolism and mismatch repair in different bacteria infected with Caudoviricetes phages [48]. De Smet et al. [49] studied the phage-encoded auxiliary metabolic genes (AMGs) in Pseudomonas aeruginosa infected with lytic bacteriophages from six distinct phage genera and concluded that pyrimidine metabolism of phages encoding AMGs capable of host genome degradation.
Mauve software is useful to ordering the contigs of the sequences bacteriophages. Also Mauve progressive alignments are helpful to determine the conserved sequence segments among the genomes. It is proved for displaying segments of similarity between more distantly related genomes, as well as revealing potentially newly-acquired genes among more closely-related genomes [50]. Based on the Mauve alignment, our Escherichia phage CMSTMSU’s genome is highly resembles to the genome of the phages like Escherichia PBECO 4, Escherichia Phage 121Q and Escherichia phage vB Eco slurp01. Yazdi et al. [13] calculated the multiple genome alignments using Mauve software in phage VB_EcoS-Golestan infected with E. coli. Genomic comparison of Phi-191 phage which infected with Enteroaggregative Haemorrhagic E. coli (EAHEC) compared with other vtx2-phages. The whole genome sequence of the Phi-191 phage was identical to that of the vtx2-phage P13374 present in the EAHEC O104:H4 strain [51]. Whichard et al. [52] studied the complete genome of Felix O1 bacteriophage with using Mauve alignment and confirmed that, Felix O1’s sequence is highly similar to φEa21-4phage. Mauve alignment of the threes phages, phiLLS, vB_EcoS_FFH1, and bV_EcoS_AKFV33 showed that some regions are highly homologous, with no significant rearrangements observed, suggesting high level of nucleotide identity [53]. Mauve alignment also revealed multiple syntenic blocks especially large blocks of co-linearization and internal genome rearrangements [54]. N4-like lytic bacteriophage vB_Ppp_A38 (ϕA38) infecting Pectobacterium parmentieri showed homologous blocks shared among the analysed bacteriophage genomes indicate the corresponding position among the homologous blocks (LCB: locally collinear block) to visualise the gene arrangement. The present work, there were 28 LCB (Locally colinear blocks) observed in Escherichia Phage CMSTMSU by Mauve alignment.
The I- TASSER analysis for secondary structure of head completion, prohead assembly, tail sheath, major capsid and lysozyme proteins’ top most similarity results clearly revealed that all the five protein-based metabolites expressed in our isolated bacteriophage resembled mostly to I-TASSER PDB top hits. Escherichia coli phage ɸAPCEc03 has an isometric head, a flexible non-contractile tail, a characteristic distal tail spike, and three flexible bent fibres with distal globular structures [55]. Xu et al. [56] characterized the majar head protein, head morphogenesis protein, and phage structural protein of vB_EcoS-B2 infecting multidrug-resistant Escherichia coli. Lysozyme proteins are structurally closely related to few other proteins like membrane protein [57], structural protein, and signaling protein [58] and they are mostly expressed in bacteriophages by protein data bank. Major capsid was classified as viral proteins and is expressed in Escherichia phage T5 by protein data bank [59]. Major capsid protein had close relevance to the assembly of icosahedral viruses [57]. Pro-head assembly protein was identified as the important protein by PDB top hits involved in the structure of the bacteriophage [60]. Similarly, remaining proteins and their top hits by PDB revealed the importance of their function in phage genome. Bacteriophage T4 belongs to the myoviridae family has the most complex tail structures, crystal structures of gp18 protease-resistant fragment was identified [61]. The gp17 was identified as a gene encoding for a tail fiber protein (Gp17) derived from the 285p T7-like polyvalent bacteriophage belonging to the PYO97_8 phage cocktail [62]. The phylogenetic analysis of the above said the head completion protein was closely related to the other Escherichia phages, lysozyme protein was closely related to Cronobacter and Klebsiella phages, major capsid protein closely related to myoviridae, pseudomonas and vibrio phages, pro-head assembly protein closely related to Escherichia and salmonella phages and tail sheath protein closely related to other Escherichia phages. The phage vB_EcoM_APEC was able to infect E. coli APEC O78 tail fiber proteins of phage ØCJ20 and phage ØCJ19 [37].