Multi-drug resistance of pathogenic E. coli has emerged recently as a result of overuse of antibiotics, which is having a detrimental effect on food safety, human health, and the environment (Zhong et al., 2023). Researchers are currently looking for novel antibacterial treatments to address the growing problem of bacterial resistance. Phage treatment is one such remedy that is gaining popularity because of its bactericidal properties and bacterial host specificity. Isolating and genome-wide characterizing of lytic phages is crucial for the development of phage treatment against AMR bacterial infections. The genome architecture of lytic phages contains genes that encode multiple putative proteins with unclear functions. Therefore, genomic study can aid in both the taxonomic classification of phages and the identification of critical genes and suitable candidates for phage treatment (Imam et al., 2019).
In this study, seven diarrheagenic E. coli phages that isolated from rivers, dairy farms, and hospital fluid wastes infecting six strains of E. coli strains that are resistant to many drugs. The phages' entire genomes were analyzed and characterized. Based on genomic studies, linear dsDNA genomes ranging in length from 24264 to 143,710 bp and with a GC content varying from 44 to 54% were found in all phages. The phages that infect diarrheagenic in E. coli were categorized as small because their genomes were less than 200 kbp in size. These results were in line with the earlier research conducted by Montso et al. (2023), which demonstrated that other pathogenic E. coli O177 was infected by small phages.
These phages were categorized by BLASTn, Genom Detective, and PHASTER analysis into the families Myoviridae and Siphoviridae under the Caudovirales order's subfamilies Guernseyvirinae, Tevenvirinae, and Vequintavirinae. Furthermore, diarrheagenic E. coli phages were also assigned to five genera including Tequatrovirus T4 (EH-B-A (A1) and EH-SD-TH), Kagunavirus (EP-M-A and EP-B-K, E2), Vequintavirus (EI-SP-GF), Jilinvirus (ET-SD-TH), and Dhillonvirus (ST-T-K). It's interesting to note that the outcomes of the entire genome study indicated that all of the phages would be virulent, which qualified them for phage therapy.
In phage isolates EH-B-A, A1, EH-SD-TH, and ET-SD-TH only 20% of coding sequence regions (CDSs) code for putative and hypothetical proteins whereas in the case of EP-M-A, EP-B-K, E2, EI-SP-GF, and ST-T-K more than 90% of CDSs code for proteins. The lower percentage of CDSs (20%) coding for proteins in the EH-B-A, A1, EH-SD-TH, and ET-SD-TH isolates might be attributed to sequence low read numbers. This suggests that the sequencing depth or coverage for these isolates might be insufficient, resulting in a smaller fraction of identified CDSs. The higher percentage of CDSs coding for proteins in the EP-M-A, EP-B-K, E2, EI-SP-GF, and ST-T-K isolates (more than 90%) indicates a greater level of genome completeness. These isolates likely have higher quality and depth of sequencing, which enables the identification of a larger proportion of functional protein-coding genes.
The phages of E. coli contain a large number of distinct genes that encode potential and functional proteins. Many of the CDSs identified in the phage genomes, most particularly ET-SD-TH were hypothesized to be hypothetical proteins with enigmatic roles. Comparable findings have been documented for other different phages that infect pathogenic bacterial species (Kim et al., 2020; Korn et al., 2021). This suggests that multiple genes with unknown functions are present in phage genomes. Therefore, research efforts must be focused on clarifying the actual roles of these putative proteins.
The presence of genes coding for phage DNA replication/modification (DNA polymerase I, DNA helicase, putative DNA cytosine methyltransferase C5, putative HNH endonuclease, DNA methyltransferase, DNA recombination nuclease inhibitor gamma), DNA synthesis and packaging (terminase large subunit, putative terminase small subunit), structural proteins (capsid and tail proteins), and host lysis (phage lysin, u-spanin, putative holin-like class I protein, and putative holin-like class II protein) were among the intriguing findings. Two phage genomes (EP-B-K, E2, and EI-SP-GF) differed from other phages in that they have genes that encode tail fiber and baseplate tail spike proteins. The inclusion of tail fiber and tail spike proteins in a phage genome can improve its infection capabilities and host range because these proteins are essential for phage receptor recognition (Nobrega et al., 2018).
The tRNAscan-SE v. 2.0 analysis indicated that phage isolates EI-SP-GF had tRNAs in its genome, but the isolates EP-M-A, EP-B-K, E2, ET-SD-TH, and ST-T-K had no tRNAs. The absence of tRNA sequences in the phage genome implies that the phage is more reliant on the host cell's resources for translation and may have evolved to exploit the host's existing translational machinery. The presence of tRNAs in the phage genome suggests that the phage has adapted to replicate efficiently within the host cell by utilizing its translation machinery. In this study, the genome detective web-based and GeneMarkS-2 analysis showed that the genomes of all phage isolates do not contain sequences of genes encoding integrase, recombinase, repressors, or excisionase, which are the main markers of lysogenic viruses (Necel, et al., 2020). Therefore, the results indicated that these phages should be considered as strictly lytic (virulent) phages.
To obtain a more global phylogenetic overview of the relationships between the different E. coli phage isolates, whole genome-based alignment was employed for tree construction against each other. The major capsid protein gene database sequences of the same genus as well as subfamily particularly isolate EI-SP-GF were obtained for the determination of the evolutionary relationship of each phage isolate with available database sequences. Bootstrapping, a resampling statistical technique was used to assess the robustness of the inferred phylogenetic relationships. The resulting trees were compared to calculate the frequency at which a particular branch appears in the replicate trees. This frequency was expressed as a bootstrap value, which represents the statistical support for that branch. Higher bootstrap values (typically ranging from 70 to 100) indicate greater support for the branch (Wiens et al., 2008).
They are constructed based on similarities and differences in genetic sequences, typically using multiple sequence alignment and evolutionary models. The phages EP-M-A and EP-B-, E2 were clustered together having 100% support by bootstrap. In phylogenetic analysis, bootstrap support is a measure of the statistical confidence or robustness of a particular branch or grouping in the tree. It is often represented as a percentage and indicates how often a particular grouping appears in replicate analyses of the data. A bootstrap value of 100% suggests that in multiple iterations of the analysis, the sequences EP-M-A and EP-B-E2 consistently clustered together as a distinct group. This high bootstrap support indicates strong statistical confidence in the grouping or relationship between these two sequences. Therefore, based on the available information, it can be concluded that EP-M-A and EP-B-E2 are closely related and form a cluster in the phage phylogenetic tree, supported by a bootstrap value of 100% which is comparable with study by Al-Shayeb et al., (2020) who constructed phylogenetic tree using major capsid protein and many of the sequences from their phage genomes cluster together with high bootstrap support that defining clades.
The major capsid protein (MCP) gene was used for phylogenetic tree construction for each phage isolate to observe evolutionary relationships with existing database sequences. The MCP gene was the gold standard for the classification of the lytic phage family. An essential function of the major capsid protein (MCP) gene is to maintain the structure and function of bacteriophages, which are viruses that specifically infect and replicate within bacteria. The MCP gene encodes the major structural protein that forms the outer capsid of the phage, encompassing the viral genome and protecting it during infection. The MCP gene is highly conserved within a specific phage family, meaning that it displays a relatively low rate of mutation across different phage isolates within the same family. This conservation leads to utilize the MCP gene for phylogenetic tree construction, which helps to elucidate the evolutionary relationships between different phage isolates (Lee et al., 2022; Dion et al., 2020).
The MCP gene database search was specified to the genus Kagunavirus for isolates EP-M-A and EP-B-K, E2 whereas in the case of EI-SP-GF specified to the family Vequintirinae, Jilinvirus for ET-SD-TH and Dhillonvirus for ST-TK to retrieve sequence from database. Isolates EP-M-A and EP-B-K were classified within the genus Kagunavirus. This suggests that these isolates share significant similarities in their MCP gene sequences, indicating a close evolutionary relationship that is similar to the report by Grose and Casjens, (2014). The isolate EI-SP-GF was classified within the family Vequintirinae, a taxonomic family that encompasses a group of viruses showing similarities in their MCP gene sequences. The isolate ET-SD-TH was classified within the genus Jilinvirus, while the isolate ST-TK was classified within the genus Dhillonvirus. These genera represent distinct groups of viruses with shared MCP gene characteristics. All five phage isolates were closely linked to other phages, as demonstrated by both phylogenetic analyses based on the conserved MCP gene and whole-genome sequence alignment study. This implied that these phages had an intricate evolutionary relationship.
The NCBI Genome database only contains a small number of full phage genomes, even though bacteriophages are more widely distributed and varied in nature. Specifically, there are extremely few reports on the diarrheagenic E. coli lytic phage family in the E. coli phage database. In addition, more updates and enhancements are needed for the phage virology database. In addition, there is a paucity of knowledge on the phage host's genome. The genome and functional annotation of bacteriophages remain mostly unexplored. Developing bacteriophages into microbial agents requires a thorough grasp of the phages. Whole genome sequencing is the most effective technique for determining the genetic background of phages, the underlying mechanisms that underlie phage-host interaction at the gene level, and a theoretical framework for phage genetic alteration.