Whole genome sequence analysis and characterization of lytic bacteriophages against antimicrobial-resistant diarrheagenic Escherichia coli strains isolated from various sources in Addis Ababa, Ethiopia

doi:10.21203/rs.3.rs-3972238/v1

Download PDF

Research Article

Whole genome sequence analysis and characterization of lytic bacteriophages against antimicrobial-resistant diarrheagenic Escherichia coli strains isolated from various sources in Addis Ababa, Ethiopia

https://doi.org/10.21203/rs.3.rs-3972238/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The emergence of antibiotic resistance in E. coli strains has sparked a fervent investigation of alternative therapies such as the use of lytic bacteriophages. Phage whole genome sequencing is a novel method for learning more about proteins and other biomolecules encoded by phages, particularly phage lytic enzymes that are crucial to the lysis of bacterial cells. Seven potential lytic E. coli phages; EH-B-A (A1), EP-M-A, EP-B-K (E2), EI-SP-GF, ET-SD-TH, and ST-TK isolated from activated dairy farm sludges, Rivers, and hospital liquid waste were described. For sequencing, an Illumina NextSeq 550 sequencer was used. The virus nucleotide collection (nr/nt) (taxid:10239) was used to evaluate the whole genome sequences. Phylogenetic analysis was done using MEGA11 software. Genome sequencing revealed that each bacteriophage contains a linear double-stranded DNA genome. Phage isolates were taxonomically identified as 4 (57%) Myoviridae and 3 (43%) Siphoviridae phages. Phage genome length varied from 24264 to 143,710 bp, and their GC contents ranged from 43 to 54%. 33–218 CDSs (coding sequences) in total were predicted, with 19–77% of CDSs encoding functional proteins. All phages lacked tRNA in their genomes, except for EI-SP-GF, which possessed five tRNAs. Based on phylogenetic tree analysis, the phage isolates were related to Enterobacteria and E. coli phage sequences in the database. Screening did not show any genes encoding for a CRISPR-like system, virulence, antibiotic resistance, or lysogeny. Because of their stringent lytic nature, these phage isolates may be applied in the future to treat E. coli infections. This study may provide some primary data for the development of phage control techniques and advance our understanding of the genetic composition of E. coli phages.

Virology

Bacteriophage

Escherichia coli

Phage genome assembly

Myoviridae

Phage therapy

Siphoviridae

Whole genome sequencing

Escherichia coli (E coli) includes both commensal and pathogenic strains, mostly harbouring both human and animal intestines, and it is possibly the most widely studied bacterial species. It can also be present in food, vegetation, sewerage, and the environment. While the majority of E. coli strains are safe and contribute to the normal gut flora, some strains are harmful and can cause serious illness or food poisoning. E. coli is the most prevalent causal agent of urinary tract infections (UTIs) worldwide, accounting for 75 to 95% of UTIs (Gupta, 2002). Antibiotic resistance in E. coli isolates is also becoming more common (Ramirez-Castillo et al., 2018), and antibiotic resistance has even been linked to the formation of new strains of resistant bacteria (Ramos et al., 2020). It is imperative to take into account alternate treatments due to the rising prevalence of E. coli which is resistant to many drugs (Tuem et al., 2018). Bacteriophages are regarded as a viable substitute for therapeutic applications (Biswas et al., 2002).

Frederick Twort and Felix d'Herelle separately identified bacteriophages in cell cultures of Staphylococcus aureus and Shigella, respectively, in 1915 and 1917 (Taylor et al., 2014). D'Herelle employed phages for the first time in medicine in 1919. Despite encouraging outcomes, the discovery of antibiotics in 1940 led to a decline in researchers' interest in using bacteriophages to treat human diseases because of conflicting public views (Negash and Ejo, 2016). Research on the use of bacteriophages has increased in the modern era. Biochemical agents, such as antibiotics and sanitizers, were the primary means of controlling bacterial human pathogenesis and preventing bacterial reproduction in many industries. However, bacterial multi-drug resistance was the outcome of the imprudent application of these chemical agents. In this fashion, bacteriophage technology and phage therapy arose (Muñoz and Koskella, 2014).

Only particular prokaryote species, or even strains within the same species, are susceptible to phage infection. Bacteriophages can attach themselves to the surface of microorganisms by the use of proteins and other supporting structures. They can then inject their genetic material into the host and exhibit various infection cycles, such as the lytic, lysogenic, pseudo-lysogenic, or chronic cycles. Phages exhibiting a lytic cycle of infection may be employed as preferred biocontrol agents. In this cycle, bacteriophages cause cellular lysis by injecting their genetic material into the cell to create new virus particles (Muñoz and Koskella, 2014)

The International Committee on Taxonomy of Viruses (ICTV) identifies the existence of ten families of bacteriophages based on conserved genomic synteny, homology in amino acid sequences of phage genetic material encoded proteins, and capsid morphology (Tolstoy et al., 2018). The bacteriophage genome exists as one of the four possible forms of nucleic acids (ssDNA, dsDNA, ssRNA, and dsRNA) (Nguyen et al., 2023; Hungaro et al., 2014). Phages differ greatly in terms of their genetic makeup: their genome lengths range from 3405 bp to 497513 bp, their gene densities from 0.29 to 1.36, and the number of encoded proteins they contain from 1 to 675.

About 31.6% of the genome is made up of viruses unique to the bacterium and archaea domains, known as bacteriophage genomes. The barrier to phage functional genomic research is bacteriophage DNA sequencing, which remains challenging despite the introduction of new sequencing tools (Lopez et al., 2016). Pure phage genomic material, PCR amplification, and the genetic material's complexity owing to intrinsic features like methylation bases and repetition zones which are inherently challenging to sequence and organize are the primary challenges (Klumpp et al., 2013).

From a technological perspective, bacteriophage sequencing is crucial for any functional genomics investigation, as well as for regulatory bodies like the Food and Drug Administration (FDA) to approve and release goods produced from bacteriophages. Genetic research is required as biocontrol techniques in the food sector and the medical area after it is established that certain viruses can increase the pathogenicity of bacteria (Cristobal-Cueto et al., 2021). Thus, a small number of companies currently sell bacteriophage-based products, such as those that regulate foodborne and other infections. Certain products, including List Shield TM and Salm Fres TM, which are used to control Listeria monocytogenes and Salmonella on food processing surfaces, respectively, are deemed safe for consumers and have FDA approval (Eller et al., 2014).

One of the most comprehensive methods for studying phage-encoded proteins is genome sequencing; nevertheless, genetic material only displays potentially expected proteins; it does not demonstrate how each of these proteins is expressed during host infection (Lal et al., 2016; Klumpp et al., 2012). To fully comprehend phage-bacteria interactions, additional omics techniques such as transcriptomics, proteomics, and metabolomics could be employed in conjunction with phage genome sequencing (Pujato et al., 2017; Gontijo et al., 2017; Islam et al., 2023).

Achieving a high enough coverage of phages of clinically relevant diseases to address future demand for therapeutic phages requires ongoing phage isolation and characterization due to the growing interest in phage therapy and the normally fairly narrow host range of phages. Thus, the study of phages and their genomes is intrinsically valuable to further our understanding of ecology, evolution, molecular biology, pathogenicity caused by bacteria, biotechnology, and health. Developing an understanding of phage genomes will undoubtedly open up possibilities for converting phages and unique phage proteins into effective biotechnological and medicinal instruments (Zampara et al., 2021; Dedrick et al., 2019). The genomes of diarrheagenic E. coli strain phages isolated from rivers, dairy farm sewage, and hospital fluid wastes were sequenced in this work. The six diarrheagenic E. coli strains used for isolation include enteropathogenic E. coli (EPEC), enterohaemorrhagic E. coli (EHEC/O157), enteroinvasive E. coli (EIEC), enteroaggregative E. coli (EAEC), enterotoxigenic E. coli (ETEC), and Shiga toxin-generating E. coli (Non-O157 STEC). Some phage-critical genes were identified from sequence data through comparative comparison with other sequenced E. coli genomes. This study provides the researchers with the genomic data on a lytic E. coli phage, which they can use for sequence comparison and evolutionary relationship analysis. The scientific community can also utilize phage genome sequencing data to screen for and identify new phage-based antibacterial treatments.

2.1. Phage isolates

The lytic E. coli bacteriophages were isolated from various sources such as rivers, dairy farm sewages, and hospital fluid wastes in Addis Ababa, Ethiopia. These seven phages were lytic for five diahrrgenic E. coli strains (Table 1). Bacteriophage isolates were characterized morphologically, biologically, and molecularly using PCR and agarose gel electrophoresis (https://doi.org/10.21203/rs.3.rs-3653371/v2).

Table 1

Bacteriophage isolates used for whole genome sequence and analysis
No	Phage isolates	Isolation source	E. coli host strain
1	EH-B-A (A1)	River	Enterohaemorrhagic E. coli (EHEC/O157)
2	EH-SD-TH	Sediment of hospital waste	Enterohaemorrhagic E. coli (EHEC/O157)
3	EP-M-A	River	Enteropathogenic E. coli (EPEC)
4	EP-B-K (E2)	River	Enteropathogenic E. coli (EPEC)
5	EI-SP-GF	Superficial of dairy farm	Enteroinvasive E. coli (EIEC)
6	ET-SD-TH	Sediment of hospital waste	Enterotoxigenic E. coli (ETEC)
7	ST-T-K	River	Shiga toxin-generating E. coli (Non-O157 STEC)

2.2. DNA extraction and quality analysis

To extract DNA from bacteriophages, the phenol-chloroform organic DNA extraction method was used, followed by an ethanol precipitation step. Using a Nanodrop spectrophotometer, the extracted DNA's quantity and purity have been determined. By performing a comparative DNA extraction using DNase, RNase, SDS, and Proteinase K treatment at various levels, the quality concerning host DNA contamination was evaluated. The quality and quantity comparative analysis was validated by agarose gel electrophoresis. Sequencing was done using the generated, evaluated, and quantified DNA extract.

2.3. Sequencing and assembly

2.3.1. Phage genome sequencing

Phage lysate DNA samples were sent to Armauer Hansen Research Institute (AHRI), Addis Ababa, Ethiopia for whole phage genome sequencing using an Illumina NextSeq 500/550 sequencer with 2x150 bp read length. Ten (10µl) of the DNA was used for each sample for the whole genome library preparation with a modified Illumina COVIDseq RUO kit. In short, the DNA was enzymatically fragmented and tagmented simultaneously. The tagmented DNA was purified and amplified with a limited PCR cycle for the addition of indexes and amplification. The concentration of the libraries was measured using a Qubit HS assay kit. The concentrations are listed in Table 2. The libraries were finally loaded on the sequencer targeting 80x depth coverage. The resulting paired-end reads were obtained in FASTQ format.

Table 2

The concentration of phage DNA library prepared for whole genome sequencing
No	Sample id	Concentration(ng/µl)
1	S41-EH-B-A (A1)	0.106
2	S42-EH-SD-TH	0.152
3	S43-EP-M-A	1.95
4	S44-EP-B-K (E2)	3.20
5	S45-EI-SP-GF	2.9
6	S46-ET-SD-TH	0.152
7	S47-ST-T-K	0.206

2.3.2. Sequence assembly and consensus generation

In this study, a comprehensive analysis workflow was used for generating a high-quality consensus phage genome from Illumina sequencing data. The workflow involved several crucial steps designed to ensure accurate assembly and error detection/correction. Widely adopted bioinformatics tools such as FastQC, BBMap package, SPAdes, BWA, Sam tools, and Pilon were utilized to perform the necessary data processing and analysis. The quality of raw reads was checked initially using FastQC and the bbduk.sh script from the BBMap package was employed to perform adapter trimming and quality filtering, this was followed by genome assembly using SPAdes. From the resulting assembly, the longest contig was selected as the representative genome sequence. To prepare the longest contig for read alignment, it was indexed using the Burrows-Wheeler Aligner (BWA). Paired-end reads were then aligned to the longest contig using BWA's mem algorithm, and the resulting alignment was stored in a Sequence Alignment/Map (SAM) file. To facilitate downstream analysis, the SAM file was sorted by genomic coordinates using Sam tools, generating a sorted Binary Alignment/Map (BAM) file. Subsequently, an index file (.bai) was created for the sorted BAM file using Sam tools. The sorted and indexed BAM file was then subjected to error detection and correction using Pilon which utilizes the alignment information to identify and correct errors, including SNPs, indels, and gaps, resulting in an improved consensus phage genome. The corrected genome assembly was stored as FASTA file and used for the downstream analysis (Shen and Millard, 2021; Petrillo et al., 2022). The variant calling and a consensus sequence was generated using ivar tools.

2.4. Phage genome sequence analysis

The entire genome sequences were queried against the viruses (taxid:10239) nucleotide collection (nr/nt) using NCBI blastn and default settings (http://www.ncbi.nlm.nih.gov/genome). In addition, Genome Detective and PHASTER web-based software were used to determine genome size and to classify phage genomes using sequence-derived taxonomic features whether phages belong to the lytic phage families of Myoviridae, Podoviridae, or Siphoviridae (Arndt et al., 2016; Vilsker et al., 2019). RAST (http://rast.nmpdr.org/) online annotation server was used to annotate the whole genome of phage, and Genome Detective web-based software was used for the identification of CDSs and initial annotation of the phage genomes including identification of the phage terminase large subunit, major capsid proteins and phage lytic enzymes. The GeneMarkS and GC content calculator were used to determine the G + C content of the phages. The number of tRNAs was predicted using web-based software called tRNAscan-SE 2.0 (http://trna.ucsc.edu/tRNAscan-SE/). Using the Virulence Factor Database (VFDB), CRISPR Finder, and ResFinder; the virulence determinants, CRISPR-like system, lysogeny, and the genes implicated in antibiotic resistance were identified (Liu et al., 2019; Bortolaia et al., 2020; Couvin et al., 2018). The genome map of five phage isolates was created by using Proksee genome analysis software (Grant et al., 2023). To ascertain the diversity of phage genomes and the evolutionary connections between phages, multiple sequence alignment was carried out using ClustalW, and a phylogenetic tree was built using the neighbor-joining and MEGA11 software methods. Major capsid protein and the conserved gene were utilized as phylogenetic phylo-markers for the variety and evolutionary relationship of each phage isolate. Reference sequences used in the analysis were obtained from the GenBank database. Phylogenetic trees were supported statistically by bootstrapping with 1000 replicates. Homologs were identified with the NCBI GENE database using the nucleotides as queries. The accession numbers of the viruses used in the alignments and phylogenetic analyses are listed on the trees.

The raw sequence data was submitted to the NCBI database as a sequence read archive (SRA) with PRJNA1006193 bio-project identifier and SAMN37015700 to SAMN37015706 bio-sample identifiers. The quality control assayed and assembled five phage genome sequences were submitted to GenBank with accession numbers from SRR25691062 to SRR25691068.

3.1. Whole Genome Sequence Analysis

To examine the genomic structures and the potential diversity in the genomes, the 7 potent newly isolated bacteriophage genome sequences were determined and genomic comparison was conducted. Good quality of raw sequence data for all phages was ensured according to FastQC parameters. Therefore, sequences of phage isolate EH-B-A, A1, and EH-SD-TH were not further analyzed due to low-quality FastQC results for consensus sequence generation. As a result, five sequences were further analyzed and submitted to GenBank with accession numbers from OR992643_-_OR992647. Genome sequencing of the bacteriophages showed that all the bacteriophages have a linear double-stranded DNA genome. Phage genomes ranged in size from 24264 to 143,710 bp, with a GC content between 44 to 54% (Tables 3 and 4). According to PHASTER analysis, the completeness of the genome indicated that all phage isolates were intact except for EP-M-A which had a questionable genome. Aligned in the NCBI database, BLASTn analysis showed that the phage isolate EP-M-A had the highest genome similarity with Escherichia phage ZCEC5, (GenBank: NC-073321.1), with a nucleotide similarity of 91.81% and a genome coverage of 76%. EP-B-K, E2, EI-SP-GF, ET-SD-TH and ST-T-K had the highest genome similarity with Escherichia phage K1G, (GenBank: NC-027993.1), Escherichia phage slur16 (GenBank: NC-028248.1), Escherichia phage vB_EcoM_ECO1230-10 (GenBank: NC-027995.1) and Escherichia phage vB_EcoS-101114BS4 (GenBank: NC-073061.1) respectively.

Taxonomic classification of the 7 isolated potent coliphages was performed using multiple WGS genome comparisons (http://www.ncbi.nlm.nih.gov/genome/viruses/) and phage DB. These coliphages included 4 (57%) of Myoviridae E. coli phages and 3 (43%) of Siphoviridae E. coli phages. According to ICTV guidelines, phage family, subfamily, and genus were predicted based on genome similarity. The results are shown in Table 2. EH-B-A (A1), EH-SD-TH, EI-SP-GF, and ET-SD-TH phages belong to the Myoviridae family. These phages were thought to belong to the Tevenvirinae subfamily, which includes the T4 virus, also known as Tequatrovirus, except for phage ET-SD-TH, which probably belongs to the Jilinvirus family. EP-M-A, EP-B-K (E2), and ST-T-K phages belonged to the Siphoviridae family with subfamily Guernseyvirinae with genus Kagunavirus; Except for ST-T-K which was in the genus Dhillonvirus.

Table 3

The genome characteristics of phage DNA sequences
Sample	Specimen	No raw reads	No trimmed reads	Genome size (bp)	Family	Subfamily	Genus
S41	EH-B-A (A1)	1095	398	131660	Myoviridae	Tevenvirinae	Tequatrovirus T4
S42	EH-SD-TH	4329	1880	143710	Myoviridae	Tevenvirinae	Tequatrovirus T4
S43	EP-M-A	534845	276179	24264	Siphoviridae	Guernseyvirinae	Kagunavirus
S44	EP-B-K (E2)	803828	377929	43553	Siphoviridae	Guernseyvirinae	Kagunavirus
S45	EI-SP-GF	624628	325380	136204	Myoviridae	Vequintavirinae	Vequintavirus
S46	ET-SD-TH	10079	5024	33119	Myoviridae	-	Jilinvirus
S47	ST-T-K	53297	22236	28657	Siphoviridae	-	Dhillonvirus

As indicated in Table 4, 33–218 putative CDSs were discovered for the E. coli phages using both automatic and manual annotation. All the coliphages genome sequences had CDSs that encoded the phage terminase small subunit, DNA polymerase, phage terminase big subunit, the phage lysis enzyme, and the phage capsid and tail proteins. In addition, tiny terminase components, tail fiber, baseplate spike proteins, and phage DNA polymerase were found in the majority of the phage genomes. One to three CDSs for the tail and capsid proteins of each phage were discovered. None of the phage genomes contained any known acquired resistance or virulence genes. The updated tRNAscan-SE-based predictions of tRNAs indicated that EI-SP-GF had 5 tRNAs. The remaining phage isolates have no tRNA.

Table 4

Genome detective and RAST wed-based annotations of phage genome
Phages	No CDS	Functional proteins	No tRNA	GC content (%)	NT (%) Identity	AA (%) Identity	Alignment score	Reference phage
EP-M-A	33	23	No	52	91.8	96.1	119175	Escherichia phage ZCEC5 (taxon:2530021)
EP-B-K (E2)	77	50	No	51	91.35	96.23	108109	Escherichia phage K1G (taxon:698486)
EI-SP-GF	218	169	5	44	94.3	97.2	194829	Escherichia phage slur16 (taxon:1720495)
ET-SD-TH	41	8	No	53	88.1	88.4	76790	Escherichia phage vB_EcoM-ep3 (taxon:1541883)
ST-T-K	37	27	No	54	94.4	96.9	153955	Escherichia phage vB_EcoS-101114BS4: (taxon:2865793)

For the EP-M-A phage isolate, 17 (51.5%) of the 33 predicted CDSs were discovered to be present in the direct strand, whereas the remaining CDSs were detected in the complementary strand (Fig. 1). Ten CDSs (30.3%) were projected to encode putative proteins, while twenty-three CDSs (69.7%) were expected to encode functional proteins. The fundamental phage-related functions of DNA replication/modification, packing protein, a structural protein, metabolism/regulation, and host lysis were found attributed to the CDSs. This phage's genome was devoid of genes encoding for CRISPR/CRISPR-like system, virulence factors, toxins, antibiotic resistance, or hallmarks of temperate phages.

There are 77 putative coding sequences (CDSs) encoded in the whole genome of EP-B-K phage isolates. Of them, there were 27 (35.1%) CDSs with unknown (speculative) functions and 50 (64.9%) CDSs having identified functional proteins. It was discovered that there were 27 (28.6%) in the lagging strand and 55 (71.4%) in the leading strand (Fig. 2). The identified functional proteins include those that are necessary for phage replication, packaging, metabolism, and host lysis enzyme.

The entire circular genomic map of the phage EI-SP-GF is demonstrated in Fig. 3. In the entire genome, 218 CDSs were predicted by the genome annotation study, of which 169 (77.5%) encode functional proteins and 49 (22.5%) hypothetical proteins. Of the CDSs, 126 (57.8%) were discovered in the positive strand and 92 (42.2%) in the negative strand. The genome of this phage was projected to have five tRNAs: tRNA-Arg-TCT, tRNA-Tyr-GTA, tRNA-Thr-TG, tRNA-Met-CAT, and tRNA-Pro-TGG.

According to the ET-SD-TH phage isolate's whole genome map, 41 CDSs in all were predicted by utilizing the RAST and PHASTER software programs. Of them, 33 (80.5%) CDSs were encoded for putative proteins, and only 8 (19.5%) CDSs were shown to be functional proteins. Among these 41 putative genes, 10 were located in the forward strand and 31 in the complementary strand (Fig. 4). Fortunately, no genes encoding suspected poisons or resistance to antibiotics were found in the phage genome.

The entire genome analysis of the ST-T-K phage isolate yielded 37 CDSs, of which 27 (73%) encoded functional proteins and 10 (27%) putative proteins. Of these, the forward strand had 21 CDSs, while the reverse strand contained the remaining CDSs (Fig. 5). Using the tRNAScan software, no tRNA gene was discovered. The general genome properties of coliphage isolate ST-T-K were gathered in Tables 3 and 4.

3.2. Phylogenetic Analysis

In phylogenetic trees of the relation among phage isolates EP-M-A clearly clustered together and in the same clade with phage EP-B-K, E2 (Fig. 6) with 100% bootstrap values as both phages were in the same family of Siphoviridae and genus Kagunavirus. These two phages isolates again showed a 100% evolutionarily relationship with phage isolate ET-SD-TH. Phages EI-SP-GF and ST-T-K were outgrouped from the relationship, but within cluster having distant evolutionary relation.

In order to analyze the evolutionary relationship between phage isolates and other Caudovirale phages, a phylogenetic tree was constructed based on the nucleotide sequences of the relatively conserved phage major capsid protein (MCP) phylomarker gene using the Neighbor-joining (NJ) method. Constructing a major capsid protein-based phylogenetic tree of phages involves analyzing the genetic sequences of the major capsid protein gene from the database of different strains.

In the case of phage isolates EP-M-A and EP-B-K, E2 major capsid protein (MCP) gene search was limited to the genus Kagunavirus. The MCP gene sequence search for phage isolates EI-SP-GF, ET-SD-TH, and ST-T-K was limited to Vequintavirinae, Jilinvirus, and Dhillonvirus respectively. Therefore, NCBI MCP gene sequences were retrieved from the database, the MCP gene was cut out from each phage isolate mapping with reference sequences, and multiple sequences were performed in MEGA11 software for tree construction. There were 10 NCBI search hit sequences obtained related to Kagunavirus; the accession numbers are indicated in the tree. The phylogenetic tree of phage isolate EP-M-A showed that phage EP-M-A and Escherichia phage vB EcoSfFiEco02 clustered onto a single branch with 73% bootstrap value which supports strongly the inferred relations (Fig. 7). It was related by four phages by 39 bootstrap values which means that the branch in question was supported in approximately 39% of the resampled trees.

The evolutionary relationship of phages EP-B-K (E2) showed that it was related to Escherichia phage vB EcoSfFiEco02 phage from the database with 81% support of the grouping of taxa by bootstrap and 61% support of the evolutionary relationship by two phages (Fig. 8).

The phage isolate EI-SP-GF was related 100% with four phages including Salmonella and Klebsiella phages (Fig. 9).

ET-SD-TH and ST-T-K were related 100% with Enterobacter phage Arya and 94% with two E. coli phages respectively (Fig. 10 & Fig. 11).

Multi-drug resistance of pathogenic E. coli has emerged recently as a result of overuse of antibiotics, which is having a detrimental effect on food safety, human health, and the environment (Zhong et al., 2023). Researchers are currently looking for novel antibacterial treatments to address the growing problem of bacterial resistance. Phage treatment is one such remedy that is gaining popularity because of its bactericidal properties and bacterial host specificity. Isolating and genome-wide characterizing of lytic phages is crucial for the development of phage treatment against AMR bacterial infections. The genome architecture of lytic phages contains genes that encode multiple putative proteins with unclear functions. Therefore, genomic study can aid in both the taxonomic classification of phages and the identification of critical genes and suitable candidates for phage treatment (Imam et al., 2019).

In this study, seven diarrheagenic E. coli phages that isolated from rivers, dairy farms, and hospital fluid wastes infecting six strains of E. coli strains that are resistant to many drugs. The phages' entire genomes were analyzed and characterized. Based on genomic studies, linear dsDNA genomes ranging in length from 24264 to 143,710 bp and with a GC content varying from 44 to 54% were found in all phages. The phages that infect diarrheagenic in E. coli were categorized as small because their genomes were less than 200 kbp in size. These results were in line with the earlier research conducted by Montso et al. (2023), which demonstrated that other pathogenic E. coli O177 was infected by small phages.

These phages were categorized by BLASTn, Genom Detective, and PHASTER analysis into the families Myoviridae and Siphoviridae under the Caudovirales order's subfamilies Guernseyvirinae, Tevenvirinae, and Vequintavirinae. Furthermore, diarrheagenic E. coli phages were also assigned to five genera including Tequatrovirus T4 (EH-B-A (A1) and EH-SD-TH), Kagunavirus (EP-M-A and EP-B-K, E2), Vequintavirus (EI-SP-GF), Jilinvirus (ET-SD-TH), and Dhillonvirus (ST-T-K). It's interesting to note that the outcomes of the entire genome study indicated that all of the phages would be virulent, which qualified them for phage therapy.

In phage isolates EH-B-A, A1, EH-SD-TH, and ET-SD-TH only 20% of coding sequence regions (CDSs) code for putative and hypothetical proteins whereas in the case of EP-M-A, EP-B-K, E2, EI-SP-GF, and ST-T-K more than 90% of CDSs code for proteins. The lower percentage of CDSs (20%) coding for proteins in the EH-B-A, A1, EH-SD-TH, and ET-SD-TH isolates might be attributed to sequence low read numbers. This suggests that the sequencing depth or coverage for these isolates might be insufficient, resulting in a smaller fraction of identified CDSs. The higher percentage of CDSs coding for proteins in the EP-M-A, EP-B-K, E2, EI-SP-GF, and ST-T-K isolates (more than 90%) indicates a greater level of genome completeness. These isolates likely have higher quality and depth of sequencing, which enables the identification of a larger proportion of functional protein-coding genes.

The phages of E. coli contain a large number of distinct genes that encode potential and functional proteins. Many of the CDSs identified in the phage genomes, most particularly ET-SD-TH were hypothesized to be hypothetical proteins with enigmatic roles. Comparable findings have been documented for other different phages that infect pathogenic bacterial species (Kim et al., 2020; Korn et al., 2021). This suggests that multiple genes with unknown functions are present in phage genomes. Therefore, research efforts must be focused on clarifying the actual roles of these putative proteins.

The presence of genes coding for phage DNA replication/modification (DNA polymerase I, DNA helicase, putative DNA cytosine methyltransferase C5, putative HNH endonuclease, DNA methyltransferase, DNA recombination nuclease inhibitor gamma), DNA synthesis and packaging (terminase large subunit, putative terminase small subunit), structural proteins (capsid and tail proteins), and host lysis (phage lysin, u-spanin, putative holin-like class I protein, and putative holin-like class II protein) were among the intriguing findings. Two phage genomes (EP-B-K, E2, and EI-SP-GF) differed from other phages in that they have genes that encode tail fiber and baseplate tail spike proteins. The inclusion of tail fiber and tail spike proteins in a phage genome can improve its infection capabilities and host range because these proteins are essential for phage receptor recognition (Nobrega et al., 2018).

The tRNAscan-SE v. 2.0 analysis indicated that phage isolates EI-SP-GF had tRNAs in its genome, but the isolates EP-M-A, EP-B-K, E2, ET-SD-TH, and ST-T-K had no tRNAs. The absence of tRNA sequences in the phage genome implies that the phage is more reliant on the host cell's resources for translation and may have evolved to exploit the host's existing translational machinery. The presence of tRNAs in the phage genome suggests that the phage has adapted to replicate efficiently within the host cell by utilizing its translation machinery. In this study, the genome detective web-based and GeneMarkS-2 analysis showed that the genomes of all phage isolates do not contain sequences of genes encoding integrase, recombinase, repressors, or excisionase, which are the main markers of lysogenic viruses (Necel, et al., 2020). Therefore, the results indicated that these phages should be considered as strictly lytic (virulent) phages.

To obtain a more global phylogenetic overview of the relationships between the different E. coli phage isolates, whole genome-based alignment was employed for tree construction against each other. The major capsid protein gene database sequences of the same genus as well as subfamily particularly isolate EI-SP-GF were obtained for the determination of the evolutionary relationship of each phage isolate with available database sequences. Bootstrapping, a resampling statistical technique was used to assess the robustness of the inferred phylogenetic relationships. The resulting trees were compared to calculate the frequency at which a particular branch appears in the replicate trees. This frequency was expressed as a bootstrap value, which represents the statistical support for that branch. Higher bootstrap values (typically ranging from 70 to 100) indicate greater support for the branch (Wiens et al., 2008).

They are constructed based on similarities and differences in genetic sequences, typically using multiple sequence alignment and evolutionary models. The phages EP-M-A and EP-B-, E2 were clustered together having 100% support by bootstrap. In phylogenetic analysis, bootstrap support is a measure of the statistical confidence or robustness of a particular branch or grouping in the tree. It is often represented as a percentage and indicates how often a particular grouping appears in replicate analyses of the data. A bootstrap value of 100% suggests that in multiple iterations of the analysis, the sequences EP-M-A and EP-B-E2 consistently clustered together as a distinct group. This high bootstrap support indicates strong statistical confidence in the grouping or relationship between these two sequences. Therefore, based on the available information, it can be concluded that EP-M-A and EP-B-E2 are closely related and form a cluster in the phage phylogenetic tree, supported by a bootstrap value of 100% which is comparable with study by Al-Shayeb et al., (2020) who constructed phylogenetic tree using major capsid protein and many of the sequences from their phage genomes cluster together with high bootstrap support that defining clades.

The major capsid protein (MCP) gene was used for phylogenetic tree construction for each phage isolate to observe evolutionary relationships with existing database sequences. The MCP gene was the gold standard for the classification of the lytic phage family. An essential function of the major capsid protein (MCP) gene is to maintain the structure and function of bacteriophages, which are viruses that specifically infect and replicate within bacteria. The MCP gene encodes the major structural protein that forms the outer capsid of the phage, encompassing the viral genome and protecting it during infection. The MCP gene is highly conserved within a specific phage family, meaning that it displays a relatively low rate of mutation across different phage isolates within the same family. This conservation leads to utilize the MCP gene for phylogenetic tree construction, which helps to elucidate the evolutionary relationships between different phage isolates (Lee et al., 2022; Dion et al., 2020).

The MCP gene database search was specified to the genus Kagunavirus for isolates EP-M-A and EP-B-K, E2 whereas in the case of EI-SP-GF specified to the family Vequintirinae, Jilinvirus for ET-SD-TH and Dhillonvirus for ST-TK to retrieve sequence from database. Isolates EP-M-A and EP-B-K were classified within the genus Kagunavirus. This suggests that these isolates share significant similarities in their MCP gene sequences, indicating a close evolutionary relationship that is similar to the report by Grose and Casjens, (2014). The isolate EI-SP-GF was classified within the family Vequintirinae, a taxonomic family that encompasses a group of viruses showing similarities in their MCP gene sequences. The isolate ET-SD-TH was classified within the genus Jilinvirus, while the isolate ST-TK was classified within the genus Dhillonvirus. These genera represent distinct groups of viruses with shared MCP gene characteristics. All five phage isolates were closely linked to other phages, as demonstrated by both phylogenetic analyses based on the conserved MCP gene and whole-genome sequence alignment study. This implied that these phages had an intricate evolutionary relationship.

The NCBI Genome database only contains a small number of full phage genomes, even though bacteriophages are more widely distributed and varied in nature. Specifically, there are extremely few reports on the diarrheagenic E. coli lytic phage family in the E. coli phage database. In addition, more updates and enhancements are needed for the phage virology database. In addition, there is a paucity of knowledge on the phage host's genome. The genome and functional annotation of bacteriophages remain mostly unexplored. Developing bacteriophages into microbial agents requires a thorough grasp of the phages. Whole genome sequencing is the most effective technique for determining the genetic background of phages, the underlying mechanisms that underlie phage-host interaction at the gene level, and a theoretical framework for phage genetic alteration.

The purpose of this work was to characterize and analyze the entire genome sequences of seven potential lytic phages that were isolated from hospital fluid effluent, dairy farms, and rivers. These phages were intended to combat different strains of multidrug-resistant diarrheagenic E. coli strains that are frequently associated with infections in humans and animals. All bacteriophages were found to have a linear double-stranded DNA genome through genome sequencing. The coliphages were shown to belong to the Myoviridae and Siphoviridae families under the Caudovirales order. According to whole genome sequence alignment and phylogenetic analysis, these phage isolates were closely related to E. coli and Enterobacteria phages in the database. Furthermore, the gene content and putative functions were determined from the genomes, which revealed the presence of phage protein genes and bacterial lysing enzymes. The genes responsible for lysogeny, virulence, toxins, and antibiotic resistance were absent from all phage isolates. Data reveals details about the phages that may be safe to use as a therapy against strains of E. coli that are resistant to multiple drugs. With the use of this knowledge, it will be possible to decide whether to apply phage-based intervention following more genome characterization. As a result, the study expands on our comprehension of the genomic diversity of phages and facilitates research into the potential use of these phages in medicine and diagnosis.

E. coli Escherichia coli

EAEC Enteroaggregative E. coli

EHEC Enterohemorrhagic E. coli

EPEC Enteropathogenic E. coli

ETEC Enterotoxigenic E. coli

STEC Shiga toxin-producing E. coli

UTI Urinary tract infection

AMR Antimicrobial resistance

CDS Coding sequence

MCP Major capsid protein

BWA Burrow-wheeler alignment

BAM Binary alignment map

BBMap Big blue mutation analysis page

Acknowledgements

We would like to thank Institute of Biotechnology, Addis Ababa University, for hosting this research; in addition, we would like to thank Department of Biotechnology, Woldia University, for helping and assessing this research progress. Our appreciation also goes to Armauer Hansen Research Institute for sequencing of phage genome.

Authors' contributions

TSS collected samples, performed the experiments, analyzed the data, developed the theoretical framework, and wrote the manuscript with input from TST. TST involved in problem identification and study design, supervision, availing funding and write up. DHA and KMT both equally contributed in the Wet lab work: sample quality and quantity check, NGS library preparation, Library quality check, and Sequencing on the Illumina platform.

Funding

There has been no significant financial support for this work that could have influenced its outcome.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request till NCBI release.

Ethics approval and consent to participate

There are no human or animal participants in this study

Consent for publication

“Not applicable” in this section

Competing interests

The authors declare no competing interests.

Al-Shayeb, B., Sachdeva, R., Chen, L. X., Ward, F., Munk, P., Devoto, A., & Banfield, J. F. (2020). Clades of huge phages from across Earth’s ecosystems. Nature, 578(7795), 425-431.
Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, Wishart DS. (2016). PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res;44(W1): W16-21.
Biswas B, Adhya S, Washart P, Paul B, Trostel AN, Powell B, Carlton R, Merril CR. Bacteriophage therapy rescues mice bacteremic from a clinical isolate of vancomycin-resistant Enterococcus faecium. Infect Immun. 2002;70:204–210.
Bortolaia, V., Kaas, R. S., Ruppe, E., Roberts, M. C., Schwarz, S., Cattoir, V., ... & Aarestrup, F. M. (2020). ResFinder 4.0 for predictions of phenotypes from genotypes. Journal of Antimicrobial Chemotherapy, 75(12), 3491-3500.
Couvin, D., Bernheim, A., Toffano-Nioche, C., Touchon, M., Michalik, J., Néron, B., ... & Pourcel, C. (2018). CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic acids research, 46(W1), W246-W251.
Cristobal-Cueto, P., García-Quintanilla, A., Esteban, J., & García-Quintanilla, M. (2021). Phages in food industry biocontrol and bioremediation. Antibiotics, 10(7), 786.
Dedrick, R. M. et al. Engineered bacteriophages for treatment of a patient with a disseminated drug- resistant Mycobacterium abscessus. Nat. Med. 25, 730–733 (2019).
Dion, M. B., Oechslin, F., & Moineau, S. (2020). Phage diversity, genomics and phylogeny. Nature Reviews Microbiology, 18(3), 125-138.
Eller MR, Vidigal PM, Salgado RL, Alves MP, Dias RS, et al. (2014) UFV-P2 as a member of the Luz24likevirus genus: A new overview on comparative functional genome analyses of the LUZ24-like phages. BMC Genomics 15: 7.
Golkar Z, Bagasra O, Pace DG (2014) Bacteriophage therapy: a potential solution for the antibiotic resistance crisis. J Infect Dev Ctries 8: 129-136.
Gontijo, M. T., Batalha, L. S., Lopez, M. E., & Albino, L. A. (2017). Bacteriophage genome sequencing: a new alternative to understand biochemical interactions between prokaryotic cells and phages. J. Microb. Biochem. Technol, 9, 169-173.
Gontijo, M. T., Batalha, L. S., Lopez, M. E., & Albino, L. A. (2017). Bacteriophage genome sequencing: a new alternative to understand biochemical interactions between prokaryotic cells and phages. J. Microb. Biochem. Technol, 9, 169-173.
Grant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen CY, Graham M, Van Domselaar G, Stothard P. (2023). Proksee: in-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. Jul 5;51(W1):W484-W492.
Grose, J. H., & Casjens, S. R. (2014). Understanding the enormous diversity of bacteriophages: the tailed phages that infect the bacterial family Enterobacteriaceae. Virology, 468, 421-443.
Gupta K. Addressing antibiotic resistance. Am J Med. 2002;113(Suppl 1A):29S–34S.
Hungaro HM, Lopez MES, Albino LAA, Mendonça RCS (2014) Bacteriophage: The viruses infecting bacteria and their multiple applications. In: Jull A T. Reference Module in Earth Systems and Environmental Sciences. (1stedn), Elsevier Inc.
Imam, M., Alrashid, B., Patel, F., Dowah, A. S., Brown, N., Millard, A., ... & Galyov, E. E. (2019). vB_PaeM_MIJ3, a novel jumbo phage infecting Pseudomonas aeruginosa, possesses unusual genomic features. Frontiers in Microbiology, 10, 2772.
Islam, M. R., Martinez-Soto, C. E., Lin, J. T., Khursigara, C. M., Barbut, S., & Anany, H. (2023). A systematic review from basics to omics on bacteriophage applications in poultry production and processing. Critical Reviews in Food Science and Nutrition, 63(18), 3097-3129.
Kim, S. G., Lee, S. B., Giri, S. S., Kim, H. J., Kim, S. W., Kwon, J., & Park, S. C. (2020). Characterization of novel Erwinia amylovora jumbo bacteriophages from Eneladusvirus genus. Viruses, 12(12), 1373.
Klumpp J, Fouts DE, Sozhamannan S (2013) Bacteriophage functional genomics and its role in bacterial pathogen detection. Brief Funct Genomics elt009.
Klumpp, J., Fouts, D. E., & Sozhamannan, S. (2012). Next generation sequencing technologies and the changing landscape of phage genomics. Bacteriophage, 2(3), 190-199.
Korn, A. M., Hillhouse, A. E., Sun, L., & Gill, J. J. (2021). Comparative genomics of three novel jumbo bacteriophages infecting Staphylococcus aureus. Journal of virology, 95(19), 10-1128.
Lal T M, Sano M, Hatai K, Ransangan J (2016) Complete genome sequence of a giant Vibrio phage ValKK3 infecting Vibrio alginolyticus. Genom Data 8: 37-38.
Lee, D. Y., Bartels, C., McNair, K., Edwards, R. A., Swairjo, M. A., & Luque, A. (2022). Predicting the capsid architecture of phages from metagenomic data. Computational and Structural Biotechnology Journal, 20, 721-732.
Liu, B., Zheng, D., Jin, Q., Chen, L., & Yang, J. (2019). VFDB 2019: a comparative pathogenomic platform with an interactive web interface. Nucleic acids research, 47(D1), D687-D692.
Lopez MES, Batalha LS, Vidigal PMP, Albino LAA, Boggione DMG et al. (2016) Genome sequence of the enterohemorrhagic Escherichia coli bacteriophage UFV-AREG1. Genome Announc 4: e00412-16.
Montso, P. K., Kropinski, A. M., Mokoena, F., Pierneef, R. E., Mlambo, V., & Ateba, C. N. (2023). Comparative genomics and proteomics analysis of phages infecting multi-drug resistant Escherichia coli O177 isolated from cattle faeces. Scientific Reports, 13(1), 21426.
Muñoz SLD, Koskella B (2014) Bacteria-phage interactions in natural environments. Adv Appl Microbiol 89: 135-183.
Necel, A., Bloch, S., Nejman-Faleńczyk, B., Grabski, M., Topka, G., Dydecka, A., & Węgrzyn, A. (2020). Characterization of a bacteriophage, vB_Eco4M-7, that effectively infects many E. coli O157 strains. Scientific reports, 10(1), 3743.
Negash A, Ejo M (2016) Review on bacteriophages and its antimicrobial uses. Amer-Eurournal Sci Res 11: 199-208.
Nguyen, H. M., Watanabe, S., Sharmin, S., Kawaguchi, T., Tan, X. E., Wannigama, D. L., & Cui, L. (2023). RNA and Single-Stranded DNA Phages: Unveiling the Promise from the Underexplored World of Viruses. International Journal of Molecular Sciences, 24(23), 17029.
Nobrega, F. L., Vlot, M., de Jonge, P. A., Dreesens, L. L., Beaumont, H. J., Lavigne, R., & Brouns, S. J. (2018). Targeting mechanisms of tailed bacteriophages. Nature Reviews Microbiology, 16(12), 760-773.
Petrillo, M., Querci, M., Brogna, C., Ponti, J., Cristoni, S., Markov, P. V. & Van den Eede, G. (2022). Evidence of SARS-CoV-2 bacteriophage potential in human gut microbiota. F1000Research, 11, 292.
Pujato SA, Guglielmotti DM, Martínez-García M, Quiberoni A, Mojica FJ (2017) Leuconostoc mesenteroides and Leuconostoc pseudomesenteroides bacteriophages: Genomics and cross-species host ranges. Int J Food Microbiol 257: 128-137.
Ramirez-Castillo FY, Moreno-Flores AC, Avelar-Gonzalez FJ, Marquez-Diaz F, Harel J, Guerrero-Barrera AL. An evaluation of multidrug-resistant Escherichia coli isolates in urinary tract infections from Aguascalientes, Mexico: cross-sectional study. Ann Clin Microbiol Antimicrob. 2018;17:34.
Ramos S, Silva V, Dapkevicius MLE, Canica M, Tejedor-Junco MT, Igrejas G, Poeta P. Escherichia coli as commensal and pathogenic bacteria among food-producing animals: Health implications of extended spectrum beta-lactamase (esbl) production. Animals (Basel) 2020;10:2239.
Shen, A., & Millard, A. (2021). Phage genome annotation: where to begin and end. Phage, 2(4), 183-193.
Taylor MW (2014) Viruses and man: A history of interactions. (1stedn), Springer International Publishing Switzerland.
Tolstoy, I., Kropinski, A. M., & Brister, J. R. (2018). Bacteriophage taxonomy: an evolving discipline. Bacteriophage Therapy: From Lab to Clinical Practice, 57-71.
Tuem KB, Gebre AK, Atey TM, Bitew H, Yimer EM, Berhe DF. Drug resistance patterns of Escherichia coli in Ethiopia: a meta-analysis. Biomed Res Int. 2018;2018:4536905.
Vilsker, M., Moosa, Y., Nooij, S., Fonseca, V., Ghysens, Y., Dumon, K., & de Oliveira, T. (2019). Genome Detective: an automated system for virus identification from high-throughput sequencing data. Bioinformatics, 35(5), 871-873.
Wiens, J. J., Kuczynski, C. A., Smith, S. A., Mulcahy, D. G., Sites Jr, J. W., Townsend, T. M., & Reeder, T. W. (2008). Branch lengths, support, and congruence: testing the phylogenomic approach with 20 nuclear loci in snakes. Systematic Biology, 57(3), 420-431.
Ye, Y., Tong, G., Chen, G., Huang, L., Huang, L., Jiang, X., & Lin, M. (2023). The characterization and genome analysis of a novel phage phiA034 targeting multiple species of Aeromonas. Virus Research, 336, 199193.
Zampara, A. et al. Developing Innolysins Against Campylobacter jejuni Using a Novel Prophage Receptor-Binding Protein. Front. Microbiol. 12, (2021).
Zhong, Z., Wang, Y., Li, H., Zhang, H., Zhou, Y., Wang, R., & Bao, H. (2023). Characterization and genomic analysis of a novel E. coli lytic phage with extended lytic activity against S. Enteridis and S. Typhimurium. Food Production, Processing and Nutrition, 6(1), 14.

The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

Whole genome sequence analysis and characterization of lytic bacteriophages against antimicrobial-resistant diarrheagenic Escherichia coli strains isolated from various sources in Addis Ababa, Ethiopia

Status:

Version 1

Abstract

Figures

1. Background

2. Materials and Methods

2.1. Phage isolates

2.2. DNA extraction and quality analysis

2.3. Sequencing and assembly

2.3.1. Phage genome sequencing

2.3.2. Sequence assembly and consensus generation

2.4. Phage genome sequence analysis

3. Results

3.1. Whole Genome Sequence Analysis

3.2. Phylogenetic Analysis

4. Discussion

5. Conclusion

List of abbreviations

Declarations

References

Additional Declarations

Status:

Version 1