Screening of promising molecules against potential drug targets in Yersinia pestis by integrative pan and subtractive genomics, docking and simulation approach

doi:10.21203/rs.3.rs-4767929/v1

Download PDF

Research Article

Screening of promising molecules against potential drug targets in Yersinia pestis by integrative pan and subtractive genomics, docking and simulation approach

https://doi.org/10.21203/rs.3.rs-4767929/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

This study aims to identify novel drug targets in Yersinia pestis, the bacterium responsible for plague, using an integrative approach combining pan-genomic and subtractive genomics methods. The primary objective was to locate targets that do not share homology with human proteins, gut microbiota, or known anti-targets but are crucial for the pathogen's survival. These targets should also exhibit high levels of protein interaction, antibiotic resistance, and conservation across various pathogens. We identified two promising targets: the aminotransferase class I/class II domain-containing protein and 3-oxoacyl-[acyl-carrier-protein] synthase 2. These proteins were modeled using AlphaFold2, validated through several structural analyses, and subjected to molecular docking and ADMET analysis. Molecular dynamics simulations confirmed the stability of the drug-target complexes, indicating their potential as targets for new therapies against Y. pestis.

Yersinia pestis

in silico analysis

drug target identification

virtual screening

novel antibiotics

Yersinia pestis, the bacterium causing plague, is a potential biological weapon. Humans can contract the disease from infected flea bites, direct contact with contaminated substances, or inhalation. Plague can lead to severe illness, particularly in its septicemic and pneumonic forms, with fatality rates ranging from 30–100% without treatment [1]. While most infections are treatable with antibiotics, the emergence of antibiotic resistance in Y. pestis has complicated control efforts, especially in the absence of effective vaccines [2]. The 17/95 strain isolated from human in 1995 exhibited multiple drug resistance, limiting treatment options [2]. Plague is currently most common in Madagascar, the Democratic Republic of the Congo, and Peru, although potential natural foci exist worldwide [1]. Thus, developing new antibiotics countering Y. pestis is essential.

Despite Y. pestis being one of the most lethal bacteria, and the emergence of multiple antibiotic-resistant strains, few studies have focused on identifying new drug targets in this pathogen. Past research has generally targeted virulence factors or essential enzymes common to all microorganisms [3–5], primarily identified through laboratory research and comparative genome analysis years ago [6]. For instance, a 2012 study identified MurE ligase as a potential drug target in Y. pestis CO92 using metabolic pathway analysis [7]. Another study found four novel targets using pan genomic analysis across nine strains of Y. pestis[8].

This study aims to discover previously unidentified protein targets in Y. pestis using a comprehensive pan-genome analysis and subtractive genomics approach against 28 strains. Furthermore, we aim to screen potential drug candidates using molecular docking and simulation techniques.

In this study, we utilized a subtractive genomics approach to prioritize drug targets against Y. pestis. We selected 28 strains of Y. pestis to identify potential drug targets. Various databases and computational tools, as depicted in Fig. 1, were employed to determine therapeutic targets.

The subtractive genomics approach is a robust method for identifying proteins that are unique to a pathogen and essential for its survival, yet absent in the host. This approach involves a comparative analysis of the proteomes and metabolic pathways of both the host and the pathogen. By subtracting proteins common to both, subtractive genomics reveals pathogen-specific proteins crucial for its survival, thereby ensuring these targets do not interfere with the host's metabolic processes. This methodological framework is instrumental in pinpointing potent drug targets for pathogens.

Retrieval of core proteomes of Y. pestis

The core proteomes of 28 strains of Y. pestis (Fig. 2) were retrieved from the PANX database [9]. PANX is a comprehensive platform for pan-genome analysis and exploration, allowing the extraction of core proteomes specific to pathogens. This streamlined process enables more efficient subsequent data analysis by eliminating irrelevant accessory protein sequences.

Removal of paralogous protein sequences

To ensure a non-redundant dataset, paralogous protein sequences were removed using the Cluster Database at High Identity with Tolerance (CD-HIT) at a 75% identity threshold. Proteins with sequence identities greater than 75% were identified as paralogous, and their complete sequences were excluded from the dataset [10]. Additionally, proteins with fewer than 100 amino acids were removed. This process yielded a refined list of non-paralogous protein sequences for further analysis.

Identification of non-homologous protein sequences

To avoid cross-reactivity with the human proteome and prevent potential therapeutic molecules from binding to host homologous proteins, a BLASTp analysis was conducted. The non-paralogous proteins of Y. pestis were compared against the human (Homo sapiens) RefSeq proteome in the Ensembl genome database [11]. Proteins were considered homologous to human proteins if any significant hits above a threshold value of 0.005 were found, with sequence identity less than 50%.

Selection of essential non-homologous proteins

The non-homologous protein sequences identified previously were further analyzed using the Database of Essential Genes (DEG, Version 11) [12]. The DEG database provides crucial information on genes and proteins essential for the survival and growth of pathogens, which is instrumental for identifying potential drug targets. Using a threshold E-value of 10^− 4 and an alignment length cutoff of 1%, we screened for proteins in Y. pestis that are both essential and non-homologous. These essential proteins play significant roles in synthetic biology and offer potential as high-value therapeutic targets.

Druggability of essential proteins

To assess the druggability of the essential non-homologous proteins, we conducted a BLASTp analysis against proteins that are therapeutic targets and approved by the Food and Drug Administration (FDA). These target proteins were obtained from the DrugBank Database (Version 5) [13] and Therapeutic Target Database (TTD) [14]. This analysis aimed to identify essential proteins with drug-target-like characteristics, prioritizing those that are novel and unique therapeutic targets. An E-value cut-off of 10^− 4 was used for this analysis.

Removal of proteins homologous to human ‘anti-targets’

To mitigate the risk of adverse side effects caused by drugs interacting with essential human proteins, we identified and removed proteins homologous to human "anti-targets." Anti-targets are human proteins that, if inhibited by pathogen-targeting drugs, could cause hazardous side effects. Using data compiled from literature [15], which identified a total of 451 high-confidence anti-targets, along with their accession numbers. The corresponding protein sequences were obtained from the NCBI Protein database.

To ensure that our identified drug targets do not share significant similarity with these human anti-targets, a BLASTp analysis was performed using the NCBI BLAST program. The parameters for this analysis were an E-value threshold of < 0.005, a query length > 30%, and an identity > 50%.

Conservancy analysis of druggable essential proteins

To determine the potential broad-spectrum applicability of the predicted drug targets, conservancy analysis was performed. The predicted drug target sequences were analyzed for homology with other common pathogenic strains using the BLASTp suite on the NCBI server. In this analysis, protein-protein BLAST was conducted against 222 pathogenic bacteria [16, 17]. Parameters for this analysis included an E-value of < 10^− 4 and sequence identity > 20%. Proteins identified in more than 50 distinct pathogenic strains were classified as broad-spectrum targets.

Host pathogen interaction analysis

To refine our selection of drug targets, we analyzed the sequence similarity of these targets with microbial proteins known to interact with the human host. Sequence information for host-pathogen interacting proteins was obtained from online databases such as HPIDB (version 2.0) [18], PHIbase (version 4.2) [19], and PHISTO (version 2) [20]. Using the BLAST algorithm, we calculated sequence similarity with an E-value threshold of 10^− 4 and an alignment length cutoff of 1%. This step ensured that the identified targets have potential interactions relevant to human pathogenic conditions.

Identification of gut microflora non-homologous proteins

The next selection criteria involved screening out proteins with high sequence similarity to human gut microbiota, to avoid potential side effects on gastrointestinal function. The gut microbiota plays a pivotal role in maintaining homeostasis and overall health. To compile a relevant database, sequence information was sourced from published literature [21, 22]. The BLAST algorithm in NCBI was used to compute protein sequence similarity. Parameters for this BLAST analysis included an E-value threshold of < 0.005, query length > 30%, and identity > 50%. This step ensured that potential drug targets would not adversely affect beneficial gut flora.

Protein resistance analysis

To identify potential drug targets that can effectively counteract antibiotic resistance, we included only proteins with significant antibiotic resistance for subsequent structure-based investigations. Targeting these specific proteins could potentially inhibit the mechanisms through which Y. pestis evades existing treatments. For this analysis, the proteins that had successfully passed through all previous screening steps were subjected to a BLASTp analysis against the Comprehensive Antibiotic Resistance Database (CARD) [23, 24]. The resultant sequences with high antibiotic resistance would be selected for further structure-based studies.

Prediction of subcellular localizations

Knowing the subcellular localization of a protein is crucial for understanding its function and potential as a drug target. Y. pestis being a Gram-negative bacterium, has five possible subcellular locations: cytoplasm, inner membrane, periplasm, outer membrane, and extracellular space. PSORTb version 3.0.2 was employed to predict the subcellular localizations of proteins that passed through the antibiotic resistance screening [25]. PSORTb works by conducting a BLAST search of non-homologous proteins against proteins with known subcellular localizations.

Drug target prioritization

To further refine the selection of screen ideal drug target candidates for Y. pestis, several crucial protein properties were considered: molecular weight, presence of transmembrane helices, stability, and involvement in biological processes. Protparam tool in the ExPASy server was used to predict the molecule weight and stability of the proteins [26]. Ideal drug targets were those with stable physicochemical properties and a molecular weight of less than 100 kDa, making them easier to handle experimentally. Then, transmembrane helix analysis was conducted using TMHMM-2.0 [27]. Proteins without transmembrane helices were preferred, as these proteins are generally easier to express and clone, facilitating further laboratory studies and drug development processes. The Interpro Database (version 90.0) was utilized to identify the potential function of protein targets, as well as the biological processes they are involved in [28]. This step ensured that the selected proteins play critical roles in the pathogen's lifecycle, thereby validating their relevance as drug targets.

Structure prediction and homology modelling

To advance our understanding of the shortlisted protein targets, we examined their structures in the Protein Data Bank (PDB). The BLASTp method was employed to identify a suitable template for protein structure modeling. When a three-dimensional structure was unavailable, we employed AlphaFold2 to model the protein structure [29, 30].

Validation of protein structure

The quality of the modeled structures was rigorously validated using several computational tools, including PROCHECK [31], PROSA [32], and ERRAT [33], to ensure their suitability for docking experiments. PROCHECK was used to evaluate the stereochemistry composition of a protein structure. It analyzed the residue-by-residue geometry as well as the overall structural geometries of the protein. PROSA was employed to evaluate the quality of the 3-dimensional structural model against the available structures supplied from PDB based on Z-score. ERRAT analyzed the statistics of protein model atom interactions, identifying potential errors in the protein structure.

Active site prediction

Upon confirmation of the protein structures, it was essential to identify the active sites where ligands could bind to exert their effects. For this purpose, the online tool POCASA was utilized. POCASA is an automated program that uses the Roll algorithm to predict binding sites by detecting pockets and cavities within proteins of known 3D structure [34].

Ligand extraction

Ligands were extracted from Traditional Chinese Medicine (TCM) Bank database and DrugBank database. The TCMBank (https://tcmbank.cn) is the largest comprehensive traditional Chinese medicine database, providing standardized information on traditional Chinese herbs, ingredients, diseases, and their corresponding gene targets. It contains data on 9,191 herbs, 61,965 ingredients and 32529 diseases currently (released 2024-6-10) [35]. We extracted 61,965 ingredients from the TCM Bank to ensure a broad spectrum of natural compounds for potential therapeutic applications. DrugBank is an open-source of archives of drugs, consisting of 3820 approved molecules, 11,212 experimental drugs, and 5,371 investigational drug molecules at present (version 5.1.10, released 2024-6-20) [36]. From DrugBank, we collected molecules that are approved, under experimental evaluation, and in clinical trials, expanding our pool of potential drug candidates. The combination of traditional and modern pharmacological compounds allowed us to explore a diverse range of bioactive molecules, enhancing the likelihood of identifying effective inhibitors against the selected drug targets in Y. pestis.

Molecular docking and ADMET analysis

Molecular docking simulations were performed using the GLIDE module of Schrödinger [37]. The identified ligands from previous steps were docked with the active sites of the curated protein targets. The protein structures were preprocessed using the Protein Preparation Wizard, which involved adding missing hydrogen atoms, assigning bond orders, and optimizing hydroxyl groups and hydrogen bonds. Energy minimization was performed using the OPLS4 force field, with non-hydrogen atoms minimized to an RMSD value of 0.3 Å. The receptor grid was generated based on the centroid of residues identified by POCASA. Ligand structures were prepared using LigPrep, optimizing their geometries with the OPLS4 force field, and generating structures within the default pH range using Epik.

Docking was executed using the Ligand Docking module, employing the standard precision (SP) method. The top 100 molecules were predominantly selected according to binding affinity and run three docking times to ensure consistency. From these, 10 molecules were shortlisted based on the average binding affinities. These molecules were then screened using Lipinski's rule of five with the ChemAxon tool [38].

To evaluate the pharmacokinetic properties of the selected compounds, ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) analyses were conducted using the HelixADMET and ADMETlab (version 2.0) web servers [39, 40]. The absorption of the molecules was predicted using human intestinal absorption (HIA), Caco-2 permeability, and human oral bioavailability (HOB). Distribution of drug molecules was assessed by considering blood-brain barrier (BBB) penetration, plasma protein binding (PPB), and volume distribution. Metabolism was evaluated using models for inhibition by the detoxification enzyme cytochrome P450, including CYP2C9, CYP2D6, CYP3A4, and CYP2C19. Excretion of drugs was predicted using half-life, while toxicity was assessed based on the AMES toxicity, hepatotoxicity, and carcinogenicity.

In conjunction with the outcomes derived from the previously described screening process, the most promising molecular candidates will be subsequently subjected to molecular dynamics (MD) simulations, with the aim of elucidating the structural stability, integrity, flexibility and compactness of the protein-ligand complex.

Molecular dynamics simulation

To assess the dynamic behavior of the docked complexes following ADMET analysis, they were subjected to Molecular Dynamics (MD) simulations using the Schrödinger Biosuite's Desmond software. The aim was to scrutinize the stability of these complexes, specifically the interactions between the target proteins and the most promising ligands identified through virtual screening, over an extended 100-nanosecond (ns) simulation period.

The initial stages of MD simulation involved configuring the solvent model, force field, and boundary conditions through the system builder. The OPLS4 force field was employed for these simulations, with the complexes centered in an orthorhombic box and a 10 Å distance imposed as boundary conditions. The solvent molecules were modeled using the simple point charge (SPC) model, and counter ions were added to neutralize the system's charge. A 0.15mol/L NaCl solution was included to simulate the concentration of physiological saline in the human body. Subsequently, the energy of the complex structures was minimized using the steepest descent algorithm, forming part of the equilibration protocol within the Desmond framework.

The MD simulations were conducted for a 100 ns trajectory within the NPT ensemble class. Temperature control was maintained at a constant 310 K using the Nose-Hoover chain method, and pressure was stabilized at 1 bar using the Martyna-Tobias-Klein method. Comprehensive analyses of the simulation events included assessing the Root Mean Square Deviation (RMSD) of protein and ligand bonds, Root Mean Square Fluctuation (RMSF) of the complex, Radius of Gyration (RoG), Solvent Accessible Surface Area (SASA), and intermolecular hydrogen bonds. These analyses provided valuable insights into the stability of the ligands within the binding sites of the protein complexes. Additionally, the solvent-accessible properties of the protein-ligand complexes were examined to understand their interaction with the surrounding environment.

This study had two primary objectives: determining novel therapeutic targets against Y. pestis by pan-genome analysis and subtractive genomics approaches; and identifying potential drug candidates against TCMBANK and Drug Bank database. A total of 2,794 core genes and associated 2,971 proteins of 28 Y. pestis strains were obtained from the PanX database. The accession numbers, strain names, and the initial core protein sequences are provided in Supplementary Table S1 and Supplementary File1. The step-wise filtering of the proteins was summarized in Table 1. Using the CD-HIT tool, we screened out 2864 paralogous protein sequences from the initial 2971 proteins. After removing proteins shorter than 100 amino acids, 99 proteins remained for analysis (see Supplementary File 2). A BLASTp search revealed that 93 of these proteins shared similarities with human proteins. These were excluded to avoid potential cross-reactivity and toxicity. Thus, 93 non-homologous proteins were selected for further analysis (Supplementary File 3). We identified 68 essential proteins crucial for the survival of Y. pestis using the DEG database (Supplementary File 4). Subsequently, the 68 essential proteins were subjected to BLASTp analysis against the DrugBank database to identify those with substantial sequence similarities to FDA-approved therapeutic targets. This identified 35 druggable proteins (Supplementary File 5). The remaining 33 proteins were flagged for further research as potential novel targets. After that, we conducted BLASTp alignments against a set of 451 identified anti-targets. Thirty-three proteins exhibited significant similarity to anti-targets and were excluded, leaving two proteins for further analysis (Supplementary File 6). Both proteins demonstrated high conservation among 54 and 42 pathogenic bacteria respectively, suggesting their potential as broad-spectrum drug targets. Detailed alignment information and lists of conserved pathogens are provided in Supplementary Files 7 and 8. BLASTp analysis against HPIDB, PHIbase, and PHISTO databases showed that both target proteins had unique host-pathogen interactions. They were also identified as antibiotic resistant against the CARD. The BLAST alignment against identified proteins of human gut microbiota showed that neither of the two proteins shared homology with human gut microbiota proteins, reducing the risk of drug-induced gastrointestinal discomfort.

Both proteins were predicted to be cytoplasmic, according to PSORTb, making them suitable targets for small-molecule drugs. They were also found to be stable with molecular weights below 100 kDa, as determined by ProtaParam.

Functional analyses indicated that the two targets were an aminotransferase class I/class II domain-containing protein (Uniprot: A0A3N4AYN0) and 3-oxoacyl-[acyl-carrier-protein] synthase 2 (Uniprot: A0A0H2YHQ2). The TMHMM server showed that neither protein had transmembrane helices. The functions, putative names, associated biological processes, and Uniprot accession numbers were confirmed through InterProScan and BLASTp searches in the NCBI database, as detailed in Table 2.

Table 1

Subtractive genomics analysis based on core sequences from 28 *Y. pestis* strains
No.	Steps involved in the study	Identified proteins
1	Core proteome of Y. pestis	2971
2	Proteins left after removal of paralogs sequences	107
3	Proteins with more than 100 amino acids	99
4	None-homologous proteins	93
5	Essential none-homologous proteins	68
6	Druggable essential proteins	35
7	Target proteins non-homology to ‘anti-targets’	2
8	Highly conserved proteins among pathogens	2
9	Target proteins showing host interaction	2
10	Proteins left after removal of sequences homology to human gut microbiota proteomes	2
11	Proteins with antibiotic resistance	2
12	Subcellular localization	2

Table 2

Details of selected drug targets
Protein’s name in Uniprot (putative)	Accession No. in Uniprot	Accession No. in NCBI		Subcellular localization	Transmembrane Helices	Molecular Weight	Stability	Molecular Function	Biological Process
Aminotransferase class I/classII domain-containing protein	A0A3N4AYN0	AAM85046.1		Cytoplasmic	0	44594.08	stable	pyridoxal phosphate binding	biosynthetic process
3-oxoacyl-[acyl-carrier-protein] synthase 2	A0A0H2YHQ2	WP_002213032.1		Cytoplasmic	0	46369.74	stable	acyltransferase activity, transferring groups other than amino-acyl groups	fatty acid biosynthetic process

Three-dimensional structure prediction

The 3D structures of two therapeutic targets, A0A3N4AYN0 and A0A0H2YHQ2, were both unavailable in PDB. Therefore, the sequences were used to for modelling by AlphaFold2. The successful modeling of this structure is depicted in Fig. 3.

Validation of modelled structure

To ensure the accuracy and quality of the predicted 3D structures of the identified proteins, we employed several online validation tools: ProSAweb, PROCHECK, and ERRAT. The quality scores for these structures were presented in Table 3.

ProSAweb was used to assess the overall quality of the 3D structures by calculating the Z-score. The Z-score indicates structural reliability by comparing it to a database of structures determined by NMR spectroscopy and X-ray crystallography. High-quality models typically have Z-scores within the range of native protein structures. For our targets, A0A3N4AYN0 and A0A0H2YHQ2, the Z-scores were approximately − 10, indicating that their structures fall within the acceptable range of protein models (Supplementary File 9). The PROCHECK tool was employed to assess the 3D structures through Ramachandran plot analysis. This analysis highlights the distribution of dihedral angles in the protein. For both target proteins, over 90% of the residues were located in the most favored regions of the plot, less than 9% in the additionally allowed regions, less than 0.5% in the generally allowed regions, and 0% in the disallowed regions. These results supported the high quality and accurate folding of the modeled structures. The detailed Ramachandran plots for A0A3N4AYN0 and A0A0H2YHQ2 are provided in Supplementary File 10. The the ERRAT tool was used to evaluate the consistency of the protein structures with their expected atomic interactions. Higher ERRAT scores reflect better agreement with standard protein models. The ERRAT scores for A0A3N4AYN0 and A0A0H2YHQ2 were 96.3542 and 97.6019, respectively, indicating that both models are of high quality. Detailed quality metrics for both protein targets are provided in Supplementary File 11.

Table 3

Validation scores of 3D structures of drug targets
Scores	A0A3N4AYN0	A0A0H2YHQ2
ProSA-web
Z-score	-10.02	-10.22
PROCHECK-Ramachandran plot (%)
Core	91.1%	92.8%
Allowed	8.6%	6.6%
General	0.3%	0.5%
Disallowed	0	0
ERRAT
Quality factor	96.3542	97.6019

Binding site prediction

The POCASA tool was utilized to identify the active sites within the two target proteins. For each protein, five cavities were detected. The largest cavity in protein A0A3N4AYN0 had a volume of 257 Å³, a volume depth (VD) of 791, and an average VD of 3.07912 Å (Fig. 4A). Similarly, for protein A0A0H2YHQ2, the largest cavity had a volume of 361 Å³, a VD of 1140, and an average VD of 3.15974 Å (Fig. 4B). These findings indicate that the largest pockets in A0A3N4AYN0 and A0A0H2YHQ2 have volumes of 257 Å³ and 361 Å³, with VDs of 791 and 1140, respectively. Hence, both proteins were subjected to molecular docking and molecular dynamics simulation.

Molecular docking and ADMET analysis

In this study, an extensive molecular docking analysis was conducted involving 20,403 compounds from DrugBank and 61,965 compounds from TCM Bank. These compounds underwent preprocessing using Schrödinger's LigPrep, applying molecular weight filters to select molecules with weights between 150 and 500 Da. The prepared ligands were then docked against the selected target proteins, A0A3N4AYN0 and A0A0H2YHQ2, to identify optimal binding orientations and evaluate binding affinities.

The top 10 ligands for each protein were shortlisted based on binding affinity. For A0A3N4AYN0, the binding affinities ranged from − 9.014 to -8.347, while for A0A0H2YHQ2, the affinities ranged from − 8.788 to -8.184. Among these, six ligands targeting A0A3N4AYN0 adhered to Lipinski's Rule of Five (Ro5), highlighting their favorable physicochemical properties. In contrast, five compounds targeting A0A0H2YHQ2 met the Ro5 criteria. Table 4 displayed these docked compounds, their 2D structures, and the binding residues for each target protein.

Table 5 presented the ADMET assessment results for the docked molecules adherence to Ro5 based on ADMETlab and HelixADMET tools. For the 6 ligands of A0A3N4AYN0, chelidonate and 4,4'-dihydroxytruxillic acid demonstrated high Caco-2 cell permeability, HIA, and HOB, while (S)-mandelic acid O-beta-D-glucopyranoside exhibited lower oral absorption. Regarding distribution, except for phloretin and 4,4'-Dihydroxytruxillic acid, Most compounds recorded PPB values < 90%, indicating poor plasma protein binding which allows higher fractions of unbound drugs for diffusion. Most of the ligands displayed weak absorption into the central nervous system (CNS), except for phloretin and (S)-mandelic acid O-beta-D-glucopyranoside. The volume distribution analysis indicated that all six ligands were mainly confined to blood, bound to plasma proteins, or highly hydrophilic.

In terms of inhibitory activity, phloretin was the only molecule showing metabolism inhibition, which makes it less attractive for further study. The half-life values indicate the likelihood that a compound has a half-life exceeding 3 hours. Miraxanthin-I, (S)-mandelic acid O-beta-D-glucopyranoside, and apigenin 7-glucuronide demonstrated long half-lives.

Four ligands showing good docking scores and non-toxic profiles were selected for molecular dynamics simulations: miraxanthin-I, chelidonate, (S)-mandelic acid O-beta-D-glucopyranoside, and apigenin 7-glucuronide.

For compounds with lower oral absorption, such as miraxanthin-I, (S)-mandelic acid O-beta-D-glucopyranoside, and apigenin 7-glucuronide, potential improvements can be achieved through formulation optimization, which may involve techniques like solid dispersions, nanoparticles, or liposomes. Sustained-release formulations could improve compounds with short half-lives.

Regarding the A0A0H2YHQ2 protein, most of the ligands demonstrated elevated HIA and oral bioavailability, though Caco-2 permeability was below 0.3. Furthermore, enhanced absorption into the CNS was observed for most ligands. DPA and salvianolic acid A, with high PPB, suggested higher bloodstream distribution. Analysis of volume distribution indicated that all ligands primarily resided within the blood, bound to plasma proteins or displaying high hydrophilicity.

In terms of metabolism, salvianolic acid A and dofetilide displayed inhibition of CYP2C9 and CYP2D6, respectively. Salvianolic acid A had a shorter half-life. Notably, all ligands were non-toxic concerning AMES toxicity and carcinogenicity, while varying degrees of hepatotoxicity were observed. Consequently, valomaciclovir and miraxanthin-III were selected for molecular dynamics simulations.

The docked complex structures of the ligands that passed the ADMET assessments for protein A0A3N4AYN0 and protein A0A0H2YHQ2 are shown in Fig. 5 and Fig. 6, respectively.

Molecular dynamics simulation of protein-ligand complexes

RMSD analysis

The stability of each complex was evaluated using Root Mean Square Deviation (RMSD) at regular intervals. For A0A3N4AYN0, the average RMSD values of the protein backbone in complex with miraxanthin-I, chelidonate, (S)-mandelic acid O-beta-D-Glucopyranoside, and apigenin 7-glucuronide were 0.28, 0.26, 0.26, and 0.27 nm, respectively. The highest RMSD increases observed were 0.33, 0.33, 0.34, and 0.34 nm, respectively, indicating the overall stability of these complexes. Minor fluctuations were noted during the initial 10 nanoseconds (ns), after which the RMSD values stabilized for the remaining simulation period (Fig. 7A). The average RMSD values for the A0A0H2YHQ2-valomaciclovir and miraxanthin-III complexes were 0.23 and 0.27 nm, with peak fluctuations of 0.30 and 0.42 nm, respectively. For the A0A0H2YHQ2-Valomaciclovir complex, minor structural deviations were observed at 25 ns, after which it remained stable. Conversely, frequent protein structure deviations were observed in the A0A0H2YHQ2-miraxanthin-III complex at approximately 25ns, 40ns, 76ns, and post-80ns, suggesting less stability compared to A0A0H2YHQ-Valomaciclovir (Fig. 7A).

The stability of the ligand in the binding pocket was investigated by measuring ligand RMSD. Average ligand RMSD values of 1.39, 1.32, 0.57, and 0.62 were obtained from A0A3N4AYN0-miraxanthin-I, A0A3N4AYN0-chelidonate, A0A3N4AYN0-(S)-mandelic acid O-beta-D-glucopyranoside, and A0A3N4AYN0-apigenin 7-glucuronide complexes, respectively. The low RMSD values of (S)-mandelic acid O-beta-D-glucopyranoside and apigenin 7-glucuronide indicated strong and stable binding within the pockets. The ligand of A0A0H2YHQ2-valomaciclovir showed fluctuations at ~ 32ns but remained stable until the end of the 100ns simulation. Conversely, significant deviations for miraxanthin-III were observed from 20 to 40ns, after which stability was achieved (Fig. 7B).

RMSF analysis

The Root Mean Square Fluctuation (RMSF) analysis was conducted to examin the fluctuations in the protein chain during the simulation, providing insights into the dynamic characteristics of the protein and its interaction with ligands. The RMSF plots revealed several fluctuations at various positions for all complexes, indicating potential sites of ligand interaction and catalysis. For the A0A3N4AYN0 complexes, significant fluctuations within 0.75 Å were observed at residues 22, 60 to 62, 240 to 275, 146, and 339. These observed fluctuations suggest that these residues may play critical roles in ligand binding and catalytic activity (Fig. 7C). In the A0A0H2YHQ complexes, higher fluctuations within 0.85 Å were noted at residues 133 to 147, with slight fluctuations within 0.35 Å at residues 282 to 289. This pattern of fluctuations implies shared protein dynamics for both ligands in this protein context (Fig. 7D).

Radius of Gyration

The radius of gyration (Rg) serves as a key metric for assessing the compactness and folding characteristics of the proteins, as well as for evaluating the impact of ligand binding on the protein’s conformational stability. Throughout the 100 ns simulation period, all the complexes displayed remarkably stable and compact Rg values. This stability reflected the structural integrity and compact nature of the protein-ligand complexes, showing no significant structural deviations throughout the simulation (Fig. 7E).

Solvent-accessible surface area analysis

The solvent-accessible surface area (SASA) measures the extent to which a molecule's surface is accessible to solvent molecules, which provides insights into the protein-ligand complex's exposure to the surrounding environment. The SASA values for all complexes were monitored over the simulation period and averaged (Fig. 7F). For the A0A3N4AYN0 complexes, the (S)-mandelic acid O-beta-D-glucopyranoside complex had a relatively high average SASA of 180.59 ± 2.60 nm². In contrast, the complexes with chelidonate, miraxanthin-I, and apigenin 7-glucuronide displayed lower and closely similar SASA values, at around 175.05 ± 2.93 nm², 179.66 ± 3.32 nm², and 179.77 ± 2.58 nm², respectively. For the A0A0H2YHQ2 complexes, the valomaciclovir-bound form had an average SASA of approximately 177.20 ± 2.95 nm², while the miraxanthin-III complex exhibited a slightly higher SASA of 186.29 ± 3.01 nm². These results suggest that the A0A3N4AYN0-chelidonate complex had the least solvent exposure, hinting at its comparatively greater stability. Similarly, the relatively lower SASA value for the A0A0H2YHQ2-valomaciclovir complex also suggests enhanced stability compared to other complexes.

Hydrogen bond analysis

Hydrogen bonds play a critical role in the stability and specificity of protein-ligand interactions. Throughout the molecular dynamics simulations, the formation and disruption of these bonds were tracked to better understand the binding dynamics and stability of the complexes. The average number of hydrogen bonds formed in A0A3N4AYN0 complexes ranged from 1 to 2 for most ligands. However, the complex with chelidonate displayed a notably different pattern, suggesting some unique interaction dynamics. Overall, the presence of consistent hydrogen bonding indicated stable interactions for most of the A0A3N4AYN0 complexes (Fig. 8). For the A0A0H2YHQ2 complexes, the valomaciclovir-bound form maintained an average of around 2 hydrogen bonds throughout the simulation. In contrast, the Miraxanthin-III complex formed approximately 1 hydrogen bond on average, suggesting that the A0A0H2YHQ2-valomaciclovir interaction was more stable and possibly more effective as an inhibitor (Fig. 9).

The analysis of molecular dynamics simulations confirmed the stability and compact nature of all complexes, revealing consistent and stable interactions between the protein and ligand, as evidenced by backbone RMSD, ligand RMSD, RMSF, Rg, SASA, and hydrogen bond interaction. The results suggest that chelidonate for A0A3N4AYN0 and valomaciclovir for A0A0H2YHQ2 are the most stable and promising candidates among the studied complexes. Notably, despite the low number of hydrogen bonds in A0A3N4AYN0-chelidonate, the results of the molecular dynamics simulations indicated its considerable stability. This could be attributed to pi-interactions between chelidonate and residues TYR123 (pi-pi stacking) and LYS237 (pi-cation), as well as the salt bridge formed with residue ARG368.

In our comprehensive study of the Yersinia pestis proteome, we identified and characterized two distinct drug target proteins, A0A3N4AYN0 and A0A0H2YHQ2. These targets are crucial due to their exclusive presence in Yersinia pestis and the broad-spectrum potential of A0A0H2YHQ2, found in diverse pathogens.

Through advanced techniques involving high-throughput virtual screening, docking analyses, and molecular dynamics simulations, we discovered several promising candidate compounds. Notably, chelidonate emerged as a potent inhibitor for the A0A3N4AYN0 target, demonstrating strong and stable binding efficacy. Similarly, valomaciclovir showed significant promise as an inhibitor for A0A0H2YHQ2, with consistent interaction dynamics and impressive stability. Future directions should include experimental validation to confirm these findings, further optimization of drug formulations to enhance pharmacokinetic properties, and extended molecular dynamics simulations to assess the long-term stability and efficacy of these complexes in the context of therapeutic development against Y. pestis.

Authors contribution statement

S.K. conceived the research and reviewed the original draft; L.C., L.Z., L.Q. conducted the data extraction and processing; L.C. analyzed the results and wrote the original draft; Y.L. reviewed and edited the draft. All authors have read and agreed to the published version of the manuscript.

Funding statement

This research was sponsored by the Funding for school-level research projects of Yancheng Institute of Technology (Grant No. xjr2020020).

Conflict of interest disclosure

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Data availability statements

All data generated or analyzed during the study, including the supplementary information files, can be accessed via the link provided to the Figshare repository (https://figshare.com/s/570e794af924c9c04772).

Ethical approval statement

Not applicable.

World Health Organization (2022) Plague. https://www.who.int/news-room/fact-sheets/detail/plague. Accessed 8 Aug 2023
Lei C, Kumar S (2022) Yersinia pestis antibiotic resistance: a systematic review. Osong Public Health Res Perspect 13:24–36. https://doi.org/10.24171/j.phrp.2021.0288
Tang M, Odejinmi SI, Allette YM, Vankayalapati H, Lai K (2011) Identification of novel small molecule inhibitors of 4-diphosphocytidyl-2-C-methyl-d-erythritol (CDP-ME) kinase of Gram-negative bacteria. Bioorg Med Chem 19:5886–5895. https://doi.org/10.1016/j.bmc.2011.08.012
Swietnicki W, Carmany D, Retford M, Guelta M, Dorsey R, Bozue J, Lee MS, Olson MA (2011) Identification of Small-Molecule Inhibitors of Yersinia pestis Type III Secretion System YscN ATPase. PLoS One 6:e19716. https://doi.org/10.1371/journal.pone.0019716
McKelvie J, Richards M, Harmer J, Milne T, Roach P, Oyston P (2013) Inhibition of Yersinia pestis DNA adenine methyltransferase in vitro by a stibonic acid compound: identification of a potential novel class of antimicrobial agents. Br J Pharmacol 168:172–188. https://doi.org/10.1111/j.1476-5381.2012.02134.x
Demeure C, Dussurget O, Fiol GM, Le Guern AS, Savin C, Pizarro-Cerdá J (2019) Yersinia pestis and plague: an updated view on evolution, virulence determinants, immune subversion, vaccination and diagnostics. Microbes Infect 21:202–212. https://doi.org/10.1016/j.micinf.2019.06.007
Sharma A, Pan A (2012) Identification of potential drug targets in Yersinia pestis using metabolic pathway analysis: MurE ligase as a case study. Eur J Med Chem 57:185–195. https://doi.org/10.1016/j.ejmech.2012.09.018
Islam J, Sarkar H, Hoque H, Hasan MdN, Jewel GMNA (2022) In-silico approach of identifying novel therapeutic targets against Yersinia pestis using pan and subtractive genomic analysis. Comput Biol Chem 101:107784. https://doi.org/10.1016/j.compbiolchem.2022.107784
Ding W, Baumdicker F, Neher RA (2018) panX: pan-genome analysis and exploration. Nucleic Acids Res 46:e5–e5. https://doi.org/10.1093/nar/gkx977
Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682. https://doi.org/10.1093/bioinformatics/btq003
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9. https://doi.org/10.1093/nar/gkn201
Luo H, Lin Y, Gao F, Zhang C-T, Zhang R (2014) DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements: Table 1. Nucleic Acids Res 42:D574–D580. https://doi.org/10.1093/nar/gkt1131
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037
Zhou Y, Zhang Y, Lian X, Li F, Wang C, Zhu F, Qiu Y, Chen Y (2022) Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucleic Acids Res 50:D1398–D1407. https://doi.org/10.1093/nar/gkab953
Raman K, Yeturu K, Chandra N (2008) targetTB: A target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis. BMC Syst Biol 2:109. https://doi.org/10.1186/1752-0509-2-109
Shanmugham B, Pan A (2013) Identification and Characterization of Potential Therapeutic Candidates in Emerging Human Pathogen Mycobacterium abscessus: A Novel Hierarchical In Silico Approach. PLoS One 8:e59126. https://doi.org/10.1371/journal.pone.0059126
Raman K, Yeturu K, Chandra N (2008) targetTB: A target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis. BMC Syst Biol 2:109. https://doi.org/10.1186/1752-0509-2-109
Ammari MG, Gresham CR, McCarthy FM, Nanduri B (2016) HPIDB 2.0: a curated database for host–pathogen interactions. Database 2016:baw103. https://doi.org/10.1093/database/baw103
Urban M, Pant R, Raghunath A, Irvine AG, Pedro H, Hammond-Kosack KE (2015) The Pathogen-Host Interactions database (PHI-base): additions and future developments. Nucleic Acids Res 43:D645–D655. https://doi.org/10.1093/nar/gku1165
Durmuş Tekir S, Çakır T, Ardıç E, Sayılırbaş AS, Konuk G, Konuk M, Sarıyer H, Uğurlu A, Karadeniz İ, Özgür A, Sevilgen FE, Ülgen KÖ (2013) PHISTO: pathogen–host interaction search tool. Bioinformatics 29:1357–1358. https://doi.org/10.1093/bioinformatics/btt137
Sha S, Ni L, Stefil M, Dixon M, Mouraviev V (2020) The human gastrointestinal microbiota and prostate cancer development and treatment. Investig Clin Urol 61:S43. https://doi.org/10.4111/icu.2020.61.S1.S43
Gomaa EZ (2020) Human gut microbiota/microbiome in health and diseases: a review. Antonie Van Leeuwenhoek 113:2019–2040. https://doi.org/10.1007/s10482-020-01474-7
Alcock BP, Raphenya AR, Lau TTY, Tsang KK, Bouchard M, Edalatmand A, Huynh W, Nguyen A-L V, Cheng AA, Liu S, Min SY, Miroshnichenko A, Tran H-K, Werfalli RE, Nasir JA, Oloni M, Speicher DJ, Florescu A, Singh B, Faltyn M, Hernandez-Koutoucheva A, Sharma AN, Bordeleau E, Pawlowski AC, Zubyk HL, Dooley D, Griffiths E, Maguire F, Winsor GL, Beiko RG, Brinkman FSL, Hsiao WWL, Domselaar G V, McArthur AG (2019) CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz935
Alcock BP, Huynh W, Chalil R, Smith KW, Raphenya AR, Wlodarski MA, Edalatmand A, Petkau A, Syed SA, Tsang KK, Baker SJC, Dave M, McCarthy MC, Mukiri KM, Nasir JA, Golbon B, Imtiaz H, Jiang X, Kaur K, Kwong M, Liang ZC, Niu KC, Shan P, Yang JYJ, Gray KL, Hoad GR, Jia B, Bhando T, Carfrae LA, Farha MA, French S, Gordzevich R, Rachwalski K, Tu MM, Bordeleau E, Dooley D, Griffiths E, Zubyk HL, Brown ED, Maguire F, Beiko RG, Hsiao WWL, Brinkman FSL, Van Domselaar G, McArthur AG (2023) CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res 51:D690–D699. https://doi.org/10.1093/nar/gkac920
Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FSL (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26:1608–1615. https://doi.org/10.1093/bioinformatics/btq249
Wilkins MR, Gasteiger E, Bairoch A, Sanchez J-C, Williams KL, Appel RD, Hochstrasser DF Protein Identification and Analysis Tools in the ExPASy Server. In: 2-D Proteome Analysis Protocols. Humana Press, New Jersey, pp 531–552
Krogh A, Larsson B, von Heijne G, Sonnhammer ELL (2001) Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen. J Mol Biol 305:567–580. https://doi.org/10.1006/jmbi.2000.4315
Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar GA, Bileschi ML, Bork P, Bridge A, Colwell L, Gough J, Haft DH, Letunić I, Marchler-Bauer A, Mi H, Natale DA, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A (2023) InterPro in 2022. Nucleic Acids Res 51:D418–D427. https://doi.org/10.1093/nar/gkac993
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439–D444. https://doi.org/10.1093/nar/gkab1061
Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26:283–291. https://doi.org/10.1107/S0021889892009944
Wiederstein M, Sippl MJ (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 35:W407–W410. https://doi.org/10.1093/nar/gkm290
Colovos C, Yeates TO (1993) Verification of protein structures: Patterns of nonbonded atomic interactions. Protein Science 2:1511–1519. https://doi.org/10.1002/pro.5560020916
Yu J, Zhou Y, Tanaka I, Yao M (2010) Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics 26:46–52. https://doi.org/10.1093/bioinformatics/btp599
Lv Q, Chen G, He H, Yang Z, Zhao L, Zhang K, Chen CY-C (2023) TCMBank-the largest TCM database provides deep learning-based Chinese-Western medicine exclusion prediction. Signal Transduct Target Ther 8:127. https://doi.org/10.1038/s41392-023-01339-1
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037
Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, Halgren TA, Sanschagrin PC, Mainz DT (2006) Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein−Ligand Complexes. J Med Chem 49:6177–6196. https://doi.org/10.1021/jm051256o
Lipinski CA (2004) Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol 1:337–341. https://doi.org/10.1016/j.ddtec.2004.11.007
Xiong G, Wu Z, Yi J, Fu L, Yang Z, Hsieh C, Yin M, Zeng X, Wu C, Lu A, Chen X, Hou T, Cao D (2021) ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res 49:W5–W14. https://doi.org/10.1093/nar/gkab255
Zhang S, Yan Z, Huang Y, Liu L, He D, Wang W, Fang X, Zhang X, Wang F, Wu H, Wang H (2022) HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer. Bioinformatics 38:3444–3453. https://doi.org/10.1093/bioinformatics/btac342

Tables 4 to 5 are available in the Supplementary Files section

No competing interests reported.

Tables.docx

Download PDF

Editorial decision: Revision requested
12 Aug, 2024
Reviews received at journal
03 Aug, 2024
Reviews received at journal
30 Jul, 2024
Reviews received at journal
30 Jul, 2024
Reviews received at journal
28 Jul, 2024
Reviews received at journal
28 Jul, 2024
Reviewers agreed at journal
26 Jul, 2024
Reviewers agreed at journal
23 Jul, 2024
Reviewers agreed at journal
22 Jul, 2024
Reviewers agreed at journal
22 Jul, 2024
Reviewers agreed at journal
21 Jul, 2024
Reviewers agreed at journal
20 Jul, 2024
Reviewers agreed at journal
20 Jul, 2024
Reviewers agreed at journal
20 Jul, 2024
Reviewers invited by journal
20 Jul, 2024
Editor assigned by journal
20 Jul, 2024
Submission checks completed at journal
20 Jul, 2024
First submitted to journal
19 Jul, 2024

You are reading this latest preprint version

Screening of promising molecules against potential drug targets in Yersinia pestis by integrative pan and subtractive genomics, docking and simulation approach

Status:

Version 1

Abstract

Figures

Introduction

Materials and methods

Removal of paralogous protein sequences

Identification of non-homologous protein sequences

Selection of essential non-homologous proteins

Druggability of essential proteins

Removal of proteins homologous to human ‘anti-targets’

Conservancy analysis of druggable essential proteins

Host pathogen interaction analysis

Identification of gut microflora non-homologous proteins

Protein resistance analysis

Prediction of subcellular localizations

Drug target prioritization

Structure prediction and homology modelling

Validation of protein structure

Active site prediction

Ligand extraction

Molecular docking and ADMET analysis

Molecular dynamics simulation

Results and discussion

Three-dimensional structure prediction

Validation of modelled structure

Binding site prediction

Molecular docking and ADMET analysis

Molecular dynamics simulation of protein-ligand complexes

RMSD analysis

RMSF analysis

Radius of Gyration

Solvent-accessible surface area analysis

Hydrogen bond analysis

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1