Assessment of LDH sequence similarity among apicomplexan species
Upon analysis of its nucleotide sequence, it was determined that the BbigLDH gene contains an open reading frame spanning 969 nucleotides, which encodes for a sequence of 332 amino acids. The alignment of sequences was conducted using ClustalW and visualized through ESPript 3 (Supplement file, Fig. 1). The highest sequence identity of 95.34% and 85.98% was observed between BbigLDH and Babesia ovata and Babesia bovis, respectively. Other species closely related to BbigLDH, such as Theileria orientalis exhibited a 70.66% match, while Plasmodium falciparum showed a 50.31% of identity. Notably, there is a 30% identity with Bos taurus LDH A, LDH B, which is recognized as the host for Babesia bigemina. The phylogenetic tree, constructed using the Neighbor Joining method, is provided in Supplement file, Fig. 3 [66]. The phylogenetic tree suggests that BbigLDH is positioned within the Babesia clade, with a closer evolutionary relationship to B. ovata. In contrast, LDHs originating from mammalian hosts form a distinct clade, indicating a clear separation in their evolutionary lineage from those of the apicomplexan parasites. The InterPro database analysis of BbigLDH sequence (XP_012766822.1) revealed the presence of LDH1N-NAD binding (PF0056 / 9-152) and LDH1C-alpha/beta C terminal (PF02866 / 157–315) domains. The 3D model of BbigLDH is expected to include a binding site for NADH or NAD + cofactors as well as a binding site for pyruvate or lactate substrates. The secondary structure of BbigLDH consists of sixteen alpha-helices, fourteen beta-strands, and various loops (Supplement file, Fig. 2).
BbigLDH displays a common five-amino acid insertion within its substrate specificity loop. Crystallographic analysis PfLDH have shown that this insertion contributes to a distinctive architecture in the active site, which could be a target for drug development [18]. While the sequences in this region vary across species, for BbigLDH, the sequence is DDEWT, and it is predicted to form a coil structure (Fig. 2A). BbigLDH shares identical catalytic residues with those conserved in all LDHs, which include R109, D168, H195, and R171 (residues R98, D158, H185, R161 in BbigLDH) (Supplement file, Fig. 1) [5]. Residues 93–97 (DDEWT) typically form a mobile loop in most apicomplexan LDHs (Fig. 2A). In the ternary complex, the loop moves downward to form the closed catalytic site [19]. The loop opposite to the substrate specificity loop (G236, T237, G238 in BbigLDH) is called as “opposing loop” and depicted in Fig. 2B and Supplement file, Fig. 5 [19]. The distinctive two-residue deletion within the opposing loop of modern Plasmodium LDHs, along with the shared loop structure with ancient LDH (AncLDH) and Toxoplasma LDHs, marks a significant evolutionary deviation (Supplement file, Fig. 5) [19]. Additionally, while ancestral LDH displays a slight oxaloacetate activity in common with modern Toxoplasma LDHs, this capability is absent in Plasmodium LDHs. This intriguing correlation strongly suggests that the opposing loop deletion, and potentially the presence of Ala236 and Pro246, play a pivotal role in shaping the remarkably stringent substrate specificity observed in modern Plasmodium LDHs [67]. Unlike in modern Plasmodium LDHs, a deletion in the opposing loop is not observed in BbigLDH, like Toxoplasma gondii LDH. This sets it apart from the modern Plasmodium LDHs in this aspect. In Babesia and Theileria (excluding T. equi) genus, a G[QSTN]G pattern is present (Fig. 2). Unlike in Plasmodium and B. taurus, such a pattern is not found, and there is a distinction in the opposing loop from both ancLDH and Toxoplasma LDH. While residues preceding the opposing loop in B. taurus and Plasmodium LDHs are positively charged (H and K residues), in Babesia and Theileria, these residues are observed to be aromatic (Y).
Homology modelling and model validation
The crystal structures of monomers in both closed and open states of BbigLDH were previously unknown. To address this, homology models were generated using the MODELLER 9.23 program. These models were developed for both open and closed states, incorporating pyruvate and NADH, respectively. The suitable template proteins were selected from PDB-structured proteins using BLASTp, based on the highest identity percentage. Additionally, during the selection process, emphasis was placed on ensuring that the template LDH structures belonged to apicomplexans and contained NADH and pyruvate in closed state. The template LDH proteins used for modelling included 1PZE (1.95 Å) [68] - the T. gondii LDH1 apoenzyme structure- for the open state, and 4PLC (1.50 Å) - an ancestral LDH from apicomplexans [67]- for the closed state, along with pyruvate and NADH. The generated 100 models were selected based on the lowest DOPE value and subsequently validated using various methods. One of the validation programs employed was ERRAT, which examines the consistency of atomic interactions within the modelled structure. The overall quality factors obtained from the ERRAT server for the open and closed models were 90.1274 and 93.949, respectively [69]. These values indicate a relatively high quality of the modelled structures. Another validation program utilized was ProSA [70], which evaluates the quality of generated models by constructing an energy profile of structural residue areas. The Z-scores obtained from ProSA were compared with those of experimentally validated structures to assess consistency. According to ProSA results, the Z-scores for mutant and wild-type models were − 3.3 and − 3.19, respectively. Models falling within the acceptable range of residue and energy efficiency were considered reliable. Quantitative Model Energy Analysis (QMEAN) is a valuable tool for analyzing crucial geometric aspects of protein structures, such as determining natural curvatures and calculating torsion angle potentials of amino acids [71]. The QMEAN value is originally in a range [0,1] with one being good expected from high resolution X-ray structures. In the results, QMEAN4 values were − 2.47 and − 2.19 for open state and closed state, respectively. Verify3D is a validation program that evaluates the absolute secondary and tertiary folding behaviour of a model protein by analyzing the 3D-1D scores in the context of 3D structure-sequence relationships. A higher value for amino acids is expected compared to the threshold value, and the overall simplified score determines the structure's validity [72]. Homology models passed by 88.82% and 93.79% for open state and close state, respectively according to the assessment based on favourable 3D-1D scores. A Ramachandran plot graph was generated for the model proteins using ProCHECK [73]. The percentages of residues in the favoured region were 88.1% and 86.6% respectively for the open and closed states. Based on the assessment of validation values, all these validation results confirmed that the enzymes were modelled with high accuracy and reliability (Fig. 3A). Additionally, the structural difference introduced by the 5-AA insertion in the open form of BbigLDH compared to host LDH A and B is shown in Fig. 3B. Detailed results obtained from validation were provided in Supplement file, Table 1.
Table 1
Twenty compounds, exhibiting lower affinity with BtLDH A, B and A-like B and their average values (kcal/mol)
Compound no
|
Compound ID
|
BtLDHA
|
BtLDHB
|
BtLDHA-likeB
|
Average
|
C1
|
Z1478445725
|
-4.311
|
-5.156
|
-4.875
|
-4.78067
|
C2
|
Z1246322905
|
-4.495
|
-5.448
|
-3.34
|
-4.42767
|
C3
|
Z826473056
|
-4.253
|
-4.534
|
-4.414
|
-4.40033
|
C4
|
Z509196536
|
-4.233
|
-5.03
|
-4.275
|
-4.51267
|
C5
|
Z1424270493
|
-4.559
|
-4.411
|
-5.247
|
-4.739
|
C6
|
Z278525320
|
-4.464
|
-4.047
|
-4.283
|
-4.26467
|
C7
|
Z1640818270
|
-4.863
|
-5.026
|
-4.731
|
-4.87333
|
C8
|
Z604033838
|
-3.745
|
-3.219
|
-4.344
|
-3.76933
|
C9
|
Z2373310039
|
-4.842
|
-5.14
|
-4.485
|
-4.82233
|
C10
|
Z973658262
|
-4.959
|
-3.858
|
-3.883
|
-4.23333
|
C11
|
Z1616793138
|
-4.507
|
-4.505
|
-4.672
|
-4.56133
|
C12
|
Z2953881032
|
-4.596
|
-3.577
|
-4.391
|
-4.188
|
C13
|
Z1253206711
|
-4.843
|
-4.798
|
-4.772
|
-4.80433
|
C14
|
Z19650139
|
-4.249
|
-4.068
|
-4.262
|
-4.193
|
C15
|
Z2400537901
|
-5.409
|
-4.512
|
-4.829
|
-4.91667
|
C16
|
Z2201943136
|
-4.8
|
-4.406
|
-5.435
|
-4.88033
|
C17
|
Z2060912694
|
-4.114
|
-4.636
|
-4.857
|
-4.53567
|
C18
|
Z2063989764
|
-3.427
|
-5.208
|
-5.287
|
-4.64067
|
C19
|
Z1531526443
|
-4.526
|
-5.325
|
-4.901
|
-4.91733
|
C20
|
Z2055833084
|
-4.401
|
-4.787
|
-4.808
|
-4.66533
|
Validation of Molecular Docking Procedure
The validation of the molecular docking procedure to be used in computational biology methods is an essential preliminary step to ensure the validity and reliability of the method. In the validation process, inhibitors present in the crystal structures of PfLDH with known structures were redocked into the original protein structure using the cognate docking procedure in methods section. The superposition of the resulting protein structure with the original protein structure is examined, along with the binding mode of the ligand and the RMSD values. Low RMSD values (The superimposed values for 1U4O, 1T25 and 1XIV are 0.14 Å, 0.21 Å, and 0.32 Å, respectively.) indicate a high degree of similarity between the two structures. Other RMSD values for 1U4S and 1T26 are 1.26 Å and 1.06 Å, respectively which have more higher superimposed values then other PDB structures. Additionally, when evaluating the cognate docking results, a threshold value of -7.00 kcal/mol for the docking score has been determined as the preferred threshold for assessing the HTVS docking results.
The enrichment calculation results provide valuable information about the performance of a computational model or algorithm in virtual screening or drug discovery experiments. The model was tested on a dataset consisting of 1000 ligands, including 34 known actives. The goal was to determine the model's ability to accurately rank the active compounds and enrich them at the top of the ranked list. The obtained BEDROC values ranged from 0.229 to 0.332, suggesting a moderate enrichment capability. The receiver operating characteristic (ROC) curve and its associated area under the curve (AUC) were also analyzed to evaluate the overall classification accuracy of the model. The ROC value of 0.69 suggests a reasonable discriminative ability, indicating that the model can distinguish between active and inactive compounds to a certain extent. The area under the accumulation curve (AUC) is 0.68, indicating that a significant portion of the active compounds is discovered early in the screening process. The obtained RIE value of 3.43 indicates that the model is performing better than random selection but still leaves room for improvement. It highlights the model's ability to detect structurally diverse active compounds (Supplement file, Fig. 4 and Table 2).
Structure-Based Virtual Screening, QM-Polarized Ligand Docking and Induced Fit Docking (IFD)
Four structures of BbigLDH were utilized, comprising two open and two closed states. A pair of open and closed states were generated after a 100 ns MD simulation using NAMD. For the prediction of binding sites, DoGSiteScorer method was employed on all models. [74]. In both the open and closed states of BbigLDH, druggable pockets were predicted, with a druggability score of 0.81 and simple score of 0.6. The pocket includes the pyruvate and NADH binding regions. The virtual library was initially docked to the active site of BbigLDH using GLIDE HTVS, SP and XP docking methods, respectively. The top 100 poses were then ranked based on XP GScores (Table 2). After comparison of compounds interacting with common residues, 256 ligands were selected. Subsequently, the remaining ligands underwent ADME analysis using the QikProp program. Compounds within the specified reference value range from the manual were identified (133 compounds). After ADME analysis, the Quantum Mechanics Polarized Ligand Docking method was applied, where the QM charges were calculated by using the QSite (Jaguar) and each ligand was redocked from its corresponding pose set. The poses with the lowest Glide Emodel values were chosen. Results with XP GScore values equal to or higher than − 7.00 kcal/mol were ranked to obtain the best outcomes for 70 models. Following this, induced fit docking was employed This involved relaxing the ligand into a conformationally relaxed receptor state, followed by redocking using Glide. The resulting 70 poses were ranked based on their IFD score, with lower scores being prioritized. Remaining compounds were also docked to the host LDH (Bos taurus LDH A, LDH B and LDH AL6B) to identify selective inhibitor candidates. The structural difference introduced by the 5-AA insertion in the open form of BbigLDH compared to host LDH A and B is shown in Fig. 3B. Twenty compounds, exhibiting lower affinity with host LDHs and available for purchase were selected for molecular dynamics simulations and end-point binding free energy calculations (Table 1).
The evaluation of potential drug candidates involves a thorough assessment of various criteria, with a close examination of their physicochemical properties intricately linked to their structural attributes. Minor modifications in their structures have resulted in enhanced properties. In our study, the hit compounds underwent analysis to determine their compliance with Lipinski's rule of five. [55]. According to Lipinski's rule of five, an established set of guidelines for drug likeness, the molecular weight of potential drug candidates should be less than 500 Da. Additionally, the hydrogen bond donor count should be fewer than 5, the hydrogen bond acceptor count should be less than 10, and the predicted octanol/water partition coefficient should ideally fall within the range of -2.0 to 4. These criteria help assess the likelihood of a compound's success as a drug candidate based on its physicochemical properties [75]. All values for the compounds fall within the specified range for Lipinski rules and PlogP0/w. The Plog values for both hERG channel blocking and serum albumin binding fell within the reference ranges [76]. The predicted IC50 value for blocking HERG K + channels should fall within the accepted range, with values greater than − 5.0. In this study, the compound with the largest value is C18 (5.073), while the one with the smallest value is compound C9 (-2.504). On the other hand, the predicted apparent Caco-2 cell permeability should be assessed, with values indicating low permeability if less than 25 and high permeability if greater than 500 [77]. According to this information, it is observed that the compound with the highest permeability is C20 (2104.635), while the one with the least permeable is C19 (74.968). The predicted blood-brain barrier permeability coefficient should be within the range of -3.0 to 1.2 [78]. The predicted binding capacity to human serum albumin is considered appropriate if it falls between − 1.5 and 1.5 [79]. All values for the compounds fall within the specified range for PlogBB, PlogKhsa. The predicted human oral absorption, rated on a scale of 0 to 100, is considered favorable if it exceeds 70%. The compound with the highest human oral absorption rates is C18 and C20 (both 100), whereas C19 has a rate of 72.307 (Supplement file, Table 3).
Table 2
List of compounds which have favourable XP GScores. Free binding energy (∆G) values were listed for the last 5 ns of 20 ns MD simulation.
Compound no
|
Compound ID
|
Structure of BbigLDH
|
HTVS; XP gscore
|
QPLD; XP gscore
|
IFD; XP gscore
|
Prime energy
|
IFD score
|
dG Average
|
C1
|
Z1478445725
|
Closed state after MDS
|
-8.343
|
-8.515
|
-9.569
|
-13590.36
|
-689.09
|
-67.3425
|
C2
|
Z1246322905
|
Closed state after MDS
|
-8.608
|
-8.140
|
-9.048
|
-13502.23
|
-684.16
|
-58.4571
|
C3
|
Z826473056
|
Closed state after MDS
|
-8.236
|
-8.530
|
-11.017
|
-13460.85
|
-684.06
|
-64.0088
|
C4
|
Z509196536
|
Closed state
|
-8.997
|
-8.389
|
-10.565
|
-13452.59
|
-683.19
|
-57.2994
|
C5
|
Z1424270493
|
Closed state
|
-8.869
|
-7.598
|
-9.404
|
-13471.78
|
-682.99
|
-60.7591
|
C6
|
Z278525320
|
Closed state after MDS
|
-9.497
|
-7.480
|
-10.557
|
-13439.77
|
-682.55
|
-54.8145
|
C7
|
Z1640818270
|
Closed state after MDS
|
-8.479
|
-8.473
|
-7.652
|
-13462.95
|
-680.8
|
-64.4233
|
C8
|
Z604033838
|
Closed state
|
-8.382
|
-8.250
|
-10.285
|
-13409.02
|
-680.74
|
-66.175
|
C9
|
Z2373310039
|
Closed state after MDS
|
-8.173
|
-7.958
|
-7.998
|
-13442.56
|
-680.13
|
-75.825
|
C10
|
Z973658262
|
Closed state
|
-9.176
|
-9.182
|
-9.803
|
-13401.97
|
-679.9
|
-46.3542
|
C11
|
Z1616793138
|
Closed state
|
-8.648
|
-8.555
|
-9.624
|
-13403.46
|
-679.8
|
-46.0935
|
C12
|
Z2953881032
|
Closed state after MDS
|
-8.202
|
-7.972
|
-8.058
|
-13428.81
|
-679.5
|
-71.5432
|
C13
|
Z1253206711
|
Closed state
|
-8.925
|
-7.731
|
-10.617
|
-13374.26
|
-679.33
|
-53.7573
|
C14
|
Z19650139
|
Closed state
|
-8.701
|
-9.111
|
-7.606
|
-13429.61
|
-679.09
|
-60.4939
|
C15
|
Z2400537901
|
Closed state
|
-10.464
|
-8.919
|
-9.831
|
-13379.26
|
-678.79
|
-67.0062
|
C16
|
Z2201943136
|
Open state
|
-8.992
|
-8.443
|
-8.72
|
-13399.65
|
-678.70
|
-60.9869
|
C17
|
Z2060912694
|
Open state
|
-8.774
|
-8.274
|
-10.44
|
-13364.38
|
-678.66
|
-60.6208
|
C18
|
Z2063989764
|
Open state
|
-9.105
|
-7.750
|
-9.56
|
-13377.59
|
-678.43
|
-72.3022
|
C19
|
Z1531526443
|
Open state
|
-8.641
|
-8.706
|
-9.55
|
-13338.36
|
-676.47
|
-72.2696
|
C20
|
Z2055833084
|
Open state
|
-8.482
|
-7.328
|
-9.19
|
-13311.35
|
-674.76
|
-44.1338
|
Compound prioritization based on molecular dynamics simulations and binding energy
In this study, we conducted MD simulations and binding free energy calculations of 20 protein-ligand complexes obtained from the docking campaigns using the Desmond program for 20 ns. Furthermore, to determine ligand stability over a long period of time, MDS with GROMACS was performed for 100 ns, too.
Based on the RMSD values from the GROMACS simulations and the ∆G average values from the 20 ns MD simulations using Desmond (Table 2), we identified C7, C9, C12, C14, C15, C16, C18 and C20 compounds as candidates for an extended MD simulation for 300 ns. Subsequently, we assessed the RMSD values obtained from a 300 ns MD simulation (Fig. 4). According to the Lig-fit-prot RMSD values in Fig. 4B, all compounds except C20 were found in the binding cavity at the end of the simulation. Moreover, C16 demonstrated remarkable positional stability, maintaining a 3 Å RMSD value throughout the simulation. According to the MM/GBSA ∆G Distribution graph in Fig. 6A, both C9 and subsequently C16 displayed favourable binding energies at the last 30 ns of the 300 ns simulation.
In the subsequent phase, we selected C9, C15, C16, and C18 for an extensive 1000 ns MDS analysis according to the ∆G and RMSD values, and interaction with the active site residues. The RMSD analysis graph in Fig. 5B reveals that C16 maintained stability within the range of 2–4 Å. In the case of C9, stability was observed after 400 ns, with a stability range of 4–6 Å. C18 exhibited stability after 150 ns, within a range of 6–8 Å. C15 moving away from the binding pocket, therefore RMSD analysis was not performed. In Fig. 5C, the RMSF (Root Mean Square Fluctuation) graph indicates that residues between 80 and 100, which include the substrate-specific loop, exhibit dramatic changes for C18 compared to other compounds. Additionally, the residues between 235 and 240, which encompass the opposing loop, show changes for C9 and C16, distinct from those observed for C18.
According to the MM/GBSA ∆G Distribution graph in Fig. 6B, both C9, C16 and C18 displayed favourable binding energies in the last 100 ns of the 1000 ns simulation. The ∆G averages are − 52.5 kcal/mol, -62.1 kcal/mol and − 62.5 kcal/mol for C9, C16 and C18, respectively.
Figure 7 illustrates the average energies contributing to the overall MM-GBSA binding energy. Calculations were based on snapshots taken from the last 100 ns of a 1 µs MD simulation. The MM-GBSA ΔG binding energies are quite similar for C16, and C18 (− 62.11 kcal/mol, and − 62.52 kcal/mol, respectively), but the main driving forces behind these binding energies different. For C9-protein binding, lipophilic energy and van der Waals (vdW) energy are the predominant types of energy. In contrast, the contributions of coulomb and hydrogen bond (H-bond) energies in the binding of C16 and C18 are markedly higher from those in C9-protein binding.
In Fig. 8, all three compounds are found in the expected regions after the 1000 ns simulation time. All compounds are positioned within the enzyme's catalytic pocket, which is of crucial importance, and interactions with catalytic residues are evident. C9 retained its significant H-bond interactions after IFD, but lost them after MD. In Supplementary Fig. 6A and 7A, according to BbigLDH-C9 interactions, there are water bridges which interaction mediated between a water molecule and residues including T82, L85, D93, I128, and T129. On the other hand, upon examination of Supplement file, Fig. 6B and 7B, it is evident that in the BbigLDH-C16 complex, there are hydrogen bond interactions with residues N17 (77%), L85 (97%), R87 (94%), and N130 (70%), along with water Bridges involving L231 and pi-pi stacking with Y241 (32%). In Supplement file, Fig. 6C and 7C, hydrogen bond interactions are observed with residues R87, I128, N130 (46%) and R161 (39%) in the BbigLDH-C18 complex, along with pi-cation interactions involving R161 (33%). In Fig. 9A, after the induced fit docking stage, interactions with R87, I128, and N130 residues were generally observed with all compounds. On the other hand in Fig. 9B, after the 1000 ns simulation times, interactions with R87, N130 conserved. It can be observed that the interaction with these residues is maintained throughout the MD simulation for both C16 and C18 (Supplement file, Fig. 6). It is believed that these residues play a significant role in inhibition against BbigLDH. In the cognate docking stage, inhibitors found in the open and closed PfLDH structures, whose PDB structures are known, were docked to the open and closed states of BbigLDH structures. The interaction of the original PDB structures with the inhibitor and the interaction of the BbigLDH structures with the same inhibitor are shown in Supplement file, Fig. 8. It is observed that the lack of a stable region in the substrate-specific loop region of the open state of PfLDH structures results in differences with the open state of BbigLDH structure. Additionally, it is seen that the residues interacting with the ligands in the open state of BbigLDH structure and also known to have catalytic properties in some cases are R87, I128, N130, R161, and S239. On the other hand, for the closed state of BbigLDH, the most important residues interacting with the ligands are R98, N130, R161, H185, S239, and H240, as shown in Supplement file, Fig. 9. In conclusion, it is considered that compounds showing similarity in the critical residues obtained from the redocking of PfLDH inhibitors, whose PDB structure is known, to BbigLDH, and interacting with residues C09, C16, and C18, may have similar inhibitory properties on BbigLDH.
These findings strongly indicate that compounds located in the pocket where the cofactor and substrate bind may inhibit enzyme activity through these specific residue interactions. Particularly, C16 may be a promising candidate for in vitro inhibitor studies, given its stable interaction with BbigLDH. Compounds Z2373310039 (C09), Z2201943136 (C16) and Z2063989764 (C18) may warrant further investigation to comprehend their potential therapeutic effects on babesiosis.