2.1. Hybrid Generation
The hybrid generation method involves creating small molecule-peptide hybrids using the RDKit module. The RDKit library for molecular manipulation and hybridizing small molecules with peptides was used. The core component of this method is the LigandBuilder class, which handles the creation of peptide bonds and the removal of the hydrogens needed for the peptide bond formation. A view of the graphical online tool is demonstrated in scheme 2.
[Scheme 2 near here]
- The input_amino_acids function allows users to input sequences of L- and D-amino acids, as well as other molecules, in various combinations. These sequences are then converted into RDKit molecule objects. The “other molecules” option is provided to handle non-standard amino acids as well as bulky and protective groups that possess the capability of forming at least one peptide bond from the selected terminal.
- The create_peptide_bond method creates peptide bonds between pairs of molecules, ensuring that the combined molecule retains a valid structure. Four possible reactions are attempted to maximize the success rate of bond formation in the molecules containing carboxylic acid or carbonate substructures.
- The remove_extra_hydrogens method maintains the structural integrity of the hybrid molecules and removes the extra hydrogen atoms selectively from the nitrogen atoms. Special care is taken for aromatic heterocycle structures, ensuring that the nitrogen atoms within these structures are properly hydrogenated. These structures can lead to kekulization errors if not handled properly.
- The generate_ligands method generates all possible ligand arrangements by combining the input amino acids with a specified molecule. The connection site and terminal (C or N) are specified to guide the attachment process. The process generates unique ligands by attaching peptide sequences to the specified connection site of the input small molecule. Each generated ligand is then checked for chemical accuracy, hydrogenated, and converted to its canonical SMILES representation for further use in docking simulations.
2.2. Ligand preparation
Ligand preparation is implemented via the EasyDock module, to which Ligands can be provided as SMILES strings. If non-3D structures are provided, RDKit’s EmbedMolecule and UFFOptimizeMolecule modules will create optimized structures. Protonation is implemented at pH 7.4 or any desired pH with the pkasolver module. Although automated protonation applications are not optimal (ten Brink and Exner 2009), their use in high-throughput practices is widely accepted (Bender, Gahbauer et al. 2021). Then, the molecule is converted to PDBQT format via the Meeko module and can be used for docking simulation.
2.3. Target preparation
Either UniProt (2023) or PDB IDs (Berman, Westbrook et al. 2000) can be used as input to prepare a receptor ready for further docking calculations. In the event of receiving a UniProt ID, the program connects to the UniProt API (application programming interface) provided by the database to query and access its data programmatically. Data retrieved from the API is then parsed to search for PDB IDs and chains related to the desired protein. In cases of multiple PDBs per entry, the chain with the largest amino acid count and highest resolution is selected. After the PDB ID and chain are chosen, the PDB file is downloaded with the Biopython module (Cock, Antao et al. 2009), and extra chains are removed from the file. In this step, the centroid of the chain is also calculated and added to the PDB file in order to use in further docking calculations. The PDBFixer module is also integrated with the target preparation module, which is used to fix common problems in PDB files, such as missing atoms, missing residues, non-standard residues, etc. Additionally, all heteroatoms, including water molecules, are removed in this step. PDB files with multiple chains are accessible through PDB IDs.In M01 TOOL, the MGLtools tool was used to check hydrogens, add Gasteiger charges (Tiwari, Mahasenan et al. 2009) to the structure and remove non-polar hydrogens by merging them into the adjacent carbon atom.
2.4. Docking configuration
This tool automatically generates default configuration files for docking, thus, minimizing the needed input files. The latest version of Autodock Vina (1.2.5) was used for docking simulations. The input PDBQT files and chain centroid are prepared as explained in the previous sections, while the protein setup and config files, each one file, are yet to be prepared. The chain centroid is used as the center of the grid box and a box of 126×126×126 Å is selected to cover the whole chain in order to run a blind docking process. The high value of 32 is chosen for exhaustiveness to compensate for a fairly large grid box size (Trott and Olson 2010, Eberhardt, Santos-Martins et al. 2021). Maximally 10 binding modes per ligand will be obtained. In the subsequent analysis the highest binding affinity will only be considered.
2.5. Molecular descriptor calculation
For each ligand, M01 tool calculates the Crippen-Wildman partition coefficients (Wildman and Crippen 1999) (logP), hydrogen bond donor count, hydrogen bond acceptor count, molecular weight (Lipinski 2000), topological polar surface area (TPSA), and QED score (Bickerton, Paolini et al. 2012). This calculation is performed through RDKit’s rdMolDescriptors module.