The AIDD Workflow
The AIDD workflow is shown schematically in Fig. 1. The order in which the workflow steps are discussed below follows the path defined by the heavy green arrows.
The Dataset
The illustrative dataset used here consists of a series of TzP antimalarials that inhibit PfDHODH[12]. Relevant data was compiled from the public literature[12]–[19]. Figure 2 shows five of the early leads from that program, as well as three later analogs that proved more potent, selective, and metabolically stable.
Activity was reported as IC50 values for a solubilized form of PfDHODH lacking the N-terminal tail that anchors it to the mitochondrial membrane in vivo[12]. These were converted to inhibition constants using the Cheng-Prusoff relationship for competitive inhibitors[20] to be consistent with earlier design work targeting PfDHODH[21].
Seed molecules
AIDD requires a set of seed molecules. These can be as arbitrary as an amino acid or benzene, though having some functionality in place – e.g., nicotinamide – will speed up the evolutionary process substantially. A more usual approach is to seed the process with a minimally substituted scaffold. That was done in the AIDD runs described here - the seed molecule was DSM12, in which the 2, 3’, 4’, and 5’ positions all bear hydrogens.
Objective Functions and Pareto Ranking
It is possible to use AIDD to generate molecules that optimize a single algebraic combination of predicted property values. We have found it more productive, however, to use a more general technique – Pareto ranking – applied across several objective functions that can themselves be multi-property. Mathematically speaking, a set S is Pareto optimal when no member of the set is completely “dominated” by any other member. In other words: no member of the set is inferior to any other single member with respect to all criteria[22]. Pareto ranks are assigned to members of a set S by successive extraction of Pareto optimal subsets. Members of the Pareto optimal subset of S are assigned a Pareto rank of 1 and are said to belong to the first “Pareto layer” (L1) or to be on the “Pareto frontier.” Members of the Pareto optimal subset remaining after the first Pareto layer has been removed (S1 = S – L1) have a Pareto rank of 2. The process continues until all members of S have been assigned a rank. This is distinct from Pareto ranking defined as the number of members of a set by which a member is dominated[23]; the latter variation has been used, for example, in partial-match pharmacophore generation[24].
The use of Pareto ranking in AIDD has several advantages. For one, it makes it possible to explore multiple SARs simultaneously. For another, it expands the number of evolutionary paths available to get around “holes” in the chemistry space where no intermediate structures exist that are “good” in terms of individual criteria. It circumvents the need to rationally weight attributes that cannot be put on a single meaningful scale – i.e., that are fundamentally or practically incommensurate. Finally, it can help facilitate optimization challenges posed by the fundamentally chaotic nature of many desirable chemical spaces.
Several different kinds of property predictions can serve as objective functions in AIDD: predictions based on QSARs or QSPRs, synthetic accessibility estimates, Risk scores, and estimates based on HT-PK simulations. “Individual” objectives may themselves be functions of other models. Simple examples are min, max, or average of groups of complementary or related models, but more complex algebraic functions are also supported. The default expectation is that regression models will have been appropriately transformed (e.g., as logKi or pKi rather than as Ki, where Ki is a constant e.g., binding, inhibition, etc.) and that Risk models are in an additive space to make optimal use of additive out-of-scope (OoS) penalty factors (see below). Calls to external functions including docking/affinity scores and 3D ligand shape matching are also supported but are not explored here.
One essential feature of AIDD’s implementation of Pareto optimization is that users can specify a specific cap for each property beyond which further improvement (increase or decrease) has no practical advantage. Such caps are important because the program can readily create molecules that are a little superior to others in just one property and inferior in all (or most) other properties. The objective functions used in the experiments described here illustrate the major types available in AIDD:
Activity models
The approach used to construct artificial neural network ensemble (ANNE) models in ADMET Predictor has been described in detail elsewhere[25], [26]. The primary model used to predict PfDHODH inhibition by TzPs, using log-transformed Ki values, was built using 152 active TzPs. Each of the 33 ANNs in the ensemble contained four neurons in a single hidden layer (Fig. 3). Each took the same 10 descriptors as inputs, those descriptors having been selected by a genetic algorithm from among 122 well-represented, reasonably high-variance and relatively uncorrelated ones[26].
The 23 analogs (15%) assigned to the test set were selected by stratified random sampling across the activity range. The observed RMSEs are low enough to make logKi a reasonably reliable surrogate for experimental activity of follow-up candidates provided those predictions are in-scope. Structures for the analogs used to build the primary logKi model are provided in the Supplementary Materials, along with Ki’s and values for the descriptors used in the final ANNE models. The partitions between test set and training pools are also provided.
Only active analogs were used to build the logKi regression model, so new analogs generated might lie outside its applicability domain for reasons that cannot be accounted for in a quantitative model[27], [28]. The companion ActivityClass classification model was built on a subset of 150 analogs that fit a more generalized version of the scaffold shown in Fig. 2 - one that allowed substitution at the 2’ and/or 6’ positions of the phenylamino ring and which included compounds for which only qualified activity values (e.g., “Ki > 1 µM”) were available. Of those 150, the 98 with measured Ki values below 1 µM were assigned to the “active” class, whereas those with Ki values above that were assigned to the “inactive” class. Structures for the analogs used to build the ActivityClass model are provided in the Supplementary Materials. A training pool of 120 compounds was selected by random sampling stratified across activity classes and a genetic algorithm was used to optimize the ANN architecture. The relatively simple model produced had two hidden neurons and seven input descriptors and exhibited good predictive sensitivity (1.00) and specificity (0.90) for the 30 analogs in the held-out test set. The corresponding statistics for performance on the training pool were 0.96 and 0.98.
Risk models
The list of attributes that characterize a good drug candidate is long, including but not limited to: on-target activity, selectivity against related targets, solubility, permeability, bioavailability, synthesizability, and metabolic clearance. Unfortunately, the size of a Pareto frontier increases rapidly as the number of criteria considered increase. This “curse of dimensionality” makes Pareto optimization against more than five separate attributes impractical. Fortunately, most of those characteristics are not independent functions of molecular structure. This is true for many ADMET properties, which makes it useful to group them into lists of rules of thumb like “Lipinski’s Rule of 5”[29]. The ADMET Risk model provided in ADMET Predictor is an extension of Lipinski’s idea from four attributes linked to oral bioavailability (molecular weight, logP, number of NH and OH groups, and number of N and O atoms) to 22 rules that address different aspects of ADMET[25], with each rule violation adding up to one point to the ADMET Risk score. The rules are Boolean combinations of relations between various molecular attributes and model outputs, with risk thresholds calibrated against 2260 compounds selected from the World Drug Index (WDI) in a manner similar to that described by Lipinski et al. The ADMET Risk score obtained exceeds 7 out of a maximum of 22 for 10% of the compounds in that reference subset.
“Fuzzy” risk thresholds[30] are supported in ADMET Predictor, in that thresholds can be specified as intervals as well as “crisp” point values. The penalty weight wi for each rule is specified as 1 by default and is added to the score for values on the “FALSE” side of the threshold (e.g., on the left for “<” relations), whereas there is no penalty increment for values on the “TRUE” side. Penalties scale linearly across the threshold interval. Rules used in the present work are included in the Supplementary Materials.
Estimated pharmacokinetic properties
Lipinski’s rules of thumb provide convenient broad-brush guidance to medicinal chemists as to factors that limit oral bioavailability, but modern software like GastroPlus® can run physiologically-based pharmacokinetic (PBPK) simulations that accurately estimate fraction bioavailable (%Fb) and fraction absorbed (%Fa)[31]. ADMET Predictor includes an HT-PK module that incorporates the full advanced compartmental and transit (ACAT) model from GastroPlus linked to a liver compartment, renal clearance, and a central compartment representing the remaining organs. The HT-PK implementation uses ADMET Predictor property estimates and requires no user interaction. It is very fast and accurately reflects the PK predictions performed in GastroPlus[31], which makes its outputs suitable for use in AIDD’s panel of objective functions. It nicely complements Risk models, in that it accounts for how ADMET property values interact rather than treating them independently.
Synthetic difficulty
SynthDiff is a generalized implementation of the SA score described by Ertl and Schuffenhauer[32] that provides an estimate of synthetic difficulty – “difficulty” rather than “accessibility” because a higher score indicates that the molecule is harder to make. Briefly, the raw score is a sum of a fragment frequency term and a complexity term. The first is calculated from how frequently a fragment is found in a one million molecule subset of PubChem, where each “fragment” is defined by a circular fingerprint with a topological radius of 3. Complexity calculation is based on the number of rings, stereochemistry, the number of macrocycles, the number of atoms and a molecular symmetry score. That raw score S is rescaled to a range from 0 to 10 by:
SynthDiff = -20*(S – 2.5)/13
An augmented version – SynthDiff+ – is available for AIDD that includes additional penalties for structural features that are particularly undesirable in a drug. To this end, structural alerts derived from the literature[33], [34] were collected in a user-editable file of SMARTS[35] queries whose contents are read into memory at start-up. When SynthDiff + is included in AIDD’s panel of objective functions, the program counts the number of occurrences of such alerts in any candidate molecule and increases its SynthDiff according to:
$$SynthDiff+ =SynthDiff+4*\left[1-{e}^{-\left(\frac{{x}^{2}}{3.4}\right)}\right]$$
where x is the number of structural alerts found, with each occurrence contributing to the count separately. The functional form used is motivated from Fig. 1h of Bickerton et al.[36] and approximately mimics the curve shown there. The increase in the synthetic difficulty estimate as the alert count increases goes to 4 as the exponential term goes to zero.
Parameter Settings
User-supplied parameters are:
-
Out-of-scope (OoS) penalties for models (1 for risk models, 10 for standard models by default)
-
pH for pH dependent properties (pH = 7.4 by default)
-
Species (human, mouse or rat) and dose (in mg) for PK properties
-
Location of output files (log files, results, etc.)
-
Frequency with which to write out generational “snapshots” (10% by default)
-
Number of generations to run (ngen)
-
Number of new candidates to consider for each generation (ncand)
-
Size of the initial population (n1);
-
Minimum size for each generation (nmin)
-
Multithreading (run using multiple processors to increase speed)
ADMET Predictor flags OoS predictions to indicate that they may not be reliable. When an OoS prediction is used in a Risk rule, that rule is evaluated pessimistically, i.e., as though the prediction was a “bad” one. Any Risk penalty that would otherwise be added to the total Risk score is then attenuated by a multiplicative factor. This OoS Risk multiplier is set to 0.5 by default outside of AIDD, but that global default can be reset by the user to reflect how conservatively they want to treat such predictions. The corresponding penalty is specified separately for each AIDD run, however, because the evolutionary process might otherwise cause even small uncertainties to compromise results. The default AIDD risk penalty of 1 was used here, which means that any potential rule violation was fully penalized.
AIDD applies a separate additive penalty to OoS predictions for regression and classification models used in the Pareto optimization. The default value of 10 was used here; given that most objective functions are on a log scale or have relatively small maximum values, this value makes it easy to filter out compounds with OoS predictions in post-processing. Penalizing compounds in this way is preferable to simply discarding them; so long as the analog is not bested in every other good property by another compound, it will remain in the population and have a chance to produce “descendants” with similarly desirable properties but for which the corresponding activity prediction is in-scope.
The minimum generation size is only enforced after 100 or 0.5*ngen generations, whichever comes first. The reason for not imposing this constraint earlier is to allow the initial population to grow gradually with only the most Pareto optimal molecules. This generally leads to faster convergence and a better overall solution. A number seed is required to initialize roulette wheel selections. Note that this only completely specifies the trajectory of the run if multithreading is turned off, which is not the case by default and which was not done here. The option is provided for cases where strict reproducibility is required (e.g., in tutorials).
Primary structural filter and transform files
AIDD relies on input files to control how new candidate structures get generated, to make sure molecules that lack key structural features or possess fatally flawed ones do not get into the next generation, and to specify a required scaffold.
There is an option to include a scaffold query that must be satisfied by any new candidate. This query can be uploaded as a file, created in MedChem Designer, or entered as a SMARTS string. Note that SMARTS queries for even relatively simple scaffolds can be quite complicated. The one corresponding to that shown in Fig. 2 – which is included in the query file used for the TzP experiments described here (see Supplemental Information) – is a case in point. Besides supplying a required core structure, it lets the user specify constraints on where and how that scaffold can be substituted, and even allows variation within rings and other variables.
A file of SMARTS structural filter queries specifies substructures that must not appear in any acceptable product. The default set of queries provided are based on exclusion rules described by Brenk et al[33]. Some non-drug-like molecules will get through into the population over the course of the evolutionary process despite these precautions, but that is actually beneficial to a degree. Such “bad” molecules can be – and often are – subsequently transformed into drug-like analogs with good physicochemical and biochemical properties. Unacceptable analogs that do make it through to the final generation can be removed by applying more stringent post-processing filters. The default primary structural filter file provided with ADMET Predictor is well-suited to most AIDD applications, but it is impossible to anticipate all possibilities and business rules. The entries are fully editable to reflect this.
A transform file is required to define the set T of SMIRKS[37] expressions from which molecular transformations are drawn and applied to molecules from the evolving population to generate new candidate structures. The default file provided with ADMET Predictor has over 150 transforms, including “add” transforms that replace hydrogen with a functional group (a chloride, an amide or thiazole ring, for example) as well as others that delete those groups or swap one for another. There are also transforms that create or break rings and that change bond saturations. Each transform is accompanied by an embedded description of what it does and does not do, so the user can make informed decisions about what it might be appropriate to disable it from the interface or comment it out in the transform file. Some may run counter to organizational policies, in which case they can be disabled through the user interface. Likewise, new transforms can be added by editing the file and transforms can be turned on or off through the AIDD interface within ADMET Predictor. At the beginning of each run, AIDD writes out a file (RxnIDs.txt) to the destination folder that contains an indexed list of names for the transforms used in that run; a separate list of transforms disabled during set-up is also generated. The same set of transforms was used for all of the experiments described here; a copy of the RxnIDs.txt file generated is included in the Supplementary Material.
Initialization
The molecular evolution engine at the core of AIDD is primed by populating the zeroth generation (G0) and the candidate pool P with seed molecules and setting the generation index i to 0. The number of new candidate molecules generated for each generation – kmax – is set initially to n1; it gets reset to ncand after the first generation has been populated with n1 molecules. Roulette wheel sampling is used to bias molecule and transform selection within AIDD; all weights are initially set to 1. Uniform random sampling is used for selecting among any alternative transformation products that may be produced (see below).
The molecular evolution cycle
A parent molecule is randomly chosen from Gi, and a transform is randomly chosen from T. Both selections are by “roulette wheel,” i.e., using weights as described below. Using the selected transform and parent molecule, the program generates all possible products from that combination of parent and transform. Both 1- and 2-chloro propane will be generated as products, for example, when the “add chlorine” transform is applied to propane. Each product molecule generated is checked against the scaffold query Q0 and subjected to the other filter criteria. It is also checked to make sure it is not in the current population and that it is not a molecule whose properties have already been evaluated.
If multiple alternative products survive the primary filtering step, one is selected at random and added to the current population. The selection weight of its parent is then decremented, since one of its children has been added to P. If nothing survives the filtering step, that combination of parent and transform(s) is never tried again. In the unlikely event that all parent/product combinations fail to generate the required number of candidate molecules, then the program will resort to trying pairs of reaction transforms.
Products that make it through the filters are immediately added back to G0 to provide more analogs to work from. Once kmax new analogs have been generated, they are submitted for property evaluation. As part of that calculation, any target property for which corresponding to an ADMET Model that is out-of-scope for a given analog is penalized by adding to (for minimized properties) or subtracting from (for maximized properties) the calculated value that property’s OoS penalty. The multiplicative OoS penalties for Risk rules are applied within those models.
Once properties for the kmax new analogs in the population have been calculated, the generational index i gets incremented and the molecules in the first Pareto rank are extracted from P and become the next the next generation (Gi). If i is less than or equal to the minimum of 100 and one-half ngen (i.e., nx in Fig. 1), the next round of candidate generation begins immediately. Once i exceeds nx, however, additional ranks are extracted from P, and successively added to Gi until its size exceeds nmin.
Note that new analogs get added to the candidate pool immediately after they are generated and pass the primary filters but do not pass into the next generation until and unless they survive the Pareto selection pruning process. Hence, they may get selected for transformation while candidates are still being generated. If that happens, their “children” may have been improved in ways that let them survive into the next generation even if their progenitor does not. Indeed, most of the molecules in the first generation will be secondary or tertiary products when a relatively small number (n0) of seed structures are provided to the program. As a result, the evolutionary process is neither fully generational nor fully steady state in nature but lies somewhere in between. Effects from small differences in the order in which parallel processes complete are the reason that multithreaded runs with the identical seed structures and number seeds may (and generally do) diverge somewhat.
If i is less than ngen, the roulette wheel selection weights for molecules in Gi are updated. Weights for a molecule are based on the number of its children, its Pareto rank, and the sum of ranks across all objectives (its Borda Rank[38]). The transform selection weights are adjusted downward based on the number of times the molecules they produced failed to pass the scaffold and substructure filters. The cycle then repeats. When i reaches ngen, the program exits the evolution cycle, and the final generation is written out as an ADMET Predictor file.
Post-processing
The process is not quite complete, as care must be taken to evaluate the final list. These post-processing steps include but aren’t limited to 1) Tautomer standardization and property recalculation; 2) Application of more stringent secondary filters; 3) Generation of classes; 4) Prioritization for testing.
Tautomer standardization is not carried out as part of the AIDD run, in part because it would slow analog generation down, but also because a minor tautomeric form may evolve into more stable analogs. A particular minor tautomer appearing several times when the thermodynamically dominant form does not should encourage a chemist to look for an alternative, more stable bioisosteric replacement that the program did not happen to find. After running tautomer standardization (one click in ADMET Predictor), properties need to be recalculated for any structures that were altered, as well as for any structures whose predicted values were capped to avoid extremes in Pareto sampling. After property re-calculation, it is important to remove analogs from the final population for which predictions are out-of-scope (automatically highlighted in red).
A simple method to conduct post-processing is to use the recalculated AIDD objective function values or – if no structures were changed during tautomer standardization – those which are written out along with the structures. This is accomplished within the ADMET Predictor spreadsheet using slider bars. In this work, we set the filters to require a minimum %Fb of 70, a maximum logKi of -7.5 (i.e., a minimum pKi of 7.5, or about 30 nM), a maximum ADMET Risk of 6 and a maximum SynthDiff of 5 worked well for the TzPs. Doing so yielded a fully filtered population of 200–300 analogs, which is a convenient number for manual inspection.
It is also advisable to apply a more stringent substructural filter to remove molecules with undesirable substructures that passed the more lenient primary filters applied during the evolutionary process. Again, the primary filters need to be relatively lenient to keep flawed but otherwise promising candidates in the evolving population. The secondary filtering benefits from incorporating complementary “hard” and “soft” filters. Molecules flagged by the “hard” filter are summarily discarded, whereas “soft” violations are inspected manually. Molecules containing halogens directly bonded to heteroatoms are an example of a “hard” violation, whereas acetals and aminals might be “soft” violations; piperonyl groups and some stable cyclic versions could be acceptable forms of the latter. For simplicity’s sake, “soft” violations were all discarded for the work described here. The structural filter file is supplied in the Supplementary Information.
Part of the challenge of multi-objective optimization of drug candidates is that small structural changes often improve a molecule in one respect while making it less desirable in some other way. This can lead to the generation of clumps of quite similar molecules in the AIDD output – not the “methyl, ethyl, butyl, futile” of blind analog synthesis programs, but still potentially a nuisance. The class generation tool in ADMET Predictor groups products by scaffold, which aids this part of post-processing. Several class generation methods are available, including the method used in this work, which is a “Framework” approach with scaffolds based on ring systems and connecting chains, similar to the “Murcko assemblies” described by Bemis and Murcko (Bemis 1996). Other available methods include classification by Ring-anchored systems, Chain-anchored systems, ECFP fingerprint, and custom scaffolds.
ADMET Predictor provides a multitude of tools to help the drug discovery team select candidates from the final list for synthesis and testing. These include 2- and 3D interactive property plots that are color-coded by a third or fourth property, pop-up structure windows and associated selection tools. Such tools are an important part of the workflow because they afford medicinal chemists useful context when picking out examples for synthesis. In addition, the availability of a linked sketching app to return on-the-fly property predictions helps the chemists identify structural “tweaks” that make molecules more attractive synthesis targets without unduly compromising the combination of predicted properties for which they were selected by AIDD. This step is critical in “the real world”, in part because of how it can engage medicinal chemists from the project team.