Construction of the MD-simulation system.
Preparation of spike-RBD-hACE2 structures.
A crystal structure of the wild-type SARS-CoV-2 spike-RBD bound with human ACE2 (hACE2) was downloaded from the PDB (ID: 6M0J). Energy minimization was performed using a short steepest-descent minimization followed by simulated annealing (timestep 2 fs, atom velocities scaled down by 0.9 every 10th step). For this purpose, the AMBER14 force field35, applying an 8-Å force cutoff was implemented. The minimized structure was implemented as a template for homology modeling. hACE2 mutants with experimentally determined binding affinities (Table S2) were used to validate the empirical scoring function (ESF). Their structures were built by introducing the mutation into the protein sequence and subsequent homology modeling using Yasara36. The same homology-model experiment was performed for the wild-type RBD:hACE2 amino-acid sequence (ID: 6M0J) to guarantee identical handling of input structures. The final input files contained residues 333–526 of the respective SARS-CoV-2 RBD and residues 19–615 of hACE2 coordinating one zinc ion.
System setup and training.
The system was established as described previously27. Initial structures were solvated in a cuboid box with periodic boundaries. The cell was filled with water at a density of 0.997 g/cm³, ionizable groups were protonated according to pH 7.4 and 0.9% NaCl counter ions were added. Energy minimization took place before and after each simulation phase to clear bumps and adjust the covalent geometry. For this purpose, the same preparation procedure as described above was implemented. MD simulations were carried out using the AMBER14 force field with automatic parameter assignment by AutoSMILES37 at 310 K and 1 bar. The RBD:hACE2-complex structure, as well as the unbound hACE2 structure, were simulated for 200 ps. Energy snapshots were extracted every 200 fs during the simulation.
hACE2 screening.
hACE2 design.
hACE2 mutations were initially selected according to visual inspection with the focus on affinity optimization using an RBD:hACE2 crystal structure from the PDB (ID: 6M0J) and the collection was extended by the top-ten high-affinity hACE2 variants, determined by Chan et al. in flow-cytometry binding-affinity experiments17 and combinations of the two approaches (Fig. S6).
Structure preparation.
To examine binding affinities between the suggested hACE2 variants and SARS-CoV−2 variant of concern (VOC) RBDs, suggested hACE2 mutations and RBD-residue changes reported by the WHO for Beta, Delta, Omicron BA.1, Omicron BA.2, Omicron BA.2.75, Omicron BA.3 and Omicron BA.4/BA.538 were manually incorporated into the wild-type sequence (PDB: 6M0J) and homology models were built with Yasara. The final input files contained residues 333–526 of the respective SARS-CoV−2 RBD and residues 19–615 of ACE2 coordinating one zinc ion.
MD-simulation.
The model-predicted value of ΔG is computed from the linear combination of two energy contributions, one term coming from van der Waals (vdw) and another from electrostatic forces (elec). The energy contributions are differences of ensemble averages of bound and unbound configurations. In the bound case, the hACE2 environment includes water molecules and RBD atoms, whereas, in the unbound state, hACE2 is exclusively surrounded by water molecules within the simulation box. While the energy differences come from MD ensembles, the weights for these contributions were determined using a set of 43 structures along with experimental KD-values with simulation parameters optimized as described previously27. For our model calculations, we used 200-ps simulations with 50 replicates per variant, which led to Eq. (1).
$$\varDelta G=0.765\varDelta {E}^{vdw}+0.024\varDelta {E}^{elec}$$
1
It should be noted here that the weight for the electrostatic term is relatively small. This leads to a small contribution from the electrostatic energy differences to ΔG since energy differences in both terms are of the same order of magnitude. Our previous study showed that electrostatic interaction energies are too broadly distributed in each simulation run to lead to a good-enough signal-to-noise ratio for the corresponding bound-unbound difference. As a result, the model fit reduced the contribution of this term significantly in order not to destroy the correlation between predicted ΔG and the gauging data.
Model validation.
Comparison to experimental EC 50 values.
A subset of the hACE2-variant collection was tested in in vitro binding affinity experiments (see below). The correlation between predicted Gibbs free energy values from the simulation and logarithmic EC50 values was evaluated by calculating R2.
hACE2 production.
Construction of expression plasmid.
The full-length cDNA of human ACE2 (GenBank Accession No. AF291820) was purchased from Sino Biological Inc. (Beijing, China). The cDNA-encoding hACE2 (residues 19–720) for Chinese-hamster-ovary (CHO) expression was amplified by polymerase chain reaction (PCR) and cloned into the expression vector pYD11SP. The insert was fused to the human IgG1 signal peptide at the N-terminus and to the Fc region of human IgG1 at the C-terminal end. The designed mutations were introduced by PCR to generate hACE2 mutants with higher potency. To eliminate hACE2 peptidase activity, H374N and H378N mutations were introduced by overlap extension PCR.
For expression in N. benthamiana, coding sequences of hACE2 (K31W)-Fc were amplified using the following primers:
Sequences were inserted in the pSCMP plant-expression vector via BamHI- and SacI-restriction sites using In-Fusion® HD Cloning Plus PCR Cloning Kits (Takara Bio USA, Inc., San Jose, CA, USA), resulting in pSCMP-ACE2(K31W)-Fc fused to barley alpha-amylase 2 signal peptide.
Expression of soluble hACE2-Fc.
CHO cells were maintained in FreeStyle™ F17 medium (Invitrogen, Waltham, MA, USA) supplemented with 4 mM glutamine and 0.1% Kolliphor P-188 (Sigma, St. Louis, MO, USA). The cells were grown at 37°C in shake flasks on an orbital shaker set to 120 rpm in a humidified 5% CO2 incubator. For transfections, cells were seeded at 1.0 × 106 cells/mL on day 0 and transfected with Polyethylenimine MAX linear (LPEI MAX, MW 25, Polysciences Inc.) on day 1. Briefly, 80 µg DNA plasmid and 133 µL LPEI MAX stock solution (3 mg/mL) were diluted with 5 mL FreeStyle™ F17 medium, respectively. The diluted DNA and LPEI-MAX were combined and incubated at room temperature for 3 min. 10 mL of the mixture were added to 90 mL overnight CHO cell culture. At 4 to 24 h post-transfection, 3 mL of tryptone N1 (Organotechnie, La Courneuve, France) were added. The conditioned media were collected at 72 h after transfection for purification of soluble hACE2-Fc proteins.
N. benthamiana was cultivated in a growth room under a 16 h light:8 h dark photoperiod at 22°C and 50% relative humidity. The binary construct pSCMP-ACE2(K31W)-Fc was introduced into Agrobacterium tumefaciens strain AGL1 by the freeze–thaw method. The AGL1 strains were inoculated in selective liquid YEP media with 50 µg/mL rifampicin, 50 µg/mL carbenicillin, 100 µg/mL kanamycin at 28°C in a shaking incubator at 200 rpm for two days. For transient expression, the AGL1 pellet harboring pSCMP-ACE2-Fc was resuspended and diluted in 1 × infiltration buffer containing 10 mM 2-(N-morpholino) ethanesulfonic acid (MES), 10 mM MgSO4, 5% glucose, 200 µM acetosyringone, at pH 5.3 to an OD600 of 1.0. The AGL1 strains were vacuum-infiltrated into the 6-week-old N. benthamiana plant leaves and maintained at 22°C growth room. The leaf tissue was harvested 4 days post infiltration (dpi) for protein-expression extraction and purification.
Purification of hACE2-Fc.
Cleared conditioned media from CHO cells transfected with soluble human ACE2-Fc were supplemented with 0.5 mL equilibrated MabSelect™ PrismA Resin (GE Healthcare, Chicago, IL, USA) and incubated in a fridge with shaking overnight. The resin was collected on a chromatography column and washed with 50 mL of buffer A (20 mM sodium phosphate, 150 mM NaCl, pH 7.2). The proteins were eluted with buffer B (0.1 M glycine, pH 3.5). The eluate was immediately neutralized with 1M Tris, pH 10.6. The hACE2-Fc-containing fractions were pooled and the storage buffer was changed to 1 × PBS. The concentrations of purified hACE2-Fc proteins were determined by Bradford assay.
For purification of recombinant soluble hACE2-Fc expressed from N. benthamiana, 50 g of frozen leaves were extracted in ice-cold lysis buffer (1x PBS containing 10% glycerol, 1% PVP, 0.5% TritonX-100, 1 mM PMSF, 0.01% β-me, 0.5% protease inhibitor cocktail) at a ratio of 1:2 (w/v) and mixed in a Magic bullet blender (Homeland Housewares, Los Angeles, CA, USA) by applying three cycles of 30 second at a 30-second interval. The crude extract was then incubated in a fridge with shaking for 30 min before centrifuging at 13,000 × g for 25 min at 4°C. The pH of the extract was adjusted to pH 7.2 followed by centrifugation at 13,000 × g for 25 min at 4°C. The cleared extract was used to purify soluble hACE2-Fc as described above.
Cell culture and virus stocks.
African green-monkey kidney-epithelial cells VeroE6 (Biomedica, Vienna, Austria) were cultivated in Gibco’s Minimum Essential Medium (MEM) supplemented with Earle’s Salts and L-glutamine (all from Thermo Fisher Scientific, Waltham, MA, USA) with 5% fetal bovine serum (FBS; Thermo Fisher Scientific) and 1% penicillin/streptomycin (Thermo Fisher Scientific), in the following referred to as MEM (5% FBS). Incubation at 37°C, 5% CO2 if not stated otherwise.
A human 2019-nCoV Isolate (Ref-SKU: 026V−03883, Charité, Berlin, Germany) and a human SARS-CoV−2 Beta variant isolate (Ref-SKU: 014V-04058, EVAg, Marseille, France) were propagated in VeroE6 cells. TCID50 titres were determined according to the Reed Munch method39 and plaque-forming units (PFU) were calculated using the conversion factor 0.7, based on the ATCC-LGC standards (www.atcc.org/support/technical-support/faqs/converting-tcid-50-to-plaque-forming-units-pfu). For all infection experiments, the working stocks were diluted to a calculated multiplicity of infection (MOI) 0.0003 in MEM (2% FBS). All experimental steps with active SARS-CoV-2 virus isolates were performed under BSL-3 conditions.
SARS-CoV−2 neutralization assay.
Prior to every assay, purified hACE2-Fc solutions were freshly diluted in MEM (2% FBS) to hACE2-Fc concentrations of 0.78 µg/mL, 1.56 µg/mL, 3.13 µg/mL, 6.25 µg/mL, 12.5 µg/mL, 25 µg/mL or 25 µg/mL.
SARS-CoV−2 neutralization assays were performed similarly as described previously40. 24 h prior to the assay, VeroE6 cells were seeded (30,000 cells/well) in a 48-well plate in MEM (10% FBS). After preincubation of SARS-CoV−2 (wild-type or Beta variant) with hACE2-Fc protein (wild type or variant) in concentrations between 0.78 to 25 µg/mL for ½ h, cells were infected at a multiplicity of infection (MOI) of 0.0003 with the preincubation mix in a final volume of 200 µL per well in MEM (2% FBS). Here the dose control (DC) was sampled. After 1 h of incubation at 37°C and 5% CO2, the mixture was removed and cells were washed two times with fresh medium to remove unadsorbed virus particles. Respective hACE2-Fc solutions were again added to the cells. Cells were then incubated over a period of 48 h at 37°C and 5% CO2. In the assay, untreated infected cells were used as positive controls and non-infected cells served as negative controls. Remdesivir (THP Medical Products, Vienna, Austria) was applied as an additional control. In the respective wells, cells were preincubated with Remdesivir (10 µM) for 30 min prior to infection. Remdesivir was added again after the washing steps. 140 µL of the supernatant were harvested and inactivated to extract RNA and quantify viral-copy numbers via Quantitative Reverse Transcription PCR (qRT-PCR). After removal of the remaining supernatant, the 48-well plate was fixed in 4% formalin for SARS-specific immunohistochemical staining (IHC).
RNA isolation, quantitative RT-PCR, and calculation of viral-copy numbers.
The supernatant samples were inactivated by adding AVL buffer (Qiagen, Hilden, Germany), the viral RNA was isolated following the manufacturer’s protocol using the QIamp viral-RNA mini Kit (Qiagen) and RNA was eluted in 40 µL ultra-pure H2O. qRT-PCR of viral RNA was performed with the QuantiTect Probe RT-PCR Kit (Qiagen) using the Rotor Gene Q cycler (Qiagen). Reactions took place in a total volume of 25 µL at 50°C for 30 min followed by 95°C for 15 min and 45 cycles of 95°C for 3 s and 55° C for 30 s. The employed N1 primer set and probe, which enable the detection of N-gene of SARS-CoV−2, were recommended by the CDC at the 2019-Novel Coronavirus (2019-nCoV) Real-time rRT-qPCR Panel (www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html).
-
2019-nCoV_N1-F 2019-nCoV_N1 Forward Primer 5’-GAC CCC AAA ATC AGC GAA AT−3’
-
2019-nCoV_N1-R 2019-nCoV_N1 Reverse Primer 5’-TCT GGT TAC TGC CAG TTG AAT CTG−3’
-
2019-nCoV_N1-P 2019-nCoV_N1 Probe 5’-FAM-ACC CCG CAT TAC GTT TGG TGG ACC-BHQ1−3’ FAM, BHQ−1
To allow the calculation of viral-copy numbers, a commercially-available standard (ATCC VR-1986D genomic RNA from 2019 Novel Coronavirus, Lot: 70,035,624, ATCC, Manassas, VA, USA) was serially diluted and analyzed via qRT-PCR. The resulting Ct-values were plotted against ln[copy numbers] and the equation received from linear-regression analysis (y = -1.442 x + 35.079) was used to calculate the viral-copy numbers from the Ct-values of the samples for Primer and Probe N1. The calculated viral-copy numbers refer to a volume of 140 µL supernatant harvested after the neutralization assay.
Immunohistochemical analysis.
After removal of the supernatant and fixation of the cells with 4% formalin for 1 h at room temperature, cells were washed twice with PBS and incubated with PBS at room temperature for at least 10 min. Cells were treated with Triton X 100 (0.1% in PBS, Merck Millipore, Darmstadt, Germany) for 10 min. Cells were washed three times with PBS for 3 min. Endogene peroxidases were blocked by applying H2O2 (3% in MetOH, Merck) for 30 min. Cells were washed three times with PBS for 3 min. Samples were incubated for 1 h at room temperature with a 1:1000 dilution of primary antibody (SARS-CoV-2 (2019-nCoV) Nucleocapsid Antibody, Rabbit Mab, Cat: 40,143-R019, Sino Biological Inc.) in antibody diluent (REAL Antibody diluent, Dako Cat: S202230_2, Agilent Technologies, Santa Clara, CA, USA). Cells were washed three times with PBS for 3 min. Cells were incubated for 30 min with the secondary peroxidase-conjugated anti-Rabbit antibody using the REAL EnVDetectSys Perox/DAB+, Rb/M (Agilent Technologies) as a detection system. Cells were washed three times with PBS for 3 min. The cells were incubated with 100 µL Substrate-Chromogen (EC substrate-Chromogen, Dako, Cat: K346430–2, Agilent Technologies) until optimal staining of viral infected cells was reached, but not longer than 3 min. The reaction was stopped by washing with PBS. High-quality images were obtained using a light microscope (40x magnification) in combination with the Jenoptik Gryphax Avior microscope camera and the Jenoptik Gryphax software (both from Jenoptik, Jena, Germany).
Binding-affinity assay.
The wells of microtiter plates were coated with 100 µL of 2 µg/mL recombinant His-tagged SARS-CoV-2 RBD protein in carbonate buffer, pH 9.6 overnight at 4°C. The next day, the coating solution was removed and the plate was washed three times with washing solution PBST (PBS + 0.05% v/v Tween20). The plate was blocked using 300 µL of 5% skim milk in PBST solution for 1 h at 37°C. The blocking solution was completely discarded and the plate was washed three times with the washing solution. Soluble hACE2-Fc proteins were serially diluted with PBST solution containing 0.1% BSA. 100 µL of hACE2-Fc with each concentration were added into the wells and incubated at 37°C for 1 h. The plate was washed three times with the washing solution. 100 µL of HRP-conjugated anti-Fc antibody solution (1:10,000) were pipetted to each well and incubated for 40 min at 37°C. The antibody solution was removed and the plate was washed three times with washing solution. 100 µL of TMB substrate (Biopanda Diagnostics, Belfast, United Kingdom) were added to each well and incubated at room temperature for 5–10 min. After sufficient color development, 50 µL of stop solution (2N H2SO4) were added to the wells. The absorbance (optical density, OD) was read at 450 nm. The data were plotted and binding affinities were analyzed using GraphPad Prism.
ELISA-based neutralization assay.
The wells of microtiter plates were coated with 100 µL of 2 µg/mL recombinant wild-type hACE2-Fc protein in PBS buffer overnight at 4°C. The next day, the coating solution was removed and the plate was washed three times with washing solution PBST. The plate was blocked using 300 µL of 5% skim milk in PBST solution for 1 h at 37°C. HRP-conjugated recombinant SARS-CoV-2 RBD protein was prepared at a concentration of 200 ng/mL in PBST with 0.1% BSA. The blocking solution was completely discarded and the plate was washed three times with the washing solution. Soluble hACE2-Fc proteins were serially diluted with PBST solution containing 0.1% BSA and mixed with HRP-RBD at a ratio of 1:1. 100 µL of each mixture were added into the wells and incubated at 37°C for 30 min. The plate was washed four times with the washing solution. 100 µL of TMB substrate were added to each well and incubated at room temperature for 5–10 min. After sufficient color development, 50 µL of the stop solution (2N H2SO4) were added to the wells. The absorbance (optical density, OD) was read at 450 nm. The data were plotted and the neutralizing activities were analyzed using GraphPad Prism.
Generation of ANN training data.
Sequence data preparation.
Sequences used for the preparation of ANN training data included RBD and hACE2 sequences either retrieved from visual inspection, literature research17,41, or spike protein sequences that were available at GISAID42 by January 4th, 2022. In addition to 39 specific RBD sequences obtained from a list of 145 representative spike protein sequences available at GISAID, RBD sequences (residues 319–541) were extracted from 1.3 million non-redundant spike protein sequences (out of 6.7 million entries) by pairwise alignment to the wild-type RBD and analyzed employing in-house tools in Python. For the ANN training data set, 1,077 specific RBD sequences without any insertions or deletions and containing at least two mutations in reference to the wild-type RBD were considered. Due to the large number of specific RBD sequences bearing two mutations, these were additionally restricted to sequences occurring at least twice in the list of 6.7 million entries. Including RBD mutations derived from literature research41 and current VOCs, a total of 1,165 RBD sequences were used for the training of the ANN. hACE2 mutations were retrieved either from visual inspection, literature research17, or a combination of both, resulting in 95 distinctive sequences. A list of the RBD-hACE2 pairs used for training the ANN is provided in Table S3.
Homology models of the respective RBD-hACE2 complexes were created and binding affinities for RBD (residues 333–526) and hACE2 (residues 19–615) were predicted via MD simulations as described above (“ACE2 screening”). Additionally, the homology models were used to calculate RBD Halos and ACE2 Halos.
Procreation of Catalophore hACE2 Halos and spike-RBD Halos.
As described previously27,43, a Catalophore Halo is a multivariate property field composed of a collection of points in Cartesian space discretized onto an equidistant grid annotated with currently 19 physicochemical and statistical properties (e.g. electrostatics, hydrophobicity, flexibility, potential energies, hydrogen-bonding potential, or dissolvability) that are projected by a biomolecule into its surroundings.
Spike-RBD-hACE2 homology models were deposited in the CATALObase platform43 and used as input data for calculating hACE2- and RBD-Halos. This was achieved by a Yasara44 structure-preparation step combined with a Halo-creation and -annotation step. The latter was performed using a modified version of the AutoGrid tool that is part of the Autodock45 suite, version 4.2.3. The 3D point clouds generated with a grid spacing of 0.75 Å cover the entire outer molecular surface of either RBD (for RBD-Halos) or hACE2 (for hACE2-Halos) with a thickness of 5 Å. Molecular surfaces were hereby defined by a probe radius of 1.4 Å around the atoms’ vdw radii. Focusing on the binding-interface region, Halos were further restricted to a maximum distance of 5 Å to the atoms of the respective binding partner with corresponding vdw radii and were cropped down by the space which is occupied by these atoms plus their vdw radii. Consequently, the binding partner only influences the shape but none of the 19 physicochemical and statistical properties of the Halo given by the underlying biomolecule. Ligand-atom types and properties used for the annotation of point clouds were: carbon, H-bond donor hydrogen, non-H-bonding nitrogen, H-bond acceptor oxygen, H-bond acceptor sulfur, desolvation potential, electrostatic potential, aromatic carbon, phosphor, accessibility, hydrophobicity, flexibility, positioning of chains, sulfur, bromine, chlorine, fluor, iodine.
Artificial Neural network.
Idea and Intention.
We trained an artificial neural network (ANN) on our MD-simulation data, augmented by experimental data for the spike-RBD-hACE2 binding affinity where appropriate. The ANN uses the Catalophore Halos of both the spike-RBD and the hACE2 for predicting the binding affinity based on Halo information alone (i.e., without direct reference to sequence or structure). Since getting model results this way is many orders of magnitude faster than running an MD-simulation for the spike-RBD-hACE2 interaction, we can employ the ANN model to get an extremely efficient estimate for many variants of both spike-RBD and hACE2. The cost of inference for a single variant of hACE2 essentially amounts to the computation of the Halo, which is a short addon to preparing the mutated hACE2 structure that serves as an input for the binding-affinity MD simulation.
Based on numerous ANN-predictions, we can test many more possible variants of hACE2, compile a ranked list of the results, and then feed the most promising candidates into the MD-pipeline for validation. This serves two purposes: first of all, we validate the ANN model even further, and second of all, it provides us with predictions that have a reliability validated by MD-model gauging runs via the ESF. Overall, this approach allows us to choose a much more interesting and potentially representative sampling pattern for our hACE2-design approach than an unguided set of runs of our MD-simulation setup would provide.
Initial Pre-Omicron RBD Experiment.
Using a dataset of 1,049 RBD-hACE2 pairs with RBD amino-acid-exchange counts ranging from one to 20 exchanges (for a more detailed breakdown see Table S1 in the supporting information) compared to the wild-type RBD and a single Omicron BA.1 example, the model was trained using a random subset of 120 samples for validation. With the parameters from the epoch with the lowest validation loss for inference, the model was tested on a previously unseen set of 300 RBD-hACE2 combinations. Here it reached a mean error of 1.05 kJ/mol, with a maximum error of 4 kJ/mol. The Pearson correlation for this set was 0.69.
Initial Pre-Omicron hACE2 Experiment.
A subset of 50 unique hACE2 variants was isolated and removed from the set. The remaining set was again split into training and validation. Using the parameters that produce the lowest validation loss during training for inference, the influence of these hACE2 variants was predicted. The average prediction error was 1.2 kJ/mol, and the maximum error was 4.8 kJ/mol.
The chosen subset of hACE2 amino-acid exchanges induces a mean variance of 3.72 kJ/mol on calculated RBD-hACE2 binding affinities, compared to the RBD binding with the wild-type hACE2 alone. From this, we conclude that the model explains a large share of the variance induced by the hACE2 amino-acid exchanges. Fig. S2 shows the prediction error as a function of the ground-truth energies of the predicted samples.
Figure S3 shows the distribution of samples in the entire training set with respect to their binding-affinity values. The vast majority of samples is concentrated around − 60 to -50 kJ/mol, with a strongly underrepresented minority found at binding affinities lower than − 65 kJ/mol. Since a high accuracy in the low-energy regime is desirable and most important, the machine-learning specific challenge lies in avoiding a network bias towards the mean of the distribution and instead transferring information learned from the bulk of the distribution to the low-energy outliers. The Omicron BA.2 experiment described above indeed shows better performance than copying values from or regressing to the mean of the closest samples seen during training for high-energy values, thus indicating success in predicting the strongly underrepresented labels.
Network Architecture.
The ANN used for predicting binding affinities is a 3D convolutional neural network (CNN) which we named “Tandem ZipperNet”. The network takes two voxelized Halos as inputs and outputs predictions of Gibbs free-energy values (ΔG). The overall architecture consists of three blocks: the first block uses separated convolutions, processing the input Halos of RBD and hACE2 binding sites individually, but with shared weights. This layer is supposed to both increase spatial independence on the data and reduce the imbalance coming from different numbers of unique RBD and hACE2 variants in the training data.
These separated convolutions are then followed by a single convolution acting on the output volume stacked along the channel dimension, effectively joining the Halos spatially while keeping the channel size, thus acting as the “Zipper”. A third convolutional block acts on the joined data, first increasing channels and later introducing a bottleneck at the channel dimension. The output is subsequently flattened and fed into a multi-layer-perceptron (MLP). For more information see “Network Input Data” and “Network Regularization and Augmentation” in the supporting information.
Statistics and display.
Statistical analyses were performed and plots were generated using GraphPad Prism 9 as well as Seaborn and Matplotlib. Coordinate files for Fig. 2 were generated in PyMOL. Images were rendered using Blender and Open3D.
Data availability.
Publicly available datasets were analyzed in this study. This data can be found here: https://www.gisaid.org/. Input and final structure files as well as Pandas Dataframes of interaction energies exported as Python Pickle files generated within this work are available for download at https://doi.org/10.6084/m9.figshare.19904953.