General methods. The following working concentrations of antibiotics were used: carbenicillin (Solarbio, 50 μg/ml), kanamycin (Solarbio, 50 μg/ml), spectinomycin (Macklin, 50 μg/ml), chloramphenicol (Macklin, 25 μg/ml). PHANTA 2x mix (Vazyme) was used for cloning PCR, and Flash 2x mix (Vazyme) was used for verification PCR and Sanger sequencing (Tsingke Bioscience). All cloning fragments were assembled by Golden Gate assembly (New England Biolabs) or ClonExpress assembly (Vazyme) methods. Plasmids were cloned in DH5α competent cells (HT Health). Synthetic genes were ordered from Tsingke Bioscience. Cloned plasmids were extracted by Tiangen DNA extraction kit. E. coli strain S206055 was used in all aspects of the EvoScan process, including system construction, evolution, and plaque assays. The DH5α strain was used for flow cytometry experiments. Detailed information on the plasmids and selection phage (SP) used in this work is given in Supplementary Table 6.
Phage propagation assay. Competent S2060 cells were transformed with corresponding accessory plasmid (AP) in each experiment. Overnight cultures of single colonies inoculated in LB medium with proper antibiotics were diluted 50 or 100 times and grown at 37 ℃ in 220 rpm shaker (ZQZY-B8, cultured in shake tubes, 5 ml system) or 1000 rpm shaker (HUXI HW-400TG, cultured in 96-deep well plate, 500 μl system) to log phase (OD600 ~ 0.4–0.6). These cells were then infected with SP at an initial titer of 5 × 106 plaque-forming units (p.f.u.) per ml. The mixture was further cultured overnight (16–20 h) at 37 ℃ in the shakers as described above, and was centrifuged at 4000 rpm for 10 min. Phages in the supernatant was filtered by 0.22 μm bacterial filter and stored at 4 ℃ for further use.
Plaque assay. A single colony of chemically competent S2208 cells55 (S2060 cells transformed with plasmid pJC175e) was cultured overnight in LB medium added with proper antibiotics. The saturated bacteria culture was diluted 50 or 100 times into LB medium with proper antibiotics and grown at 37 ℃ in 220 rpm shaker to log phase (OD ~0.4–0.8) before use. Phages were serially diluted 6 to 8 times with a dilution ratio of 10-fold in each step in LB medium. Then, 10 μl of each phage dilution was mixed with 45 μl S2208 cells, and then 180 μl of liquid (50–65℃) soft agar (LB medium and 0.5% agar) supplemented with 2% Bluo-gal (Inalco S.p.A.) was added and mixed by pipetting. The whole mixture was immediately added onto 500 μl of bottom agar (LB medium and 1.5% agar) previously prepared in 24-well plate. Then the plates were incubated in 37 ℃ for overnight growth (14–18 h).
Calculation of fold propagation. For fold propagation measurement of the selection phage, initial phage titer and final phage titer were measured by plaque assays. We defined the ratio of final phage titer versus initial phage titer as the fold propagation of the phage.
Basic process of evolutionary scanning (EvoScan). Target mutagenesis plasmid (TP) was first transformed to chemically competent S2060 cells, and then the prepared S2060-TP cells were used to prepare super chemically competent cell by Inoue method56. Chemically competent S2060-TP cells were transformed with corresponding APs. The resulting S2060-TP-AP bacteria were cultured overnight and diluted 50–100 times into 500 μl LB medium with antibiotics and inducers, and grown in 37 ℃ 1000 rpm shaker to OD ~0.5. The phage titer for the first infection was around 5×106–5×108 p.f.u./ml, and for the following passages the phages were subjected to a 1:50 or 1:100 dilution. Vanillic acid (Sigma-Aldrich, ethanol dissolved) at a final concentration of 50 μM was added to induce the expression of nCas9-PolIM5 complex. The mixture was then cultured in 37 ℃ 1000 rpm shaker overnight. The next day the mixture was centrifuged at 4000 rpm for 10 min and the phage content of the collected supernatant was verified by PCR (Flash 2x mix) and Sanger Sequencing. The supernatant was then used for plaque assay as described above. Single plaques from plaque assay were picked and further verified by PCR (Flash 2x mix). The PCR product was sent for Sanger Sequencing.
Searching steps in each route. For each step of EvoScan in a route, 10 μl supernatant with evolved phages was added into 1 ml log-phase S2208 bacteria culture (OD ~0.4–0.8), and propagated overnight in 96-deep well plate. The mixture was centrifuged at 4000 rpm for 10 min and filtered by 0.22 μm bacterial filter. The obtained phages were then diluted and infected another host cell containing a different AP with an infection titer of 5×106 p.f.u./ml.
Basic process of phage-assisted non-continuous evolution (PANCE). Accessory plasmid with the designed genetic circuit and the mutagenesis plasmid MP6 were co-transformed into super chemically competent S2060 cells. The S2060-MP6-AP bacteria were cultured overnight and diluted 50-100 times into 500 μl LB medium with antibiotics and inducers in 96-deep well plate, and grown in 37 ℃ 1000 rpm shaker to OD ~0.5. The initial phage titer was around 5×106–5×108 p.f.u./ml, and the phages were subjected to a 1:10–1:100 dilution in the following passages in a 500 μl system. 1% (m/v) arabinose dissolved in ddH2O was added as the inducer of MP6. Phages were then collected to obtain mutations following the same procedures in EvoScan.
Induced expression assay. Single colonies of strains to be tested were cultured in LB medium overnight. Saturated bacterial culture was diluted 100 times in LB medium with proper antibiotics and inducers, and cultured in 37 ℃ 1000 rpm shaker for 2 h (OD ~ 0.4). Then LB with proper antibiotics and inducers was prepared and 2 μl log phase bacteria culture was added together to a whole volume of 500 μl. The mixture was cultured for 5 hours in the 96-deep well plate.
Flow cytometry assay. 10 μl of the culture was added into 190 μl PBS with 2 g/L kanamycin in the 96-well U-bottom plate to stop the cell growth. The plate was stored in 4 ℃ until used. The flow cytometer (Beckman Coulter Cytoflex S) was used to quantify the expression levels of fluorescent protein. The software FlowJo v10 was used to gate the events (at least 10000 events) and calculated the median of each sample.
Mpro drug resistance index. In the RTHS protease activity assay, the fluorescence of the experimental group carrying eYFP was measured with or without addition of Mpro inhibitor GC376 or PF-07321332. The ratio of fluorescence FITC-A median with inhibitor versus fluorescence FITC-A median without inhibitor was defined as the resistance index (RI) to evaluate the drug resistance abilities of different Mpro variants.
Structure display and interaction prediction. Schrodinger 2017 was used for structural display. ZDOCK57 was used for interaction structure prediction between EGFP and its nanobody. The interaction between Mpro and inhibitors within 3 angstrom was shown in the figure.
Fold repression calculation. The background fluorescence of cells, which is the median of the fluorescence of the bacteria carrying an empty plasmid with only the backbone, was measured and subtracted from all the experimental groups. The subtracted fluorescence values of the uninduced group (no repressor expression) were divided by the induced group (repressor expression) to obtain the fold repression.
Relative expression level calculation. Using flow cytometry assay, we measured the FITC-A median of the strain carrying the empty plasmid and set this value as the background value. The FITC-A median of the strain carrying the standard plasmid expressing eYFP through the open reading frame J23101-B0064-YFP was measured the same way and set as the standard value. The FITC-A median of the strain containing a specific variant was measured the same way, and the relative expression level was defined as: (variant value – background value)/standard value.
Circuit score calculation. Thestrain carrying the plasmid with a specific genetic circuit was prepared for flow cytometry assay. IPTG (1 mM) and vanillic acid (100 μM) were used as the input signals. YFP was used as the output reporter of the circuit and the FITC-A median of each state was measured. The lowest ON signal (lowest FITC-A median in “ON” states of the circuit) was divided by the highest OFF signal (highest FITC-A median in “OFF” states of the circuit) to obtain the circuit score.
AmeR phylogenetic tree construction. Protein sequences of the 82 variants and the WT were collected as a fasta file and the file was input into MEGA11 for multiple sequence alignment (MSA)58. After MSA and phylogenetic analysis, neighbor-joining tree was selected as the method of tree construction. The output tree was decorated by iTOL59, and all the parameters were set using default values.
Epistasis calculation. Epistasis between two different mutations, A and B, could be calculated as ε = fab + fAB – fAb – faB. f is the fitness of wild-type, double-mutant and single-mutant genotypes, respectively. ε > 0 means positive epistasis, while ε < 0 means negative epistasis.
Mammalian Cell culture and transfections. HEK293T cells (CRL-3216, ATCC) were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Gibco) supplemented with 10% (v/v) fetal bovine serum (FBS, Biological Industries) and 1% (v/v) penicillin/streptomycin solution (Beyotime) at 37 °C, 100% humidity and 5% CO2. In transfection experiments, 60,000–80,000 HEK293T cells in 0.2 ml of DMEM complete medium were seeded into each well of 48-well plastic plates (NEST) and grown for ~24 h. M5 HiPer Lipo2000 Transfection Reagent (Lipo2000, Mei5bio) was used in all transfection experiments following the manufacturer’s protocol. Briefly, a sample mixture was prepared by mixing 150 ng repressor plasmid or 150 ng control plasmid (repressor-deficiency) with 150 ng reporter plasmid in 0.7 μl Lipo2000. The mixture was incubated at room temperature for 20 min before adding to cells. Transfections were supplemented with 0.2 mL DMEM complete medium 24 h post-transfection. Cells were cultured for 2 days post-transfection before flow cytometry analysis.
Mammalian cell flow cytometry assay. Cells were trypsinized 48 h after transfection and were then centrifuged at 250 × g for 10 min at room temperature. The supernatant was removed, and the cells were resuspended in 1 × PBS. Fluorescence values were measured with a Cytoflex flow cytometer (Beckman Coulter, Inc.). PB450-A and ECD-A channels were chosen for BFP and mCherry measurement, respectively. Data were processed using FlowJo (TreeStar), gated by the area of the forward scatter and the side scatter (FSC-A/SSC-A) and then cell populations were selected by gating out the background BFP signal of untransfected cells to obtain the median of fluorescence. The median of fluorescence was calculated for >20,000 transfected cells for each sample. To reduce expression noises between samples, the mCherry : BFP fluorescence ratio was used to report the repressor activity60. The mCherry : BFP fluorescence ratio was calculated by (mCherry - mCherry0)/(BFP - BFP0), mCherry0 and BFP0 were the fluorescence values from untransfected HEK293T cells. The fold-repression was calculated by (mCherry : BFP)unrepressed/(mCherry : BFP)repressed. (mCherry : BFP)unrepressed and (mCherry : BFP)repressed were the fluorescence values of the states co-transfected with control plasmid or repressor plasmid.
Feature generation. Our initial step entails querying the UniRef30_2021_03 and bfd multiple sequence alignment (MSA) databases. Subsequently, we employ AlphaFold2 to construct the structural representation of the wild-type protein. For this endeavor, we deploy the GeoFitness-Seq variant of the pre-training model. In the case of mutated proteins, structural configurations are generated using FoldX 5. The sequence features are extracted from the large-scale protein language model ESM-2, for the purpose of capturing global context information. Consequently, each node in the Geometric Encoder is initialized by the embedding of the corresponding residue derived from the ESM-2. Unlike conventional methodologies that rely upon inter-residue distances and contacts to establish edges, each edge in the Geometric Encoder is initialized by the relative geometric relationship between a pair of residues derived from the protein 3D structure61.
Cross-validation. We employed a 10-fold cross-validation approach to find the hyperparameters of the model. The dataset, comprising 82 mutational data points, was divided into three parts: a training set (59 samples), a validation set (7 samples), and a test set (16 samples). Model evaluation was performed using the Spearman correlation coefficient (ρ) as the primary assessment metric.
Model training details. The model employs the Soft Rank Loss as its loss function, with a learning rate of 10-3, Adam optimizer, and a decay rate for the learning rate. The training spans across 50 epochs. Subsequently, the learning rate of the upstream GeoFitness model is set to 10-4, while the learning rate of the downstream model is adjusted to 5×10-4 for further fine-tuning.
55 Carlson, J. C., Badran, A. H., Guggiana-Nilo, D. A. & Liu, D. R. Negative selection and stringency modulation in phage-assisted continuous evolution. Nat. Chem. Biol. 10, 216-222 (2014).
56 Green, M. R. & Sambrook, J. The Inoue Method for Preparation and Transformation of Competent Escherichia coli:" Ultracompetent" Cells. Cold Spring Harb Protoc. 2020, 101196 (2020).
57 Chen, R., Li, L. & Weng, Z. ZDOCK: an initial‐stage protein‐docking algorithm. Proteins: Struct., Funct., Bioinf. 52, 80-87 (2003).
58 Tamura, K., Stecher, G. & Kumar, S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022-3027 (2021).
59 Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293-W296 (2021).
60 Liang, J. C., Chang, A. L., Kennedy, A. B. & Smolke, C. D. A high-throughput, quantitative cell-based screen for efficient tailoring of RNA device activity. Nucleic Acids Res. 40, e154 (2012).
61 Xu, Y., Liu, D. & Gong, H. Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy. bioRxiv, 2023.2005. 2028.542668 (2023).