We generated homology models to quickly assess the potential impacts of each of the 28 mutations identified in the core (“founding”) Omicron haplotype and identify mutations that might justify more detailed experimental characterization. A number of these positions also carry mutations in the earlier Alpha and Delta variants. Some, like D614G, are observed in all three variant families and the impact of this mutation has been well documented 4–8. Other positions, such as P681, are also well documented as crucial for enhanced viral transmission. The Omicron variants thus far display the P681H mutation observed in earlier Alpha variants, unlike the P681R mutation observed in Delta, which has been documented experimentally to contribute to increased infectivity 9,10.
The Omicron spike protein mutations can be classified generally into four sub-categories. Unsurprisingly, most mutations appear in the receptor binding domain (RBD). Most of these mutations (G339D, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H) present on the RBD surface (Fig. 5), and it is reasonable to assume that these mutations may impact angiotensin-converting enzyme 2 (ACE2) receptor binding and/or recognition and binding by neutralizing antibodies (nAbs). If the Omicron variant originated in an immune-compromised patient with a prolonged infection, as has been speculated 2, this latter scenario seems quite plausible. A smaller set of RBD mutations (S371L, S373P, S375F, K417N) display little or no surface exposure (Fig. 6), but the location and non-conservative nature of these mutations suggest that they may affect local conformation, in particular monomer-monomer contacts in the receptor trimer, and thus impact ACE2 and/or nAb binding indirectly via mutation-induced conformational changes.
A third set of mutations (T547K, N764K, N856K, Q954H, N969K, L981F) occur in the interior of the receptor trimer assembly at positions that likely will alter monomer-monomer packing interactions (Fig. 6), and these mutations may well affect overall receptor structure and/or dynamics and flexibility. Similarly, there are two mutations (A67V, T95I) in the interior of the N-terminal domain (Fig. 6). These mutations likely alter side chain packing interactions and may affect the domain conformation. This may be significant, as a number of neutralizing antibodies appear to bind sites in this domain. Two additional mutations (N655Y, D796Y) are present on the surface of the receptor trimer stalk region (Fig. 6) where they may be potential contact residues for nAbs. Finally, there are two mutations (N679K, P681H) in the S1/S2 cleavage segment. As noted above, mutations in this region have been shown previously to enhance spike protein cleavage, thus leading to increased infectivity.
In addition to these mutations in the “founding” Omicron variant, our unsupervised learning strategy revealed nine mutations in the viral genome that are emerging and possibly becoming established. In earlier studies, we have shown that our method can identify significant mutations long before they are characterized clinically. For example, we observed the emergence of the P681R mutation in the U.S. several months before the Delta variant became dominant 11. We have therefore used homology modeling studies to assess the potential impact of these newly emerging mutations.
Four mutations occur in the spike protein. The R346K mutation is in the RBD (Fig. 5C); while an arginine to lysine mutation is often classified as a conservative mutation with little structural or functional consequence, there are situations where the two residues are not readily interchangeable, due to the distinct chemistry of the arginine guanidinium group (hydrogen bonding patterns, π-system interactions, etc.) compared to the primary amine group present in the lysine side chain. As a result, the R346K mutation might impact receptor and/or nAb recognition and binding. The A701V mutation is located on the exterior of the receptor trimer stalk region (Fig. 6), where it could impact nAb binding, much like N655Y and D796Y discussed above. The I1081V mutation is in the interior of the receptor trimer assembly and might influence conformation via altered side chain packing interactions. However, both A701V and I1081V are conservative mutations and may have little impact.
The fourth emerging spike protein mutation, N1192S, is less prevalent thus far but is extremely interesting and of potential concern. This mutation is located in the heptapeptide repeat sequence 2 (HR2). This region is highly conserved across all human coronaviruses and is a key component in the formation of a six-helix bundle with HR1 segments to facilitate the fusion process with target host cells 12. As a result, this region of the spike protein has been the focus of ongoing attempts to develop “universal” fusion inhibitors for therapeutic application in coronavirus infections. A crystal structure of the post-fusion six-helix bundle formed by HR1 and HR2 segments in the spike protein trimer is available 13, enabling visualization of the N1192S mutation site (Fig. 7). It is not clear from these simple homology modeling exercises what the full impact of this mutation might be on packing interactions in the six-helix bundle. However, detailed views of the mutation site in Figs. 7B and 7C that display specific helix-helix contacts (K933 from one monomer with position 1192 in the neighboring monomer) imply that packing interactions between monomer units in the bundle will be altered, so further experimental study may be warranted to determine what effect this mutation might have on viral fusion efficiency with host cells.
Three emerging mutations are identified in the ORF1ab gene. One mutation, V1069I, maps to a region in the non-structural protein 3 (nsp3), just prior to the nsp3 nucleic acid binding domain, but there is no relevant structural information for this region of nsp3 to assess possible impacts. A second mutation, V94A, maps to a region located between putative transmembrane helices 1 and 2 in the nsp4 protein, and there is likewise no relevant experimental structural data for this region of nsp4.
A third mutation in ORF1ab, F694Y, occurs in the RNA-dependent RNA polymerase and is most intriguing. Figure 4B illustrates the dramatic increase in this mutation since 8 December 2021, suggesting that F694Y may provide a notable fitness advantage for the virus. The location of this mutation in the polymerase is show in Fig. 8A, and Fig. 8B highlights the close proximity of the mutation to key residues involved in RNA binding 14. Of greater potential significance, the mutation site interacts directly with several residues involved in Remdesivir binding 14, as shown in Fig. 8C. While the phenylalanine to tyrosine mutation is a relatively conservative change, tyrosine is larger and introduces a hydrogen bond donor/acceptor at this position, so the mutation will undoubtedly induce at least some subtle conformational changes in this region of the substrate and inhibitor binding site, changes that may well alter Remdesivir binding and impact its therapeutic effectiveness. This general pattern, namely mutation of a residue near the inhibitor binding/active site, rather than mutation of key active site residues directly, is a commonly observed drug resistance development mechanism in target enzymes of many pathogens (e.g., inhibitor resistance development in HIV protease often exhibits this profile). Given the possible implications of this mutation, prompt experimental investigation seems justified.
The L106F mutation occurs in ORF3a, in a region that codes for a putative ion channel 3a 15. This mutation occurs near the amino terminus of helix 3 on the exterior face. This residue is presumably exposed to the lipid environment, so it is unclear how it may affect function, if at all. However, it is well established that mutations in ion channel 3a can impact ion conductance and viral viability 15.
The final emerging mutation, D343G, is present in the nucleocapsid phosphoprotein C-terminal dimerization domain, located in a short loop connecting b-strand b2 to the a6 helix. This residue is not directly involved in any dimer contacts, and its relatively exposed position in a flexible loop region makes it difficult to anticipate the structural consequences. It is certainly true that the glycine substitution at this position should increase loop flexibility, and this might facilitate domain conformational rearrangement necessary for the dimerization process. This may impart a fitness advantage for the virus, as experimental studies suggest that the nucleocapsid homodimer is the stable form in soluble, and that the homodimer can bind short ssRNA molecules without any ancillary proteins 16.