SARS-CoV-2 sequences hold evidence of a past, strong selective event in the receptor binding domain of Spike
To investigate which regions in SARS-CoV-2 have contributed to its efficient infection and circulation in humans, we analyzed sequences for signs of strong positive selection events known as selective sweeps. We acquired 1,912,191 human-isolated SARS-CoV-2 sequences from GISAID, covering the period up to August 2021. We then used a computational pipeline combining OmegaPlus15 and RAiSD16 to identify regions with a high probability of selective sweep events. To discern the selective sweep regions instrumental in driving mutations during the early stages of human infection, we categorized the sequences by their sample collection dates, arranging them into monthly cohorts (Fig. 1A). Our analysis identified an average of 11 sweep regions per month, with the numbers ranging from 4 to 16 (Supplementary file 1). While many sweep regions were identified, our attention was primarily drawn to the sweep regions in the receptor-binding domain (RBD) of the Spike protein (nucleotide position 22,878 to 23,332; amino acid positions 439–590), as mutations in the RBD are known modulators of host tropism and receptor binding17. The selective sweep regions included the A372 residue in the RBD, a critical factor we previously identified for the emergence of SARS-CoV-2 and its sustained transmission among humans11. These regions were evident in the early months of the COVID-19 outbreak (Fig. 1A).
Since selective sweep regions indicate a past selective event driven by various adaptive mutations and their associated hitchhiker mutations, we sought to determine the site within the Spike selective sweep region where pivotal mutations occurred upon early infection of humans18. While the progenitor virus to SARS-CoV-2 is unknown, we inferred the progenitor Spike sequence could assume several amino acid identities found at the corresponding site in Spike from closely related sarbecoviruses infecting bats and pangolins. To identify the closest coronaviruses related to SARS-CoV-2, we constructed a phylogenetic tree based on the amino acid sequence of Spike with PhyML (Fig. 1B)19,20. Using these phylogenetic relationships, we aligned the amino acid sequence of the selective sweep region in Spike from the Wuhan-Hu-1, lineage B strain of SARS-CoV-2 with the homologous region in closely related sarbecoviruses to reveal sites with differential amino acid identities (Fig. 1C; see also Extended Data Fig. 1 for the full amino acid alignment of the sweep region in Spike). Similar to amino acid position 372 in Spike, position 519 holds a non-synonymous mutation (nucleotide substitution C23117A) that differentiates SARS-CoV-2 from the aligned bat- and pangolin-derived Sarbecovirus sequences11. This position was identified in the selective sweep regions during one of the early months of the outbreak (January 2021; Fig. 1A). SARS-CoV-2 has a histidine at 519, while the bat and pangolin-derived sequences bear an asparagine or lysine. We hypothesized the progenitor to SARS-CoV-2, potentially a virus infecting bats or pangolins, acquired a histidine at some point in the evolutionary timeline, thereby gaining an adaptive advantage in humans.
Since sites with low amino acid diversity may implicate a crucial role of the predominant amino acid in viral fitness, we sought to investigate the amino acid diversity at position 519 in Spike in SARS-CoV-2. Using data from 3,848 genomes sampled between December 2019 and October 2023 from the nCoV GISAID dataset displayed on NextStrain, we determined Spike position 519 in SARS-CoV-2 has a normalized Shannon entropy of 0, suggesting little to no flexibility is allowable at this position in humans (Extended Data Fig. 2)21,22. With this result, we hypothesized the predominant histidine at Spike 519 in SARS-CoV-2 may hold an important function for the virus in humans, while the ancestral residue, asparagine, would result in deleterious effects on fitness in human cells.
SARS-CoV-2 Spike mutant bearing a residue of bat and pangolin Sarbecovirus origin has reduced replicative fitness and infectivity in human cells
Towards identifying the driving mutations of the putative selective event, we reasoned evolution from the asparagine to histidine at position 519 may have driven early adaptation of the progenitor virus to humans. Since position 519 is in the RBD of Spike, we first used a previously described pseudovirus system23 to determine whether Spike H519N had any functional significance during infection of cells expressing human ACE2 (hACE2), the receptor for SARS-CoV-224. First, we verified hACE2 protein levels by western blot expressed in human embryonic kidney cells (Extended Data Fig. 3). We generated lentiviruses pseudotyped with full wild-type (WT) Spike, Spike H519N, Spike D614G, and Spike A372T, a RBD mutant bearing an ancestral threonine which we previously showed attenuates the virus in human lung epithelial cells11. Spike D614G is used as a known human-adaptive variant that emerged early in the pandemic and is now fixed in all SARS-CoV-2 variants25. The pseudoviruses express both luciferase and a green fluorescence protein (GFP), ZsGreen23, allowing for sensitive detection of infection efficiency. We detected Spike protein levels from prepared virus stocks and observed a greater incorporation of G614 into the pseudoviruses as previously observed (Fig. 2A)26. Ectopic expression of Spike was similar between WT Spike and Spike H519N. Next, we used pseudovirus particles to infect human embryonic kidney cells expressing hACE2 to determine whether infectivity through hACE2 is altered by Spike H519N. Spike D614G enhanced the infectivity of the pseudotyped viruses compared to WT Spike, in agreement with data extensively reported in literature (Fig. 2B)26. Spike A372T significantly reduced the infectivity of SARS-CoV-2, consistent with our previous study (Fig. 2B)11. Importantly, Spike H519N significantly reduced the infectivity of SARS-CoV-2 compared to Spike D614G and WT Spike (Fig. 2B). When infected cells were quantified based on luciferase expression, we also observed significant decreases in infectivity for Spike H519N compared to WT Spike (Fig. 2C; p < 0.0001).
Towards determining whether Spike N519 reduces the replicative fitness of SARS-CoV-2 in human cells, we generated a replication-competent, SARS-CoV-2 mutant bearing an asparagine at position 519 in Spike, the amino acid present in closely related sarbecoviruses. In human lung epithelial cells, the putatively ancestral SARS-CoV-2 Spike H519N mutant replicated to significantly lower titers than the WT virus and Spike D614G, demonstrating a nearly 2-log difference in viral titers (Fig. 2D; p < 0.01 at all timepoints). These data suggest that reverting the histidine at Spike 519 to the ancestral asparagine significantly reduces infection and replication in human lung epithelial cells.
Spike H519N potentially impacts ability to sample up/down conformations and reduces binding affinity to human ACE2
In elucidating the mechanism of attenuation of SARS-CoV-2 bearing the putatively ancestral asparagine at Spike position 519, we analyzed the interactions between Spike chains to identify how H519N is impacting infection (Fig. 3A). Residue 519 is in the RBD but not within the receptor binding motif (RBM) to interact with ACE2 directly. Rather, residue 519 is positioned on a cleft at the interface between Spike chains that constitute the full trimer complex. Indeed, residue 519 is within 4–5 Å of residues of adjacent chains of the Spike trimer, participates in polar interchain interactions, and conformational up/down movement can be impacted at this interface. Here, we sought to utilize this interaction interface to compute predicted free energy of binding calculations (MM/GBSA) analyzing up/down conformation favorability by probing interchain binding free energy in Spike. Results demonstrate that the H519 up conformation has similar interchain binding free energy to the N519 up conformation when unprotonated (-284.1 kcal/mol and − 264.1 kcal/mol, respectively). The down conformation exhibits larger differences between the H519 and N519 structures, with H519 interchain predicted free energy binding energy of -259.2 kcal/mol and N519 of -365.9 kcal/mol. Interestingly, when analyzing protonated H519, we observe predicted free energy of binding for the up conformation H519 of -359.4 kcal/mol and − 280.9 kcal/mol for the down confirmation of H519. This suggests that in a lower physiological pH, the H519 Spike samples the up conformation more favorably compared to the N519 which alternatively energetically favors the down conformation based on interchain interactions. Structural inspection by analyzing the neighboring residues reveals that 519 is surrounded by a pocket of polar and charged residues from a neighboring Spike chain (Fig. 3B-C). Surface mapping of residue properties highlights that H519 exhibits a neutral surface area (Fig. 3D) while N519 results in a more polar surface area (Fig. 3E). These results reveal that H519 may result in a lower energy barrier to overcome when moving between down/up positioning to bind ACE2, potentially allowing for increased infection efficiency.
With the insight that H519N appears to impact up/down conformational changes in protonated SARS-CoV-2 Spike and the potential ability to bind ACE2 by sampling the up position more favorably, we further sought to determine whether Spike H519 would experimentally bind with higher affinity to hACE2. Enzyme-linked immunosorbent assay (ELISA) was performed with hACE2 and several concentrations of RBD from either WT Spike, Spike N501Y, Spike A372T, or Spike H519N. Spike N501Y was used as a positive control with known increased binding to hACE227. Consistent with reduced binding efficiency, the absorbance curve for Spike H519N and Spike A372T RBD are shifted right from WT RBD (Fig. 3F). We observed significantly lower EC50 values for Spike N501Y RBD and significantly higher EC50 values for Spike H519N and Spike A372T compared to WT RBD (Fig. 3G). This result suggests more molecules of Spike H519N RBD would be required to saturate hACE2 binding sites; therefore, this mutation may reduce the affinity to hACE2 leading to reductions in replication.