Sequence and Structural Alignment
Figure 1A shows part of the Multiple Sequence Alignment (MSA) between the seven human coronaviruses (229E, NL63, OC43, HKU1, SARS, MERS, and COVID-19) spike proteins performed by Clustal Omega web server and visualized by ESpript software. The secondary structure for the COVID-19 spike model is displayed at the top of the MSA, and residual surface accessibility is present at the bottom. Alpha helices are shown by helix while arrows show beta-sheets on the top of the MSA. The residues that are surface accessible are in blue, while buried residues are in white at the bottom of the MSA. Identical residues are highlighted in red, while similar residues are highlighted in yellow. The positions of the disulfide bonds are marked by the green numbers below the accessibility rows in the MSA. 13 disulfide bonds are found in the spike protein from which we predict four regions to be the binding site with cell surface GRP78. These four regions of the spike protein, identified with the disulfides numbers 3, 4, 5, and 6 are marked in the MSA with green, blue, magenta, and red dashed lines, respectively. A complete MSA for the spike protein (1,273 residues) is found in supplementary figure S1.
The SARS spike protein sequence is the closest to the COVID-19 spike, with 77.38% identity. In contrast, OC43, MERS, HKU1, 229E, and NL63 share only 32.81%, 32.79%, 31.86%, 30.35%, and 28.28%, respectively, with COVID-19 spike. Figure 1B shows the superposition of the homo-trimeric COVID-19 spike model (cyan cartoon) and SARS spike structure (PDB ID: 6ACD) (green cartoon). Two views are shown with a vertical axis rotation of 180o. The Root Mean Square Deviation (RMSD) between the two structures is only 0.284 Å, while the sequence identity is 77.38%.
Pep42 versus spike regions
Pep42 is reported to specifically target the cell-surface GRP78 in cancer cells 14. Its selectivity against GRP78 has been reported for its cyclic form but not for the extended form. This may be due to the rigidity of the cyclic structure of the peptide, which causes the stabilization of the hydrophobic patch formed by C1, V3, A4, L5, V10, V12, and C13. These residues become closer to each other by the aid of the disulfide bond, making the cyclic peptide the perfect docking platform for GRP78 SBDβ
We found 13 disulfide bonds in the COVID-19 spike protein model that form 13 different cyclic regions that may resemble the cyclic Pep42. Four of these disulfides are found in the outer surface of the spike receptor-binding domain that faces the outside part of the virion, a region that has been targeted with neutralizing antibodies against the SARS and MERS spikes. These four regions, namely, the region I C336: C361 (26 residues), region II C379: C432 (54 residues), region III C391: C525 (135 residues), and region IV C480: C488 (9 residues), are marked in Figure 1A.
Figure 2A shows the pairwise sequence alignments between each spike region and Pep42. The percentage of pairwise sequence identity is listed on the right side of each alignment. The percent identity for region III is the most significant (46.15%) compared to other regions (15.38%, 23.08%, and 33.33% for the regions I, II, and IV, respectively). As shown in Figure 2A, region IV is part of region III. Moreover, regions II and III share some residues. Again, identical residues are highlighted in red, while the conserved residues are highlighted in yellow. The secondary structure is shown at the top of the alignment and the surface accessibility at the bottom. Region IV has all of its residues exposed at the surface (either blue, meaning surface accessible, or cyan, for partially accessible residues). For other regions, some residues are surface exposed (blue or cyan), while others are buried (in white).
Figure 2B shows the hydrophobicity index (Kyte & Doolittle) for each suggested region and the Pep42. The grand average hydrophobicity index for each region (GRAVY) is listed in front of each peptide. Regions I, II, and III have negative values of GRAVY (-0.24, -0.30, and - 0.28, respectively). In contrast, region IV has a positive value (0.08), which means that it has a slightly more hydrophobic character compared to other regions. Pep42 has a highly hydrophobic character (GRAVY value of 1.1) that enables it to be recognized by the cell-surface GRP78 14,38,39.
Figure 3A shows the structure model of the COVID-19 spike protein model (homo- trimeric) in a colored cartoon representation. Region IV of the spike (C480: C488) is not only cyclic but also surface-accessible and protrudes to the outer side of the spike, i.e., facing the target cell. It has a slightly hydrophobic character, hence resembling the Pep42 cyclic peptide, and it seems suitable to be the binding site to the cell-surface GRP78. Figure 3B shows the region III of the spike (black cartoon). As shown in the enlarged panel, region IV is part of region III and it is the most surface-exposed part of the spike receptor-binding domain. The contribution of region IV (C480: C488) in binding region III to GRP78 is high (-9.8 out of -14.0 kcal/mol)
Binding mode of spike-GRP78
GRP78-COVID-19 spike protein docking was performed using the HADDOCK software in four different ways. Each region of the spike (predicted to be the binding site to GRP78) was used as the binding site to GRP78, using its active residues selected to be that have hydrophobic character. The active residues for region I are: C336, F338, V341, F342, A344, R347, F348, S350, A352, I358, and C361. For region II, the active residues are: C379, V382, L387, I390, C391, F392, V395, A397, F400, V401, I402, V407, I410, A411, I418, A419, L425, F429, and C432. For region III, the active residues are: C391, F392, V395, A397, F400, V401, I402, V407, I410, A411, I418, A419, L425, F429, C432, V433, I434, A435, L441, V445, L452, L455, F456, L461, F464, I468, I472, A475, C480, V483, F486, C488, F490, L492, F497, V503, V510, V511, V512, L513, F515, L517, L518, A520, A522, V524, and C52. Finally, for region IV, the active residues are C480, V483, F486, and C488. For GRP78, the active site residues are retrieved from previous work to be I426, T428, V429, V432, T434, F451, S452, V457, and I459.
Figures 4A and 4B show the binding mode for each of the docking trials (the best-formed complexes from each docking experiment) with green cartoon representing GRP78 and yellow cartoon representing the homo-trimeric COVID-19 spike. All the docking trials proved the possibility of fitting the GRP78 SBDβ to the spike with binding affinities (predicted by PRODIGY) ranging from -9.8 up to -14 kcal/mol. In terms of the orientation of the two interacting proteins, regions III and IV are accepted to be the docking platform. In these two trials, the GRP78 and spike can interact in a head-to-head fashion.
Table 1 summarizes the docking trials of the four regions of the COVID-19 spike protein against GRP78. The docking scores are listed, while the interaction pattern is analyzed by PLIP software and listed in the table. As shown from the docking scores, region IV of the spike is the best docking platform to GRP78, with a score of -143.5 ±4.4. This score is lower (better) than other regions by 18.3%, 15.2%, and 65.5% for the region I, region II, and region III, respectively. The PRODIGY binding affinities are also listed in table 1. The PLIP analysis partially explains the binding affinity. Region IV of the spike interacts with the substrate-binding domain β of GRP78 with five H-bonds (through P479, N481, E484, and N487) and four hydrophobic interactions (through T478, E484, and F486). The average H-bond length for the docking trial of region IV is 2.26 ±0.54, while the average hydrophobic contact length is 3.66 ±0.18. These values are less than other docking trials using other regions.
Figure 5A shows the predicted binding mode of the GRP78 (cyan surface) to the spike of the newly emerged coronavirus (green surface) using region IV of the spike (red surface) as the docking platform. The interacting residues of GRP78 and the spike proteins are shown in yellow and red, respectively. The enlarged panel shows in more detail the interacting residues of GRP78 (yellow sticks) and the region IV of the spike (red cartoon) labeled with its one-letter codes. This binding mode is acceptable since the two proteins are interacting, as when the virus is approaching the target cell (respiratory system cells) expressing its cell-surface receptor, GRP78. Figure 5B shows a hypothetical binding model showing the homotrimer spike (red surface) protein of the COVID-19 bound to a respiratory system cell exposing the GRP78 protein (green surface). This scenario could occur in stressed cells when GRP78 is overexpressed and translocated from the Endoplasmic Reticulum (ER) to the cell membrane (M).
The predicted binding site of the Spike protein to GRP7 found in this study is in good agreement with studies that identified the spike receptor-binding domain using antibodies 23,41. Knowledge of this binding site could open the door for further experimental and simulation studies on the mode of envelope protein recognition by the highly dynamical GRP78 substrate- binding domain.