Prediction of continuous B-cell epitopes of S, E, M and N proteins of SARS-CoV-2 and SARS-CoV
A recently published paper revealed that S proteins of SARS-CoV-2 and SARS-CoV are closely related with an amino-acid sequence identity of around 77%1 and SARS-CoV-2 uses ACE2 as the cellular receptor for entry into cells3. In this work, we have identified and predicted potential B- and T-cell epitopes in the structural proteins (S, E, M and N) of SARS-CoV and SARS-CoV-2 using the validated prediction tools (Fig.1). BepiPred8 has identified and predicted around 22 B-cell epitope in SARS-CoV-2 S protein, 14 of these epitopes are likely to be antigenic as predicted by vaxigen (Table 1a). Similarly, Bcepred9 has predicted around 27 B cell epitope in S protein of which 21 of them are likely antigenic epitopes. BepiPred have predicted around 17 B-cell epitope in SARS-CoV-2 N protein which are likely to be antigenic as predicted whereas Bcepred have predicted around 14 B-cell epitope in N protein of which 10 of them are likely to be antigenic epitopes (Table 1a). Both Bepipred and Bcepred have predicted a similar number of epitopes for S and N protein of SARS-CoV (Table 1a). As expected, a smaller number of B-cell epitopes were predicted for E and M proteins of SARS-CoV-2 and SARS-CoV. The prediction results are suggestive of a good degree of correlation between the size of proteins and total number of predicted epitopes. A heat map was generated using R programming for the predicted B- and T-cell epitope. Both Bepipred and Bcepred predicted linear B cell epitope showed the distribution of antigenic epitope in the S1 domain of S protein of SARS-CoV-2 (Fig.2a). As there are no known bioinformatic prediction server that could predict the common epitopes, we have manually analyzed and sorted out B-and T-cell epitopes that are common to SARS-CoV-2 and SARS-CoV. We have found a conserved epitope of S (407-VRQIAPGQTG-416, 421-YNYKLPD-427,1028-KMSECVLGQSKRVDFCGKGYHL-1049, 1254-CKFDEDDSEPVLKGVKLHYT-1273), N (173-AEGSRGGSQASSRSSSR-189 & 235-SGKGQQQQGQTVT-247), M (163-DLPKEITVATSRTLSYYKLG-182) and E (58-VYSRVKNLNSS-68) proteins that are common to both SARS-CoV and SARS-CoV-2 (Table 1b).
The accessibility of linear B-cell epitopes of S, E, M and N proteins predicted as antigenic, non-allergenic and nearly conserved properties were visualized using BIOVIA Discovery Studio 2017 R2 and the results revealed that epitopes are likely to be localized on the surface of 3D-structure of proteins (Table 1c and Fig. 3a-e). The predicted linear B-cell epitopes were enriched in the S and N proteins, whereas less enriched in the M and E proteins of SARS-CoV and SARS-CoV-2 (Table 1a & 1c). Most of the predicted epitope peptides showed good helical content as predicted based on Agadir score14,15. In addition to Bepipred and Bcepred, we have predicted the presence of continuous (linear) B-cell epitope of S, E, M and N proteins of SARS-CoV-2 using the ABCpred server. ABCpred is used for prediction of B cell epitope in an antigen sequence with 65.93% accuracy using recurrent neural networks27. In this study, we applied a window length of 18 amino acid (aa) with a threshold setting of 0.7, the predicted epitopes were further verified by VexiJen V2.0 to find out antigenic B-cell epitopes. A total of 52 antigenic linear B cell epitopes were predicted for S, 18 for N, 7 for M and 3 for E proteins of SARS-CoV-2 (Table 1d), many of these epitopes were also found to be predicted by Bepipred and Bcepred (Table 1c).
We have carried out sequence alignment of RBDs of S protein of SARS-CoV and SARS-CoV-2 using clustal omega (Fig.4). Inset table of Fig.4. shows the linear and conformation B cell epitope predicted and identified in the RBD of the S protein of SARS-CoV-2 and notably, we have found a linear B-cell epitope of S protein (370-NSASFSTFKCYGVSPTKLNDLCFTNV-395, inset table of Fig. 4) that was recently found to interact with CR3022. CR3022 is a previously described neutralizing antibody isolated from a convalescent SARS patient have now found to neutralize SARS-CoV-228,29. Co-crystallization of CR3022 and RBD of the S protein of SARS-CoV-2 have identified a conserved epitope residues enabling cross-reactive neutralization between SARS-CoV-2 and SARS-CoV without hampering ACE2 binding to RBD of S protein28. Many of the predicted linear and conformational B cell epitopes (Inset table of Fig.4) predicted in this work are found to be located in the RBD of S protein of SARS-CoV-2 (S1B residues 338-506) and a receptor-binding subdomain (residues 438-498). 47D11 is the first report of a human monoclonal antibody that neutralizes SARS-CoV-2 most likely by binding to the conserved core structure of the S1B (residues 338-506) using a mechanism independent of receptor-binding inhibition30. Similarly, several virus-specific memory B cells recognizing the RBD of the S protein of SARS-CoV-2 have been identified and only two (of the total 206 SARS-CoV-2 RBD-specific monoclonal antibodies) antibodies were found to block the viral entry which correlated with high competing capacity against ACE2 receptor31. The findings are suggestive of virus species-specific antibody response to RBDs of SARS and cross-recognition target regions outside the RBDs. Amanat et al (2020) have developed a serological assay using the S protein expressed in insect (iSpike/iRBD) and mammalian (mSpike/mRBD) expression system. They observed that all COVID-19 plasma/serum samples reacted strongly to both RBD and full-length spike protein. Reactivity of COVID-19 sera was, in general, stronger against the full-length S protein than against the RBD. They have tested an additional 12 serum samples from patients with acute COVID19 disease, as well as convalescent participants, for reactivity to mRBD and mSpike. All 12 samples reacted with both RBD and spike protein32.
Prediction of conformational B-cell epitopes of S, E, M and N proteins of SARS-CoV-2 and SARS-CoV
The Cryo-EM structure of S protein of SARS-CoV-2 (PDB:6VSB3) is available from residues 1 to 1208 that had missing 65 residues of C-terminal region of the protein. Similarly, the Cryo-EM solved structure (residues 1 to 1195) of S protein of SARS-CoV had missing residues of 60 amino acids towards the C-terminal end (PDB:5WRG33) due to limited electron density. We have overcome this problem using I-TASSER modelled structures of S protein of SARS-CoV and SARS-CoV-234. We compared modelled structures with cryo-EM solved structure3,34 and observed high similarity with root mean square deviation (RMSD) of around 1.3 Å (SARS-CoV) and 2.0 Å (SARS-CoV-2) (data not shown). Similarly, the structures of E, M and N proteins were modelled and used as input structures to identify potent conformational B-cell epitopes using CBTOPE, DiscoTope, ElliPro and EPSVR. We identified a cluster of conformational B-cell epitopes predominately enriched in S and N proteins of SARS-CoV and SARS-CoV-2 (Table 2a & 2c). We have generated a heat map showing the distribution of antigenic epitopes in the S1 domain of S protein of SARS-CoV-2 (Fig.2b). Notably, we found conformational B-cell epitope of S, E, M and N proteins common to both SARS-CoV-2 and SARS-CoV (Table 2b). The findings of this work revealed that predicted conformational epitopes are likely to be localized on the accessible region of 3D-structure of proteins as visualized by BIOVIA Discovery Studio 2017 R2 (Fig. 5a-e).
Prediction of MHC I binding epitopes of S, E, M and N proteins of SARS-CoV-2 and SARS-CoV
SARS-CoV is closely related human beta-CoV to SARS-CoV-2 and a previous data from SARS-CoV patients in 2003-2004 has identified CD8+ T and CD4+ T cell responses using the whole proteome35, a likely possibility has been that substantial CD4+ T cell, CD8+ T cell, and neutralizing antibody responses develop to SARS-CoV-2 and all contribute to clearance of the acute infection and protective immunity against SARS-CoV-2 infection36,37. The availability of information of SARS-CoV-2 proteins and epitopes recognized by human T-cell responses can greatly assist researchers in selecting potential epitopes or target proteins for the design of the candidate vaccine. All the four structural proteins (S, E, M and N) of SARS-CoV-2 and SARS-CoV were screened and epitopes were predicted that are likely to be presented on 27 HLA class I molecules (HLA-A and HLA-B) molecules using three different servers, proteasomal cleavage/TAP transport/MHC class I combined predictor17, MHC-NP18 and TepiTool16.
We selected the high affinity-ranked peptides and found nearly 683 MHC I epitope for S protein of SARS-CoV-2, as predicted using proteasomal cleavage/TAP transport/MHC class I combined predictor, 683 by MHC-NP and 341 by Tepitool (Table 3a). Similarly, we observed likely number of MHC I epitopes of S protein of SARS-CoV predicted using the proteasomal cleavage/TAP transport/MHC class I combined predictor (674), MHC-NP (673) and TepiTool (331). This observations give us some confidence in the ability of three independent servers with a varying operating algorithm to predict a consistent number of unique MHC I epitopes across all four proteins (S, E, M & N) and corroborate with the overall high sequence similarity of SARS-CoV and SARS-CoV-2 (Fig. 6a). The predicted epitopes were further screened for their ability to elicit an IFN-γ response using the IFNepitope tool22 and the epitopes that are predicted as likely conserved across the species of CoVs, antigenic, non-allergenic have been shortlisted. It is also worth mentioning that the same epitope is being predicted by more than two servers employed in this study. Notably, we observed and identified 6 CD8+ T cell epitopes in S, 6 epitopes in N, 5 epitopes each in M and E proteins that are found common to both SARS-CoV-2 and SARS-CoV (Table 3b). Table 3c shows the list of top MHC I binding epitopes that are antigenic, non-allergenic, a positive score for IFN-γ and the majority of the listed epitopes are predicted by more than two servers used in this study. A total of 41 MHC I epitope was found for S protein, 19 each for N & M and 10 for E protein of SARS-CoV-2 (Table 3c).
We found that one epitope of N protein (316-GMSRIGMEV-324) common to both SARS-CoV-2 and SARS-CoV (N317-325, Table 3b) predicted as high affinity epitope (HLA-A*02:03, HLA-A*02:01, HLA-A*02:06) was previously known to induce interferon-gamma (IFN-γ) in PBMCs of SARS-recovered patients38. Interestingly, two epitopes, 1041-GVVFLHVTY-1049 (HLA-A*11:01, HLA-A*68:01, HLA-A*03:01) and 1202-FIAGLIAIV-1210 (HLA-A*02:06, HLA-A*68:02, Table 3c) derived from SARS-CoV S protein having epitope conservancy between 85 to 100% with S protein of SARS-CoV-2 was previously reported as immunogenic in PBMCs derived from SARS-CoV patients38. CD4+ and CD8+T cells can recognize antigen only when it is presented by a self-MHC molecule and MHC molecules are extremely polymorphic. Selecting multiple peptides with different HLA binding specificities will afford increased coverage of the patient population targeted by peptide-based vaccines and diagnostic methods. Using the IEDB tool we found the occurrence of HLA alleles with population coverage of around 99% throughout the world except for central America where the population coverage of HLA allele was less than 10% (Fig.7a). We found that HLA-A*02:01, HLA-A*02:03, HLA-A*02:06, HLA-A*11:01, HLA-A*30:01, HLA-A*68:01, HLA-A*68:02, HLA-B*15:01 and HLA-B*35:01 have been predicted to bind to the maximum number of epitope (based on the criterion of allele predicted to bind more than 30 epitopes) of S protein of SARS-CoV-2 (Table S1a). Similarly, we observed that HLA-A*02:06, HLA-A*30:01, HLA-A*30:02, HLA-A*31:01, HLA-A*32:01, HLA-A*68:01, HLA-A*68:02, HLA-B*15:01 and HLA-B*35:01 are predicted to bind to the maximum number of epitope (based on the criterion of allele predicted to bind more than 10 epitopes) of N protein of SARS-CoV-2 (Table S1a).
Prediction of MHC II binding epitopes of S, E, M and N proteins of SARS-CoV-2 and SARS-CoV
Helper T cells are required for adaptive immune responses and they help activate B cells to secrete antibodies and help activate cytotoxic T cells to kill infected target cells. NetMHCIIpan 4.019 and TepiTool16 have been used to predict and identify high-affinity MHC II binding epitopes based on 23 HLA class II molecules (HLA-DP, HLA-DQ and HLA-DR) molecules. We found nearly 884 and 519 MHC II epitope for S protein of SARS-CoV-2 predicted using NetMHCIIpan 4.0 and TepiTool (Table 4a). A similar number of epitopes were predicted for S protein of SARS-CoV, the findings of computational predictions are suggestive of reliability and good correlation. Interestingly, we found a total of eight (8) CD4+ T cell epitope for S, six (6) for N, two (2) for M and four (4) for E proteins that are common to both SARS-CoV-2 and SARS-CoV with epitope conservancy of approximately between 80% to 100% (Table 4b). We have selected the top list of MHC II epitopes based on antigenicity, non-allergenicity and conservancy and all these epitopes were predicted to induce IFN-gamma as shown in Table 4c. The allele-wise distribution of predicted epitopes for S, E, M and N proteins of SARS-CoV-2 have been presented in a heat map (Fig.6b). Using the IEDB tool we found the occurrence of HLA alleles with population coverage of around 99% throughout the world except for South Africa where the population coverage of HLA class II allele was around 30% (Fig.7b).
We found that HLA-DRB1*04:01, HLA-DRB1*04:05, HLA-DRB1*13:02, HLA-DRB1*15:01, HLA-DRB3*01:01, HLA-DRB3*02:02, HLA-DRB4*01:01, HLA-DRB5*01:01, HLA-DQA1*04:01, DQB1*04:02, HLA-DPA1*02:01, DPB1*01:01, HLA-DPA1*01:03, DPB1*02:01, HLA-DPA1*01:03, DPB1*04:01, HLA-DPA1*03:01, DPB1*04:02, HLA-DPA1*02:01, DPB1*05:01, HLA-DPA1*02:01, and DPB1*14:01 are predicted to bind to the maximum number of epitope (based on the criterion of allele predicted to bind more than 30 epitopes) of S protein of SARS-CoV-2 (Table S1b). Similarly, we observed that HLA-DRB1*04:01, HLA-DRB1*07:01, HLA-DRB1*08:02, HLA-DRB1*09:01, HLA-DRB1*11:01, HLA-DRB1*13:02, HLA-DRB3*02:02, HLA-DRB5*01:01, HLA-DQA1*01:02, DQB1*06:02, DPB1*05:01 and HLA-DPA1*02:01 are found to interact with the maximum number of epitope (based on the criterion of allele predicted to bind more than 10 epitopes) of N protein of SARS-CoV-2 (Table S1b). The findings of computational predictions are suggestive that T-cell based immunity might be generated largely against S and N protein of SARS-CoV-2 and could potentially be selected as target candidate for recombinant protein-based vaccine. On careful observations of computationally predicted epitopes (Table 5), we found epitope in S1B of spike protein (370-NSASFSTFKCYGVSPTKLNDLCFTNV-395) likely to function as both linear B-cell and MHC class II epitope. 403-RGDEVRQIAPGQTGKIADYNYKLPD-427 and 437-NSNNLDSKVGGNYNYLYRLFRKSNL-461 peptides of S protein were predicted as linear B cell and MHC class I epitope. 177-MDLEGKQGNFKNLREFVFKN-196 and 1253-CCKFDEDDSEPVLKGVKLHYT-1273 peptides of S protein were predicted as linear and conformational B cell epitope. Similar overlapping B- and T-cell epitopes were predicted for E, M and N proteins of SARS-CoV-2 (Table 5).
Utilizing bioinformatic approaches a few researchers have identified specific peptides in SARS-CoV-2 with increased probability of being T-cell targets37,39 and have identified circulating SARS-CoV-2-specific CD8+ and CD4+ T-cells in approximately 70% and 100% of COVID-19 convalescent patients, respectively40. CD4+ T cell responses were observed predominately in the S protein and the robustness of T cell responses was correlated with the magnitude of IgG and IgA titers of SARS-CoV-2. Among the structural proteins, the S and M proteins were mainly recognized as possible targets for CD8+ T cells of SARS-CoV-240.
Identification of potential B and T cell epitope of structural proteins for serological assay and multi-epitope-based vaccine
The spike protein of SARS-CoV-2 and SARS-CoV share an amino-acid sequence identity of around 77% and are phylogenetically closely related1. Previously characterized immune response of patients with SARS have found maximum neutralizing antibodies against the S and N proteins4,5,6 and the recent findings showed S protein as an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients32,41,42. Therefore, S and N proteins are the major targets for development of vaccine and serological assay because SARS-CoV-2 neutralization assays are time-consuming and require BSL-3 containment facilities. In this study, we have designed a multi-epitope chimeric constructs by utilizing the computationally predicted putative and potent B- and T-cell epitopes of S, E, M and N proteins of SARS-CoV-2 (Table 6). In the multi-epitope constructs we have included MHC class I (CTL) and class II binding epitopes (HTL) having the positive score for IFN-γ induction. CTL and HTL epitopes were linked together by AAY and KK cleavable linkers whereas the B- cell epitopes (linear and conformational) were linked together with GGGGS flexible linker (Fig. 8a-f). Poly-Gly-rich linkers can be considered as independent units and do not affect the function of the individual proteins. A total of six (6) multi-epitope chimeric constructs namely, (i) RBD of S protein (B- and T-cell epitopes), (ii) Full-length S protein (B- and T-cell epitopes), (iii) Structural protein construct (B- and T-cell epitopes of S, E, M and N proteins), (iv) Chimeric construct of S and N proteins (B- and T-cell epitopes of S and N), (v) S protein (B-cell epitopes of S protein) and (vi) Nucleocapsid protein (B-cell epitopes of N protein) were designed containing N-CTL, HTL and B-cell epitopes (Fig.8a-f). The structure of chimeric multi-epitope constructs has been modelled using I-TASSER (Fig.9a-f) and validated by RAMPAGE server to generate a Ramachandran plot43. Most of the amino acid residues of epitope constructs were found in the favourable region (Fig.S1, Inset table) except two constructs (RBD of S and nucleocapsid protein). Various physicochemical parameters that included number of residues, theoretical pI, molecular weight, aliphatic index and grand average of hydrophobicity (GRAVY) were analysed by ProtParam44. Based on the aliphatic index scores, the multi-epitope constructs could be considered moderately thermostable (Table S2). Gravy scores obtained were of a negative value for all the constructs indicating the likelihood of the chimeric multi-epitope antigen being globular and hydrophilic in nature. The secondary structure of multi-epitope constructs was predicted by online server PSIPRED45 and helical and strand content of the constructs are provided in Table S2 and Fig.S2. Cathepsin and carboxypeptidase involved in proteolytic processing of endocytosed proteins in the MHC class II pathway display preferential cleavage of dibasic (RR, KK, KR or RK) sites46. Proteases, which provide peptide ligands for the MHC class II antigenic presentation pathway display preferential cleavage of hydrophobic motifs (AAY). The cleavable linkers are required to be accessible for the proteases associated with MHC I and II antigen processing pathway. It is observed that the cleavable linker residues (AAY and KK) used in multi-epitope subunit vaccines were surface accessible based on computational prediction algorithms indicating high probability of T-cell epitopes presentation by MHC molecules as visualized by discovery studio (Fig.10a-f). The results of C-ImmSim server (http://150.146.2.1/C-IMMSIM/index.php) prediction revealed the ability of multi-epitope constructs to simulate IFN-γ production following an immunization with the peptide (Fig. 11a-f). In-silico immune simulation of the multi-epitope constructs {(A) RBD of spike protein, (B) Spike protein, (C) Structural protein construct and (D) Chimeric construct of S and N proteins)} showed consistent results with the actual immune responses as observed by the primary response of high levels of IgM. This is followed by marked increase in B-cell populations and levels of IgG1 + IgG2, IgM, and IgG + IgM antibodies as a part of the secondary and tertiary responses (Fig. 12, A-D (i-iv). A similarly high response was observed in the TH (helper) and TC (cytotoxic) cell populations with corresponding memory development especially for constructs made of structural proteins (S, E, M and N) and chimeric construct of S and N proteins (Fig.12C & D).