Prediction and Evolution of B Cell Epitopes of Surface Protein in SARS-CoV-2

doi:10.21203/rs.3.rs-26822/v1

Download PDF

Research

Prediction and Evolution of B Cell Epitopes of Surface Protein in SARS-CoV-2

https://doi.org/10.21203/rs.3.rs-26822/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 29 Oct, 2020

Read the published version in Virology Journal →

You are reading this older preprint version

Read the latest preprint version →

Background The discovery of epitopes is helpful to the development of SARS-CoV-2 vaccine.

Methods The sequences of the surface protein of SARS-CoV-2 and its proximal sequences were obtained by BLAST, the sequences of the whole genome of SARS-CoV-2 were obtained from the GenBank. Based on the NCBI Reference Sequence: NC_045512.2, the conformational and linear B cell epitopes of the surface protein were predicted separately by various prediction methods. Furthermore, the conservation of the epitopes, the adaptability and other evolutionary characteristics were also analyzed.

Results 7 epitopes were predicted, including 5 linear epitopes and 2 conformational epitopes, one of the linear and one of the conformational were coincide. The epitope D mutated easily, but the other epitopes were very conservative and the epitope C was the most conservative. It is worth mentioning that all of the 6 dominated epitopes were absolutely conservative in nearly 1000 SARS-CoV-2 genomes, and they deserved further study.

Conclusion The findings would facilitate the vaccine development, had the potential to be directly applied on the treatment in this disease, but also have the potential to prevent the possible threats caused by other types of coronavirus.

Infectious Diseases

SARS-CoV-2

Epitopes

Bioinformatics

Evolution

In late December 2019, a novel coronavirus was officially named as SARS-CoV-2 by World Health Organization(WHO) and identified as the pathogen causing outbreaks of SARS-like and MERS-like illness in Chinese city of Wuhan, which was a zoonotic disease. As of March 13, 2020, the outbreak of SARS-CoV-2 has been reported in many areas of the world, with more than 130,000 people infected[1]. With an alarmingly human-to-human transmissibility, the reproductive number of SARS-CoV-2 has been computed to around 3.28[2]. According to the data in NGDC(National Genomics Data Center), at 15:00(GMT + 8) on March 13, 2020, 482 genomic variations of SARS-CoV-2 has been reported, which has aroused widespread concern.

The B cell epitope of viral surface protein can specifically bind to the host’s B cell antigen receptor and induce the body to produce protective antibody and humoral immune response. The discovery of epitopes is helpful to the development of SARS-CoV-2 vaccine and the understanding of SARS-CoV-2’s pathogenesis[3]. 3 proteins embedded in the virus envelope of SARS-CoV-2 have been identified, Spike(S) protein, Envelope(E) protein, Membrane(M) protein. At present, due to the lack of study of the crystal structure of surface protein of SARS-CoV-2, the study of epitopes is time-consuming, power-consuming, cost and difficult [4].

In this work, we analyzed the surface protein of SARS-CoV-2, predicted the structures with bioinformatics methods. On the basis, we predicted the linear and conformational B cell epitopes, analyzed the conservation of the epitopes, the adaptability and other evolutionary characteristics of the surface protein, which provided a theoretical basis for the vaccine development and prevention of SARS-CoV-2.

Materials

All of the analysis was based on the NCBI Reference Sequence: NC_045512.2. We obtained the sequence of S, E and M protein and its proximal sequences by BLAST, which got 420, 334 and 329 sequences in total from NCBI database respectively. We obtained the whole genome sequence of SARS-CoV-2 from Genbank and GISAID(959 in total), which were used to be a dataset after genome annotation. The genome sequences, which performed mistakes of translation, were deleted.

Basic analysis of surface protein of SARS-CoV-2

The physical and chemical properties of target protein were analyzed by the Port-Param tool in ExPASy [5], including the primary structure of the target protein, molecular formula, theoretical isoelectric point, the protein instability index(the index < 40 means the protein was stable), etc. Online software, ProtScale, was used to deeply analyze the hydrophilicity and hydrophobicity of target protein and the distribution of hydrophilicity and hydrophobicity of polypeptide chains [5]. SARS-CoV-2 carried the S/E/M protein through the virus envelope, the transmembrane region of the protein was predicted online by TMHMM 2.0 [6].

Prediction of the 3D structure of target protein

With the amino acid sequences of the surface protein of SARS-CoV-2 of NC_045512.2 as templates, based on homology modeling method, we predicted the 3D structure through the online server SWISS-MODEL[7], selected and optimized the optimal structure based on the template identity and GMQE value[7], the rationality of the structure was evaluated by Ramachandran plot [8] with PDBsum server. The structures were displayed and analyzed by SWISS-pdb Viewer v4.10 [9].

Prediction of conformational B cell epitopes of target protein of SARS-CoV-2

Based on the structures, the conformational B cell epitopes were predicted by SEPPA 3.0[10] and Ellipro [21] respectively, and the common predicted conformational B cell epitopes from two methods were selected for the further analysis.

Prediction of linear B cell epitopes of target protein of SARS-CoV-2

The Protean module of DNAStar was used to predict the flexibility[11], surface probablity [12] and antigenic index [13] of the target protein of SARS-CoV-2. The linear B cell epitope was predicted by ABCpred [14] and BepiPred 2.0 [15] respectively and the common predicted linear B cell epitopes from two methods were selected for the further analysis. Coupled with the secondary structure, the tertiary structure and the glycosylation sites [16] etc, the linear B cell epitopes were finally determined.

Analysis of epitope conservation

Based on the PDB model and the multiple alignment result, we used the Consurf Server to analyze the conservation of amino acid sites of the epitopes online[17]. The conservation of epitopes on the surface protein of SARS-CoV-2 was analyzed by multiple alignment with MAFFT and Logo was drawn with Weblogo [18, 19].

Basic analysis of surface protein of SARS-CoV-2

The primary structure and physicochemical properties of the S/E/M protein were analyzed. The results revealed that the S protein has an average hydrophilic index of -0.079(Figure S1A). On the basis of hydrophilicity, it also showed amphoteric properties. There was an outside-in transmembrane helix in 23 residues from position 1214th to position 1236th at the N-terminal(Figure S2A). The protein instability index was 33.01, which revealed the S protein was stable. The E protein has an average hydrophilic index of 1.128(Figure S1B). It was hydrophobic. An inside-out transmembrane helix in 23 residues from position 12th to position 34th at the N-terminal was predicted(Figure S2B). The protein instability index was 38.68, which revealed the E protein was stable. The M protein has an average hydrophilic index of 0.446(Figure S1C). On the basis of hydrophobicity, it also showed amphoteric properties. There were two outside-in transmembrane helices, one was in 20 residues from position 20th to position 39th, the another one was in 23 residues from position 78th to position 100th, and an inside-out transmembrane helix in 20 residues from position 51st to position 73rd, at the N-terminal(Figure S2C). The protein instability index was 39.14, which revealed the M protein was stable.

Prediction of the 3D structure of surface protein of SARS-CoV-2

The optimal template for homology modeling of the S protein of SARS-CoV-2 was the S protein of SARS(PDB ID: 6acc.1), with the sequence identity of 76.47% and the GMQE score of 0.73. According to the evaluation of the structure by Ramachandran plot(Fig. 1A), 99.3% of the residues were located in the most favoured regions and the allowed regions, 0.7% of the residues were located in the disallowed regions, the high-energy regions(Table 1), which was possibly due to some energy was spent in the protein processing to make these residues enter the high-energy regions [20]. The result generally showed that the structure was reliable. The structure of S protein(Fig. 1B) of SARS-CoV-2 is a trimer, which can be divided into a tightly curled tail and a distributed head. The head is mainly composed of β-sheet, irregular curl and turn, which is exposed to the envelope of the virus, contributes to the formation of epitopes. The tail is mainly composed of several α-helices, part of which is embedded in the envelope, hinders the formation of epitopes.

The optimal template for homology modeling of the E protein of SARS-CoV-2 was the E protein of SARS(PDB ID: 5 × 29.1), with the sequence identity of 91.38% and the GMQE score of 0.73. According to the evaluation of the structure by Ramachandran plot(Fig. 1C), 100% of the residues were located in the most favoured regions(Table 1), indicating that the structure was reliable. The E protein of SARS-CoV-2 is a pentamer(Fig. 1D), which can be divided into the concentrated transmembrane part and the head located outside the envelope. The head is mainly composed of α-helix, irregular curl and turn, which is exposed to the envelope, contributes to the formation of epitopes. The tail is mainly composed of long α-helix, most of which are embedded in the envelope, hinders the formation of epitopes.

The optimal template for homology modeling of the M protein of SARS-CoV-2 was the effector protein Zt-KP6-1(PDB ID: 6qpk. 1. A), with the sequence identity of 20.00% and the GMQE score of 0.06. The sequence identity between the optimal template and the M protein of SARS-CoV-2 and the GMQE score are too low, so that the template is not suitable for homology modeling.

Prediction of linear B cell epitopes

All linear B cell epitopes of the surface protein were filtered according to the following criteria: (1) region with high surface probability(≥ 0.75), strong antigenicity(≥ 0) and high flexibility; (2) excluding the region with α-helix, β-sheet and glycosylation site(Fig. 2); (3) in line with the prediction by BepiPred 2.0(cut off to 0.35)(Table S1) and ABCpred(cut off to 0.51)(Table S2). Based on the results obtained with these methods and artificial optimization, 4 potential linear B cell epitopes of the S protein were predicted(Table 2, Fig. 3A), including 601–605 aa, 656–660 aa, 676–682 aa, 808–813 aa, and they were named as the epitope A, B, C, D, respectively; 1 epitope of the E protein was selected(60–65 aa) and named as the epitope F(Table 2, Fig. 3C); 1 epitope of the M protein was selected (211–215 aa) and named as the epitope H(Table 2).

Prediction of conformational B cell epitopes

The conformational B cell epitopes of surface protein were predicted with Ellipro(Table S3) and SEPPA 3.0(Table S4) with the threshold of 0.063 and 0.5, respectively. After the artificial optimization, one conformational B-cell epitope (403–405,416,445,446,455,500 aa) of S protein was predicted(Table 2). It is obvious that the region located on the head of the S protein(Fig. 3B), which is the outside of SARS-CoV-2, making it easy to form an epitope. We selected it as a dominant conformational epitope and named it as the epitope E. Additionally, one conformational B-cell epitope(60–65 aa) of E protein was predicted(Table 2), which is consistent with the epitope F of the E protein. Similarly, this region located on the outside(Fig. 3C), we selected it as a dominant conformational epitope and named G. However, the conformational epitope of the M protein could not be predicted due to the failure of credible homology modeling.

Analysis of epitope conservation

The Consurf Server was used to predict epitope conservative sites with the structure of surface proteins and the alignment results in different datasets (Table S5). All the epitopes of the S, E, M protein were absolute conservative among all SARS-CoV-2 sequences(Table 3A, Figure S3A-G). To further calculate the conservation of the epitopes in different coronavirus datasets, the representative sequences from SARS-CoV-2 were selected to participate in the human coronavirus dataset and the coronavirus dataset, due to amino acid sequences of some S or E or M protein were absolute conservative in SARS-CoV-2. The conservation was a little lower in human coronavirus than those of in SARS-CoV-2(Table 3B, Figure S4A-G), and the epitope D was easy to mutate. The other epitopes were conservative and the epitope F/G was the most conservative one. As for the coronavirus(Table 3C, Figure S5A-G), in the 5 epitopes of S protein, 4 of which obtained the conservative score less than 1, ranging from − 0.854 to 0.256. It showed that the epitope C with the minimum score is the most stable and not easy to mutate. Besides, the score of the epitope D was 1.247, showing the relatively high possibility to mutate. The epitopes of the E protein were stable with the conservative score less than 1. The site conservation of the M protein could not be predicted due to the failure of credible homology modeling.

SARS-CoV-2 caused huge impact to human production, living and even life, has become a major challenge confronting the whole world. Development of vaccine is one of the effective means of prevention and treatment of the virus long-term. Epitope vaccine is the trend of development of vaccine due to the advantages of strong pertinence, less toxic and side effects and easy to transportation and storage [21]. The determination of epitopes is the basis of the development and application of vaccine, and the clinical diagnosis and treatment. Currently, the methods which were mainly used are X- crystal diffraction method, immune experiment method and bioinformatics method. The first two are time-consuming and laborious, the bioinformatics method is gaining more and more credibility among researchers [3, 21, 22]. There are many factors to be considered in the prediction of epitopes by bioinformatics method, such as the surface probablity and flexibility of the epitopes. At the same time, it is necessary to exclude the structurally stable and non-deformable α-helix, β-sheet, glycosylation sites which may obscure the epitopes or alter the antigenicity, etc [23]. Even so, the predicted epitopes are still inaccurate [4]. Compared with the current study on SARS-CoV-2, this work adopted various prediction methods and 3D structure databases developed in recent years, which were based on artificial neural network, Hidden Markov Model(HMM), Support Vector Machine(SVM), etc, such as ABCpred, BepiPred2.0, SEPPA 3.0, IEDB, etc. Compared with prediction by a single method [24], on the basis of a single protein [25] on the basis of epitopes of SARS [26], these methods and databases greatly improved the accuracy of prediction and had more bioinformatic meaning. We comprehensively analyzed the prediction results from the tools which were widely used, set up screening criteria on the basis of primary structure, secondary structure and tertiary structure, so that the prediction results would more accurate and reliable.

The S protein, the E protein and the M protein are surface proteins of SARS-CoV-2, which have the potential as antigenic molecules. However, the current study on the epitopes prediction of SARS-CoV-2 [27], due to the S protein has been reported to be the directly binding molecule of SARS-CoV-2 to ACE2[28], the prediction of epitopes is mainly focusing on the S protein, with few studies on the E protein and the M protein. In this work, we analyzed the S protein, the E protein and the M protein, predicted their epitopes. On the basis, 7 B cell epitopes were predicted, including 2 conformational and 6 linear B cell epitopes, one of the conformational and one of the linear are coincide. All of the epitope A, B, C, D located on the surface of the tail of the S protein, which is relatively easy to bind. The epitope E is located on the head of the S protein, which is the key area where the S protein recognizes and binds to ACE2 [28, 29], has the potential to block the infection process. The epitope F and the epitope G located on the end of the head of the E protein, the two epitopes coincide, this may due to they are all consecutive and the secondary structure avoided the α-helix and the β-sheet. The epitope H is derived from the M protein, the structure and conservation could not be determined due to the inability to predict reliable structure. However, it could be known from the surface probablity scores that the epitope H is more likely to be located on the surface of the M protein.

The higher the conservation score calculated by the Consurf Server is, the more likely the site is to be mutated in the evolutionary process. When the score < 1, the site is likely to be a conservative site; when the score is between 1 and 2, the site is a site which is likely to be a relatively easy mutation; when the score > 2, the site is likely to be an easy mutation site [30]. In the 7 epitopes obtained, all the epitopes of the S, E, M protein were absolute conservative among all SARS-CoV-2 sequences. For the human coronavirus dataset and the coronavirus dataset, only the average conservative score of the epitope D is higher than 1, which is prone to mutation. The epitope D should not be used as an epitope of the S protein. The conservation of the epitope H could not be calculated by the PDB file, the application value of the epitope H needed further experimental verification. Although the epitopes could be integrally considered to be conservative, the independent residues of these epitopes could still easy to mutate. Except the epitope E, all of 6 dominate epitopes contain 1–2 residues which has a conservative score higher than 1(Table 3C), indicating that these residues were likely to be easy mutation sites. These residues mostly located at the head or the tail of the epitopes, therefore, the mutation of these residues should be paid attention to, and the length of the epitopes should be adjusted according to the actual effect in application. The scores of epitopes in different datasets were different, which could due to the quantity of sequences in the datasets and the structures were analyzed in different situations.

The epitope detection in glycoproteins is significant to the study of the immunoreaction of SARS-CoV-2, but its challenge is less reliable than the epitope detection due to the presence of glycan[25]. In addition, SARS-CoV-2 would mutate frequently, the epitopes predicted might mutate too, so conservative epitopes analyzed in the present study might be more reliable. However, this work is limited. Without the molecular dynamics analysis, the binding between epitopes and antibodies was not simulated to further determine the availability of epitopes, but researches from different perspectives can provide more epitopes choices for subsequent studies.

In this work, we predicted 6 reliable epitopes: A, B, C, E, F/G and H. The reliability of the epitopes of the S protein was relatively better than that of the epitopes of the E protein and the M protein, indicating that the S protein is still the optimal choice for the prediction of epitopes and the development of vaccine. All of the 6 epitopes were able to achieve absolute conservation in SARS-CoV-2, and to achieve relative conservation in the data set, including SARS, etc. Therefore, the epitopes not only have the potential to be directly applied on the treatment in this disease, but also have the potential to prevent the possible threats caused by other types of coronavirus. In addition, although various factors of prediction were integrated in this work, more experimental data are needed to further verify whether all the 6 epitopes can induce the body to produce corresponding antibodies and generate specific humoral immunity, due to the limited data set and other factors.

SARS-CoV-2: Severe acute respiratory syndrome coronavirus 2

BLAST: Basic Local Alignment Search Tool

NCBI: National Center for Biotechnology Information

SARS: Severe acute respiratory syndrome

MERS: Middle east respiratory syndrome

NGDC: National Genomics Data Center

PortParam: Protein Parameters

GMQE: Global model quality estimate

PDB: Program database file

Ethic approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

The viral genomes described in detail here were deposited in NCBI, Genbank and GISAID.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the National Key R&D Program of China (2018YFC0910201), the Key R&D Program of Guangdong Province (2019B020226001), and the Science and the Technology Planning Project of Guangzhou (201704020176).

Authors’ contribution

JL conceived the study and participated in its design and coordination. HD participated in the design of the study and helped draft the manuscript. YB participated in analysis of conservation, sequence alignment and manuscript drafting. BZ participated in antigenic prediction. FC participated in drafting the manuscript. All authors read and approved the final manuscript.

Acknowledgements

The Student Entrepreneurship and Innovation Center of the school of biology and biological engineering, South China University of Technology, also provided a lot of help during the preparation of the project.

Han Q, Lin Q, Jin S, You L. Recent insights into 2019-nCoV: a brief but comprehensive review. J Infect 2020. doi:10.1016/j.jinf.2020.02.010.
Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J. The reproductive number of COVID-19 is higher compared to SARS coronavirus. J Travel Med 2020. doi:10.1093/jtm/taaa021.
El-Manzalawy Y, Honavar V. Recent advances in B-cell epitope prediction methods. Immunome Res. 2010;6 Suppl 2:S2. doi:10.1186/1745-7580-6-S2-S2.
Sun P, Ju H, Liu Z, Ning Q, Zhang J, Zhao X, et al. Bioinformatics resources and tools for conformational B-cell epitope prediction. Comput Math Methods Med. 2013;2013:943636. doi:10.1155/2013/943636.
Walker JM. The Proteomics Protocols Handbook. Dordrecht: Springer; 2005.
Blom N, Gammeltoft S, Brunak S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol. 1999;294:1351–62. doi:10.1006/jmbi.1999.3310.
Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296-W303. doi:10.1093/nar/gky427.
Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr. 1993;26:283–91. doi:10.1107/S0021889892009944.
Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 1997;18:2714–23. doi:10.1002/elps.1150181505.
Zhou C, Chen Z, Zhang L, Yan D, Mao T, Tang K, et al. SEPPA 3.0-enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res. 2019;47:W388-W394. doi:10.1093/nar/gkz413.
Karplus PA, Schulz GE. Prediction of chain flexibility in proteins. Naturwissenschaften. 1985;72:212–3. doi:10.1007/BF01195768.
Emini EA, Hughes JV, Perlow DS, Boger J. Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide. J Virol. 1985;55:836–9.
Jameson BA, Wolf H. The antigenic index: a novel algorithm for predicting antigenic determinants. Comput Appl Biosci. 1988;4:181–6. doi:10.1093/bioinformatics/4.1.181.
Saha S, Raghava GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins. 2006;65:40–8. doi:10.1002/prot.21078.
Larsen JEP, Lund O, Nielsen M. Improved method for predicting linear B-cell epitopes. Immunome Res. 2006;2:2. doi:10.1186/1745-7580-2-2.
Gupta R, Jung E, Brunak S. Prediction of N-glycosylation sites in human proteins. 2004;46:203–6.
Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016;44:W344-50. doi:10.1093/nar/gkw408.
Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90. doi:10.1101/gr.849004.
Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–100. doi:10.1093/nar/18.20.6097.
Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins. 1992;12:345–64. doi:10.1002/prot.340120407.
Groot AS de, Sbai H, Aubin CS, McMurry J, Martin W. Immuno-informatics: Mining genomes for vaccine components. Immunol Cell Biol. 2002;80:255–69. doi:10.1046/j.1440-1711.2002.01092.x.
Yang X, Yu X. An introduction to epitope prediction methods and software. Rev Med Virol. 2009;19:77–96. doi:10.1002/rmv.602.
Chen W, Zhong Y, Qin Y, Sun S, Li Z. The evolutionary pattern of glycosylation sites in influenza virus (H5N1) hemagglutinin and neuraminidase. PLoS ONE. 2012;7:e49224. doi:10.1371/journal.pone.0049224.
Grifoni A, Sidney J, Zhang Y, Scheuermann RH, Peters B, Sette A. A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2. Cell Host Microbe 2020. doi:10.1016/j.chom.2020.03.002.
Baruah V, Bose S. Immunoinformatics‐aided identification of T cell and B cell epitopes in the surface glycoprotein of 2019‐nCoV 2020. doi:10.1002/jmv.25698.
Yuan M, Wu NC, Zhu X, Lee C-CD, So RTY, Lv H, et al. A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV; 2020.
Lv H, Wu NC, Tsang OT-Y, Yuan M, Perera RAPM, Leung WS, et al. Cross-reactive antibody response between SARS-CoV-2 and SARS-CoV infections; 2020.
Tian X, Li C, Huang A, Xia S, Lu S, Shi Z, et al. Potent binding of 2019 novel coronavirus spike protein by a SARS coronavirus-specific human monoclonal antibody; 2020.
Yan R, Zhang Y, Guo Y, Xia L, Zhou Q. Structural basis for the recognition of the 2019-nCoV by human ACE2; 2020.
WANG S, Li G, Di WU, Cao Z. Mutation Feature Analysis on Epitope and Receptor Binding Sites of Influenza A H1N1 Hemagglutinin. ACTA BIOPHYSICA SINICA. 2012;28:486. doi:10.3724/SP.J.1260.2012.20015.
Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh C-L, Abiona O, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367:1260–3. doi:10.1126/science.abb2507.
Verdiá-Báguena C, Nieto-Torres JL, Alcaraz A, Dediego ML, Enjuanes L, Aguilella VM. Analysis of SARS-CoV E protein ion channel activity by tuning the protein and lipid charge. Biochim Biophys Acta. 2013;1828:2026–31. doi:10.1016/j.bbamem.2013.05.008.

Due to technical limitations the tables are available as downloads in the Supplementary Files.

Table 1 The plot statistics of the Ramachandran plot

Table 2 The composition and the antigenic index of the epitopes of SARS-CoV-2

Table 3A The conservation of the epitopes in SARS-CoV-2

Table 3B The conservation of the epitopes in human coronavirus

Table 3C The conservation of the epitopes in coronavirus

Figure S1 Deep analysis of hydrophilicity and hydrophobicity of surface protein of SARS-CoV-2

The online software, ProtScale, was used to predict the hydrophilicity and hydrophobicity of the surface protein deeply. A. The S protein has a maximum score of hydrophobicity, 3.222 at the 7^th site, which revealed a strong hydrophobicity; a minimum score of hydrophobicity, -2.589 at the 679^th site, which revealed a strong hydrophilicity. The score of hydrophilicity and hydrophobicity on the polypeptide chain of S protein constantly fluctuates, with most of the scores being negative, which revealed the possibility that the protein had bisexual properties on the basis of hydrophilicity. B. The E protein has a maximum score of hydrophobicity, 3.489 at the 21^st and the 25^th site, which revealed a strong hydrophobicity; a minimum score of hydrophobicity, -1.550 at the 65^th site, which revealed a strong hydrophilicity. Most of the scores of the residues being positive, which revealed the possibility that the protein has obvious hydrophobicity. C. The M protein has a maximum score of hydrophobicity, 2.978 at the 84^th site, which revealed a strong hydrophobicity; a minimum score of hydrophobicity, -1.956 at the 211^th and the 212^th site, which revealed a strong hydrophilicity. The scores of hydrophilicity and hydrophobicity on the polypeptide chain of M protein showed large fluctuations, and the number of positive scores and negative scores were similar, the positive scores accounted for the majority, which revealed the possibility that the protein had bisexual properties on the basis of hydrophobicity.

Figure S2 The transmembrane region of the surface protein of SARS-CoV-2

The S, E and M protein are embedded in the envelope of SARS-CoV-2, the transmembrane helix was predicted by TMHMM 2.0 server. All of three amino acid indexes were higher than 18, indicating the reliability of the prediction. A. For the S protein, an outside-in transmembrane helix was predicted in the 23 residues of amino acids from position 1214^th to position 1236^th at the N-terminal. The amino acid index was 23.97303. B. For the E protein, an inside-out transmembrane helix was predicted in the 23 residues of amino acids from position 12^th to position 34^th at the N-terminal. The amino acid index was 25.72521. C. For the M protein, 2 outside-in transmembrane helices were predicted, which were a helix in the 20 residues of amino acids from position 20^th to position 39^th and a helix in the 23 residues of amino acids from position 78^th to position 100^th at the N-terminal. An inside-out helix was predicted in the 23 residues of amino acids from position 51^st to position 73^rd at the N-terminal. The amino acid index was 64.90522.

Figure S3 The antigenic conservation of the surface protein in SARS-CoV-2

A. The epitope A was absolutely conservative in 756 SARS-CoV-2 genomes. B. The epitope B was absolutely conservative in 756 SARS-CoV-2 genomes. C. The epitope C was absolutely conservative in 756 SARS-CoV-2 genomes. D. The epitope D was absolutely conservative in 756 SARS-CoV-2 genomes. E. The epitope E was absolutely conservative in 756 SARS-CoV-2 genomes. F. The epitope F/G was absolutely conservative in 939 SARS-CoV-2 genomes. G. The epitope H was absolutely conservative in 913 SARS-CoV-2 genomes.

Figure S4 The antigenic conservation of the surface protein in human coronavirus

A. The conservation of the epitope A in 331 human coronavirus genomes. B. The conservation of the epitope B in 331 human coronavirus genomes. C. The conservation of the epitope C in 331 human coronavirus genomes. D. The conservation of the epitope D in 331 human coronavirus genomes. E. The conservation of the epitope E in 331 human coronavirus genomes. F. The conservation of the epitope F/G in 268 human coronavirus genomes. G. The conservation of the epitope H in 268 human coronavirus genomes.

Figure S5 The antigenic conservation of the surface protein in coronavirus

A. The conservation of the epitope A in 403 human coronavirus genomes. B. The conservation of the epitope B in 403 human coronavirus genomes. C. The conservation of the epitope C in 403 human coronavirus genomes. D. The conservation of the epitope D in 403 human coronavirus genomes. E. The conservation of the epitope E in 403 human coronavirus genomes. F. The conservation of the epitope F/G in 334 human coronavirus genomes. G. The conservation of the epitope in 327 human coronavirus genomes.

Table S1 Bepipred2.0 linear epitope prediction results

Table S2 ABCpred linear epitope prediction results

Table S3 Prediction results of conformational B cell epitopes of surface protein by Ellipo

Table S4 Prediction results of conformational B cell epitopes of surface protein by SEPPA3.0

Table S5 Conservation analysis of epitopes in different datasets by Consurf

Download PDF

Journal Publication

published 29 Oct, 2020

Read the published version in Virology Journal →

Review #3 received at journal
16 Jul, 2020
Editorial decision: Major revision
16 Jul, 2020
Reviewer #5 agreed at journal
03 Jul, 2020
Review #2 received at journal
17 Jun, 2020
Review #1 received at journal
17 Jun, 2020
Reviewer #4 agreed at journal
12 Jun, 2020
Reviewer #3 agreed at journal
11 Jun, 2020
Reviewers invited by journal
22 May, 2020
Reviewer #1 agreed at journal
22 May, 2020
Reviewer #2 agreed at journal
22 May, 2020
Editor invited by journal
11 May, 2020
Editor assigned by journal
11 May, 2020
Submission checks completed at journal
05 May, 2020
First submitted to journal
01 May, 2020

You are reading this older preprint version

Read the latest preprint version →

Prediction and Evolution of B Cell Epitopes of Surface Protein in SARS-CoV-2

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Materials And Methods

Materials

Basic analysis of surface protein of SARS-CoV-2

Prediction of the 3D structure of target protein

Prediction of conformational B cell epitopes of target protein of SARS-CoV-2

Prediction of linear B cell epitopes of target protein of SARS-CoV-2

Analysis of epitope conservation

Results

Basic analysis of surface protein of SARS-CoV-2

Prediction of the 3D structure of surface protein of SARS-CoV-2

Prediction of linear B cell epitopes

Prediction of conformational B cell epitopes

Analysis of epitope conservation

Discussion

Conclusion

List Of Abbreviations

Declarations

References

Tables

Supplementary Information

Supplementary Files

Status:

Journal Publication

Version 1