Amino acid sequence alignment and hot spot analysis
The global amino acid identity between the main antigenic protein of investigated vaccines and SARS-CoV-2 proteins does not exceed 63%. For structural proteins, it varied between 21% and 55% (identity levels for the S and M proteins respectively with the Polyprotein E1/E2 of the Rubella virus and the HAV VP1 protein). For non-structural proteins, identity levels varied between 21 and 63% (identity rates of ORF1a and ORF3a proteins respectively with HBsAg-adr protein of Hepatitis B virus and Tetanus Toxin protein) (Supplementary material 1).
Similar segments with main vaccine antigenic proteins were identified along with structural and non-structural proteins of SARS-CoV-2. The majority were shorter than five consecutive amino-acids for all SARS-CoV-2 proteins (Supplementary material 2-13). Nevertheless, a total of twelve patterns of six to eight similar consecutive amino-acids were identified in comparison with the main antigenic proteins of Poliovirus, Measles, Streptococcus pneumoniae, Tetanus, Mumps, Hepatitis B, Hib and BCG vaccines (Table1). Two similar segments were identified through comparison of Poliovirus, Measles, PCV10 and Hib proteins and SARS-CoV-2 structural proteins (S and N) and also non-structural proteins (ORF 1a, ORF 6 and ORF 8). In contrast, Tetanus, Mumps, Hepatitis B and BCG antigenic proteins showed no more than one similar segment with SARS-CoV-2 proteins (Table1). Among the described peptides, seven were similar to others in the S protein of SARS-CoV-2 and were identified in the antigenic proteins in poliovirus Sabin 3, S pneumoniae, tetanus, Mumps, Hepatitis B and Hib vaccines. The pattern’s length varied between six and seven amino acids. Also, one peptide of eight amino acids (GTSPARMA), detected in the Poliovirus VP1 sequence, matched with the N protein of the SARS-CoV-2.
We also identified two discontinuous patterns of 10 amino-acids each, DISGFNSSVI and MSLSLLDLYL, in the tetanus toxin and the hemagglutinin Measles virus proteins which had 90% and 80% similarity with matching segments, DISGINASVV (1168-1177aa), IELSLIDFYL (2-11aa), in the S and ORF7b proteins of SARS-CoV-2 respectively.
Table1: Description of similar patterns of more than five amino-acids obtained in vaccine antigenic proteins and SARS-CoV-2 proteins
Vaccine
|
N° of similar segment
|
Vaccine protein
|
SARS-CoV-2 Protein
|
Designation
|
Segment
|
Position
|
Designation
|
Segment
|
Position
|
Poliovirus
|
2
|
VP1 protein (Sabin 1)
|
GTAPARIS
|
188-195
|
N
|
GTSPARMA
|
203-211
|
VP1 protein (Sabin 3)
|
LDPLSE
|
289-295
|
S
|
LDPLSE
|
293-299
|
Measles
|
2
|
Fusion protein
|
QECLRG
|
359-364
|
ORF6
|
QECVRG
|
21-27
|
IQVGSRR
|
433-440
|
ORF8
|
IRVGARK
|
47-54
|
Streptococcus pneumoniae
|
2
|
Capsular polysaccharide biosynthesis protein (serotype19F, 18C, 14, 7, 4, 1)
|
IGFLAGVI
|
182-190
|
S
|
LGFIAGLI
|
1218-1226
|
Capsular polysaccharide biosynthesis protein (serotype19F, 18C, 14, 5)
|
SSVAFA
|
33-39
|
S
|
NSVAYS
|
703-708
|
Hib
|
2
|
Capsular polysaccharide biosynthesis protein
|
KNINDS
|
210-215
|
S
|
KNLNES
|
1191-1196
|
FILNKKI
|
73-79
|
ORF1a
|
FLLNKEM
|
3183-3189
|
BCG
|
1
|
Immunogenic protein MPB64
|
IFMLVT
|
5-11
|
E
|
VFLLVT
|
25-31
|
Tetanus
|
1
|
Toxin protein
|
NILMQY
|
84-90
|
S
|
NLLLQY
|
751-756
|
Mumps virus
|
1
|
Fusion protein
|
DISTEL
|
448-454
|
S
|
DISTEI
|
467-473
|
Hepatitis B
|
1
|
HBs Ag-adr
|
PGTSTTS
|
111-117
|
S
|
PGTNTSN
|
600-606
|
Immunogenicity prediction
First, we focused on characterizing the immunogenicity of the matching sequences with S and N proteins for their involvement in modulating the immune response of the host [19, 20].
Regarding the pattern GTAPARIS matching with N protein sequence (GTSPARMA), it did not map to the structure of the N protein from SARS-CoV-2. Moreover, no significant match with CMH-I predicted epitope was distinguished. The prediction of the B-cell epitope using the N protein sequence showed a potential antigenic peptide of 51 amino acids (165-216) that harbors the pattern GTSPARMA identified from our similarity search.
Among the seven patterns identified in the SARS-CoV-2 S protein, four segments (LDPLSE, NSVAYS, NLLLQY, PGTNTSN) from Polio, PCV10, Tetanus and HBV vaccines, respectively, have been mapped on the structure of the spike protein S1 subunit (Figure 1A). We were also able to map one other pattern, KNLNE, on the structure of the six-helical bundle fusion core solved independently (S2 subunit) from the rest of the ectodomain. The two other patterns (LGFIAGLI, and DISTEI) were not solved by the electron density map from the Cryo-EM structures. Among the five retained patterns, the segments PGTNTSN and LGFIAGLI showed a putative interaction with one of the MHC-I receptors predicted by IEDB analysis resource NetMHCpan. Furthermore, the prediction for these two peptides showed a weak peptide score of 0.07 and 0.02, respectively (0 indicates no MHC-I capacity, and 1 indicates a high probability). The segment PGTNTSN, existing in the Hbs Ag of Hepatitis B virus adr strain, is located in a turn region.
On the other hand, the prediction of epitopes for B-cell response using Bepipred 2.0 from the IEDB analysis resource showed the implication of four putative patterns from the total set of the seven segments, namely LDPLSE, NSVAYS, DISTEI and PGTNTSN. These segments match the predicted epitopes LDPL, YTMSLGAENSVAYSNN, NLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTN and TNTSN (Figure 1 B). The sequence KNLNES does not fall in a putative B-cell epitope region.
We also calculated the Solvent Accessible Surface Area (SASA) using different probe radii to allow better insight into the possible interaction of antibody Complementarity-Determining Regions (CDRs) to the predicted epitopes (Figure 1C). Our results show that exposure to both water molecules and the antibody paratope is only preserved for the segment "PGTNTSN". Consequently, the SASA values at probe radii of 1.4 Å, 5 Å, and 10 Å are 528.69 Å2, 497.6 Å2, and 305.38 Å2, respectively.
Second, we focused on a list of hits that belonged to the investigated vaccine sequences and that match any of the other proteins of SARS-CoV-2. All the patterns have been explored for their antigenic potential using IEDB Bepipred and IEDB NetMHCpan methods. None of the investigated patterns showed a significant putative B-cell antibody binding property. Discontinuous patterns with more than ten residues were discarded from the analysis as they showed low levels of similarity. Consequently, we have retained two segments from Tetanus toxin protein (DISGFNSSVI) and chain A hemagglutinin protein of the Measles virus (MSLSLLDLYL) that significantly matched SARS-CoV-2 Spike and ORF7b proteins, respectively. The segment DISGINASVV of the S protein (Figure 2A) showed a putative interaction with the MHC-I receptor encoded by one of the corresponding HLA alleles. DISGINASVV and corresponding matching segment DISGFNSSVI showed high peptide scores of 0.88 and 0.76 for the SARS-CoV-2 S and the tetanus toxin proteins, respectively. The segment DISGINASVV is part of the six-helical bundle fusion core of the spike protein. It belongs to the HR2 domain as a random coil structure [21]. The peptide shows an extended conformation within its native environment stabilized by the residues of a small groove formed between two HR1 parallel helices from different monomers. The SASA value for DISGINASVV peptide is 504.88 Å2. In contrast, its matching sequence from Tetanus toxin DISGFNSSVI corresponds to a SASA value of 243.3 Å2 (Figure 2B) and the Bepipred tool shows only a partial implication of the sub-string "DISGI" as an epitope in the context of B-cell response.
Regarding the ORF7b and Measles hemagglutinin proteins, the identified similar segments overlap significantly with regions of putative T-cell antigenicity. The matching segment of the Measles hemagglutinin protein (Figure 2C) corresponded to a random coil segment (MSLS) spanned by an alpha helix of six residues (LLDLYL) in the crystal structure of the hemagglutinin [22]. The segment also interacts with a large pocket formed mainly by four strands of a beta-sheet containing many aromatic amino acids. The pocket is similar to the groove of the MHC-I molecule (Figure 2C and supplementary material 14). Moreover, MSLSLLDLYL corresponds to a SASA measured at 439.19 Å2 (Figure 2B). The NetMHCpan tool predicted an antigenicity score of 0.18 for the MSLSLLDLYL segment using the sequence of ORF7b. We also noticed that the matching segment of the Measles hemagglutinin Protein, i.e “IELSLIDFYL” is represented by a substring “IELSLIDFY” that shows the highest antigenicity score of 0.59 among all the predicted epitopes.