Amino acid sequence alignment and hot spot analysis
The global amino acid identity between the main antigenic protein of investigated vaccines and SARS-CoV-2 proteins does not exceed 63%. For structural proteins, it varied between 21% and 55% (identity levels for the Spike and Matrix proteins respectively with the Polyprotein E1/E2 of Rubella virus and the HAV VP1 protein). For non-structural proteins, identity levels varied between 21 and 63% (identity rates of ORF1a and ORF3a proteins respectively with HBsAg-adr protein of Hepatitis B virus and Tetanus Toxin protein) (Additional material 1).
Similar segments with main vaccine antigenic protein were identified all along structural and non-structural proteins of SARS-CoV-2. The majority were shorter than 5 consecutive amino-acids for all SARS-CoV-2 proteins (Additional material 2-13). Nevertheless, a total of 12 patterns of six to eight similar consecutive amino-acids were identified in comparison against main antigenic proteins in: Poliovirus, Measles, Streptococcus pneumoniae, Tetanus, Mumps, Hepatitis B, Hib and BCG vaccines (Table1). Two similar segments were identified through comparison of Poliovirus, Measles, PCV10 and Hib proteins and SARS-CoV-2 structural proteins (S and N) and also non-structural proteins (ORF 1a, ORF 6 and ORF 8). While, tetanus, Mumps, Hepatitis B and BCG antigenic proteins showed no more than one similar segment with SARS-CoV-2 proteins (Table1). Among described peptides, seven were similar to others in S protein, obtained with antigenic proteins in poliovirus Sabin 3, S pneumoniae, tetanus, Mumps, Hepatitis B and Hib vaccines. The patterns length varied between six and seven amino acids. Also, one peptide of eight amino acids (GTSPARMA) detected in the Poliovirus VP1 sequence matched with the N protein of the SARS-CoV-2.
We also identified two discontinuous patterns of 10 amino-acids (DISGFNSSVI, MSLSLLDLYL) in the tetanus toxin and the hemagglutinin measles proteins which present 90% and 80% similarity with another segments (DISGINASVV, IELSLIDFYL) in the S and ORF7b proteins of SARS-CoV-2 respectively.
Table1: Description of similar patterns of more than 5 amino-acids obtained in vaccine antigenic proteins and SARS-CoV-2 proteins
Vaccine
|
N° of similar segment
|
Vaccine protein
|
SARS-CoV-2 Protein
|
Designation
|
Segment
|
Position
|
Designation
|
Segment
|
Position
|
Poliovirus
|
2
|
VP1 protein (Sabin 1)
|
GTAPARIS
|
188-195
|
N
|
GTSPARMA
|
203-211
|
VP1 protein (Sabin 3)
|
LDPLSE
|
289-295
|
S
|
LDPLSE
|
293-299
|
Measles
|
2
|
Fusion protein
|
QECLRG
|
359-364
|
ORF6
|
QECVRG
|
21-27
|
IQVGSRR
|
433-440
|
ORF8
|
IRVGARK
|
47-54
|
Streptoccocus pneumonea
|
2
|
Capsular polysaccharide biosynthesis protein (serotype19F, 18C, 14, 7, 4, 1)
|
IGFLAGVI
|
182-190
|
S
|
LGFIAGLI
|
1218-1226
|
Capsular polysaccharide biosynthesis protein (serotype19F, 18C, 14, 5)
|
SSVAFA
|
33-39
|
S
|
NSVAYS
|
703-708
|
Hib
|
2
|
Capsular polysaccharide biosynthesis protein
|
KNINDS
|
210-215
|
S
|
KNLNES
|
1191-1196
|
FILNKKI
|
73-79
|
ORF1a
|
FLLNKEM
|
3183-3189
|
BCG
|
1
|
Immunogenic protein MPB64
|
IFMLVT
|
5-11
|
E
|
VFLLVT
|
25-31
|
Tetanus
|
1
|
Toxin protein
|
NILMQY
|
84-90
|
S
|
NLLLQY
|
751-756
|
Mumps
|
1
|
Fusion protein
|
DISTEL
|
448-454
|
S
|
DISTEI
|
467-473
|
Hepatitis B
|
1
|
HBs Ag-adr
|
PGTSTTS
|
111-117
|
S
|
PGTNTSN
|
600-606
|
Immunogenicity prediction
Our analysis has been conducted in two inspections. First, we focused on characterizing the immunogenicity of the matching sequences with S and N proteins for their involvement in modulating the immune response of the host [11, 12].
Regarding the pattern identified in N protein, no significant similarity was obtained with the crystal structure of the corresponding protein. Among the seven patterns identified in S protein, four segments (LDPLSE, NSVAYS, NLLLQY, PGTNTSN) respectively from Polio, PCV10, Tetanus and HBV vaccines have been mapped to the structure of the spike protein (figure 1A). The three other patterns (LGFIAGLI, KNLNE and DISTEI) are not solved by the electron density map from the Cryo-EM structure. Among the four retained patterns, except for the segments PGTNTSN and LGFIAGLI, none of the other peptides show a putative interaction with one of the MHC receptors predicted by IEDB analysis resource NetMHCpan. However, the prediction for both of these peptides shows a weak peptide score of 0.07 and 0.02 respectively (0 indicates no MHC capacity, and 1 indicates a high probability). the segment PGTNTSN, present in the Hbs Ag of Hepatitis B virus adr strain, is situated in a turn region and shows the highest level of exposure to the solvent (figure 1 B) with an average SASA value of 407 Å2. Thus, among the seven candidate segments, only PGTNTSN shows a putative humoral immunogenicity predicted by EpiJen ranked as the fourth best match with a predicted IC50 Value of 4.33. (Figure 1 C)
In the second inspection, we focused on a list of hits that belong to any of the investigated vaccine sequences and that match any of the other proteins of SARS-CoV-2. All the patterns have been explored for their antigenic potential using EpiJen and IEDB NetMHCpan methods. They don’t show any putative antigenicity properties predicted by the servers.
Discontinuous patterns with more than 10 residues are discarded from the analysis as they show low levels of similarity. Nevertheless, we have retained two segments in Tetanus toxin protein (DISGFNSSV) and chain A, hemagglutinin Protein of the measles virus (MSLSLLDLYL) that significantly matched with SARS-CoV-2 Spike and ORF7b proteins. The segment DISGINASVV of the S protein that matches the sequence DISGFNSSVI of the tetanus toxin shows a putative interaction with one of the MHC receptors predicted by IEDB analysis resource NetMHCpan. These two matching sequences showed high peptide scores of 0.88 and 0.76 (0 indicates no MHC capacity, and 1 indicates a high probability) respectively for SARS-CoV-2 S protein of and the tetanus toxine protein. Regarding segments ORF7b and Measles hemagglutinin Protein, both overlap significantly with regions of putative T-cell antigenicity. This identified segment corresponds to a random coil segment (MSLS) spanned by an alpha helix of six residues (LLDLYL) in the crystal structure of the hemagglutinin [13]. The segment also interacts with a large pocket formed mainly by four strands of a beta-sheet containing many aromatic amino acids. The pocket resembles the grooves of the MHC I and MHC II molecules (Figure 2).