In order to identify the possible cross-reactive regions between SARS-CoV2 and other HCoVs, we carried out conservation analysis of SARS-CoV-2 virus with OC43, HKU1, and NL63, which have been seen in Sri Lanka. We analyzed the homology and conservation of the four structural and the sixteen NSPs of SARS-CoV2 with the three other HCoVs (Supplementary table 1). The NSP12, NSP13, and NSP16 of the SARS-CoV2 showed ≥ 65% homology with OC43 and HKU, which are beta coronaviruses and a > 55% homology with NL63, which is an alpha coronavirus.
In contrast, the NSP1 and NSP2 proteins of the SARS-CoV2 showed < 20% homology with other three viruses, suggesting that these proteins were more specific for SARS-CoV-2 compared to the other proteins. All four structural proteins (S, E, M and N), NSP3, and NSP6 showed < 35% homology with OC43, HKU1, and NL63. Therefore, immune responses directed at these proteins may also be specific to the SARS-CoV2 unlike immune responses generated against NSP12, NSP13 and NSP16. The SARS-CoV-2 proteins S, E, M, N and the non-structural proteins showed less homology with NL63 than OC43, and HKU1, suggesting that the. SARS-CoV-2 virus is genetically closer with OC43 and HKU1 than NL63.
Identification of possible CD8 + epitopes of the SARS-CoV2 virus restricted through HLA-A alleles of the SARS-CoV2
The NetMHCpan EL4 epitope prediction tool gives peptide binding scores ranging from 0 to 1.0 and we considered a predicted score ≥ 0.90 for a peptide as indicative of a stronger binder. None of the 8mer peptides gave predictive score of ≥ 0.90. However, 39 9mer peptides gave a high binding score of ≥ 0.90, which were restricted through different HLA-A alleles, (Table 1). Five of these epitopes were identified from the spike and NSP3 proteins and 4 epitopes each were identified from NSP4, NSP6, NSP12, NSP14, and NSP15. Peptides with high binding scores were not identified from the envelope, NSP1, NSP9, NSP10, NSP11, and NSP16 proteins. The epitopes from NSP3 726YYTSNPTTF734 (predicted to be restricted through HLA-A*2402) and NSP6 70FLLPSLATV78 (predicted to be restricted through HLA-A*0201) gave a score of 0.99, while 1349NYMPYFFTL1357 from NSP3, 420FLLNKEMYL428 from NSP4, and 152ALWEIQQVV160 from NSP8 also gave a score of 0.98 scores. However, they had < 45% homology with the other three viruses. 23/39 of the 9mer epitopes predicted in this study were restricted through HLA-A*02 while 16/39 9mer peptides were predicted to be restricted through HLA-A*24. Although HLA-A*33 was an allele seen in over 10% of the Sri Lankan population, 9mers that had a score of ≥ 0.90 was not identified.
Table 1
Predicted CD8 + epitopes of the SARS-CoV2 virus restricted through HLA-A alleles
9mer peptides with a peptide binding score of ≥ 0.90 |
Protein | HLA-A allele | Sequence | Score | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | A*02:01 | 269YLQPRTFLL277 | 0.97 | 44 | 44 | 22 |
Spike | A*24:02 | 1208QYIKWPWYI1216 | 0.95 | 77 | 77 | 77 |
Spike | A*02:01 | 976VLNDILSRL984 | 0.94 | 67 | 67 | 33 |
Spike | A*24:02 | 635VYSTGSNVF643 | 0.93 | 22 | 22 | 22 |
Spike | A*02:01 | 109TLDSKTQSL117 | 0.91 | 22 | 22 | 33 |
Membrane | A*24:02 | 95YFIASFRLF103 | 0.91 | 77 | 77 | 77 |
Nucleocapsid | A*02:01 | 222LLLDRLNQL230 | 0.96 | 33 | 33 | 33 |
NSP2 | A*02:01 | 265GLNDNLLEI273 | 0.92 | 44 | 33 | 44 |
NSP2 | A*24:02 | 497TFFKLVNKF505 | 0.90 | 33 | 22 | 22 |
NSP3 | A*24:02 | 726YYTSNPTTF734 | 0.99 | 22 | 22 | 22 |
NSP3 | A*24:02 | 1349NYMPYFFTL1357 | 0.98 | 33 | 22 | 11 |
NSP3 | A*24:02 | 816YYHTTDPSF824 | 0.96 | 11 | 11 | 0 |
NSP3 | A*24:02 | 364LYDKLVSSF372 | 0.95 | 33 | 33 | 11 |
NSP3 | A*24:02 | 1081YYKKDNSYF1089 | 0.94 | 33 | 33 | 0 |
NSP4 | A*02:01 | 420FLLNKEMYL428 | 0.98 | 33 | 33 | 33 |
NSP4 | A*02:01 | 359FLAHIQWMV367 | 0.94 | 44 | 44 | 33 |
NSP4 | A*24:02 | 351FYLTNDVSF359 | 0.92 | 22 | 22 | 22 |
NSP4 | A*24:02 | 486LYQPPQTSI494 | 0.90 | 66 | 66 | 66 |
NSP5 | A*02:06 | 194LIQDYIQSV202 | 0.95 | 44 | 33 | 11 |
NSP6 | A*02:01 | 70FLLPSLATV78 | 0.99 | 33 | 33 | 44 |
NSP6 | A*02:01 | 141TLMNVLTLV149 | 0.92 | 0 | 0 | 0 |
NSP6 | A*24:02 | 84VYMPASWVM92 | 0.91 | 0 | 0 | 11 |
NSP6 | A*24:02 | 115MYASAVVLL123 | 0.90 | 22 | 22 | 22 |
NSP7 | A*02:01 | 12VLLSVLQQL20 | 0.95 | 66 | 66 | 44 |
NSP8 | A*02:01 | 152ALWEIQQVV160 | 0.98 | 44 | 33 | 22 |
NSP12 | A*24:02 | 37IYNDKVAGF45 | 0.95 | 66 | 44 | 44 |
NSP12 | A*02:01 | 123TMADLVYAL131 | 0.93 | 77 | 77 | 77 |
NSP12 | A*02:06 | 334FVDGVPFVV342 | 0.93 | 100 | 100 | 88 |
NSP12 | A*02:01 | 854LMIERFVSL862 | 0.91 | 88 | 88 | 55 |
NSP13 | A*02:01 | 239TLVPQEHYV247 | 0.95 | 77 | 77 | 44 |
NSP13 | A*24:02 | 397VYIGDPAQL405 | 0.91 | 100 | 100 | 77 |
NSP14 | A*02:01 | 321LLADKFPVL329 | 0.94 | 11 | 11 | 33 |
NSP14 | A*02:01 | 176NLSDRVVFV184 | 0.93 | 66 | 66 | 77 |
NSP14 | A*02:01 | 184VLWAHGFEL192 | 0.92 | 66 | 44 | 66 |
NSP14 | A*02:01 | 494YLDAYNMMI502 | 0.91 | 44 | 44 | 44 |
NSP15 | A*02:06 | 243SQLGGLHLL251 | 0.96 | 66 | 66 | 66 |
NSP15 | A*02:01 | 297LLLDDFVEI305 | 0.95 | 66 | 88 | 88 |
NSP15 | A*02:06 | 312SVVSKVVKV320 | 0.94 | 66 | 66 | 55 |
NSP15 | A*02:06 | 181KVDGVVQQL189 | 0.92 | 22 | 22 | 11 |
10mer peptides with a peptide biding score of ≥ 0.90 |
Protein | HLA-A allele | Sequence | Sore | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | A*24:02 | 1066TYVPAQEKNF1075 | 0.94 | 30 | 30 | 10 |
Spike | A*24:02 | 159VYSSANNCTF168 | 0.90 | 10 | 30 | 20 |
NSP3 | A*24:02 | 717VYYTSNPTTF726 | 0.98 | 30 | 40 | 20 |
NSP6 | A*24:02 | 242YDYLVSTQEF251 | 0.91 | 50 | 50 | 50 |
Only four 10mer peptides were predicted to have a score of ≥ 0.90 and two were from spike and one each from NSP3 and NSP6 proteins (Table 1).
Identification of possible CD8 + epitopes of the SARS-CoV2 virus restricted through HLA-B alleles of the SARS-CoV2
As for HLA-A alleles, we considered a predicted score ≥ 0.90 for a peptide as indicative of a stronger binder. None of the 8mer peptides were found to give a score of ≥ 0.90 and therefore, were not predicted to be restricted through HLA-B alleles. However, 38 9mer peptides were identified which had high binding scores and were predicted to be restricted through HLA-B alleles (Table 2). The highest number of epitopes were predicted from spike protein (5/38) and NSP13 (4/38). 3 epitopes were predicted from each of the following proteins: namely the nucleocapsid, NSP2, NSP3, NSP4, and NSP12. No epitopes were identified from envelope, membrane, and NSP11 proteins. Nine epitopes gave score of 0.99 and were 895IPFAMQMAY 903 from spike, 325TPSGTWLTY333 from nucleocapsid, 195SEVGPEHSL203 and 562GETLPTEVL570 both from NSP2, 120EEFEPSTQY128 and 546QEILGTVSW554 both from NSP3, 72 LPSLATVAY 80 from NSP6, 4 SEFSSLPSY 12 from NSP8, and 608 VENPHLMGW 616 from NSP12.
Table 2
Predicted CD8 + epitopes of the SARS-CoV2 virus restricted through HLA-B alleles
9mer peptides with a peptide binding score of ≥ 0.90 |
Protein | HLA-B allele | Sequence | Score | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | B*35:01 | 895IPFAMQMAY 903 | 0.99 | 33 | 44 | 33 |
Spike | B*35:01 | 83LPFNDGVYF 91 | 0.98 | 22 | 33 | 11 |
Spike | B*44:03 | 1200QELGKYEQY 1208 | 0.98 | 44 | 44 | 33 |
Spike | B*35:01 | 686VASQSIIAY 694 | 0.98 | 0 | 0 | 0 |
Spike | B*40:01 | 1015AEIRASANL1023 | 0.98 | 22 | 22 | 44 |
Nucleocapsid | B*35:01 | 325TPSGTWLTY333 | 0.99 | 22 | 22 | 22 |
Nucleocapsid | B*44:03 | 322MEVTPSGTW 330 | 0.96 | 11 | 11 | 0 |
Nucleocapsid | B*35:01 | 79 SPDDQIGYY 87 | 0.92 | 22 | 22 | 55 |
NSP1 | B*40:02 | 56VEKGVLPQL64 | 0.98 | 22 | 22 | 11 |
NSP1 | B*35:01 | 110HVGEIPVAY118 | 0.95 | 22 | 11 | 11 |
NSP2 | B*40:01 | 195SEVGPEHSL203 | 0.99 | 11 | 11 | 0 |
NSP2 | B*40:01 | 562GETLPTEVL570 | 0.99 | 0 | 0 | 0 |
NSP2 | B*44:02 | 52REHEHEIAW60 | 0.98 | 22 | 22 | 22 |
NSP3 | B*44:03 | 120EEFEPSTQY128 | 0.99 | 11 | 11 | 0 |
NSP3 | B*44:03 | 546QEILGTVSW554 | 0.99 | 22 | 22 | 0 |
NSP3 | B*40:01 | 1799AELAKNVSL1807 | 0.98 | 0 | 0 | 0 |
NSP4 | B*40:02 | 309GEYSHVVAF317 | 0.98 | 44 | 44 | 33 |
NSP4 | B*35:01 | 373VPFWITIAY381 | 0.98 | 33 | 44 | 00 |
NSP4 | B*35:01 | 174NVLEGSVAY182 | 0.97 | 11 | 11 | 44 |
NSP5 | B*35:01 | 93 TANPKTPKY 101 | 0.95 | 66 | 66 | 44 |
NSP6 | B*35:01 | 72 LPSLATVAY 80 | 0.99 | 44 | 44 | 33 |
NSP7 | B*35:01 | 41 LAKDTTEAF 49 | 0.91 | 33 | 33 | 22 |
NSP8 | B*44:03 | 4 SEFSSLPSY 12 | 0.99 | 44 | 44 | 66 |
NSP8 | B*40:01 | 47 SEFDRDAAM 55 | 0.93 | 44 | 44 | 44 |
NSP9 | B*44:03 | 67 TELEPPCRF 75 | 0.95 | 66 | 66 | 66 |
NSP10 | B*35:01 | 19 FAVDAAKAY 27 | 0.98 | 55 | 55 | 55 |
NSP12 | B*44:02 | 608 VENPHLMGW 616 | 0.99 | 77 | 77 | 88 |
NSP12 | B*40:01 | 874 QEYADVFHL 882 | 0.98 | 44 | 44 | 44 |
NSP12 | B*35:01 | 337 VPFVVSTGY 345 | 0.97 | 88 | 88 | 55 |
NSP13 | B*35:01 | 291 FAIGLALYY 299 | 0.96 | 66 | 77 | 66 |
NSP13 | B*44:03 | 141 TEETFKLSY 149 | 0.96 | 77 | 77 | 55 |
NSP13 | B*40:01 | 155 REVLSDREL 163 | 0.95 | 55 | 55 | 44 |
NSP13 | B*40:02 | 161 RELHLSWEV 169 | 0.91 | 77 | 77 | 66 |
NSP14 | B*35:01 | 428 TPAFDKSAF 436 | 0.96 | 44 | 44 | 77 |
NSP14 | B*35:03 | 19 HPTQAPTHL 27 | 0.94 | 55 | 55 | 55 |
NSP15 | B*35:03 | 49 LPVNVAFEL 57 | 0.96 | 66 | 66 | 88 |
NSP15 | B*44:03 | 200 QEFKPRSQM 208 | 0.92 | 33 | 33 | 55 |
NSP16 | B*44:02 | 141 KENDSKEGF 149 | 0.94 | 55 | 55 | 66 |
10mer peptides with a peptide binding score of ≥ 0.90 |
Protein | HLA-B allele | Sequence | Sore | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | B*44:02 | 95TEKSNIIRGW104 | 0.95 | 10 | 0 | 10 |
Membrane | B*44:03 | 11EELKKLLEQW20 | 0.94 | 40 | 20 | 30 |
NSP2 | B*44:03 | 489KEIKESVQTF498 | 0.95 | 0 | 0 | 10 |
NSP3 | B*44:03 | 120EEEFEPSTQY129 | 0.98 | 20 | 10 | 0 |
NSP3 | B*44:03 | 1072TEIDPKLDNY1081 | 0.96 | 40 | 30 | 10 |
NSP3 | B*35:01 | 502VPTDNYITTY511 | 0.94 | 0 | 0 | 0 |
NSP3 | B*44:03 | 94GEFKLASHMY103 | 0.93 | 50 | 50 | 0 |
NSP7 | B*40:01 | 46TEAFEKMVSL55 | 0.91 | 50 | 40 | 20 |
NSP8 | B*44:03 | 3ASEFSSLPSY12 | 0.94 | 40 | 40 | 60 |
NSP10 | B*40:01 | 5TEVPANSTVL14 | 0.92 | 50 | 60 | 50 |
NSP12 | B*44:03 | 875QEYADVFHLY884 | 0.99 | 50 | 50 | 40 |
NSP12 | B*44:03 | 166VENPDILRVY175 | 0.95 | 80 | 70 | 80 |
NSP12 | B*44:03 | 608DVENPHLMGW617 | 0.90 | 80 | 70 | 80 |
NSP13 | B*40:01 | 446AEIVDTVSAL455 | 0.96 | 90 | 80 | 80 |
NSP14 | B*44:02 | 77EEAIRHVRAW86 | 0.96 | 70 | 60 | 60 |
NSP14 | B*35:01 | 42IPGIPKDMTY51 | 0.92 | 20 | 20 | 30 |
NSP15 | B*35:01 | 269IPMDSTVKNY278 | 0.96 | 30 | 30 | 10 |
NSP15 | B*40:01 | 40VELFENKTTL49 | 0.95 | 50 | 30 | 60 |
17/38 of these epitopes were predicted to be restricted through HLA-B*35 11/38 through HLA-B*40 and 10/38 through HLA-B*44. While most of these peptides showed < 45% homology with OC43, NL63 and HKU-1, 141 TEETFKLSY149 and 608VENPHLMGW 616 restricted through HLA-B*44, 337 VPFVVSTGY 345 restricted through HLA-B*35 and 161 RELHLSWEV 169 restricted through HLA-B*40 showed a homology of > 75% with OC43 and HKU-1, which are two other beta coronaviruses.
Only 18 10mers peptides were predicted to have a score of ≥ 0.90 that were restricted through HLA-B (Table 2), 4/18 were identified NSP3. 10mer peptides with high binding scores were not predicted from envelope, nucleocapsid, NSP1, NSP4, NSP5, NSP6, NSP9, NSP11, and NSP16 9 proteins. 3/18 of these predicted 10mers were shown to be restricted through HLA-B*35, 4/18 through HLA-B*40, 11/18 through HLA-B*44. Therefore, there may be a higher probability 10mers to bind to HLA-B*44 alleles.
Identification of CD8 + T cell epitopes of SARS-CoV2, which show ≥ 75% homology with OC43, HKU1, and NL63
After identification of peptides that had high predicted values to be restricted through the common HLA-A and B alleles present in the Sri Lankan population, we proceeded to identify the regions of the SARS-CoV2 virus, which had a > 75% homology with the HCoVs. We then proceeded to identify CD8 + T cell epitopes within these regions, which were candidates to be restricted through these HLA alleles. This was to determine if we could identify CD8 + T cell epitopes of the SARS-CoV2, which were likely to cross-react with the other HCoVs. None of the predicted CD8 + 8mer epitopes identified within the SARS-CoV2 virus gave a high binding score and therefore, only predicted 9mer and 10mer CD8 + T cell epitopes of the SARS-CoV2 virus were analyzed for the degree of homology with OC43, HKU1, and NL63.
Epitopes which were identified to have a ≥ 75% homology with more than two HCoV viruses are shown in Table 3. Twenty-four 9mer epitopes and 17 10mer peptides identified within the SARS-CoV2 virus had ≥ 75% homology with ≥ 2 HCoV viruses. Of the 9mer peptides, 17/24 epitopes gave a peptide binding score of ≥ 0.90. 11/24 of these CD8 + T cell epitopes within these cross-reactive regions were predicted to be restricted through HLA-A and 6/24 were predicted to be restricted through HLA-B alleles. Six highly cross-reactive CD8 + T cell epitopes (9mers) with high HLA-A (A*201 and A*206) binding scores were identified from the NSP12 and NSP13 (334FVDGVPFVV342). Two of these peptides showed 100% homology with OC43 and HKU1. The Envelope, nucleocapsid proteins and the other non-structural proteins (apart from NSP12 and NSP13) did not have regions with > 75% homology with the other HCoVs. The alignment of SARS-CoV2 NSP12 and NSP13, in which most of the cross-reactive epitopes were identified from and their position is shown in supplementary Figs. 1 and 2.
Table 3
Predicted CD8 + epitopes of the SARS-CoV2 virus, which show ≥ 75% homology with OC43, HKU1, and NL63
9mer peptides with ≥ 75% homology with OC43, HKU1, and NL63 |
Protein | HLA Allele | Sequence | Score | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | A*24:02 | 1208QYIKWPWYI1216 | 0.95 | 77 | 77 | 77 |
Membrane | A*24:02 | 95YFIASFRLF103 | 0.91 | 77 | 77 | 77 |
NSP5 | A*02:06 | 159FVYMHQLEL167 | 0.76 | 100 | 88 | 66 |
NSP5 | B*35:01 | 95 NPKTPKYKF 103 | 0.47 | 77 | 77 | 55 |
NSP10 | B*35:03 | 36QPITNCVKM44 | 0.74 | 88 | 88 | 66 |
NSP12 | A*02:01 | 123TMADLVYAL131 | 0.93 | 77 | 77 | 77 |
NSP12 | A*02:06 | 334FVDGVPFVV342 | 0.93 | 100 | 100 | 88 |
NSP12 | A*02:01 | 854LMIERFVSL862 | 0.91 | 88 | 88 | 55 |
NSP12 | A*02:01 | 334FVDGVPFVV342 | 0.90 | 100 | 100 | 77 |
NSP12 | B*44:03 | 608 VENPHLMGW 616 | 0.99 | 77 | 77 | 88 |
NSP12 | B*35:01 | 337 VPFVVSTGY 345 | 0.97 | 88 | 88 | 55 |
NSP13 | A*02:01 | 239TLVPQEHYV247 | 0.95 | 77 | 77 | 44 |
NSP13 | A*24:02 | 397VYIGDPAQL405 | 0.91 | 100 | 100 | 77 |
NSP13 | A*02:06 | 239TLVPQEHYV247 | 0.91 | 77 | 77 | 44 |
NSP13 | B*35:01 | 291 FAIGLALYY 299 | 0.96 | 77 | 77 | 66 |
NSP13 | B*44:03 | 141 TEETFKLSY 149 | 0.96 | 77 | 77 | 55 |
NSP13 | B*40:02 | 161 RELHLSWEV 169 | 0.91 | 77 | 77 | 66 |
NSP14 | A*02:01 | 176NLSDRVVFV184 | 0.93 | 77 | 66 | 77 |
NSP14 | B*35:01 | 509 WVYKQFDTY 517 | 0.65 | 77 | 77 | 55 |
NSP15 | A*02:01 | 297LLLDDFVEI305 | 0.95 | 66 | 88 | 88 |
NSP15 | B*35:03 | 49 LPVNVAFEL 57 | 0.96 | 77 | 66 | 88 |
NSP16 | A*02:01 | 53YLNTLTLAV61 | 0.89 | 88 | 88 | 55 |
NSP16 | A*24:02 | 46KYTQLCQYL54 | 0.82 | 100 | 100 | 100 |
NSP16 | A*33:03 | 247MSKFPLKLR255 | 0.78 | 77 | 77 | 44 |
10mer peptides with ≥ 75% homology with OC43, HKU1, and NL63 |
Protein | HLA Allele | Sequence | Sore | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
NSP12 | A*02:06 | 332KIFVDGVPFV341 | 0.88 | 90 | 90 | 70 |
NSP12 | A*02:01 | 332KIFVDGVPFV341 | 0.88 | 90 | 90 | 70 |
NSP13 | A*24:02 | 216TYKLNVGDYF225 | 0.77 | 80 | 70 | 50 |
NSP13 | A*02:01 | 40KLVLSVNPYV49 | 0.58 | 80 | 80 | 50 |
NSP13 | A*33:03 | 381NYDLSVVNAR390 | 0.54 | 80 | 80 | 80 |
NSP13 | A*33:03 | 551ETAHSCNVNR560 | 0.52 | 90 | 90 | 80 |
NSP13 | B*40:01 | 446AEIVDTVSAL455 | 0.96 | 90 | 80 | 80 |
NSP13 | B*40:02 | 446AEIVDTVSAL455 | 0.87 | 90 | 80 | 80 |
NSP14 | A*33:03 | 516TYNLWNTFTR525 | 0.58 | 80 | 80 | 50 |
NSP14 | A*24:02 | 510VYKQFDTYNL519 | 0.55 | 80 | 80 | 60 |
NSP15 | A*33:03 | 52NVAFELWAKR61 | 0.66 | 80 | 60 | 90 |
NSP15 | A*02:06 | 243SQLGGLHLLI252 | 0.52 | 70 | 70 | 80 |
NSP16 | A*24:02 | 241SYSLFDMSKF250 | 0.75 | 80 | 80 | 50 |
NSP16 | A*33:03 | 246DMSKFPLKLR255 | 0.59 | 80 | 70 | 50 |
NSP16 | A*24:02 | 221GYVMHANYIF230 | 0.55 | 80 | 80 | 70 |
NSP16 | A*02:01 | 243SLFDMSKFPL252 | 0.47 | 90 | 80 | 50 |
Of the 10mer peptides analyzed, 17 were identified within SARS-CoV2 to have ≥ 75% homology with ≥ 2 HCoVs (Table 3). Only one 10mer peptide identified within NSP-13 (446AEIVDTVSAL455) gave a score of ≥ 0.90 score. 14/17 of these 10mer CD8 + T cell epitopes were predicted to be restricted through HLA-A and 3/17 were predicted to be restricted through HLA-B alleles. As with the 9mer peptides, 6 of the 10mer peptides, which were highly homologous with OC43, HKU1 and NL63 were found within the NSP12 and NSP13 region.
Identification of CD8 + T cell epitopes of SARS-CoV2, which show ≤ 25% homology with OC43, HKU1, and NL63
After identification of highly cross reactive CD8 + T cell epitopes within the SARS-CoV2, we proceeded to identify regions, which were specific to the virus and did not cross react with other HCoV2, and therefore, are likely to be SARS-CoV2 specific CD8 + T cell epitopes. 9mer peptides of the representing different regions of the SARS-CoV2 virus, which have ≤ 25% homology with > two HCoV viruses were analysed and 39 such potential CD8 + T cell epitopes were identified. (Table 4). 25/39 9mer peptides gave a binding score of ≥ 0.90. 19/39 of these CD8 + T cell epitopes were predicted to be restricted through HLA-A alleles and 20/39 predicted to be restricted through HLA- B alleles. A region within the spike protein (686VASQSIIAY 694) had no homology with the other HCoVs but had a high binding score of > 0.95 to HLA-B*3501 and two other 9mer peptides within the nucleocapsid (325TPSGTWLTY333 and 322MEVTPSGTW 330) had < 22% homology and were predicted to be restricted through HLA-B*3501 and HLA-B*4403. Three other CD8 + T cell epitopes within NSP2, NSP3 and NSP6, which had high binding scores but had 0% homology were also identified.
Table 4
Predicted CD8 + epitopes of the SARS-CoV2 virus, which show ≤ 25% homology with OC43, HKU1, and NL63
9mer peptides with ≤ 25% homology with OC43, HKU1, and NL63 |
Protein | HLA allele | Sequence | Score | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | A*24:02 | 635VYSTGSNVF643 | 0.93 | 22 | 22 | 22 |
Spike | A*02:01 | 109TLDSKTQSL117 | 0.91 | 22 | 22 | 33 |
Spike | B*35:01 | 83LPFNDGVYF 91 | 0.98 | 22 | 33 | 11 |
Spike | B*35:01 | 686VASQSIIAY 694 | 0.98 | 0 | 0 | 0 |
Spike | B*40:01 | 1015AEIRASANL1023 | 0.98 | 22 | 22 | 44 |
Membrane | A*33:03 | 138LVIGAVILR146 | 0.72 | 22 | 22 | 22 |
Membrane | B*40:01 | 136SELVIGAVI 144 | 0.73 | 11 | 11 | 11 |
Envelope | A*33:03 | 61RVKNLNSSR69 | 0.61 | 11 | 11 | 0 |
Envelope | A*33:03 | 30TLAILTALR38 | 0.60 | 22 | 22 | 11 |
Envelope | B*40:01 | 6SEETGTLIV 14 | 0.55 | 11 | 22 | 11 |
Envelope | B*35:03 | 4 FVSEETGTL12 | 0.43 | 11 | 11 | 11 |
Nucleocapsid | B*35:01 | 325TPSGTWLTY333 | 0.99 | 22 | 22 | 22 |
Nucleocapsid | B*44:03 | 322MEVTPSGTW 330 | 0.96 | 11 | 11 | 0 |
NSP1 | A*24:02 | 135SYGADLKSF143 | 0.89 | 0 | 0 | 0 |
NSP1 | A*02:01 | 84VMVELVAEL92 | 0.85 | 22 | 22 | 0 |
NSP1 | B*40:02 | 56VEKGVLPQL64 | 0.98 | 22 | 22 | 11 |
NSP1 | B*35:01 | 110HVGEIPVAY118 | 0.95 | 22 | 11 | 11 |
NSP1 | B*35:01 | 89VAELEGIQY97 | 0.63 | 11 | 0 | 11 |
NSP2 | A*24:02 | 497TFFKLVNKF505 | 0.90 | 33 | 22 | 22 |
NSP2 | A*02:06 | 420YITGGVVQL428 | 0.86 | 22 | 22 | 22 |
NSP2 | A*02:06 | 439TVYEKLKPV447 | 0.83 | 11 | 11 | 11 |
NSP2 | B*40:01 | 195SEVGPEHSL203 | 0.99 | 11 | 11 | 0 |
NSP2 | B*40:01 | 562GETLPTEVL570 | 0.99 | 0 | 0 | 0 |
NSP2 | B*44:03 | 52REHEHEIAW60 | 0.98 | 22 | 22 | 22 |
NSP3 | A*24:02 | 726YYTSNPTTF734 | 0.99 | 22 | 22 | 22 |
NSP3 | A*24:02 | 1349NYMPYFFTL1357 | 0.98 | 33 | 22 | 11 |
NSP3 | A*24:02 | 816YYHTTDPSF824 | 0.96 | 11 | 11 | 0 |
NSP3 | B*44:03 | 120EEFEPSTQY128 | 0.99 | 11 | 11 | 0 |
NSP3 | B*44:03 | 546QEILGTVSW554 | 0.99 | 22 | 22 | 0 |
NSP3 | B*40:01 | 1799AELAKNVSL1807 | 0.98 | 0 | 0 | 0 |
NSP4 | A*24:02 | 351FYLTNDVSF359 | 0.92 | 22 | 22 | 22 |
NSP4 | B*35:01 | 174NVLEGSVAY182 | 0.97 | 11 | 11 | 44 |
NSP6 | A*02:01 | 141TLMNVLTLV149 | 0.92 | 0 | 0 | 0 |
NSP6 | A*24:02 | 84VYMPASWVM92 | 0.91 | 0 | 0 | 11 |
NSP6 | A*24:02 | 115MYASAVVLL123 | 0.90 | 22 | 22 | 22 |
NSP6 | B*35:01 | 156 NALDQAISM 164 | 0.88 | 22 | 22 | 11 |
NSP6 | B*35:01 | 167 LIISVTSNY 175 | 0.56 | 11 | 11 | 22 |
NSP14 | A*02:01 | 321LLADKFPVL329 | 0.94 | 11 | 11 | 33 |
NSP15 | A*02:06 | 181KVDGVVQQL189 | 0.92 | 22 | 22 | 11 |
10mer peptides with ≤ 25% homology with OC43, HKU1, and NL63 |
Protein | HLA Allele | Sequence | Sore | OC43 % identity with SARS-CoV2 | HKU1 % identity with SARS-CoV2 | NL63 % identity with SARS-CoV2 |
Spike | A*24:02 | 368LYNSASFSTF377 | 0.89 | 20 | 20 | 30 |
Spike | A*24:02 | 788IYKTPPIKDF797 | 0.88 | 10 | 20 | 10 |
Spike | B*44:02 | 95TEKSNIIRGW104 | 0.95 | 10 | 0 | 10 |
Spike | B*35:01 | 229LPIGINITRF238 | 0.82 | 10 | 20 | 20 |
Membrane | A*33:03 | 137ELVIGAVILR146 | 0.65 | 20 | 20 | 20 |
Membrane | A*33:03 | 177SYYKLGASQR186 | 0.55 | 40 | 20 | 20 |
Membrane | B*40:01 | 136SELVIGAVIL145 | 0.76 | 10 | 10 | 20 |
Envelope | A*33:03 | 60SRVKNLNSSR69 | 0.48 | 10 | 30 | 0 |
Envelope | A*33:03 | 29VTLAILTALR38 | 0.41 | 30 | 20 | 0 |
Nucleocapsid | A*33:03 | 140NTPKDHIGTR149 | 0.60 | 20 | 10 | 20 |
Nucleocapsid | A*02:01 | 398ADLDDFSKQL407 | 0.44 | 10 | 30 | 0 |
Nucleocapsid | B*44:03 | 321GMEVTPSGTW330 | 0.87 | 0 | 0 | 0 |
Nucleocapsid | B*35:01 | 324VTPSGTWLTY333 | 0.68 | 20 | 20 | 20 |
Nucleocapsid | B*40:01 | 322MEVTPSGTWL331 | 0.67 | 10 | 10 | 0 |
NSP1 | A*33:03 | 162NTKHSSGVTR171 | 0.74 | 0 | 0 | 0 |
NSP1 | A*33:03 | 68YVFIKRSDAR77 | 0.49 | 30 | 20 | 10 |
NSP1 | A*02:06 | 14VQLSLPVLQV23 | 0.45 | 20 | 10 | 10 |
NSP1 | A*33:03 | 15QLSLPVLQVR24 | 0.40 | 10 | 10 | 10 |
NSP1 | B*35:01 | 61LPQLEQPYVF70 | 0.81 | 20 | 20 | 30 |
NSP1 | B*40:02 | 112GEIPVAYRKV121 | 0.54 | 30 | 20 | 20 |
NSP1 | B*40:02 | 9NEKTHVQLSL18 | 0.49 | 20 | 20 | 20 |
NSP1 | B*35:03 | 61LPQLEQPYVF70 | 0.43 | 20 | 20 | 30 |
NSP1 | B*35:03 | 18LPVLQVRDVL27 | 0.41 | 20 | 20 | 10 |
NSP2 | A*02:01 | 389ILDGISQYSL398 | 0.65 | 0 | 0 | 10 |
NSP2 | A*33:03 | 529FVTHSKGLYR538 | 0.61 | 0 | 0 | 30 |
NSP2 | A*02:01 | 288KLNEEIAIIL297 | 0.59 | 10 | 10 | 30 |
NSP2 | A*24:02 | 1AYTRYVDNNF10 | 0.59 | 30 | 20 | 20 |
NSP2 | B*44:03 | 489KEIKESVQTF498 | 0.95 | 0 | 0 | 10 |
NSP2 | B*44:03 | 452EEKFKEGVEF461 | 0.89 | 10 | 10 | 30 |
NSP2 | B*40:01 | 344GEQKSILSPL353 | 0.85 | 10 | 10 | 20 |
NSP3 | A*24:02 | 16QGYKSVNITF25 | 0.79 | 10 | 20 | 20 |
NSP3 | A*24:02 | 1040EYKGPITDVF1049 | 0.75 | 20 | 20 | 0 |
NSP3 | B*44:03 | 120EEEFEPSTQY129 | 0.98 | 20 | 10 | 0 |
NSP3 | B*35:01 | 502VPTDNYITTY511 | 0.94 | 0 | 0 | 0 |
NSP4 | A*02:06 | 101FVVPGLPGTI110 | 0.65 | 20 | 10 | 50 |
NSP4 | B*40:01 | 97REVGFVVPGL106 | 0.73 | 20 | 10 | 50 |
NSP5 | A*24:02 | 125VYQCAMRPNF134 | 0.50 | 20 | 20 | 40 |
NSP6 | A*33:03 | 84VYMPASWVMR93 | 0.53 | 0 | 0 | 10 |
NSP6 | B*35:01 | 43LPFAMGIIAM52 | 0.71 | 0 | 0 | 10 |
NSP7 | B*44:03 | 73EEMLDNRATL82 | 0.58 | 10 | 10 | 40 |
NSP8 | A*02:01 | 151SALWEIQQVV160 | 0.68 | 20 | 30 | 20 |
NSP14 | B*35:01 | 42IPGIPKDMTY51 | 0.92 | 20 | 20 | 30 |
NSP15 | A*33:03 | 215ELAMDEFIER224 | 0.57 | 40 | 20 | 20 |
NSP15 | B*44:03 | 169GEAVKTQFNY178 | 0.74 | 10 | 10 | 10 |
We identified 44 10mers as CD8 + T cell epitopes, which had ≤ 25% homology with > two HCoV viruses (Table 4). 5/44 epitopes gave a score of ≥ 0.90. 22/44 of these CD8 + T cell epitopes were predicted to be restricted through HLA-A and 22/44 were predicated to be restricted through HLA-B alleles. Again, the peptides which had the highest binding scores and least percentage identify were predicted to be restricted through HLA-B*3501 and HLA-B*4403. The highest binders, which were specific to SARS-CoV2 were identified within the spike protein (95TEKSNIIRGW104), NSP2 (489KEIKESVQTF498) and NSP3 (120EEEFEPSTQY129 and 502VPTDNYITTY511).
Conservational analysis of the candidate CD8 + T cell epitopes with binding scores of ≥ 0.90
We proceeded to investigate if the 18 candidate CD8 + T cell epitopes which had a percentage identity of > 75% with other HCoVs and the 31 SARS-CoV2 specific (< 25% percentage identity) were conserved within the SARS-CoV2. We found that these candidate epitopes were highly conserved (supplementary table 2 and 3) and these regions were highly conserved within the new SARS-CoV2 variants as well (supplementary Figs. 3 and 4).
Similarity of candidate peptides with published CD8 + T cell SARS-CoV2 epitopes
Several CD8 + T cell epitopes that are restricted through different HLA-A and B alleles have been published[10, 14–17]. We proceeded to find out if any of the candidate CD8 + T cell epitopes were already identified in patients who were naturally infected with the SARS-CoV2 virus. We found that 20/31 candidate highly conserved T cell epitopes which were found to be specific to the SARS-CoV2 (< 25% homology with other HCoVs) had been identified in infected individuals (supplementary table 4). In our HLA allele prediction analysis using the dominant HLA alleles in Sri Lanka, although some of the epitopes were predicted to be restricted though HLA-B*3501 and HLA-B*4403, some of these epitopes were found to be restricted through HLA-A*0201, A*1101 and HLA-A*0301.
7/18 of the candidate T cell epitopes which were found to be cross reactive (> 75% homology with the other HCoV2s) were also identified from those who were naturally infected. For the candidate CD8 + T cell epitopes that were found to be cross reactive with other HCoV2, the predicted HLA allele by us and the HLA allele restriction identified following natural infection were similar in 4/7 epitopes (supplementary table 5).