In order to assess the best Spike protein epitope-HLA matches, the sequence was analyzed looking for epitope predictions in the most frequent HLA-A and HLA-B alleles. The ten most immunogenic peptides with a higher affinity binding to its restricted HLA are shown in table 3.
Although the most immunogenic peptide from this list is GTHWFVTQR, the match with the highest affinity was between the peptide FIAGLIAIV and HLA-A*02:03. Of note, here we analyzed the most frequent class I A and B alleles, so this analysis not only reveals epitopes that can be used for vaccine development but also the HLA alleles that best present epitopes of this particular protein.
The best epitopes and HLA class II alleles were also predicted, as shown in table 4. The prediction tool for HLA class II uses a core of nine amino acids to predict the binding efficiency of peptides to the pocket of the molecules, even if this core is in the middle of different peptides of 15 aa. Of note, between this whole set of peptides only seven HLA molecules resulted with a high binding affinity: HLA-DPA1*01:03/DPB1*02:01, HLA- DPA1*02:01/DPB1*01:01, HLA-DPA1*03:01/DPB1*04:02, HLA-DQA1*05:01/DQB1*03:01, HLA-DRB1*01:01, HLA-DRB1*07:01, and HLA-DRB1*09:01.
To track down the specific location of the peptides in the SARS-CoV2 spike glycoprotein, the corresponding 3D model was obtained. In this model, different predicted epitopes were searched for (tables 2 and 3) in the protein structure considering its subunits and domains (figure 1). Notably, HLA class I peptides WTAGAAAYY, SANNCTFEY, and YLQPRTFLL (7, 8, and 9) are located in the A domain, which is highly conserved among other coronavirus species 24, suggesting that these could be also epitopes for other viruses. On the other hand, it was found that the class II epitopes FELLHAPAT, VVVLSFELL, FLVLLPLVS, VLSFELLHA, and FTISVTTEI (a, b, c, d, and h) and the HLA class I EVFNATRFA (4) are preferentially founded in the B domain.
The most probable junctions among the HLA-A*02:01 (table 3) against two of the most immunogenic SARS-CoV-2 epitopes (peptides FIAGLIAIV and YLQPRTFLL) were predicted. They were also compared with one well-known weak binding epitope for the HLA-A*02:01, the A2-ALW (ALWGPDDPAA), as a control.25
In order to perform a docking analysis, the last epitopes were re-analyzed using the NetMHC 4·0 server to obtain a prediction of the properties of each peptide-MHC-class I complex.34 The YLQPRTFLL epitope showed the strongest affinity and the highest binding percentage level for HLA-A*02:01, with values of 5·36 nM and 0·04% respectively. Conversely, the FIAGLIAIV epitope had less affinity and percentage of binding level (10·29 nM and 0·12% respectively), as well as the control peptide A2-ALW (117·15 nM and 1·20%).
Despite the differences in affinities between the two SARS-CoV-2 epitopes, both showed to fit the groove of HLA- A*02:01 maintaining their unfolded conformations. A contact map of the interface between the amino acids of HLA- A*02:01 and the epitopes was made, finding that each amino acid of FIAGLIAIV peptide was interacting at least with one amino acid of the HLA-A*02:01 groove (figure 2). Our data complemented the previous analysis of the epitopes and demonstrated that all of them have a great chance of being successfully presented by their respective HLA- restricted allele into a pMHC-TCR complex and trigger an immunological response against SARS-CoV-2. As the other HLA class I and II alleles structures are not found free of a pre-loaded peptide, they were not submitted to docking analysis.
HLA allele analysis and correlation with the country cumulative incidence
The population behavior of the HLA alleles that resulted in our prediction analysis was looked for considering the cumulative incidence of COVID infection in the most affected countries by March 31st, 2020. Table 5 shows the cumulative incidence per one million inhabitants on days 36, 57, 60, and 65 after the report of the first case in each country. Austria had the highest cumulative incidence on day 34 and 36 (976 and 1130 per one million inhabitants, respectively).
There was no association between any HLA allele frequencies and the cumulative incidence on days 34 and 36 (data not shown). On day 57, HLA-A*02:03 frequency was significantly associated with a lower cumulative incidence per one million inhabitants (R = -0·75, p value=0·08) and remained the same on day 60.
In addition, the frequency of HLA-A*31:01 was significantly associated with a lower cumulative incidence on day 57 (R = -0·69, p value=0·05), and on day 60 the association increased (R = -0·85, p value=0·01). Strikingly, HLA- A*30:02 frequency was significantly associated with a higher cumulative incidence on day 57 (R = 0·75, p value=0·05) and day 60 (R = 0·71, p value=0·11) (Figure 3).