Collection of ASFV protein sequences
The most lethal ASFV type, Georgia 2007/1 (GenBank: FR682468), was used as the target virus for screening. The protein sequences of Georgia 2007/1 were obtained from the NCBI, including CD2v (EP402R), p30 (CP204L), p54 (E183L), p72 (B646L), and pp220 (CP2475L). All sequences were cut into fragments using a sliding window. Finally, two datasets of ASFV proteins consisting of 9mer (Figure S1A) and 15mer (Figure S1B) fragments served as candidates for CTL and B cell epitopes, respectively.
Collection of validation datasets
To obtain the best parameter settings for PFAS, this work established two datasets from the IEDB, consisting of experimentally validated CTL and B cell epitopes of swine. The CTL epitopes (n = 243) were annotated as Sus scrofa, infectious diseases, and Swine Leukocyte Antigen (SLA) class II. After simultaneously removing duplicate and uncertain sequences belonging to both positive and negative groups, the dataset contained 125 swine 9mer CTL fragments, including 37 epitopes and 88 non-epitopes.
Similarly, 1,700 validated swine B cell epitopes (BCEs) which were annotated as Sus scrofa and infectious disease were retrieved. Because IgG production is part of the secondary humoral immune response to an antigen, we extracted 1,389 IgG epitopes. Among them, the 15mer epitope was the largest in the dataset, followed by the 12mer epitope. Therefore, we established two datasets: 1) 650 B cell 15mer epitopes, including 116 positive and 534 negative epitopes, and 2) 293 swine B cell 12mer epitopes, including 35 positive and 258 negative epitopes.
Proposed method PFAS
Figure 2 shows a flowchart of the proposed method PFAS. Five protein sequences from Georgia 2007/1 were obtained and cut into 9mer and 15mer fragments. The CTL epitope predictor estimates T cell activation of 9mer fragments in the four stages and averages the four scores to obtain a T cell immunogenicity score. Similarly, the BCE predictor estimates B cell activation of 15mer fragments to obtain a B cell immunogenicity score. After normalizing these two scores into the range of [0, 1], the two fragments were superimposed by the central amino acid. Consequently, the fragments were extended, and thus conserved sequences were obtained. The Pareto front method produced ranks of the conserved fragments. The top-ranked fragments were considered as promising epitopes.
Calculation of T and B cell scores
Good CTL epitopes are involved in viral processing and antigen presentation, with major histocompatibility complex (MHC) I molecules playing a major role. First, pathogen debris is degraded by proteasomal degradation in the cytosol of productively infected cells. NetCTL is based on the NetChop method and predicts the probability of proteasomal cleavage (Larsen et al. 2007). Second, peptides are transported to the endoplasmic reticulum (ER) by a transporter associated with antigen processing (TAP). To predict TAP transport efficiency, NetCTL and MHC I Processing in the IEDB use the stabilized matrix method, and TAPPred is based on a support vector machine (SVM) with 33 physical features of amino acids (Bhasin and Raghava 2004). Third, an antigen is loaded onto MHC I and appears on the cell surface through vesicles. NetMHCpan (Reynisson et al. 2020), MHC I Processing, and NetCTL are the most widely used ANN-based methods to predict MHC I binding affinity using the BLOSUM50 matrix. Finally, the epitope stimulates CTL activation and differentiation. MHC I immunogenicity in the IEDB (Calis et al. 2013) is based on an immunogenicity score model to predict immunogenicity. In general, these four predictive roles are equally important.
To identify promising T cell epitopes (TCEs), five web predictors were used, including NetCTL (https://services.healthtech.dtu.dk/service.php?NetCTL-1.2), IEDB MHC I Processing (http://tools.iedb.org/processing/), TAPPred (https://webs.iiitd.edu.in/raghava/tappred/index.html), NetMHCpan (https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.0), and IEDB MHC I Immunogenicity. NetCTL was used to predict proteasome processing, TAP transport efficiency, and MHC I binding affinity. To examine conserved epitope candidates that cover multiple MHC loci, including A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58 and B62, we used sequences as inputs and applied ensemble learning with 12 supertype models. After averaging all predictive values in the 12 models, we obtained three estimated values: binding affinity, proteasome cleavage, and the TAP score. The IEDB MHC I Processing tool was used to estimate TAP transport efficiency and MHC I binding affinity. We used all 45 SLA I alleles (including 12 SLA1, 16 SLA2, 12 SLA3, and 5 SLA6) and set nine as the peptide length for each allele to obtain a file with the average predictive values, including the TAP and MHC scores in all sequence fragments. PFAS used TAPPred to predict the peptide-TAP transporter binding affinity based on SVM with validated sequences and obtained the prediction score. NetMHCpan was used to predict the binding affinity of peptide-MHC I. To obtain effective epitopes, we considered all 75 SLA alleles (including 23 SLA1, 26 SLA2, 21 SLA3, and five SLA6) and set nine as the peptide length for each allele. The binding affinity scores were estimated with mean scores for all fragments. This work used IEDB MHC I Immunogenicity to predict CTL immunogenicity considering all CTL active factors and obtained scores of all sequence fragments.
BCEs can induce the differentiation of naïve and memory B cells into plasma cells, including antigen processing, peptide-MHC II presentation, and cytokine promotion. In studies on BCE presentation, LBtope (Singh et al. 2013), iBCE-EL (Manavalan et al. 2018), IgPred (Gupta et al. 2013), and ABCpred (Saha and Raghava 2006) are sequence-based predictors. LBtope uses the sparse matrix and amino acid property profile features and is an SVM-based Weka Classifier using 38,197 IEDB experimental epitopes. iBCE-EL is based on ensemble learning using amino acid composition characteristics and proportions of 5,550 experimentally validated BCEs. IgPred uses 14,725 BCEs in different types of specific epitopes using physicochemical properties (PCPs) features and is based on Weka Classifiers. ABCpred is based on PCP features and the neural network method with a balanced BCE database. Among the aforementioned predictors, LBtope uses the largest dataset with ensemble learning.
To estimate the B cell immunogenicity score of 15mer and 12mer fragments, five online predictors (LBtope_Variable, LBtope_Confirm, iBCE-EL, IgPred, and ABCpred) were utilized and validated. Epitope probabilities and IgG scores were determined using the iBCE-EL and IgPred prediction tools, respectively. LBtope is based on multiple peptides from prediction models using two variable-length epitope models. The LBtope_Variable model was trained using 38,197 peptides. The LBtope_Confirm model was reported in at least two studies and contained 2,837 peptides. By submitting multiple fragments, the probability of epitopes was obtained along with the physical property score. As ABCpred exclusively accepts an even number of epitope lengths and continuous amino acid sequences as submissions, PFAS used only one 12mer dataset with parameters containing a threshold of zero and an overlapping filter to obtain the predicted scores.
Immunogenicity prediction of T and B cell fragments
The CTL activation prediction has four important stages: proteasomal cleavage probability, TAP transport efficiency, MHC I binding affinity, and CTL immunogenicity. These predictions help identify potential TCE candidates. PFAS combined all the prediction values obtained from the online prediction tools in the four stages. The probability of proteasomal cleavage was estimated using NetCTL1.2. The TAP transport efficiency score is the mean score of NetCTL1.2, IEDB MHC I Processing, and TAPPred values. The peptide-MHCI binding affinity score is the mean score of NetCTL1.2, IEDB MHC I Processing, and NetMHCpan predictive values. The CTL immunogenicity score is obtained using IEDB MHC I Immunogenicity. After combining and normalizing the scores of each category using a combination of weights, PFAS compiled four stage score for the TCE prediction.
For the BCE prediction, the best predictor was evaluated and used to obtain B cell immunogenicity scores. After compiling the results of the prediction values from the web tools, the output values were normalized into the range of [0, 1] and B cell immunogenicity scores were compiled.
Pareto rank of fragments
The Pareto front is the set of all efficient solutions to bi-objective problems. In this study, a fragment Frag belonging to the Pareto front means that no other fragment has both larger T and B cell scores than Frag. The T and B cell scores of all fragments which were represented by their central amino acids were used as inputs of the Pareto front method to determine the Pareto rank of fragments. The Pareto front method iteratively removes the Pareto fronts, and Pareto rank of the fragments was the serial number of the removed front. For instance, the segments belonging to the initial Pareto front have a rank one. After removing the Pareto front, the fragments belonging to the new Pareto front have a rank two, and so on.
Promising epitopes of the multi-epitope vaccine
This work extended the fragments to a length of 16–20 amino acids and obtained the epitope profiles with the average Pareto rank of the extended fragments. The average rank was defined as the sum of the Pareto ranks divided by the total number of fragments included in the extended fragments. Moreover, to select conserved epitopes, PFAS estimated protein variability using the Protein Variability Server (PVS) (Garcia-Boronat et al. 2008). PVS contains three methods, the Shannon entropy, the Simpson diversity index, and the Wu–Kabat variability coefficient method, which can be used as indicators of variability. In this study, the Shannon entropy greater than two was considered as the variability point. Accordingly, PFAS removed the variable fragments that contained highly variable sequences. Finally, in the 16mer to 20mer epitope profiles, PFAS ranked conserved fragments according to the average Pareto rank, and the top-ranked promising epitopes were provided to the biological decision makers for in vitro validation.