Protein antigen selection
Firstly, protein sequences have aligned with conserved and selected regions. Then, chosen proteins compared with each other in terms of antigenicity, allergenicity, and toxicity. Finally, five chosen proteins had high levels of antigenicity and were non-allergen (see Table 1).
Linear B-cell and T-cell epitope prediction
To predict the linear B-cell and T-cell epitopes IEDB, and BepiPred2.0 servers have used, and similar epitopes were selected (Table 2). Briefly, MHC I and MHC II-restricted epitopes were predicted and ranked according to the IEDB scores (Table 3). Some of the B-cell epitopes have derived from T-cell epitopes to reduce the final length of the sequence. Finally, epitopes with the highest antigenicity score and lack of allergenicity and toxicity were selected to construct the peptide vaccine.
Peptide designing and adjuvating
For both B-cell and T-cell, thirteen epitopes were selected and then linked together using lysine-lysine (KK) linkers. To increase the immunogenicity of the designed vaccine, an adjuvant sequence was added to the peptide sequence. Various peptide adjuvants were used in previous studies [38]. The amino acid sequence of cholera toxin B subunit (CTXB) that is the non-toxic portion of cholera toxin was added into the initial part of the peptide and connected to the epitopes by PAPAP rigid linker (Fig. 2).
Physicochemical properties of constructed peptide
Physicochemical properties and amino acid composition of the constructed peptide evaluated using Pepcalc and Protparam servers. The results showed that the proposed vaccine was stable, water-soluble, with a molecular weight of 39.4 kDa. The calculated pI value was 9.6, the net charge was 17.3, and the estimated half-life of the protein in mammalian, yeast, and E. coli cells were 30, 20, and 10 hours, respectively (Fig. 3A). Also, the analysis of protein sequence stability was performed through predicting the protein disorder regions (by using Iupred 2A server), and the results confirmed the stability of the designed peptide (Figs 3B and 3C). The Iupred 2A server presents three types of analysis; IUPred2 long disorder, IUPred2 short disorder, and IUPred2 structured domains. In this study, the IUPred2 long disorder mode has selected for investigation. Protein disorders of the proposed vaccine were predicted by Iupred 2.0 designed graph. Due to the designed graph that showed in Fig. 3C, the sequence of the proposed vaccine has not a great chance to be as an anchor for binding to the registered structures in the IUPred2A server. In the presented graph, residues with a score above and below 0.5 are considered as protein disorders and protein orders, respectively.
Secondary and tertiary structure of the constructed peptide
The prediction of the secondary structure of the proposed vaccine performed using the GOR 4 method of the Prabi server (available at https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_gor4.html). As Fig. 4 shows, the secondary structure of the proposed vaccine contains alpha-helix (28.57%), extended strand (19.76%), and random coil (51.67 %). Using the I-TASSER server predicted the tertiary structure of the peptide. This server suggests various models for the input sequence and the quality of prediction models reflected on the form of c-scores (-5 to 2). The higher values of the C-score relate to the higher confidence levels for the predicted model (Fig. 5).
Model refinement and molecular docking of the vaccine candidate peptide
The refinement process has performed for the predicted model using the GalaxyWeb server. In this process, the server refined the secondary structure elements like; loop regions and side chains based on several factors containing similarity score (GDT-HA), clash score, RMSD, and MolProbity. The global distance test (GDT_TS) is a measure of similarity between two protein structures with known amino acid correspondences with a different tertiary structure. The GDT-HA is a high accuracy version of GDT_TS which selects smaller cut-off distances that were half of the size of GDT_TS and thus is more rigorous. The root-mean-square deviation (RMSD) is the average distance between backbone atoms in the protein structure. The MolProbity score reflects the crystallographic resolution. A structure with a numerically lower MolProbity score than its actual crystallographic resolution is, quality-wise, better than the average structure at that resolution. This Table presented the GalaxyWeb server as a result of the refinement process for the 3D predicted model of the proposed vaccine. Table 4 shows the five suggested refined models. The first presented model with GDT-HA and RMSD of 0.8951 and 0.542 selected for further considerations. However, the clash score for the chosen model was 33.6, reported score for the initial model was about 24.1. The Rama favored scores was the other score that indicates the percentage of the residues in the most favored regions of the Ramachandran plot. This score changed from 64.2 to 84.4, simultaneously for the refined model in comparison to the initial model.
Moreover, the geometric quality of the refined model has evaluated using the Ramachandran plot by the PROCHECK server. The quality of the predicted model was investigated before refinement and after the refinement process. Fig. 6 A shows the main Ramachandran plot for the 3D model before the refining process. Also, the Ramachandran plot for the refined model has shown in Fig. 6 B. The Ramachandran plot results of the initial structure of the designed vaccine included 59.1% in most favored regions while these proportions in the refined model were 78.7% that confirmed the refining process.
Furthermore, to validate the binding affinity and interaction of the peptide vaccine with TLR3, TLR4, MHC I, and MHC II molecules, the molecular docking process has been performed in the antibody-mode of the Cluspro 2.0 server. Results of docking scores in the antibody-mode included the lowest energy of -434.3, -515.1, -602.1, and -567.3 Kcal.mol-1 for MHC I, MHC II, TLR3, and TLR4, respectively (Table 5). Also, this server predicted a 3D structure for the docked molecules models (Fig. 7).
Back translation, codon optimization, and in silico cloning the candidate protein
While the bioinformatic and biochemical analysis was done on the constructed peptide vaccine, the amino acid sequence should be back-translated into nucleotide sequence and then inserted into an expression vector for expression in the bacterial system or other expression systems. For this goal, at first, the final amino acid sequence of peptide vaccine converted to nucleotide sequence using Snap-Gene 3.2.1 offline software. The nucleotide sequence was then optimized for codon usage in E. coli using the jcat server. For the next step, the restriction enzyme recognition site and polyhistidine tag added to the optimized nucleotide sequence. Also, the ORF frames for considering correct protein expression performed using the Snapp-Gene 3.2.1 software. Finally, the cloning of the nucleotide sequence into the PET 21 expression vector has simulated by using Snapp-Gene 3.2.1 offline software (Fig. 8).