Many studies have been released on the role of structural and accessory proteins in the pathogenesis of severe acute respiratory syndrome coronavirus (SARS-CoV) infections, yet a proper vaccine is still not available. The accessory proteins encoded by coronaviruses help the virus infect the host and enhance virus virulence [12]. Viruses mutate all the time. The mutation of COVID-19 varies across different parts of the world. A genetic tracking and network analysis can provide a better understanding of antigenic drift and improve the detection and the control of novel emerging strains [13].
ORF1a and ORF1b (ORFab) are SARS-CoV accessory proteins, known as the replicase/transcriptase genes; they are translated to proteins that are responsible for viral RNA replication and transcription, and they are important during viral pathogenesis [14, 15]. We have reported many mutations along the largest SARS-CoV exon (21555 bp). Evidence for alteration in the ORF1ab coding sequence during the coronaviruses epidemic indicates that the ORF1ab proteins play roles in virus pathogenesis in addition to viral replication [14]. Additionally, Ketteler revealed the presence of a frameshifting stimulation element and a conserved RNA sequence forming a stem-loop that allows ribosomal frameshifting, a mechanism in which open-reading frame 1b (orf1b) is expressed [16].
Several mutations were recorded in the S protein between 4 and 613 a.a. Similarly, Kim et al. [17] recorded four non-synonymous mutations in the MERS-CoV S gene from strains isolated in South Korea distributed from 137 to 629 a.a; the mutations were located at the site that does not interfere with the host receptor. Kleine-Weber et al. [18] reported that D510G and I529T mutations in RBD of the S protein resulted in a decrease in the binding affinity to DPP4 and reduced viral entry into target cells. In addition, these mutations increased resistance to antibody-mediated neutralization; however, none of these mutations were recorded in all sequences included in this study.
Orf3a is one of the accessory proteins of the SARS-CoV; it is the largest unique open reading frame of the virus genome, and it comprises three transmembrane domains [19]. The Orf3a gene encodes for protein 3a; it is expressed on the patient cell surface and can be easily detected in SARS patients, stimulating a humoral and cellular immune response [20]. Yount et al. [21] suggested the importance of this gene through a significant reduction in virus titers following infection with deleted ORF3a recombinant virus. Our data revealed the presence of 10 non-synonymous mutations along the Orf3a gene together with four conserved regions. Interestingly, Tan et al. [22] and Wang et al. [23] found the advantage for the occurrence of frameshift mutations in the protein 3a gene, as this mutation encodes for 3a variants. Additionally, Lu et al. [24] induced Cys133 point mutations at the gene, which is important for protein oligomerization and virus pathogenesis in the host cells.
The conserved structure of the E gene of the envelope protein of the coronavirus may be explained by the vital roles of this protein; it is involved in many important aspects of the virus life cycle: pathogenesis, envelope formation, budding, viral assembly, and structural motifs and virus topology [25, 26]. All E proteins have conserved cysteine residues. Lopez et al. [27] proposed the importance of the conserved cysteines of coronavirus envelope (E) for virus production, as the virus with multiple mutations at three cysteine residues at positions 40, 44, and 47 exhibited an increased rate of its degradation. Additionally, DeDiego et al. [28] proposed that a lack of the E gene caused in vivo and in vitro attenuation of SARS-CoV; this could be used for the development of a live attenuated SARS-CoV vaccine.
The coronavirus M protein plays a major role in virus assembly, when the virus and host factors come together to make new virus particles; this protein is also involved in virus spike density, and its interaction with genomic RNA and S and N proteins regulates virions [29]. Only two mutations have been detected in M protein in the phylogenetic analysis of 197 sequences; this is coincident with the observation by den Boon et al. [30], who found that M protein is moderately well conserved within each coronavirus group. However, Hu et al. [31] demonstrated the highest substitution rate of SARS-CoV-M protein compared with other proteins among 12 coronaviruses; they related these variations to the selection regarding the host range or the ability to escape from host immuno-surveillance.
M protein is one of the proteins that attaches to the envelope membrane surface of the SARS-CoV particles. It has dominant cellular immunogenicity; it potentiates strong humoral response in infected patients; and together with its most conserved structure, it serves as a possible target for vaccine design for SARS-CoV [26, 32, 33]. The nucleocapsid (N) of coronavirus is a structural protein; it plays an important role during assembly of the virion and also during virus transcription [34]. In this study, the phylogenetic analysis of N protein showed the presence of four conserved sites at the gene; interestingly, McBride et al. [34] proposed that CoV-N proteins have three distinct and highly conserved domains: an N-terminal domain, a C-terminal domain (CTD/domain 3), and a central region (RNA-binding domain); the location of these domains matches with the conserved regions detected in this study. Huang et al. [35] found that the structure of the N-terminal RNA-binding domain (NTD) of the SARS-CoV N protein is 45–181 amino acids. Additionally, they demonstrated that the Arg-94 and Tyr-122 residues in the IBV N protein are well conserved across the whole CoV family, and they are critical for SARS N-RNA binding.
Mutation rates are variable in the different regions of COVID-19; some regions have a high mutation rate, and other regions tend to be conserved. Koyama et al. [36] demonstrated that ORF1ab contains more variants amino acids in the NSP3 domain than in other domains.
The protective efficacy of vaccine-induced immunity to viral infection depends mainly on adaptive immune responses. The success of vaccination depends on the properties of the recognized antigen; its ability to activate, expand and memorize a multitude of specialist functions of lymphocytes; and its ability to control the spread and maintain the viral pathogen within a population [37]. We suggest that with the use of recombinant vaccines targeting wide ranges of strategies by using the conserved regions of COVID-19, intervention for this virus may become possible.
Based on the sequence data and the previous publications, we conclude that the favored occurrence of mutations at the ORFab and Orf3a genes during the SARS-CoV epidemic is an important mechanism in host cells for virus pathogenesis. E and M proteins have an almost conserved structure; the S and N genes have many conserved regions, and they could serve as possible targets for vaccine design for SARS-CoV.