In this study, we detected a total 389 mutations, which were categorized as 227 missense mutations, 24 noncoding region mutations, 113 silent mutations, 7 in-frame, 8 frameshifts deletions, 4 frameshift insertions, and 6 stop-gained variants (Table 3). Fascinatingly, some of these mutations of the beta-variant have a very different in the prevalence in Pakistan as compared with the global population. In consistent with our results, other researchers have also observed continued differences in the SARS-CoV-2 genome due to its ongoing and fast evolution when compared to Wuhan-Hu-1 (reference genome) (Basheer et al. 2023; Fiaz et al. 2022; Fibriani et al. 2021; Harvey et al. 2021; Hossain et al. 2021; Koyama et al. 2020; van Dorp et al. 2020). However, out of the 237 missense variants, 54 were found in structural proteins of beta-variant and among all the structural proteins, s-protein, which is a main target of the immune evasion and antibody response (Mengist et al. 2021), had the highest frequency of the missense mutations (35), Fiaz et al. (2022) also found the highest frequency of missense mutations in the s-protein in the alpha-variant. It was followed by the nucleocapsid, which had the second-largest number of missense mutations (19). Out of 22 characteristic mutations following nine mutations; A701V, D614G, D80A, K417N, D215G, N501Y, and E484K, L18F and one deletion, L242 were detected in the spike protein with a prevalence of 97.14%, 95.25, 95.25%, 92.38%, 80%, 52.38%, 50.48%, 5.71% and 49.52% respectively. However, among these mutations, E484K, N501Y, and L18F and one deletion (L242) were found to have high global prevalence (86%, 85%, 35%, and 84% respectively) as compared to Pakistan. Tegally et al. (2020) also found 3 mutations (K417N, N501Y, and E484K) in the RBD, and 3 in the NTD (D215G, D80A, and L18F) and 1 mutation found in loop 2 (A701V). Surprisingly, the prevalence of two missense mutations, N501Y and E484K, present in RBD is 86.2 and 85.6% globally, but in Pakistan, their prevalence was 52 and 50%, respectively. These two mutations purportedly makes the virus more transmissible and pathogenic (CDC 2021). Hence, the decrease in the prevalence of these two mutations indicates that the beta-variant population in Pakistan was comparatively less virulent than worldwide population. The low frequency of common mutations among various regions also suggests that this variant has evolved independently. Nelson et al. (2021) reported that the three mutations such as K417N, E484K, and N501Y cause the highest degree of changes in the RBD when bound to human ACE2. In particular, two mutation regions (K417N and E484K) of the RBD sites are the specific regions for binding of neutralizing antibodies (Tada et al. 2021). Another author reported that the E484K mutation causes a loss of neutralization activity against the beta-variant (Wang et al. 2021), however, K417 is the RBD residue and enhance the affinity of the virus against the receptor by interacting with the ACE2 receptors, a mutational analysis reveals that the substitution of K by N in the amino acid has a minimally effects on this binding (Starr et al. 2020). England (2020) reported that the N501Y mutation can promote strong binding affinity network with host ACE2 receptors as compared to N501 wild type. Barnes et al. (2020) also reported the same results as England et al. that the mutation (N501Y) is associated with enhanced affinity for ACE2 receptors.
Sanches et al. (2021) detected one deletion and twelve missense mutations in the beta-variant, including in the s-protein (LAL242-244 del, A701V, D614G, N501Y, E484K, K417N, R246I, D215G, D80A, L18F), in the envelope (P71L), in ORF1a (K1655N), and in the nucleocapsid (T205I). In the s-protein, the missense mutation (L18F) had a global prevalence of 35%, but in Pakistan it was just 6% (Table 4). McCallum et al. (2021) also found the L18F mutation in the s-protein, which is a highly sequenced variant escaping S2L28-mediated neutralization. A del L242 in s-protein had a high prevalence (84.4%) in the global population, but in the Pakistan it had just a 50% prevalence.
In the case of nucleocapsid, four missense mutations (T205I, G30R, P13S, and T362I) were detected with prevalence of 88%, 67%, 10.48%, and 7% in our samples, but globally 95.4%, 0.4%, 5.7%, and 9.3% respectively. Likewise, the mutation T205I is highly phosphorylated, and the prevalence of this mutant was about 43%. However, this mutation interferes the virus lifecycle by interrupts the activation of nucleocapsid protein (Mohammad et al. 2021). Another study reported that the nucleocapsid is for virus particle release, RNA packaging, and the core-forming process of ribonucleoprotein (Zeng et al. 2020). Surprisingly, one characteristic substitution mutation, G30R on nucleocapsid, had a prevalence of 67% locally but just 0.4% globally. The mutation P71L in E had 88.57% prevalence in Pakistani population and this mutation was present in datasets of deceased patients’ and the countries of high case-fatality-ratio. Therefore, this disease was linked with the disease severity and death rate (Rizwan et al. 2021).
ORF1ab is a polyprotein of SARS-CoV‐2, which is transcribed into 16-NSPs, that play an significant role in the synthesis of the viral RNA and encompasses mutational spectra (Banerjee et al. 2021). Out of the 227 missense variants, 148 were found in the NSPs; however, among all the NSPs, the NSP3 had the highest mutations (70), and it is essential for viral replication (Harcourt et al. 2004), followed by NSP2, which had the second largest mutations (28). In agreement with our findings, in the analysis of the 10,022 samples of SARS CoV-2, Koyama et al. (2020) also found the highest mutations in NSP3, and then in NSP2, among all the NSPs. In NSP2, one missense mutation (T85I) has a 97.14% prevalence in Pakistan. Ramesh et al. (2021) find an infectivity-strengthening mutation such as T85I that may benefit most from co-mutation with other mutations like Q57H and D614G, which are also infectivity-strengthening mutations. In the NSP3, seven missense mutations (K837N, S794L, S93F, T217I, V613I, D178Y, and V1768G) were detected with a prevalence of 82%, 67%, 9%, 8%, 5%, 3%, and 3%, respectively. The silent mutation F206F affects virus fitness and is part of the G Glade that originated in China (Tomaszewski et al. 2020) and in Pakistan, F206F mutation had a 99.05% prevalence. In NSP5, a highly prevalent (97.14%) missense mutation (K90R) and one deletion (S106) in NSP6 with 73% prevalence in Pakistan were detected, but it is very surprising to know that the S106 mutation had 91% prevalence in the global population.
NSP12/ RdRp is essential for replication and transcription (Romano et al. 2020). Therefore, mutations in RdRp increased the viral mutation rate. In this research, a P314L mutation was detected in RdRp with a prevalence of 96.19% locally and 88.9% globally. NSP13/helicase is a superfamily 1 that unwinds a double-stranded RNA/DNA into two single-stranded DNA/RNA (Yuen et al. 2020). Three missense mutations (T588I, K584N, and Q194P) were detected in helicase with their respective prevalence of 9%, 7%, and 3%, in the Pakistani population and 13%, 0.72% and 0% in the globally respectively. The substitution mutation T588I was also detected with a prevalence of 21.9% by Alkhatib et al. (2021). Basheer et al. (2023) also found 11 characteristic mutations, including 9 mutations present in the s-protein, 1 in the envelop, and 2 in the NSPs of BA.2.75.
Among all accessory proteins, the ORF3a had the highest mutation numbers (20), followed by ORF6 (3), ORF7a (3), ORF7b (4), ORF8 (7), and ORF10 (2). The ORF3a is a largest accessory and conserved protein and mostly expressed in the plasma membrane of the cell where it induces inflammatory responses and apoptosis in the infected cells (Redondo et al. 2021), and this protein also activates the signaling receptors of the immune, such as the NLRP3 (inflammasome), which helps in the production of the cytokine and causes tissue inflammation (Redondo et al. 2021; Shah 2020). In ORF3a, two highly prevalent missense mutations (Q57H and S171L) were detected, with prevalence of 94.29% and 75.24% in Pakistan and 95.5% and 90.5%, respectively, globally. Wang et al. (2021) also detected the same mutations and found that the host cell's viral load is increased by the Q57H mutation in ORF3a and this mutation also makes ORF3a unstable; therefore, it is very difficult for ORF3a to be involved in the inflammatory response and apoptosis (Wang et al. 2021). Our results agree with the results of Rehman et al. (2020), who also identified the Q57H mutation in ORF3a. likewise, in the case of ORF7b, one missense mutation (W29L) with a prevalence of 48.57% in our sample but just 0.3% in the global population was found. One stop gained mutation E39* was found with a prevalence of 9.2% globally and just 5.71% in our sample population. In the ORF8, four mutations were detected, including silent (F120F), missense (I121L), frameshift deletion (F120), and frameshift insertion (D119). In ORF8, an I121L mutation was detected that was 23% prevalent in the global population and just 3.81% in Pakistan. ORF8 is an important protein for immune evasion through the interaction of major histocompatibility complex molecules of class I (MHC-I) and in different cells, this protein suppresses the surface expression of MHC-I (Zhang et al. 2021). Wang et al. (2021) detected highly prevalent mutations, including L84S and S24L, in the ORF8 of the beta-variant, and these mutations may increase the ability of the SARS-CoV-2 to spread by down-regulating the expression of MHC-1. However, these mutations were not found in this study. Therefore, the beta-variant was less dangerous for the Pakistan. Likewise, in the case of the 5’ UTR and 3’UTR region, we detected 7 and 17 mutations, respectively, in the beta-variant, and the two mutations among all mutations in the 5’UTR including 174G > T and 241C > T, appeared most predominantly with a prevalence of 75.24% and 74.29% respectively, in Pakistan (Table 4). In agreement with our results, most of the studies reported that in the 5’UTR, the 241C > T has been reported as the most prevalent mutation in the global population (Urhan et al. 2021) and this mutation plays a significant part in the control of gene translation and expression (Li et al. 2005). This mutation does not change the sequence of protein-coding amino acid but the secondary structure of RNA is affected by this mutation, which consequently, alters the virus's cycle of infection and rate of replication (Kim et al. 2020).
The phylogenetic analysis (Fig. 8) showed that a main cluster of Pakistani samples (42) revealed a strong relationship with South Africa. Thereafter came the grouping of our eight and five other samples with those reported from England and Saudi Arabia in two different clusters. Additionally, several of our other samples were also grouped together with the Italy in a small number of clusters. Though, interestingly our findings did not show any relationship with the France and UAE but a large number of the Pakistani diaspora live there. Phylogenetic analysis suggests that the transmission of beta-VOC to Pakistan come mainly from South Africa and England.