Numbers and incidence of mutations in NSP-12 AAS based on geographical areas
First, the incidence of mutations on the NSP-12 protein structure was statistically investigated in order to identify the potentially essential mutations. As of now, the total number of 1,759,792 sequences was examined in terms of the number of AAS mutations. The global distribution depicts that the overall significant mutation rate of the NSP-12 region may result in new clades, which in turn may have a notable impact on the mortality rate, drug resistance or vaccine escape, and severity of the disease across different geographical distributions (Fig. 1).
In particular, the statistics showed that no mutation was detected in 1.36% of AASs. 74.84% AASs contained one mutation, 20.2% of AASs comprised two mutations. In comparison, 3.38% of sequences carried three mutations, and 0.23% of AASs represented more than four mutations in their AASs. Next, mutations were sorted into six geographical regions; North America, South America, Europe, Asia, Oceania, and Africa (Fig. 2. A).
North America data with 553,149 AASs included no mutations in 1.15% of sequences, one mutation in 71.94%, two mutations in 24.03%, three mutations in 1.87% of AASs, and 0.42% of AASs showed more than four mutations in their AASs. South America data included 25,130 AASs demonstrated no mutations in 0.88% of sequences, one mutation in 84.86%, two mutations in 12.85%, three mutations in 0.99% of AASs, and 0.13% of AASs illustrated more than four mutations in their AASs. In 1,037,450 AASs of Europe, the mutation was not detected in 1.12% of sequences, one mutation was detected in 76.26%, two mutations in 17.82%, three mutations in 4.34% of AASs, and 0.27% of AASs showed more than four mutations in their AASs. Data regarding Asia with 110,085 AASs included no mutations in 2.1% of sequences, one mutation in 71.68%, two mutations in 22.64%, three mutations in 3.34% of AASs, and 0.24% of AASs showed more than four mutations in their AASs. Oceania data with 17,595 AASs comprised no mutations in 0.76% of sequences, one mutation in 78.12%, two mutations in 13.65%, three mutations in 0.05% of AASs, and 7.41% AASs demonstrated more than four mutations in their AASs. Finally, data related to Africa with 15612 AASs revealed no mutations in 2.75% of sequences, one mutation in 79.24%, two mutations in 12.76%, three mutations in 4.80% of AASs, and 0.35% of AASs represented more than four mutations in their AASs (Fig. 2. A).
Given that at least one mutation has occurred in most NSP-12 sequences in all geographical areas, our data indicates NSP-12 as a mutation hotspot in line with previous studies26,27.
Counting the number of mutations that occur is not enough to investigate the impact of mutations alone, as some modifications may appear multiple times. In contrast, others may occur on only a few samples. For this purpose, a heat map was drawn to investigate the frequency of mutations of the NSP-12 protein section. The results demonstrated that the highest mutations frequency occurred in the region of 301 to 400 AA (0.7847 frequency) and then in the areas of 201 to 300 AA (0.0634 frequency) and 601 to 700 AA (0.0370), respectively (Fig. 2.B). The mutation frequency was detected based on the number of mutations in each section relative to the total AASs.
Mutation’s Specificities According To Geographical Areas
To study the NSP-12 AASs mutations in more detail, the location of mutations in the protein structure and their frequency were investigated between January 2020 and June 2021. Table 1 describes the first five frequent mutations regardless of geographical distribution. The complete list of mutations and their frequencies is attached to the additional file 1 A to G.
Table 1
NSP-12 first five frequent mutations globally from January 2020 until June 2021.
Rank | Residue | Frequency | Total frequency |
Top 1 | P(323)L | 0.98366 | 1,731,040 |
Top 2 | P(227)L | 0.061411 | 108,071 |
Top 3 | G(671)S | 0.028901 | 50,860 |
Top 4 | V(776)L | 0.018056 | 31,775 |
Top 5 | A(185)S | 0.017245 | 30,348 |
According to our findings, the most frequent mutation belongs to P323 and up to June 2021. The most dominant mutation is the approximate 0.98366 frequency rate (1,731,040 times in 1,759,792 AASs). P323 resides in the interface domain of the NSP-12 protein, which was previously shown to be associated with the stabilization of the protein structure. A recent silico study based on the virtual molecular docking investigation nominated potential drugs as SARS-CoV-2 RdRp inhibitor, including Simeprevir and Filibuvir28. The docking site of the desired drugs is located within a hydrophobic cleft which includes phenylalanine at the 326th position close to the P323 mutation site. Hence, this mutation may interfere with the affinity of RdRp with these antiviral drugs29.
P227 mutation appeared recently and may increase in prevalence during the latest global peak, located at the N-terminal extension domain that adopts a nidovirus RdRp-associated nucleotidyltransferase (NiRAN) structure (residues D60-R249)30.
G671S is now considered a fixed mutation of SARS-CoV-2 Delta variant emergence from India, Dec 2020 (https://viralzone.expasy.org/9556). A variant with a significant increase in transmissibility, severity of disease, and the potential to escape neutralization by antibodies31.
It is noteworthy that some studies demonstrated the increase in the prevalence of substitutions A185S and V776L mutations, which suggests the co-occurrence of these mutations 32,33. Moreover, A185s mutation may have a notable impact on the NSP-12 protein structure by preserving the secondary structure of the protein29.
The statistical occurrence of these top five mutations base on the continents is listed in Table 2. Interestingly, not all of these mutations are present among the continents as the top five mutations. Among these mutations, the p323 mutation was present in all continents (North America (0.9862 frequency), South America (0.9896 frequency), Europe (0.9877 frequency), Asia (0.9491 frequency), Oceania (0.9098 frequency), and Africa (0.9394 frequency) as the mutation with the highest incidence rate. P227 mutation has been observed as one of the top mutations in North America (0.1292 Frequency), Europe (0.0327 Frequency), South America (0.0193 Frequency), and Africa (0.0331 Frequency). G671 mutation has been remarked in North America (0.008 Frequency), Asia (0.0454 Frequency), and Europe (0.0396 Frequency) among the top five mutations. Finally, V776 mutation in North America (0.0085 Frequency), Europe (0.0254 Frequency), and Africa (0.0228 Frequency), and A185 mutation in Europe (0.0262 Frequency) and America (0.0258 Frequency) are observed among the top five mutations.
Table 2
The incidence of the global NSP-12 top-five mutations is based on the continents.
Residue | Variant frequency |
North America | South America | Europe | Asia | Oceania | Africa |
P(323)L | 0.9862 | 0.9896 | 0.9877 | 0.9491 | 0.9098 | 0.9394 |
P(227)L | 0.1292 | 0.0193 | 0.0327 | 0.128 | 0.0055 | 0.0331 |
G(671)S | 0.0080 | 0.0009 | 0.0396 | 0.0454 | 0.0091 | 0.0048 |
V(776)L | 0.0085 | 0.0038 | 0.0254 | 0.0013 | 0.0026 | 0.0228 |
A(185)S | 0.0042 | 0.0046 | 0.0262 | 0.0018 | 0.0022 | 0.0258 |
In North America, the A97 (0.0044 Frequency) mutation ranks fifth according to the highest mutation rates. In the 97rd AA position, the alanine amino acid is predominantly substituted by valine with a larger side chain. A previous study demonstrated that this mutation has a negative impact on the packaging of the NSP-1229. This mutation was mainly detected in the mild and asymptomatic samples34.
The pattern observed in South America shows that the K91 (0.0063 Frequency) and I548 (0.0053 Frequency) mutations are placed in the fourth and fifth ranks, respectively, which have not had a significant prevalence in other geographical areas. In Europe, the pattern of the top five mutations is consistent with the global pattern, which may be due to the presence of the largest number of database sequences (1,037,450 AASs). In Asia, the A423 (0.0826 Frequency), the A97 (0.0148 Frequency), and the M666 (0.0139 Frequency) mutations rank the second, fourth, and fifth mutations respectively. 35. The substitution of methionine with isoleucine results in increasing the flexibility of the NSP-12 structure (http://biosig.unimelb.edu.au/covid3d/mutation/QHD43415_11/I/M666I/A).
Oceania shows almost a different pattern of top five high ranks variants. The K718 (0.0478 Frequency) mutation ranks second, the N215 (0.0109 Frequency) mutation ranks third, the A97 (0.0102 Frequency) mutation ranks fourth, and the K267 (0.0101 Frequency) mutation ranks fifth. Finally, the African continent differs from the global T85 (0.008 Frequency) mutation pattern, which ranks fifth of the top five most frequent mutations. The top five mutations of each continent and which amino acids they have been substituted are shown in Fig. 3. The complete list of mutations and their frequencies is attached to the additional files 2 A to G.
Due to differences in the emergence of RdRp variants in SARS-CoV-2 in different continents. This can pose a tremendous provocation to the effectiveness of antiviral therapies. Thus, investigating the evolutionary patterns and spread dynamics of the SARS-CoV-2 NSP-12 variant is of enormous importance.
Evolutionary assessment of top five mutation’s incidence according to time and geographical regions
To determine the appearance of each mutation, we analyzed AASs from each geographic area over time by classifying them according to the month of sample collection from December 30, 2019, till June 30, 2021, as indicated in the GISAID database. This process helps identify mutations that caused the global peak and have increased in frequency over time through stabilizing and beneficiary qualities.
The detailed distribution of the top five high mutation rates of NSP-12 variants from the world and each continent is provided by the month of sample collection and illustrated in Fig. 4.
Next, we investigate the continuation of mutations with a prevalence rate higher than 0.1 per AASs collected each month. P323L mutation began to be observed at the beginning of the pandemic., its prevalence fluctuated in all geographical regions. After that, the incidence increased exponentially. In the last month of the study, June 2020, it was observed with 0.99 frequency rate in June.
The P227L mutation and G671 mutation gained in prevalence from their appearance on February 2021 and April 2021, respectively, presented by 0.01 and 0.77 frequency of the AASs collected worldwide on June 2021.
In particular, P227L mutation in North America has been on the rise since December 2020. In April 2021, the rate of 0.25 frequency is based on the collected samples per month, but in the last month of the study, its prevalence has been decreased to 0.23 frequency. The growing exponential trend of G671S mutation in the last months of the study (June 2021) is very significant, as observed with 0.19 and 0.81 and 0.57 frequency in North America, Europe, and Asia, respectively.
Assessment of P227L and G671S mutations on dynamicity and flexibility of NSP-12
To reveal the effect of P227L and G671S mutations on the tertiary Structure of NSP-12, we have done protein modeling applying DynaMut website. We calculated the alteration in vibrational entropy energy (ΔΔSvibENcom) between the wild and mutant types. Our data demonstrated that the mutation at P227L decreases molecular flexibility on the protein structure by value − 0.174 kcal.mol− 1K− 1, however, substituted serine AA on the 671 residues (G671S mutation) with ΔΔSvibENcom 0.080 kcal.mol− 1K− 1 increased molecular flexibility of NSP-12 protein structure.
Furthermore, the investigation on the changes in the intramolecular interactions caused by P227L mutation revealed that this mutation might affect the interaction with the residues that are closed to wild-type proline. The substitution of leucine alters the side chain, resulting in the alteration of intramolecular bonds in the pocket. These amino acids residue are shown in Fig. 5. A. Moreover, substituting glycine (none polar amino acid) to serine (polar amino acid) has changed intramolecular interactions among amino acids adjacent to 671rd residue by Ionic interactions and water-mediated weak hydrogen bonds Fig. 5. B.