Structure Prediction and Validation:
Robetta comparative modeling approach was used to model the 3D structure of RNA-dependent RNA-polymerase (RdRp) of the recent 2019-nCOV. The amino acid sequence of RdRp was extracted from orf1ab polyprotein submitted to NCBI using the accession number QHD43415.1. Five different models were generated from the amino acid sequence. The model generated by the Robetta ab initio modeling server is given in Figure 1. All the models were subjected to structural validation using the Ramachandran plot analysis.
The analysis from all the servers revealed that model 1 is the best model. The initial analysis suggested that the 2019-nCoV RdRp structure (sequence) possess 95.77% sequence identity with SAR-CoV (PDB ID: 6NUR) while 86.27% sequence identity with 6NUS[27]. Model 1 was selected for further analyses. All the validation scores are given in Table 1. The superimposed structure of model1 with 6NUR and 6NUS is given in Figure 2.
Table 1: Showing the structure validation scores by different rampage server.
S. No
|
Model
|
Ramachandran Plot
|
Favored Region
|
Allowed Region
|
Outlier Region
|
1.
|
Model 1
|
98.9%
|
1.1%
|
0.0%
|
2.
|
Model 2
|
98.4%
|
1.6%
|
0.0%
|
3.
|
Model 3
|
98.3%
|
1.6%
|
0.1%
|
4.
|
Model 4
|
93.7%
|
4.9%
|
1.4%
|
5.
|
Model 5
|
97.2%
|
2.6%
|
0.2%
|
Domains Architecture of RdRp:
The nsp12 comprises a polymerase domain from 398–931 amino acid position, which assumes a structure similar to other polymerases that resemble a cupped "right hand." The polymerase domain further comprises three more domains, including fingers, palm, and thumb. The fingers domain consists of amino acids from 398–581 and628–687; a palm domain consists of amino acids from582–687 and 688–815, while the thumb domain consists of amino acids from 816-931. The N-terminal extension, which is reported to be present in all CoVs, is also a part of 2019-nCoV. This region consists of amino acids from 1-397. All these domains are coloured differently and presented in Figure 3.
RdRp (2019-nCoV) Possess two highly conserved metal-binding sites
Previous studies reported that the RdRp enzymes have two metal-binding sites coordinated by four residues each. The four residues are enriched by Histidine repeats. In the case of the previously reported cryo-EM structure, these residues include His295, Cys301, Cys306, and Cys310, while the second is in the fingers domain and is coordinated by Cys487, His642, Cys645, and Cys646. Herein, these residues also coordinate the two metal-binding sites. All these eight amino acids, coordinating the metals, are reported to be highly conserved in all the RdRp. Both of these metal-binding sites are distal to known active sites as well as protein-protein and protein–RNA interactions. Thus, rather than being directly involved in enzymatic activity, these ions are expected to be structural components of the folded protein. The involvement of intrinsic zinc ions in nsp12 is reflective of bound zinc atoms in coronavirus nsp3, nsp10, nsp13, and nsp14 and leads to the common use of zinc ions to fold viral replication functional proteins. Figure 4 showing the two metal-binding sites given below.
Electrostatic Potential and conservation analysis of RdRp:
As given in Figure 5, it can be seen that the outer surface of the predicted model carries a mostly negative electrostatic potential. Nevertheless, a strong positive electrostatic potential was reported as the nucleotide triphosphate (NTP) binding site and the polymerase RNA template site. The RNA exit tunnel is comparatively neutral. A relative neutral electrostatic potential at the nsp7 and nsp8 can be observed. The electrostatic potential of the predicted model is given in Figure 5(A).
Furthermore, sequence conservation analysis using sequences from the coronavirus family reveals that the NTP tunnel, template entry, and primer exit tunnels are the most highly conserved surfaces on nsp12. On the other hand, the polymerase active site is also highly conserved site Figure 5(B). The previous study reported that the nsp12 nidovirus-unique N-terminal extension also has a conserved surface, which may reflect an interface site for the N-terminal disordered domain of nsp12 (1–116).
The N-terminal extension conserved motifs
The 2019-nCoV nsp12 is ~931amino acids long, which is in distinction to the polymerases of the closely related picornaviruses that are typically closer to 500 amino acids. With the long N-terminal extension the C-terminal region comprises of the polymerase domain. Using the sequence analysis of the RNA polymerase domain from the Nidovirales order revealed three conserved sequence motifs within this region. We have named it AN, BN and CN as according to the previous study. In the case of 2019-nCoV, these regions correspond to a different position from other CoVs. Here the AN, BN and CN are located at (69-89), (102-130) and (202-222) positions. Previous studies reported that Lys73 residue in the AN is the proposed active site. The conserved regions of the polymerase harboring this nucleotidyltransferase activity were termed NiRAN9. However, the role and mechanisms of this nucleotidyltransferase activity remain to be elucidated. The identified conserved NiRAN motifs (AN, BN and CN) and their location coloured differently AN(orange), BN(green) and CN(blue) are given in Figure 6.
Previously it has been also reported that the unique N-terminal extension could be divided into two main separate regions. In the 2019-nCoV nsp12, these regions could correspond to (117-249) known as NiRAN while the other could correspond to (250-398) termed as interface region (as shown in figure 2). A study reported that the NiRAN region specifically the BN and CNmotifs interacts with the fingers and palm domain of the polymerase region. Therefore, we speculate that this motif (BN) implies a functional role tink the N-terminal and the polymerase domain. The previous study reported that seven amino acids from the NiRAN region are strictly conserved across the Nidovirales order. In the cry-EM structure (PDB ID: 6NUR) having sequence identity 95.78% with the 2019-nCoV reported that four residues, among the seven conserved, include Asp126, Gly214, Asp218 and Phe219 are present to play an important role in the interaction.
nsp12 RNA-dependent RNA polymerase (RdRp) domain & Catalytic mechanism
The RdRp region has been reported to have a shape like a right hand with subdomains include fingers, palm and thumb. Herein, we defined these subdomains with their respective residues location. The seven (A-G) conserved motifs (Figure 7) in the RdRp domain of all the viruses are reported to be involved in the NTP binding, template and catalysis. The superimposition of SARs-CoVs (6NUR and 6NUS) (figure 2) onto the structure of 2019-nCoV RdRp revealed important information regarding the active site and catalysis. Upon accessing the active site comprised of motifs A and C and assisted by motifs B and D, single-stranded RNA template threads its way past to motif G.Incoming NTPs will pass through a tunnel at the back into the active site and communicate with Motif F. Motif E at the base of the thumb interacts with the 3′ nucleotide of the primer strand. The primer-template product of RNA synthesis exits the active site through the RNA exit tunnel. This double-stranded RNA product would interact with the N-terminal region of motif G in the fingers domain via the major groove while a helix (2019-nCoV amino acids 798–846) from the thumb domain interacts with the minor groove.
CoVs own the most giant recognized RNA genomes and entail an RNA synthesis complex with the fidelity to replicate their RNA. In the polymerase active site, the incoming NTPs form a base pair with the template RNA while the 2′ and 3′ hydroxyls form hydrogen bonds with the polymerase. In 2019-nCoV nsp12, the 2’ hydroxyl of the incoming NTP is likely to form hydrogen bonds with, the three conserved residues of the CoVs family, Thr662 and Asn673in motif B. Furthermore, Asp605 in motif A is located to interrogate the 3′ hydroxyl through hydrogen bonding.
The hydrophobic side chain in Motif F facilitates the base pairing. The previous study on SARS-CoV reported Val557 to be of significant importance, which is also located at position (Val557) in the case of 2019-nCoV. GS-5734 a nucleoside analog has been reported to impair the RNA synthesis by targeting the RNA synthesis machinery. Partial resistance to GS-5734 posed by two mutations Phe480Leu and Val557Leu has been reported. The previous study reported that the bulkier leucine side chain at amino acid 557A creates higher stringency for base pairing with the templating nucleotide is likely to be established, enabling nsp12 to exclude this analog from its active site. This residue has also previously been labeled in modulating polymerase fidelity. Amino acid positioned at 480th in motif B (Fingers domain) may influence catalytic-related dynamics. This sturture is based on the previous structres, thus based on homology modeling. Therefore, the structure is based on comparative analysis and experimental methods such as x-ray crystallography or Cryo-EM methods could verify. Furthermore, looking into the conservancy of RdRp domains we reconstructed the ancestral relationship and used it for the dispersion pattern of the recent outbreak.
Ancestral state reconstruction:
The ancestral reconstructions at node I (node ID 219) suggest that Coronavirus evolved in area E (Wuhan, China) or B (Thailand) based on BBM results with 46% and 39% marginal probability values (MP), respectively with 1.00 posterior probability value. Whereas S-DIVA results postulate the origin of this group in area BE (Thailand+ Wuhan China) with 81% MP values, as shown in Table 2. Similarly, the ancestral states on node II, III, and IV indicates that the ancestors of the species belonging to these nodes were evolved in area E with MP values of 82%, 71% and 63% respectively based on BBM inference, on other hand similar results were obtained by S-DIVA approach with slightly changed MP values for node II, III and IV (74%, 74% and 61%, respectively as shown in table 2). Both BBM and S-DIVA analyses suggest H as an ancestral state on node V with 91% and 99% MP values respectively.
Table 2: Ancestral state reconstructions by BBM and S-DIVA approach. Where AR=ancestral states, MP=marginal probability value and PP=posterior probability value, respectively.
|
S-DIVA
|
BBM
|
PP
|
Nodes
|
AR
|
MP (%)
|
AR
|
MP (%)
|
I
|
BE/BH
|
81/09
|
E/B
|
46/39
|
1.00
|
II
|
E/EH
|
74/09
|
E/EF
|
82/4.0
|
1.00
|
III
|
E/EF
|
74/19
|
E/EF
|
71/13
|
0.55
|
IV
|
EH/E
|
61/23
|
E/EH
|
63/21
|
0.85
|
V
|
H
|
99
|
H/EH
|
90/08
|
0.95
|
Table 3: Dispersal events among various regions obtained from S-DIVA and BBM methods.
S-DIVA
|
BBM
|
Distribution Rang
|
Dispersal from
|
Dispersal to
|
Within
|
Distribution Range
|
Dispersal from
|
Dispersal to
|
Within
|
A
|
0.00
|
1.00
|
0.00
|
A
|
0.00
|
1.00
|
0.00
|
B
|
0.00
|
1.00
|
1.00
|
B
|
0.00
|
2.00
|
1.00
|
C
|
10.0
|
11.0
|
08.0
|
C
|
9.00
|
10.00
|
09.0
|
D
|
00.0
|
01.0
|
00.0
|
D
|
00.0
|
01.0
|
00.0
|
E
|
13.0
|
4.00
|
12.0
|
E
|
13.0
|
6.00
|
11.0
|
F
|
2.00
|
6.00
|
4.00
|
F
|
3.00
|
5.00
|
5.00
|
G
|
0.00
|
1.00
|
0.00
|
G
|
0.00
|
1.00
|
0.00
|
H
|
9.00
|
6.00
|
49.0
|
H
|
9.00
|
5.00
|
50.0
|
I
|
0.00
|
3.00
|
0.00
|
I
|
0.00
|
3.00
|
0.00
|
These ancestral reconstructions suggest three central of diversity and expansions in coronaviruses, Wuhan is the primary center of diversity, whereas Shenzhen and USA are the other two centers (Figure 8 and Table 3). Dispersals events occurred from Wuhan to Thailand, USA, Shenzhen, Shanghai and Beijing. From USA dispersal took place to Area B, C, D E and I (Europe). The dispersal from Shenzhen (C) is unique because for very remote dispersal to A (Australia), Japan (G), USA (H) and Europe (I) occurred via this area as shown in Figure 9B.
As far as the recent new coronaviruses are concerned, they are nested within Bat SARs like coronaviruses and probably evolved in E (Wuhan) region. An early expansion occurred from Wuhan to USA and Shenzhen. From the USA, recent dispersals are identified toward Wuhan and Shenzhen. The most of the long-distance dispersal events in 2019-nCoV took place from Shenzhen to USA, Australia, Finland and Japan.