3.1 Sequence alignments among SARS-CoV, MERS-CoV, and 2019-nCoV
First of all, we found that for the first 2000 amino acids, the sequence consistency of SARS-CoV ORF1ab and MERS-CoV ORF1ab was relatively low. For amino acids 2000-4000, the sequence consistency was greatly improved. For amino acids after 4000, the sequence consistency is very high (see Figure S1). Then, we found that the sequence of SARS-CoV ORF1ab and 2019-nCoV ORF1ab is surprisingly consistent. The only significant mismatch occurs around amino acid 1000. There are about 25 amino acids in 2019-nCoV ORF1ab having no corresponding sequences in SARS-CoV ORF1ab (see Figure S2). Finally, for MERS-CoV and 2019-nCoV, we found that for the first 1500 amino acids, the mismatch was obvious. For amino acids 1500-4000, the sequence consistency was significantly improved. For amino acids after 4000, the sequence consistency has been considerable (see Figure S3). It can be seen that SARS-CoV ORF1ab and 2019-nCoV ORF1ab are closely related. The differences between SARS-CoV ORF1ab and MERS-CoV ORF1ab are similar to those between MERS-CoV ORF1ab and 2019-nCoV ORF1ab.
Next, we found that, for the S proteins of SARS and MERS viruses, the overall similarity was acceptable, among which the consistency of the first 600 amino acids was relatively low, and that of the last 600 amino acids was relatively high (see Figure S4). Then, we found that, for the S proteins of SARS-CoV and 2019-nCoV, the overall similarity was quite high, especially the amino acids after 600 showed a high degree of consistency (see Figure S5). Finally, we found that, for MERS-CoV and 2019-nCoV, the overall consistency was relatively low, and the sequence similarity after 800 was higher than that before 800 (see Figure S6).
We found that the ORF3 protein of SARS-CoV is highly homologous with ORF3a protein of 2019-nCoV (see Figure S7). Then we found that the E protein of SARS-CoV is highly homologous with that of 2019-nCoV, while the E protein of MERS-CoV is not very homologous with both (see Figures S8-S10). We also found that the M protein of SARS-CoV and 2019-nCoV was highly homologous, while the M protein of MERS-CoV was slightly less consistent with the above two cases (see Figures S11-S13). We still get similar results, that is, the N protein of SARS-CoV is highly homologous with that of 2019-nCoV, while the N protein of MERS-CoV is slightly less homologous with that of the two viruses (see Figures S14-S16).
3.2 Searching for proteins from MERS-CoV which are homologous with those from SARS-CoV and 2019-nCoV
The 2019 novel coronavirus encodes at least 27 proteins, including 15 nonstructural proteins (nsp1-nsp10, nsp12-nsp16), 4 structural proteins (S, E, M, and N) and 8 accessory proteins (3a, 3b, p6, 7a, 7b, 8b, 9b, and ORF14).4,18,19 Although the protein composition of 2019-nCoV is mostly the same as that of SARS-CoV, there are also some differences. For example, 2019-nCoV lost the 8a protein encoded by SARS-CoV. The length of 8b protein (121 amino acids) encoded by 2019-nCoV is longer than that of SARS-CoV (84 amino acids). There was also a significant difference in 3b protein between them. Differences in the composition of these accessory proteins means that 2019-nCoV and the SARS-CoV that erupted before may have some differential pathogenesis. And it also illustrates the fact that these auxiliary proteins are not preferred targets for the development of broad-spectrum antiviral drugs. In the previous section, we have explained that although the four structural proteins of 2019-nCoV are very close to those of SARS-CoV, they are quite different from those of MERS-CoV. Consequently, four structural proteins are also not suitable for the development of broad-spectrum antiviral drugs, although S protein can be used as a key target for the development of specific vaccines. So next, we focus on ORF1ab protein. As we have discussed before, the homology of ORF1ab of SARS-CoV and SARS-CoV-2 (2019-nCoV) is very high, so we use ORF1ab of MERS-CoV as a probe to estimate the conservation of ORF1ab among the three viruses considered in this work.
For nonstructural protein 1 (nsp1),20 the searching results indicate that the best template for modeling nsp1 of MERS-CoV is 2HSX_A, which is the NMR structure of the nsp1 from the SARS-CoV. However, the identity is only 24.32% (see Figure S17), demonstrating that nsp1 is not a highly conservative protein among different coronaviruses.
For nonstructural protein 2 (nsp2), no suitable template was found.
For papain-like proteinase (nsp3),21 it is found that three parts of nsp3 were achieved experimentally. Using the sequences of these three parts, we estimated the possibility of nsp3 as a universal target. We found that the identities between the nsp3 proteins from MERS-CoV and the corresponding proteins from SARS-CoV are 45% approximately (see Figures S18 and S19). It demonstrates that nsp3 is not a highly conservative protein among different coronaviruses.
For nonstructural protein 4 (nsp4), two templates from feline coronavirus and mouse hepatitis virus are suggested, and the identities are 43.01% and 51.14%, respectively (see Figure S20). It demonstrates that nsp4 is not a highly conservative protein among different coronaviruses.
For proteinase 3CL-Pro (nsp5), we found that the identity between nsp5 from MERS-CoV and that from SARS-CoV is 52.98% (see Figure S21). It should be noted that nsp5 is highly conserved between MERS-CoV and some other coronaviruses. Nevertheless, these coronaviruses are not human highly pathogenic coronaviruses. Therefore, we will not focus on this target here.
For nonstructural protein 6 (nsp6), no suitable template was found.
For nonstructural protein 7 (nsp7), two templates from SARS coronavirus and feline coronavirus are suggested, and the identities are 55.42% and 40.96%, respectively (see Figure S22). It demonstrates that nsp7 is not a highly conservative protein among different coronaviruses.
For nonstructural protein 8 (nsp8), two templates from SARS coronavirus and feline coronavirus are suggested, and the identities are 53.30% and 44.85%, respectively (see Figure S23). It demonstrates that nsp8 is not a highly conservative protein among different coronaviruses.
For nonstructural protein 9 (nsp9), two representative templates from SARS coronavirus and human coronavirus 229E are suggested, and the identities are 53.64% and 45.87%, respectively (see Figure S24). It demonstrates that nsp9 is not a highly conservative protein among different coronaviruses.
For nonstructural protein 10 (nsp10), we found that the identity between nsp10 from MERS-CoV and that from SARS-CoV is 59.42% (see Figure S25). Therefore, we do not think that nsp10 is a good choice as a universal target for the design of broad-spectrum antiviral drugs.
For RNA-directed RNA polymerase (nsp12),22 we found that the identity between nsp12 from MERS-CoV and that from SARS-CoV is 72.14% (see Figure S26). It indicates that nsp12 would be a wonderful target for the development of broad-spectrum antiviral drugs against human highly pathogenic coronavirus. We also found that the identity between MERS-CoV and foot and mouth disease virus RNA-dependent RNA polymerase is 14.55%. This shows that the conservation of RdRp among coronaviruses is much higher than that among different types of viruses. We examined the current structural biology achievements of RdRp from these three viruses. We found that for SARS-CoV and SARS-CoV-2, a hetero-oligomeric complex with nsp7 and/or nsp8 is available. Experimental structures of hetero-oligomeric complexes exist. For MERS-CoV, the protein structure has not been achieved experimentally and thus should be predicted via homology modeling. The results are summarized in Table 1 and Figure 1.
For helicase (nsp13),23-26 we found that the identity between nsp13 from MERS-CoV and that from SARS-CoV is 72.37% (see Figure S27). It indicates that nsp13 would be also a wonderful target for the development of broad-spectrum antiviral drugs against human highly pathogenic coronavirus. We examined the current structural biology achievements of helicase from these three viruses. For all these three viruses, experimental structures were realized experimentally. The results are summarized in Table 2 and Figure 2.
For guanine-N7 methyltransferase (nsp14),27-29 we found that the identity between MERS-CoV nsp14 and SARS-CoV nsp14 is 63.22% (see Figure S28). This indicates that nsp14 is also a potential target for the development of broad-spectrum anti-coronavirus drugs. We examined the current structural biology achievements of proofreading exoribonuclease from these three viruses. We found that for SARS-CoV, the protein is structurally achieved while for MERS-CoV and SARS-CoV-2, the protein could be only realized by using homology modeling. The results are summarized in Table 3 and Figure 3.
For uridylate-specific endoribonuclease (nsp15), we found that the identity between MERS-CoV nsp15 and SARS-CoV-2 (2019-nCoV) nsp15 is 51.62%; the identity between MERS-CoV nsp15 and human coronavirus 229E nsp15 is 47.04% (see Figure S29). Therefore, we do not think that nsp15 is a good choice as a universal target for the design of broad-spectrum antiviral drugs.
For 2'-O-methyltransferase (nsp16),28,30 we found that the identity between MERS-CoV nsp16 and SARS-CoV nsp16 is 65.32% (see Figure S30). This indicates that nsp16 is also a potential target for the development of broad-spectrum anti-coronavirus drugs. We examined the current structural biology achievements of 2'-O-methyltransferase from these three viruses. We found that for all these three viruses, the protein structure has been resolved experimentally. The results are summarized in Table 4 and Figure 4.
3.3 Is it possible to use structural proteins as general targets?
Some HCoVs have used cell surface enzymes as receptors, such as SARS-CoV receptor ACE2, MERS-CoV receptor DPP4. Generally, S protein of coronavirus is further cleaved into S1 and S2 subunits by host protease, and S1 / S2 cleavage is mediated by one or more host protease. For example, the activation of S protein of SARS-CoV requires sequential cleavage by endocystine cathepsin L and another trypsin-like serine protease. Unlike SARS-CoV, the S protein of MERS-CoV contains two furin cleavage sites. Host factors may also limit the attachment and entry of HCoV.
It can be seen that the S protein is very important for virus attachment and entry. Nevertheless, we found that the identity between MERS-CoV S protein and SARS-CoV-2 (2019-nCoV) S protein is 30.26% (see Figure S31). In addition, we also found that the identity between SARS-CoV S protein and SARS-CoV-2 (2019-nCoV) S protein is 76.47% (see Figure S32). This also shows that the molecular mechanism of SARS-CoV and SARS-CoV-2 using S protein to invade cells is the same, but MERS-CoV using S protein to invade cells uses a different molecular mechanism. Therefore, we think that S protein, as the most important structural protein, should not be used as a general target of coronavirus, because it is so different in different kinds of coronavirus.
Next, we discussed the other three structural proteins. For E protein, the identity between SARS-CoV-2 and SARS-CoV was only 32.76% (see Figure S33). The research of M protein structure biology is always blank, so we have not got valuable data. For N protein, the identity between SARS-CoV-2 and SARS-CoV was less than 60% (see Figure S34). Therefore, these three structural proteins are not ideal targets for the development of broad-spectrum anti coronavirus drugs.