Temporal evolution of SARS-CoV-2
To gain insight into the temporal evolutionary dynamics of SARS-CoV-2, we performed Markov Chain Monte Carlo (MCMC) algorithms implemented in BEAST 1.10.4 package with the 99 enrolled SARS-CoV-2 genomes. Generalized Time Reversible (GTR) with invariant sites as site heterogeneity model (GTR+I) was selected as the best-fit nucleotide substitution model by the Akaike Information Criterion (AIC) implemented in jModelTest. The estimated mean evolutionary rate of SARS-CoV-2 was estimated to be 6.14 × 10-6 subs/site/day (95% HPD: 3.61 × 10-6 – 8.68 × 10-6 subs/site/day), corresponding to 2.24 × 10-3 subs/site/year (95% HPD: 1.32 × 10-3 –3.17 × 10-3 subs/site/year).
The information of MCMC reconstruction was summarized and recorded into a Maximum Clade Credibility (MCC) tree by the program TreeAnnotator. From the MCC tree (Figure 1), the tMRCA of the SARS-CoV-2 was dated back to Dec 11, 2019 (95%HPD, Nov 21, 2019 – Dec 24, 2019). Two major clades were also observed from the MCC tree, with a divergence time at Dec 23, 2019 (95%HPD, Dec 18, 2019– Dec 29, 2019), both of which consist strains of SARS-CoV-2 from Wuhan and other regions of China.
Potentially due to the strong interventions of Wuhan city lockdown by the Chinese government, the Wuhan circulating viral strains were well controlled, while the viral strains outside Wuhan spread to worldwide. To emphasize the viral global transmission from outside Wuhan of China, we excluded the Wuhan circulating strains and observed that the circulating viral strains outside Wuhan could be separated into four sub-clades (Figure 1a). The two sub-clades from Clade 1 was diverged at Jan 1, 2020 (95%HPD, Dec 27, 2019 – Jan 5, 2020), while the two sub-clades from Clade 2 was diverged at Jan 8, 2020 (95%HPD, Jan 3 – Jan 13). With respect to the country-specific strains of SARS-CoV-2, we observed that the circulating strains in USA were from both of the two clades, the UK and Australia circulating strains were from Clade 1, the circulating strains in Singapore, Japan, Germany, France and Italy seemed to be from Clade 2 (Figure 1a, Table S1).
Population dynamics of SARS-CoV-2
To infer the population growth dynamics of SARS-CoV-2, the viral relative genetic diversity was reconstructed by Bayesian Skyline Plot (BSP) analysis 16. BSP analysis suggested that SARS-CoV-2 possessed a relative stable effective population size (Ne) during the first month (Dec 23, 2019 to Jan 22, 2020) of the virus outbreak (Figure 1b). A slow but accelerating reduction in the Ne was observed from Jan 22, 2020, with a sharp reduction of the lower 95% HPD of the Ne from Feb 5, 2020. A sharp reduction in the Ne suggests the initiation of a bottle-neck-effect in the virus population size. The bottle-neck-effect indicates that the current circulating virus strain was trapped, and more mutations in the virus genome will occur to help the virus escape, resulting in a leap in the virus population. Despite the BSP was generated from a limited sample size, the results suggested a possible initiation of a bottle-neck-effect in the population size of SARS-CoV-2, indicating more infected cases will occur in the near future due to the increased mutations in the virus genome.
Clade-/Sub-clade specific genomic mutations of SARS-CoV-2
Despite the SARS-CoV-2 remains relative stable, thirteen clade/sub-clade-specific mutations were observed in the present study (Figure 1a). The mutations at nt 8782 and nt 28144 were clade specific, i.e., C8782T and T28144C were only occurred in Clade 1, rather than in Clade 2. Only a viral strain (EPI_ISL_406592 from Guangdong, China) in Clade 1 did not possess C8782T, while all strains in Clade 1 possess T28144C. Eleven out of the thirteen sub-clade specific mutations were also observed (Figure 1a). Seven mutations were located in Clade 1, among which C29095G and C24034T/T26729C were observed in a sub-clade consisting of viral strains from China (outside Wuhan) and USA, respectively. G28878Aand G29742A were observed in a subclade of viral strains from Australia and USA. Four mutations were located in Clade 2, among which C21707T and C28854T were observed in a sub-clade consisting of viral strains from China (outside Wuhan) and USA. C17373T was observed in a sub-clade of viral strains from China (outside Wuhan), USA and Singapore. G26144T was observed in a sub-clade of viral strains from USA, Taiwan, Australia, Sweden, Italy, and Singapore.
Seven of the observed mutations resulted in non-synonymous mutations in the translated viral protein, including two mutations in nucleocapsid phosphoprotein (C28854T: Ser-Phe; G28878A: Ser-Asn), one mutation in ORF1ab polyprotein (T18488C: Ile-Thr), Surface glycoprotein (C21707T: His-Tyr), ORF3a protein (G26144T: Gly-Val), ORF8 protein (T28144C: Leu-Ser), and ORF10 protein (G29742A: Arg-His). Notably, most of the sub-clades possessed one non-synonymous mutation, while one sub-clade consisting viral strains from Australia and USA possessed two non-synonymous mutations (G28878A and G29742A). One non-synonymous mutation (G29742A) occurred in 3’- untranslated region (3’-UTR) of an Australia and USA circulating strains (Figure 1a, Table 1).