Point mutations in SARS-CoV-2 variants
To investigate the frequency of point mutations in SARS-CoV-2 variants, we performed phylogenetic network analysis using the 7,804 sequences published in GISAID. These sequences were collected until March 23, 2020 and over 5,000 times of point mutation were calculated by the phylogenetic network analysis. Next, we analyzed the locations of these point mutations (Figure 1A). Although the average number of point mutations per 150 nucleotides (bin) was about 28, we observed a higher frequency of point mutations in several locations. To further analyze the polarization of point mutations in each gene, we counted the number of point mutations per gene. As shown in Figure 1B, there were more point mutations in ORF-1a and ORF-1b. However, as shown in Figure 1A, open reading frame (ORF)-1a and ORF-1b are much longer than other regions, which may result in more mutations; hence, we estimated the rate of point mutations per 100 bases in each gene (Figure 1C). When normalized by gene length, the highest frequency of point mutations occurred in the 5’-untranslated region (UTR) and 3’-UTR. These results indicate that point mutations are present in SARS-CoV-2 variants; however, they do not cluster within the gene coding regions.
Point mutations in SARS-CoV-2 variants are biased with disproportionate mutation to U
Next, we focused on the mutated bases and examined the features of point mutations in SARS-CoV-2 variants. Analysis of the frequency of the substituted base after the point mutation revealed that mutation to Uracil (U) occurred approximately four times more often than did Adenine (A), Cytosine (C), or Guanine (G) (Figure 2A). Further analysis revealed that the point mutation to U was mainly derived from C (around 2,400 total mutations) or G (around 1,000 total mutations), but rarely from A (around 100 mutations) (Figure 2B). Moreover, point mutation from G to A (G-to-A), A-to-G, and U-to-C were also prominent, occurring about 500 times each. This bias in mutations suggests the involvement of host RNA editing since it is known that APOBECs can cause C-to-U and G-to-A mutations, while ADARs cause A-to-G and U-to-C mutations. Interestingly, the results in Figure 2B are partially consistent with the known substrate specificity of APOBECs and/or ADARs. We further analyzed the mutation bias per gene, and found that the mutation pattern was similar between genes (Figure 2C, 2D). These results indicate that point mutations in SARS-CoV-2 variants are significantly biased with disproportionate mutation to U. The mutation patterns were partially consistent with the APOBECs-induced and ADARs-induced point mutations, suggesting the involvement of RNA editing enzymes.
Context preferences at the mutation site in SARS-CoV-2 variants.
The above results indicate the involvement of host RNA editing machinery. Since “context preferences” support the involvement of RNA editing machinery, we set the “Contexts” which represents the upstream and downstream sequence of mutated site, and analyzed the context preferences. C-to-U and G-to-A mutations are consistent with those caused by APOBECs, and A-to-G and U-to-C are characteristic of ADARs. To examine the involvement of these two kinds of enzymes, we chose four patterns (C-to-U, G-to-A, A-to-G, and U-to-C) and analyzed the details of contexts which are a adjacent to one base upstream (-1) and downstream (+1). In C-to-U mutation, the observed proportion of U at position -1 and G at position +1 was markedly increased as compared with their expected proportion respectively (Figure 3A). In G-to-A mutation, C at position -1 and G at position +1 was increased. Conversely, U at the position +1 was decreased (Figure 3B). Similarly, at position +1 in A-to-G mutation, G was increased but A was decreased (Figure 3C). At position -1 in U-to-C mutation, A was increased but U was decreased (Figure 3D). These biases in the contexts indicate that specific base is preferred at position +1 or -1 in every 4 patterns of point mutations, suggesting the context preference.
Moreover, our results reflected the context preferences of APOBECs and ADARs. The increase of U at position -1 is consistent with the involvement APOBEC3s (Figure 3A). The increases of G at position+1 in G-to-A mutation site suggest the involvement of APOBEC3G (Figure 3B). Because APOBEC3G prefers C at position -1 of C-to-U mutations, in other words, when SARS-CoV-2 is replicated, APOBEC3G leads to C-to-U mutations on complementary RNA, resulting in the induction of G-to-A mutation with G at its position +1 in viral genome. The increase of G at position +1 in A-to-G mutation site is consistent of the context preferences of ADARs (Figure 3C).
In most commonly observed point mutation, C-to-U, we expanded contexts which are the three bases upstream (-3) and downstream (+3) of mutated site (Figure 3E, 3F). Although we found a high abundance of A and U in the observation proportion (Figure 3E), the SARS-CoV-2 genome contains a high proportion of A and U residues(A: 30%, U: 32%). To exclude the AU-rich bias in SARS-CoV-2 sequence, we calculated the ratio of observed proportion to expected proportion (Figure 3F). We found that U was more frequently present at position -1 and position -2 (p<0.00001). This is consistent with the sequence specificity of APOBEC3s. In addition, G was more commonly found at position +1 (p<0.00001).
These context preferences provide evidence that the RNA editing machinery contributes to the induction of point mutations. Moreover, APOBECs and ADARs are the strong candidate to induce point mutations in SARS-CoV-2 variants.
SARS-CoV-2 variants with increased prevalence of U induce augmented production of inflammatory cytokines
To examine the frequency of point mutations to U within the full length of each RNA sequence, we picked four different sequences from SARS-CoV-2 variants (Figure 4A). These four different sequences were derived from Japan, Georgia, France, and Australia. As shown in Figure 4B, the frequency of point mutations to U was much higher than the frequency of U to A, G, or C. Previously, several studies showed that U-rich ssRNA stimulates innate immune cells through TLR7 signaling to produce inflammatory cytokines 9 10. Thus, we hypothesized that the large number of U residues resulting from point mutations enhances the induction of inflammatory cytokines by human macrophages. To this end, we analyzed the production of TNF-a and IL-6 in the human monocyte/macrophage cell line, THP-1, stimulated by U-rich region of SARS-CoV-2 variants (Figure 4B, square symbol). As expected, ssRNA sequences lacking U residues did not upregulate the production of TNF-a (Figure 4C).
The increment of U numbers induced by point mutation enhanced the cytokine productions in variant-1, 3 and 4, comparing with the stimulation by reference ssRNA sequence from Wuhan. The production of IL-6 was lower than TNF-a, however, we observed the similar tendency in the production of IL-6 (Figure 4D). These results demonstrate that point mutation to U within the SARS-CoV-2 genome results in the ability to stimulate increased production of inflammatory cytokines such as TNF-a and IL-6.