We searched for SARS-CoV-2 transcription in 636 human-SARS-CoV-2 RNA-seq datasets, identifying 328 samples with at least 1,000 SARS-CoV-2 reads (S. Table 1). We applied the hyperediting tool12 on these human host samples to trace genuine ADAR hyper-editing (he), retrieving 14,273 SARS-CoV-2 he reads, with an average of 0.036 edited reads every thousand viral reads (S. Table 1). Notably, he along SARS-CoV-2 was identified only by lowering the fraction of minimum edited sites per read to 0.03 instead of 0.05. The he levels poorly correlated with the coverage of SARS-CoV-2 in these samples (Figure 1A, r=0.42, p-values 1.3 e-8) and only 49 samples included more than 100 he reads (for a total of 9,568 he reads). In these samples we showed that he is mostly localized in two conserved hotspots along the SARS-CoV-2 genome: one localized on the RNA-dependent RNA polymerase (nsp12, position 14221:14331) and the second on nsp6 (position 11058:11162), both encoded within the polycistronic ORF1ab (Figure 1D). Normalizing he by gene expression levels, we demonstrated that he mostly impacted ORF6 (Figure 1C). ORF6 is known to impair the transcriptional induction of ISGs by interacting with STAT1 and STAT213.
The analysis of additional 227 RNA-seq datasets referred to SARS-CoV-2 infection in non-human hosts, including hamster, ferret, non-human primates, and mice revealed relevant he levels exclusively in mouse samples of one experiment (PRJNA646535, S. Table 1). SARS-CoV-2 infection in these mice was made possible by ACE-transfection, whereas the role of Interferon signaling during infection was evaluated by knocking-out IFNR or IRF3/714. Notably, the IRF3/7 knock-out mice displayed the highest he levels (1.16‰), followed by control (0.26‰) and IFNR knock-out mice (0.11‰). In these mouse samples, we detected a higher efficiency of he, measured as the number of edited bases per he cluster, compared to SARS-CoV-2 in humans, and the normalized he levels homogenously impacted SARS-CoV-2 genes (Figure 1C and D).
To verify if SARS-CoV-2 could interfere with ADAR he of host dsRNAs, we traced he events on mouse and human genomes. In mice, ADAR he impacted preferentially non-coding genes, although he levels were not influenced by SARS-CoV-2 infection for either protein coding genes or non-coding elements (S. Figure 1A). Although ADAR expression levels appeared low (<3 TPMs) and mildly downregulated in knock-out mouse samples, we observed a reduced he levels in these latter samples, possibly due to the general reduction of interferon-related genes in which ADAR is included (S. Figure 1A). The efficiency of he was similar between SARS-CoV-2 and mouse in all the samples (Figure 1B). Unfortunately, human host reads have been removed from most of the samples with SARS-CoV-2 he. In 7 human samples we could show that the host he efficiency was higher compared to SARS-CoV-2, with an average of 5 edited bases per cluster compared to 3.5 (Figure 1B).
To further verify the possible implication of ADAR he during SARS-CoV-2 infection in humans, we analyzed post-mortem RNA-seq samples of lung with different SARS-CoV-2 infection levels (N=57)15. We demonstrated that he of host dsRNAs was not influenced by SARS-CoV-2 infection (S. Figure 1B). Similar to mouse samples, ADAR was only mildly modulated by SARS-CoV-2 (1.8x), with mid to low expression levels (<20 TPMs).
According to our results, we speculated that SARS-CoV-2 genome sequence could confer resistance to ADAR he. The analysis of the dinucleotide composition of ORFs belonging to 70 coronaviruses demonstrated a significant under-representation of the ‘WA’ (W=A/T), ‘TC’ and ‘CG’ motifs (S. Figure 2). While the ‘CG’ motif is known to be under-represented in coronaviruses, in agreement with the effect of the zinc finger antiviral protein (ZAP) and the low-CpG frequency characterizing vertebrate hosts16, the ‘WA’ and ‘TC’ motifs are preferential targets for ADAR and APOBEC3 enzymes, respectively17,18. No significant differences in the inter- and intra-genus under-representation were detectable for ‘WA’ and ‘TC’, except between alpha and delta-coronaviruses at ‘WA’ (p-value 0.0032, Figure 2, S. Table 2), with WA under-representation in Orf1ab and S detected in 96% and 81% of the tested coronavirus genomes, respectively. The second metric that we considered is “the replacement transition fraction”, or repTrFrac, which determines a significantly high mutation susceptibility in an ORF, leading to non-synonymous polymorphism (nsSNPs). We showed that repTrFrac in ‘WA’ was significantly higher than in TC for most of the coronavirus genomes (‘WA’: 78.8 ± 18%; ‘TC’: 31 ± 34.6%, p-value 2.79 e-11, Figure 2).
Among beta-coronaviruses, SARS-CoV-2 appears in the upper part of the distribution for ‘WA’ motifs, because of a significant WA under-representation in ORFs covering 95% of the genome (ORF1ab, S, ORF3a, M, ORF6, ORF7a, N), whereas ‘TC’ motifs SARS-CoV-2 are in the lower part of the distribution (ORF1ab and S, Figure 2). We also showed that the ‘WA’ under-representation characterizes most of the non-structural proteins encoded along Orf1ab, except for nsp7, nsp9-12 and nsp16 (Figure 1D).