Coronavirus infection disturbs diverse biological processes in human cells and can stimulate ACE2 expression through IRF1 and STAT1
Coronaviral infection led to not only respiratory failure but also multiple organ dysfunction syndromes, indicating that there are common pathways for coronavirus to impact human cells (28). Transcriptome analysis may provide valuable information on how human cells react with coronavirus entry.
To examine whether coronavirus infection disturbs expression of specific gene sets in human cells, we analyzed public available RNA-seq data of human lung-derived cells with infection of MERS-CoV, SARS-CoV, and SARS-CoV-2. Through comparison of transcriptomes before and after infection, we identified thousands of dysregulated genes (adjusted p-value < 0.05) for each group (Fig. 1A). Among those dysregulated genes, we found that 26 genes were commonly upregulated after infection of the three coronaviruses (Fig. 1B), but very few genes were identified to be commonly downregulated (Fig. 1C). GO analysis of the 26 commonly upregulated genes demonstrated enrichment on inflammation, immunity and apoptosis related pathways (Fig. 1B). Through relative viral sequence content in transcriptome, we found that the three coronaviruses can infect various human lung-derived cells (Fig. 1D), however, low dose of coronavirus or using NHBE cells for infection were not successful to support coronavirus replication (Fig. S1).
ACE2 is the cell receptor of SARS-CoV-2 (8, 9). Differently from robust expression of ACE2 in Calu-3 cells, ACE2 expression was undetectable in A549 cells, but after SARS-CoV-2 infection, low level of ACE2 was observed (Fig. 1E). This indicates that transcription factors responding to coronavirus infection induced ACE2 expression. Recent report showed that ACE2 can be stimulated by interferon, and proposed IRF1 and STAT1-binding sites near ACE2 transcription start site (Fig. S2) (29). Here, we noticed that expression of both IRF1 and STAT1 were increased after SARS-CoV-2 infection, and ACE2 expression was significantly reduced when IRF1 was depleted in virus-infected human cells or STAT1 was depleted in interferon-treated human cells (Fig. 1E). These results confirmed that IRF1 and STAT1 are essential upstream activators of ACE2 upon virus infection. So we propose that SARS-CoV-2 might enter human cells with low efficiency by bulk-phase endocytosis in A549 cells, inducing IRF1 and STAT1 expression which further enhances ACE2 expression to facilitate receptor-mediated viral entry. Therefore, IRF1 and STAT1 seem to be two promising drug targets to limit coronavirus entry through ACE2.
Coronavirus infection enhanced retrotransposon expression in human lung-derived cells
Next, we ask whether TE expression is impacted by coronavirus infection. We first examined transcriptome of human lung adenocarcinoma cell line Calu-3 after 24-hr infection of MERS-CoV (30). We observed that TE expression was generally activated after coronavirus infection (Fig. 2A). Further examination documented that subfamilies of LINEs, SINEs, LTRs were differentially upregulated by coronavirus (Fig. 2B). LINE-1 is the mostly well-studied autonomous retrotransposon. Most LINE-1 elements are inactivated in somatic cells, but some escape variously evolved silencing mechanisms. Hence, we ask whether evolutionarily old and young retrotransposons were impacted by coronavirus infection differently. We compared the ratio of fold change of specific LINE-1 element expression ordered by predicted evolutionary ages, and found that older and younger LINE-1 elements were similarly influenced (Fig. 2C) (31). One of the major mechanisms for LINE-1 silencing is DNA methylation, and we examined expression of genes encoding DNA methyltransferases (DNMTs) and Ten-eleven translocation (TET) enzymes mediating active DNA demethylation. We observed that Tet genes were generally upregulated after coronavirus infection (Fig. 2D), and upregulated DNA demethylation activity may lead to demethylation of retrotransposon promoters. This result supports that increased retrotransposon expression was caused by genome-wide DNA demethylation. We obtained similar results in MERS-CoV/SARS-CoV infected MRC5 cells which are noncancerous human lung fibroblast cells (Fig. 2A-D).
Recent COVID-19 outbreak is caused by the novel coronavirus SARS-CoV-2. Here, we explored transcriptomes of SARS-CoV-2 infected A549 and Calu-3 cells. Similar to MERS-CoV and SARS-CoV infection, we found general increase of multiple transposable elements (Fig. 2A-B), no biased impact of older and younger LINE-1 elements by SARS-CoV-2 infection (Fig. 2C). SARS-CoV-2 infection also causes upregulation of TET gene expression (Fig. 2D). Similarly, SARS-CoV-2 was identified to have the capability of infecting human intestinal organoids (Fig. 2E) and increased retrotransposon expression can also be observed post infection in a time-dependent manner (Fig. 2F).
Therefore, upregulation of retrotransposon seems to be a common event induced by coronavirus infection, possibly through enhancing global DNA demethylation activity. Despite of similar upregulation of retrotransposon families triggered by the three coronaviruses, individual retrotransposons are differently dysregulated, and this may cause various phenotypes in human cells. Note that above results were from 24-hr infection of coronaviruses, and impact of long-term infection should be more severe. Moreover, retrotransposon is able to encode proteins and can form retrovirus-like particles (26), so electron microscopy examination of coronavirus-infected samples may need to discriminate coronavirus from retrovirus-like particles because of upregulation of retrotransposons.
Upregulation of retrotransposon may be long-term memorized epigenetically
We then ask whether retrotransposon upregulation can be long-term inherited through several generations of cell divisions. We found the mouse model of transgenerational epigenetic inheritance of acquired traits may provide molecular insights into this question.
tRNA-derived small RNAs (tsRNAs) in sperm were reported to transmit abnormal epigenetic information into preimplantation embryo, and epigenetic abnormality was further inherited to adult tissue, causing metabolic disorders (32). Two kinds of tsRNAs were previously identified to regulate retrotransposon LTR (33), so we ask whether abnormal retrotransposon activity is inheritable during this process. We analyzed the transcriptome of cleavage mouse embryo and adult islet originated from zygote with injection of tsRNA of sperm from normal or high-fat diet (HFD) male mice. We found that LINE, SINE and LTR retrotransposons were all upregulated in 8-cell embryo when HFD tsRNA was injected (Fig. 3A). Notably, LTR retrotransposon also showed upregulation in adult islet (Fig. 3B). Further analysis on LTR families supported that upregulation of ERV1 expression was inherited from early embryo (Fig. 3C) to adult islet (Fig. 3D), probably through DNA methylation inheritance at ERV1 locus. Therefore, above result indicates that enhancement of retrotransposon expression, ERV1 in this case, can be long-term inherited through several generations of cell cycles, even from cleavage-stage early embryos to adult tissues, with change of DNA methylation as the potential molecular mechanism (Fig. 3E).
SARS-CoV-2 RNA forms chimeric transcripts with retrotransposon RNA especially LINE for potential insertion into host genome
Coronaviruses are RNA viruses and are not supposed to integrate into host genome by themselves. However, it was reported that several RNA viruses have capacity to recombine with retrotransposons to invade host genome (34, 35). Regarding contribution of SARS-CoV-2 RNA to total transcriptome in infected Calu-3 cells to be as high as 15.32% (Fig. 1D), we explored in the transcriptome the potential chimeric transcripts of SARS-CoV-2 and cellular RNA, and obtained subtranscriptome with chimeric reads.
We found that 0.23% of SARS-CoV-2 RNA formed chimeric transcripts with non-TE genes and 0.14% with TE (Fig. 4A). Surprisingly, TE-virus chimeric reads contribute 37.36% to total mapped chimeric reads, while TE reads are only 2.83% in total mapped reads (Fig. 4B), indicating that TE is much more efficient to form chimeric transcripts with SARS-CoV-2 RNA than non-TE genes. We randomly extracted reads from subtranscriptome of chimeric transcripts of SARS-CoV-2 and cellular RNA, and confirmed identity of the chimeric reads (Fig. 4C).
We further analyzed distribution of TE subfamilies in total transcriptome and subtranscriptome with chimeric reads, and found that reads of retrotransposon LINE, SINE and LTR were all enriched in the subtranscriptome of chimeric reads (Fig. 4D). Unexpectedly, only LINE RNA was overrepresented in subtranscriptome with chimeric reads than in total transcriptome, and further analysis showed that virus-LINE-1 was overrepresented in virus-LINE reads (Fig. 4E). This demonstrates high efficiency of LINE family especially LINE-1 in forming chimeric transcript with SARS-CoV-2 RNA. LINE-1 is autonomous retrotransposon with retrotransposition activity, and RNA-RNA ligation mediated by endogenous RNA ligase RtcB was previously reported for LINE-1 to carry other types of RNA for host genomic invasion (36), so similar mechanisms may apply for SARS-CoV-2 transcripts. Further analysis of human genome from SARS-CoV-2 infected human cells or biopsies will be particularly important to identity existence of integration of coronavirus RNA into human genome.
Moreover, to identify which region of SARS-CoV-2 RNA prone to form chimeric transcripts with cellular RNA, we aligned total transcriptome and subtranscriptome to SARS-CoV-2 genome, and viewed on IGV to find that the front and the rear parts, especially the rear part of coronavirus RNA were biased in forming chimeric transcripts (Fig. 4F). That means the front and the rear parts of SARS-CoV-2 fragments are easier to be inserted into human genome for prolonged expression, indicating that people even positive for Nucleic Acid Test may just have infection history, and not really carry live coronavirus but only silent viral fragments. Taken together, we suggest that primers and probes for SARS-CoV-2 testing are designed in middle of the SARS-CoV-2 genome.
The model of coronavirus-retrotransposon interaction
Based on above analysis, we propose that coronavirus infection may increase retrotransposon expression through modulating TET activity to reduce global DNA methylation. Increased retrotransposon RNA may further form chimeric transcripts with coronavirus RNA, and integrate viral genomic fragments into human genome. Moreover, enforced retrotransposon expression may be harmful and probably long-term inherited (Fig. 5A).
TE is widely expressed in human tissues (Fig. 5B), with highest enrichment in early human embryos (Fig. 5C). The cells used in this study are mainly derived from human lung and also robustly express TE (Fig. 5D). Moreover, TE subfamilies are variable in different cell types (Fig. 5E-G), suggesting extensive but specific phenotype upon global retrotransposon upregulation.
The first concern regarding global retrotransposon upregulation is genome instability. Retrotransposition activity is high in early embryo (26) and brain (37) during normal development, so potential integration of coronavirus sequence into human genome is suggested to be scrutinized for these cells. It was also reported that retrotransposon upregulation is positively correlated with tumor progression (38), causing genomic deletion, translocation and duplication (39). What’s more, increased expression of retrotransposon LINE-1 contributes to age-associated inflammation in several tissues (40). Additionally, vapers and smokers demonstrated higher retrotransposon expression and hypomethylation at associated loci (41). Also, people with neurological disorders may have higher retrotransposon expression and retrotransposition activity (42). These reports not only show that upregulation of retrotransposon expression may cause several diseases, but also indicate that persons with higher basal level of retrotransposons are supposed to be more susceptible to coronavirus infection and have increased risk of symptomatic infection. In support of this, recent analysis of SARS-CoV-2 patients showed that cancer patients (43) and aged people (44) get more severe symptoms after infection. Therefore, inhibition of reverse transcriptase activity in human cells may be necessary during pharmaceutical treatment of coronavirus-infected patients, especially those with higher basal level of retrotransposons.
The second concern regarding global retrotransposon upregulation is disturbance of retrotransposon adjacent gene expression. Accumulated evidence shows that retrotransposons are not just genomic fossils, but have molecular functions. For example, physically adjacent retrotransposon activates gene promoter of TMEM156 by readthrough mechanism (Fig. 5H, Fig. S3). Also, transcripts of LINEs, SINEs and low-complexity repeats physically interacted with specific genomic areas to play distinct roles (45).
The third concern regarding global retrotransposon upregulation is whether coronavirus RNA can enter nucleus and associate with specific genomic regions through sequence homology, similar like the behavior of retrotransposon RNA (21, 45). Blast analysis in NCBI using SARS-CoV-2 genome showed no similar sequence in human genome. We further used CENSOR program (46) to analyze the SARS-CoV-2 genome and all predicted candidate repetitive elements are less than 200bp. Therefore, no evidence supports that SARS-CoV-2 RNA has the ability to recognize human genome by homologous sequence even these transcripts enter nucleus by chance.