Characterization of the regulatory genomic landscape in nasal mucosa by WGBS
Nasal samples from 61 subjects positive for the alpha or beta variants of SARS-CoV-2 presenting to a single center at the time of symptomatic presentation concerning for COVID-19 (4 severe and 57 mild) from April 8, 2020, through June 8, 2020 were included in the study. Demographic features of hospitalized (defined as severe; n = 4) versus non-hospitalized (defined as mild; n = 57) patients are summarized in Table 1. Of note, all hospitalized subjects required intensive care unit admission. Three of the four hospitalized subjects required supplemental oxygen support with two of the four hospitalized subjects requiring intubation and mechanical ventilation.
Table 1
Demographic features of hospitalized versus non-hospitalized COVID-19 positive subjects.
Item
|
Hospitalized (n = 4)
|
Non-hospitalized (n = 57)
|
aAge (years)
|
60 (50, 66.75)
|
37 (27, 51)
|
bGender
Female
Male
|
1 (25)
3 (75)
|
25 (44)
32 (56)
|
bRace
Black
White
Other
|
1 (25)
0 (0)
3 (75)
|
27 (47)
13 (23)
17 (30)
|
a Presented as median (IQR)
b Presented as n (%)
|
WGBS data was generated at high depth, identifying on average 13.2 million CpGs per sample each at > 10X coverage. Hierarchical clustering was performed on the top 25th percentile most variably methylated regions showing no clustering structure for confounders such as self-reported race and gender (Supplemental Fig. 1). Using the combined WGBS datasets of severe and mild COVID-19 cases, we characterized active regulatory regions in nasal mucosa. Specifically, we performed methylation segmentations to extract unmethylated regions (UMR) and low methylated regions (LMR) which are known to correlate with promoter- and enhancer-like elements, respectively [25]. We identified 19,187 UMRs (average 2,366 bp) with methylation across the regions being < 5% and containing on average 117 CpGs per region. LMRs were, as expected, more abundant identifying 43,924 regions with an intermediate methylation status (5–50%) and more CpG-sparse (8 CpGs per region, average 642 bp). We annotated these regions based on publicly available reference maps of regulatory DNA based on DNase I hypersensitive sites (DHSs) across 16 different cell types [32] and found 99% and 98% of UMRs and LMRs, respectively, overlapped a DHS. Of these annotated regions, the vast majority (98%) of the UMRs overlapped with a DHS detected in multiple cell types. In contrast, LMRs were shown to represent to a larger extent cell-specific regulatory DNA with 27% of LMRs (N = 11,976) overlapping a DHS unique to a specific cell type. Of these 11,976 cell-specific LMRs from our aggregated COVID-19 positive samples (severe and mild disease), we noted 11% and 20% being lymphoid and myeloid regulatory elements, respectively (Supplemental Table 1).
To better characterize the cell types in which differential methylation may be associated with COVID-19 severity, we examined the UMRs and LMRs of severe and mild WGBS datasets separately, as well as WGBS data from nasal samples derived from non-infected individuals. While there were not substantial differences in the cell-type proportions of identified DHSs between severe and mild cases of COVID-19 (Supplemental Table 2), there were notable differences between COVID-19 positive (combined severe and mild cases) and COVID-19 negative individuals, particularly in the case of LMRs (Fig. 1, Supplemental Table 1). Most notably, when comparing COVID-19 positive versus negative individuals, we found that 46% compared to 25% of LMRs overlapped with immune cell regulatory elements (X2 = 5189.6, df = 1, p < 2.2 x 10–16). More specifically, in COVID-19 positive compared to COVID-19 negative individuals, 24% versus 15% of LMRs overlapped with lymphoid cell regulatory elements (X2 = 1264.0, df = 1, p < 2.2 x 10–16), and 28% versus 12% of LMRs overlapped with myeloid cell regulatory elements (X2 = 4557.1, df = 1, p < 2.2 x 10–16). While the same trend was seen for UMRs, the magnitude of effect was strikingly stronger when contrasting LMRs in COVID-19 positive versus COVID-19 negative individuals (Fig. 1).
When narrowing our focus to only those UMRs and LMRs appreciated for a single cell type, we noted similar findings. Our comparisons of cell-type specific UMRs and LMRs were relatively similar between those with severe and mild disease (Supplemental Table 2). However, when comparing the cell-specific UMRs and LMRs in SARS-CoV-2 positive as compared to negative individuals, substantial differences were appreciated in LMR distribution. Specifically, when examining cell-specific UMRs in COVID-19 positive versus negative individuals, 8% and 4% of cell-specific UMRs overlapped with immune regulatory elements (p = 0.064) and 0.6% and 3% overlapped with epithelial regulatory elements (p = 0.032) (Supplemental Table 1). Regarding cell-specific LMRs in COVID-19 positive as compared to negative individuals, 31% and 8% of cell-specific LMRs overlapped immune regulatory elements (p = 0.0037), 11% and 4% overlapped with lymphoid regulatory elements (p = 5.6 x 10–16), 20% and 3% overlapped with myeloid regulatory elements (p < 2.2 x 10–16), 2% and 3% overlapped with pulmonary elements (p < 2.2 x 10–16), and 3% and 11% overlapped with epithelial regulatory elements (p < 2.2 x 10–16) (Supplemental Table 1).
In all, these results point towards activation of immune cells in nasal mucosa after SARS-CoV-2 infection and indicate that WGBS can capture methylation signatures specific to these cell types. This suggests that our WGBS analysis of the nasal methylome can be used to infer differential regulation of immune-mediated pathways in the setting of severe as compared to mild SARS-CoV-2 infection.
Preponderance of hypomethylated regions in severe COVID-19 patients
To identify DMRs between individuals suffering from severe COVID-19 (i.e. requiring inpatient admission) versus individuals experiencing mild COVID-19 (i.e. remaining outpatient) we performed tiling window analysis using the WGBS data sets and logistic regression models with the self-reported measures of age, race, and gender included as covariates. To evaluate meaningful methylation differences, the top 10,000 DMRs as ranked by q-value (q-value < 1.20 x 10–15) were selected for further analysis. These DMRs demonstrated a preponderance of hypomethylated regions (n = 7,256) as compared to hypermethylated regions (n = 2,744) in hospitalized versus non-hospitalized subjects (Fig. 2A). Among these regions, 3,599 (49.6%) hypomethylated DMRs fell in intergenic regions as compared to hypermethylated DMRs where only 378 (13.8%) mapped to similar regions (Χ2 = 1,064.4, df = 1, p < 2.2 x 10–16). We then queried genes associated with hypo- and hypermethylated DMRs using Genomic Regions Enrichment of Annotations Tool (GREAT) algorithm [33, 34]. This yielded associations with 487 genes hypomethylated regions and 503 genes in hypermethylated regions (Supplemental Tables 3 and 4). This similarity in gene counts associated with hypo- versus hypermethylated regions despite the substantial difference in DMR suggests interdependency of regulatory elments involved in gene activation upon SARS-CoV-2 infection. Of note, this state of global hypomethylation has been previously appreciated in response to other viral infections [35].
To discern the relevant biologic pathways associated with these differentially methylated genes (DMGs) we performed pathway enrichment analysis. Enrichment of hypomethylated and hypermethylated DMGs in severe versus mild COVID-19 patients were carried out separately. Pathway enrichment analysis using Coronascape [36] was performed on 487 relatively hypomethylated genes with a p-value threshold of ≤ 0.01 (Log10P ≤-2) resulting in 353 Gene Ontology (GO) processes [37, 38], 28 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [39], 21 Reactome Gene Sets [40], 10 cannonical pathways [41], and 2 CORUM complexes [42] (total 414 enriched pathways) previously associated with SARS-CoV-2 infection (Supplemental Table 5). Of these, associations with p ≤ 10− 3 were manually evaluated (n = 124) and results associated with immune response were extracted (n = 17) (Fig. 2B, C). Themes emerging from these relatively hypomethylated immune-related pathways in severe compared to mild COVID-19 cases included processes related to Th1, Th2 and Th17 cell differentiation, regulation of T lymphocytes and hematopoietic cell lines, regulation of the Notch signaling pathway, cytokine production and chemotaxis, and regulation of macrophages and phagocytosis (Fig. 2B). In the similar analysis of relatively hypermethylated DMGs in severe versus mild COVID-19 cases, 423 GO Biological processes [37, 38], 40 KEGG pathways [39], 57 Reactome Gene Sets [40], and 12 cannonical pathways [41] (total 532 enriched pathways) met the p-value threshold of ≤ 0.01 (LogP ≤-2), while 181 had an associated p-value of ≤ 10− 3 (Supplemental Table 6). Of these 181 enriched pathways, 36 were identified as being associated with the immune response (Fig. 2C). Emerging themes of these enriched immune-related pathways associated with hypermethylated genes in severe versus mild COVID-19 patients included processes associated with cell activation, proliferation, and differentiation including hematopoietic cells from myeloid (e.g., neutrophils) and lymphoid (e.g., T cells) populations, leukocyte migration and adhesion, neutrophil degranulation, adaptive and leukocyte mediated immunity, cytokine signaling, and host stress responses (Fig. 2C). These cumulative findings support previous reports of aberrancies in immune response in the face of mild/moderate as opposed to severe COVID-19 [43–45].
Differentially methylated genes in severe COVID-19 overlap genetic loci
To add validity to our identified differentially methylated gene list in individuals with severe as compared to mild SARS-CoV-2 infection, we used data from a recent GWAS meta-analysis published by the COVID-19 Host Genetics Initiative [46] reporting 23 genetic loci associated with SARS-CoV-2 infection susceptibility and/or COVID-19 disease severity. Within our dataset, we identified differential methylation of eight genes that overlapped with those implicated in this previously published meta-analysis (Table 2). Within our dataset, we found that relative hypermethylation of ABO was present in severely as compared to mildly affected individuals. In severe as compared to mild cases, we noted relative hypomethylation of FDPS, CEP97, HLA-DPA1, OBP2B, MUC5B, PPP1R15A, and NAPSA.
Table 2
DMGs overlapping with known loci associated with COVID-19 risk.
Gene
|
Chromosome
|
Start
|
Stop
|
Methylation Difference (%)
|
q-value
|
FDPS
|
chr1
|
155306501
|
155307000
|
-12.73
|
3.32 x 10–19
|
CEP97
|
chr3
|
101727251
|
101727750
|
-19.01
|
4.03 x 10–16
|
HLA-DPA1
|
chr6
|
33080751
|
33081250
|
-20.08
|
1.07 x 10–16
|
OBP2B
|
chr9
|
133211251
|
133211750
|
-15.64
|
8.28 x 10–16
|
ABO
|
chr9
|
133275751
|
133276250
|
13.43
|
2.14 x 10–29
|
MUC5B
|
chr11
|
1225751
|
1226250
|
-23.41
|
1.11 x 10–16
|
PPP1R15A
|
chr19
|
48871251
|
48871750
|
-12.33
|
7.39 x 10–23
|
NAPSA
|
chr19
|
50369501
|
50370000
|
-35.15
|
3.38 x 10–43
|
Differential methylation of genes within the PI3K/Akt pathway and COVID-19 severity
Upon closer manual evaluation of genes implicated in various enriched GO terms, KEGG pathways, and Reactome gene sets, we noted differential methylation between severe and mild COVID-19 individuals in many genes involved in the PI3K/Akt pathway (Fig. 3). Specifically, in the severe cohort compared to those with mild COVID-19, we noted relative hypomethylation of promoters whose genes have interplay with the PI3K/Akt pathway and SARS-CoV-2 infection, including genes within the NF-κB (e.g., NFKBIA) and Notch (e.g., NOTCH1) signaling pathways (Fig. 3A). Genes mapping to the Notch signaling pathway were similarly appreciated as a significantly hypomethylated GO term in our enrichment analysis of severe as compared to mild COVID-19 patients (Fig. 2B).
Comparing severe versus mild SARS-CoV-2 infection, DMGs included relative hypomethylation of the AKT1 promoter (chr14: 104796001–104796500, methylation difference = -18.38%, q = 2.22 x 10–22) and of its downstream target, the type I interferon signaling molecule, ISG15 (chr1: 1013751–1014250, methylation difference = -32.27%, q = 2.49 x 10–35) (Fig. 3B, C). Further support implicating the PI3K/Akt pathway in the severe COVID-19 phenotype are the relative hypomethylation of ZEB2 (chr2: 144524251–144524750, methylation difference = -24.39%, q = 2.52 x 10–32) and SNAI1(chr20: 49983251–49983750, methylation difference = -28.98%, q = 6.11 x 10–16) promoters in severe versus mild disease. Coordinates and methylation difference of DMRs in genes associated with the PI3k/Akt pathway from our dataset are summarized in Supplemental Table 7.
In order to validate our findings indicating relative upregulation of the PI3K/Akt pathway in severe as compared to mild cases, we utilized the publicly available single cell RNA sequencing (scRNA-seq) data from Chua et al. accessed through the UCSC Cell Browser (https://covid-airways.cells.ucsc.edu) [45, 47]. In the dataset of Chua et al. we were able to examine scRNA-seq data derived from nasopharyngeal samples of individuals with critical cases of COVID-19, moderate cases, and control subjects (Fig. 4A). These data similarly showed greater expression of AKT1, TLR5, and MARK4 in secretory and ciliated cells of individuals with moderate and severe COVID-19 as compared to healthy controls. ZEB2, NOTCH1, and ITGB2, showed increased expression in neutrophils and non-resident macrophage populations of individuals with severe COVID-19. IRF7, NFKBIA, and ISG15 showed upregulation in secretory, ciliated, and squamous cells, as well as in cytotoxic T lymphocytes, T regulatory cells, non-resident macrophages, and neutrophils of severe/moderate cases compared to healthy controls (Fig. 4, Supplemental Fig. 2). These comparative data further strengthen our findings suggesting differential immune cell expression of PI3K/Akt-related genes in the nasopharynx of individuals with severe COVID-19 disease.
Hypermethylation of immune cell surface markers in severe cases of COVID-19
Differential expression of various immune cell surface makers have been noted in the setting of severe SARS-CoV-2 infections, including CD15 (the gene product of FUT4) and CD8 [8, 43, 48–55]. In examination of the most significant differentially methylated bins in severe versus mild COVID-19 cases, we found the FUT4 promoter to be relatively hypermethylated over a long region in hospitalized as compared to non-hospitalized subjects (chr11: 94545001–94552500, q = 2.90 x 10–112) (Fig. 5A). To further evaluate this finding, we visualized this region using the UCSC Genome Browser [56] comparing severe to mildly infected individuals, in addition to pooled samples from healthy controls (n = 7) and found that this relatively hypermethylated state of the FUT4 promoter among severe COVID-19 patients persisted across comparison groups (Fig. 5A). CD15, the gene product of FUT4, is predominately expressed in myeloid cells, as is shown in Fig. 5B, generated using the dataset of Monaco et al. [57] and the Human Protein Atlas (proteinatlas.org) [58]. We additionally noted significant hypermethylation of the CD8A promoter (chr2: 86809001–86809750, q = 1.50 x 10–21). Cumulatively, these data provide evidence that myeloid cell dysfunction is involved in COVID-19 severity, and that differential regulation of cell surface genes (e.g., FUT4, CD8A) potentially play a contributory roles in disease pathogenesis.
Enrichment of ELF4 and ELF5 transcriptional motifs in hypermethylated regions
Table 3: Most significantly enriched transcription factor binding motifs of hypomethylated regions in hospitalized versus non-hospitalized COVID-19 patients.
aFold change represented as % Targeted Sequences/% Background sequences
To better understand the regulatory pathways involved in COVID-19 pathogenesis and severity, we examined transcription factor binding sites among DMRs in severe versus mild COVID-19 samples. The top 25,000 hypo- and hypermethylated bins as determined by q-value between severe and mildly infected individuals were evaluated separately using HOMER motif analysis (Table 3, 4) [59]. Evaluation of hypermethylated regions revealed targets of ELF4 (p = 1 x 10–59, target sequences with motif = 11.34%, background sequences with motif = 8.33%) and ELF5 (p = 1 x 10–54, target sequences with motif = 8.84%, background sequences with motif = 6.31%) as among the most significantly enriched motifs (Table 4).
Table 4: Most significantly enriched transcription factor binding motifs of hypermethylated regions in hospitalized versus non-hospitalized COVID-19 patients.
aFold change represented as % Targeted Sequences/% Background sequences
Among these relatively hypermethylated DMRs in hospitalized as compared to non-hospitalized individuals, 2,829 were found to be targets of ELF4. More specifically, when annotating these regions based on DHSs [32], we found that 11.2% (n = 317) of these ELF4 targets overlapped signatures of myeloid cells as compared to non-ELF4 targets, of which 8.6% (n = 1,906) overlapped with myeloid signatures (X2 = 20.75, df = 1, p = 5.23 x 10− 6). Regarding lymphoid signatures, we found that the percentage of ELF4 and non-ELF4 targets overlapping with lymphoid cell DHSs were roughly equivalent, 9.9% (n = 280) and 9.1% (n = 2,024), respectively (X2 = 1.68, df = 1, p = 0.19). These findings are in keeping with the known preferential upregulation of ELF4 within myeloid cell lines (Supplemental Fig. 3) as well as its known role host antiviral response [60]. The distribution of ELF4 transcription factor binding motifs across cell types is summarized in Supplemental Fig. 4).
We identified 2,199 relatively hypermethylated DMRs as targets of ELF5 in severe as compared to mild COVID-19 individuals. Additionally, we noted relative hypermethylation (i.e., downregulation) of the ELF5 promoter in COVID-19 positive individuals (severe or mild) as compared to COVID-19 negative (n = 7) patients (Fig. 6). Recently, Pietzner et al. [61]suggested several genes to be potentially regulated or co-expressed with ELF5, of which 23 were also identified among the within our dataset as being relatively hypermethylated targets of ELF5 in individuals with severe as compared to mild COVID-19 (Supplemental Table 8). Notably, we found that C1orf116 (chr1: 207031251–207031750, q = 7.85 x 10− 7), PLAC8 (chr4: 83128251–83128750, q = 4.43 x 10− 10), and IFRD1 (chr7: 112421501–112422000, q = 8.15 x 10− 9) were among these relatively hypermethylated ELF5 targets in severe as compared to mild COVID-19, all of which have been implicated in the host response to SARS-CoV-2 infection [62–64].