To investigate potential functions SNPs in APOBEC3 genes involved in COVID-19 severity, we evaluated the COVID-19 association signals around 7 APOBEB3 genes, comprising APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H, in the two COVID-19 hospitalization GWASs with European and African ancestries (HGI-B2-EUR and HGI-B2-AFR, respectively). Around these 7 APOBEC3 genes, with an arbitrary association threshold of P < 0.01, we obtained 2 SNPs from HGI-B2-AFR and 4 SNPs from HGI-B2-EUR. Of these 6 COVID-19 risk SNPs, rs12168809 (P = 0.002; OR = 1.12; 95% CI [1.04–1.2]) and rs76929059 (P = 0.004; OR = 0.82 [0.71–0.94]) are unique to AA; they are located in the intergenic and promoter region of APOBEC3A, respectively. For other 4 SNPs, including rs2076109 (P = 0.008; OR = 1.04 [1.01–1.08]), rs1807558 (P = 0.01; OR = 1.04 [1.01–1.08]), rs2244104 (P = 0.008; OR = 0.96 [0.93–0.99]), and rs13057307 (P = 0.009; OR = 1.03 [1.01–1.05]), they are unique to EA (Fig. 1A). It is important to point out that these SNPs are only nominally significant in AA or EA. In conclusion, 2 and 4 prioritized SNPs close to the APOBEC3 gene cluster were revealed nominally associated with COVID-19 hospitalization in EUR and AFR samples, respectively.
To investigative whether these SNPs are specific polymorphisms to populations with unique ancestries, we checked the minor allele frequencies of these 6 SNPs in five major populations (including African [AFR], Admixed American [AMR], East Asian [EAS], South Asian [SAS] and European [EUR]) as well as in its corresponding subpopulations (Fig. 1B). We revealed that the protective SNP rs76929059 derived from HGI-B2-AFR is only polymorphic in AFR populations (MAF = 0.07). Another SNP, rs12168809 from HGI-B2-AFR, is a risk SNPs to COVID-19 hospitalization and shows higher minor allele frequency in AFR (MAF = 0.43) compared to all other 4 major populations (mean MAF = 0.19 ± 0.057). In terms of these 4 SNPs emerged from HGI-B2-EUR, the risk SNPs rs13057307 is less frequent in AFR (MAF = 0.24) than in other 4 major populations (mean MAF = 0.45 ± 0.09); other 3 SNPs, including rs1807558, rs2076109, and rs2244104, display similar MAFs between AFR (MAFs = 0.28, 0.32, 0.23, respectively) and other 4 major populations (mean MAFs for the 3 SNPs: 0.30 ± 0.11, 0.38 ± 0.08, 0.29 ± 0.05, respectively). The latter 3 SNPs were only emerged in HGI-B2-EUR as candidate SNPs and further evaluation revealed that they were not nominally significant in HGI-B2-AFR; given the relative high MAFs for these SNPs across all populations (MAFs ≥ 0.28), if they were associated with COVID-19 hospitalization, they might be expected to show association signals close to the association P threshold P < 0.01 in HGI-B2-AFR GWAS, however, we didn’t observe this phenomenon. Therefore, based on this observation, we decided to prioritize only 3 out of 6 SNPs as top candidates, which are rs76929059, rs12168809, and rs13057307, with the first 2 SNPs are more frequent in AFR populations.
Since these 6 SNPs only showing nominal association significance with COVID-19 hospitalization, we wanted to further evaluate their potential involvement in COVID-19 hospitalization based on their potential regulatory roles on APOBEC3 gene expression. We manually collected cis-eQTL results for these 6 SNPs from GTEx and Haploreg4. Our investigation uncovered all these 6 SNPs, except rs1807558, are nominal significant eQTLs (Fig. 1C). Among the 3 prioritized candidate SNPs, including rs76929059, rs12168809, and rs13057307, the last SNP is highly associated with APOBEC3C/D/G gene expressions across multiple GTEx tissues, with subcutaneous fat and nerve fibers are two tissues where higher correlations between rs13057307 and APOBEC3 expression were observed (Fig. 1C). While rs12168809 (represented by its high LD SNP rs5757372) is the only eQTL of APOBEC3A/C in blood tissue. Another top prioritized SNP rs76929059 (represented by its high LD SNP rs113819742) is an eQTL for multiple APOBEC3 genes, including APOBEC3A (brain caudate basal), APOBEC3B (breast mammary tissue, pancreas, brain caudate basal, esophagus muscularis), APOBEC3C (colon sigmoid), and APOBEC3D (ovary, heart atrial appenda, artery coronary, skin sun exposed low part, esophagus mucosa). In terms of another 3 SNPs that show close MAFs across different major populations, rs1807558 is not an eQTL for all APOBEC3 genes, rs2076109 is an eQTL for both APOBEC3B (cell – cultured fibroblasts and thyroid) and APOBEC3F (thyroid); rs2244104 is an eQTL for APOBEC3C (muscle skeletal, cells EBV-transformed lymphocytes, lung, adipose subcutaneous, esophagus muscularis, thyroid, nerve tibial, and artery tibial), APOBEC3D (adipose subcutaneous), and APOBEC3F/G (esophagus mucosa). Taken together, our prioritized 3 SNPs are all eQTLs of APOBEC3 genes.
Finally, we evaluated APOBEC3 genes expression in 49 normal tissues by ancestry from GTEx and conducted differential expression analysis for these genes between blood tissues derived from COVID-19 patients and healthy controls. APOBEC3C/G were highly expressed in > 20 GTEx tissues (median TPM > 2), while other APOBEC3 genes were only moderately expressed in whole blood, spleen, lung, and cells culture fibroblasts (Fig. 2A). We further performed differential gene expression analysis for these 7 APOBEC3 genes among 49 GTEx tissues between European American (EA) and African American (AA), and the significance threshold after multiple testing adjustment was set at P < 0.00015 = 0.05 / (7*49). Figure 2B showed the expression profiles of seven APOBEC3 genes among five major tissues, including liver, lung, pancreas, spleen, and whole blood. Only 3 APOBEC3 genes, including APOBEC3F (in liver and pancreas), APOBEC3G (in pancreas), and APOBEC3H (in spleen) display significant differential expression. In whole blood, although all 7 APOBEC3 genes demonstrate nominally significantly (P < 0.05) differential expression between EA and AA, no ones were survived after multiple testing. Furthermore, we re-analyzed previously published data to determine the expression of APOBEC3 genes upon SARS-CoV-2 infection in whole blood derived from COVID-19 patients and healthy controls. Cluster analysis and one-way ANOVA analysis of APOBEC3 gene expression post SARS-CoV-2 infection showed APOBEC3A, APOBEC3B, APOBEC3G, and APOBEC3H were significantly upregulated in blood samples from patients with COVID-19 disease compared to healthy controls (Fig. 2C&D). Taken together, APOBEC3 genes show differential expression profiling across multiple tissues, with APOBEC3C/G ubiquitous expressed > 20 tissues, and three tissues, including lung, whole blood, and spleen demonstrate similar expression pattern for these APOBEC3 genes; APOBEC3F/G/H display differential expression between EA and AA in specific tissues and APOBEC3A/B/G/H are upregulated upon SARS-CoV-2 infection in whole blood.