3.1 Screening of CDEGs in COVID-19 and IgAN cohorts
Based on the COVID-19 dataset GSE164805, we obtained a total of 6385 differentially expressed genes (DEGs), including 3369 upregulated DEGs and 3016 downregulated DEGs (Figure 2A and B). Based on the IgAN dataset GSE93798, we obtained a total of 341 DEGs, including 100 upregulated DEGs and 241 downregulated DEGs (Figure 2C and D). Furthermore, a total of 61 CDEGs from COVID-19 and IgAN were shown in Figure 2E through Venn diagram. 61 CDEGs are shown in Supplementary Table 2.
In addition, we further investigated the expression patterns of CDEGs in COVID-19 and IgAN, and the P-values of all CDEGs were<0.05. The heatmaps and boxplot showed the expression pattern of CDEGs (Figure 3A-D).
3.2 Enrichment analysis of CDEGs
GO enrichment analysis was analyzed based on biological process (BP), cellular component (CC), and molecular function (MF). The top 10 terms in the three categories are shown in Figure 4A. The main enriched GO terms in BP included response to peptide hormone, regulation of ERK1 and ERK2 cascade, positive regulation of nitric oxide biosynthetic process, regulation of nitric oxide biosynthetic process, and regulation of nitric oxide metabolic process. CC was mainly involved in secretory granule membrane, apical part of cell, specific granule, endocytic vesicle, apical plasma membrane, and collagen-containing extracellular matrix. MF was mainly involved in anion transmembrane transporter activity, lipid transporter activity, carboxylic acid transmembrane transporter activity, organic acid transmembrane transporter activity, organic anion transmembrane transporter activity, and secondary active transmembrane transporter activity. The circle diagram of GO is shown in Figure 4B. The specific results are shown in Supplementary Table 4. KEGG enrichment results showed that Tuberculosis, Cytokine-cytokine receptor interaction, Malaria, Leishmaniasis, and Hematopoietic cell lineage signaling pathway (Figure 4C). The KEGG plot revealed that IL-1B may be a key gene involved in COVID-19 and IgAN crosstalk, with 6 pathways involved, as shown in Figure 4D. The specific results are shown in Supplementary Table 5.
3.3 Screening and validation of diagnostic markers using machine learning models
Based on the CDEGs, we constructed two machine learning models, including Lasso regression (Lasso) and SVM recursive feature estimation (SVM-RFE) model, to discover cluster specific genes with diagnostic significance. Therefore, 61 CDEGs were used in the IgAN (Figure 5A) and COVID-19 (Figure 5B) datasets to fit the LASSO regression model, and the model was trained using 10-fold cross-validation. According to lambda 1se, 0.005823898 for IgAN and 0.204766 for COVID-19 was determined as an appropriate λ value. The SVM-RFE model screened 26 candidate genes from the IgAN dataset (Figure 5C) and 35 candidate genes from the COVID-19 dataset (Figure 5D). The two
algorithms identified DNA topoisomerase 2-alpha (TOP2A) as genes with overlap (Figure 5E).
In the COVID-19 and IgAN cohorts, the deviation correction lines in the calibration plot are close to the ideal curve, indicating good consistency of the prediction model (Figure 6A and B). Finally, we constructed two column charts based on COVID-19 and IgAN risk scores, providing clinical doctors with quantitative methods for predicting disease risk (Figure 6C and D).
Then, we presented differential gene expression patterns of candidate biomarker TOP2A in COVID-19 (GSE164805 and test set: GSE171110) and IgAN dataset (GSE93798 and test set: GSE35487). Compared with the control group, both COVID-19 and IgAN, TOP2A were upregulated (Figure 7A and B). Finally, we used ROC curves to validate the diagnostic efficacy of TOP2A in the COVID-19 and IgAN datasets, both demonstrating strong disease recognition performance (Figure 7C and D).
3.4 Constructing PPI network and identifacting hub CDEGs
Based on the results of the STRING database, we constructed a PPI network to study the potential interactions of CDEGs. A network of 61 nodes and 82 edges was obtained (Figure 8A). A total of 15 hub genes (CCL4, EGR1, CD36, IL10RA, CYBB, KLF4, IL1B, LYZ, TGFBI, UCP2, FCGR3B, AZGP1, ATF3, CD44, and HBEGF) (Supplementary Table 5) were screened out (Figure 8B). Subsequently, we further constructed a gene network using GeneMANIA. The gene functional annotation of these hub genes shows that they are all related to viral infection responses, mainly involved in cell chemotaxis, regulation of pri-miRNA transcription by RNA polymerase II, pri-miRNA transcription by RNA polymerase II, antigen processing and presentation of peptide antigen via MHC class I, leukocyte chemotaxis, and negative regulation of viral transcription (Figure 8C).
3.5 Identification of ImmuneCDEGs and constructing PPI network
The course of COVID-19 and IgAN diseases is closely related to immune activation. Therefore, we further investigated the correlation between immune related genes and selected genes that overlap with CDEGs as ImmuneCDEGs. A total of 14 ImmuneCDEGs were obtained, including 2 upregulated genes (CCL8 and CYBB) and 4 downregulated genes (APOM, AZGP1, NR4A3, and RBP4) with consistent expression trends in both diseases (Figure 9A-D). Further import these ImmuneCDEGs into GeneMANIA to construct a gene network (Figure 9E). The gene mainly involved in response to interleukin-1, cell chemotaxis, myeloid leukocyte migration, leukocyte migration, leukocyte chemotaxis, and response to chemokine.
3.6 Construction and validation of hub Genes-TFs network
We imported 15 hub genes into the TRRUST database and found that 12 TFs may regulate 11 hub genes (Figure 10A). Further, we validated the expression patterns of TFs in COVID-19 and IgAN. Among them, there are 8 transcription factors regulating IL1B. We found that HDAC1, IRF8, NFKB1, SP1, SPI1, TCF4, and TP53 were significantly upregulated in IgAN, while JUN was significantly downregulated in IgAN (Figure 10B). PPARG and TCF4 were significantly upregulated in COVID-19, while IRF8, NFKB1, RELA, and YY1 were significantly downregulated in COVID-19 (Figure 10C).
3.7 Relationship between immune cell infiltration and CDEGs subtypes in IgAN
We further explored the immune landscape in COVID-19 related IgAN and used the CIBERSORT algorithm to calculate the percentage of 22 immune cells in each sample between the IgAN group and the control group. Our research results indicate significant differences in immune cell infiltration between the two groups for the four types of immune cells. Plasma cells and M2 macrophages were more abundant in the IgAN group, while NK cells resting and Neutrophils were more abundant in the control group (Figure 11A and B). Further analysis also revealed various correlations between infiltrating immune cells and between CDEGs and infiltrating immune cells (Figure 11C and D). Specifically, there is a significant negative correlation between Plasma cells and CCL8. There is a significant positive correlation between M2 macrophages and APOD, TGFBI. There is a significant negative correlation between M2 macrophages and CCL4, IL1B, BHLHE40, SH3GL2. There is a significant positive correlation between Neutrophils and CYBB, FCGR3B, UCP2.
3.8 Clusters of CDEGs of IgAN and COVID-19 samples and analysis between CDEGs clusters
Based on the expression of CDEGs, we identified the subtypes of COVID-19 and IgAN through consistency clustering analysis. The CDEGs subtype of IgAN divides IgAN patients with GSE93798 into C1 and C2 (Figure 12A-D). The CDEGs subtype of COVID-19 divides COVID-19 patients in GSE164805 into C1 and C2 (Figure 12E-H).
Further investigating the diversity of immune features between different CDEGs subtypes of IgAN at the immune cell level, the CIBERSORT algorithm was used to calculate the percentage of 22 immune cells in each sample between C1 and C2 clusters. The C1 of IgAN exhibited immune infiltration of M2 macrophages and Mast cells resting. The C2 of IgAN exhibited immune infiltration of Mast cells activated and M1 macrophages (Figure 13A and B).
3.9 Candidate drug prediction of COVID-19-related IgAN
We submitted 15 hub genes to the BATMAN2.0 database to predict effective traditional Chinese medicines (TCM) herbs and ingredients for treating COVID-19 related IgAN. The analysis showed that GUI ZHI, DU ZHONG, DAN SHEN, GUANG ZAO, DA ZAO, TU FU LING, SHAN ZHU YU, REN SHEN, SHA JI, and SANG YE were the top 10 candidate TCM herbs. The specific results are shown in Supplementary Table 6. Punicalagin, Paeoniflorin, myricetin, Atorvastatin, arachidonic acid, cannabidiol, Withaferin A, emodin, resveratrol, and curcumin were the top 10 candidate TCM ingredients. The specific results are shown in Supplementary Table 7. We will present the candidate TCM herbs and ingredients according to the Enrichment ratio, as shown in the (Figure 14A and B). Finally, we constructed herb-gene and ingredients-gene networks to demonstrate the interaction between herb and ingredients and genes (Figure 14C and D).