Differential expression genes
Differentially expressed genes (DEGs) were analyzed in the screening set, and finally 3743 DEGs were identified (supplementary files.1). The heat map shows the Top20 DEGs (Fig. 1a). The volcano map shows 1861 up-regulated genes and 1882 down-regulated genes (Fig. 1b).
WGCNA analysis
The clinical information and genes were correlated, and WGCNA analysis was performed. The clustering situation of each sample was favorable, with no outlier sample (Fig. 2a), and the optimal soft threshold was determined to be 6 (Fig. 2b). The modules were classified according to the soft threshold and TOM matrix, and the number of genes in each module was not less than 50 (Fig. 2c). The similar gene modules were merged, and 8 modules were finally identified (Fig. 2d). By calculating the correlation between module genes and clinical traits, it was found that the black module containing 1078 genes had the highest positive correlation with the occurrence of DCM (r=0.85), and the red module containing 265 genes had the highest negative correlation with the occurrence of DCM (r=-0.64). Using both as core modules, finally 1343 potential core genes were identified.
Enrichment analysis
In order to explore the potential biological mechanism of DCM, enrichment analysis was performed on 1343 potential core genes. DO analysis revealed the types of diseases that may have common pathogenesis, such as bacterial infectious disease, tuberculosis and sarcoidosis (Fig. 3a). Further GO analysis showed that T cell activation, regulation of immune effector process, positive regulation of leukocyte activation and other processes were significantly enriched (Fig. 3b). In addition, KEGG also described specific pathways, such as Th1 and Th2 cell differentiation, Th17 cell differentiation, Viral protein interaction with cytokine and cytokine receptor, etc. (Fig. 3c). The above results indicate that immune-related factors may affect the occurrence of DCM. Ultimately, GSEA analysis was performed on the gene set and expression matrix, and the results showed that INTERFERON_ALPHA_RESPONSE, INTERFERON_GAMMA_RESPONSE and other pathways were significantly enriched (Fig. 3d). In summary, the strong chain of evidence indicates the important role of immunity in the pathogenesis of DCM.
Exploring of hub biomarkers
First, LASSO 10-fold cross-validation was used to further knockout redundant genes, and 38 potential genes were finally screened out (Fig. 4a). Among the above 38 genes, SVM machine learning method was used to conduct in-depth screening. The results showed that when 19 genes were included, the RMSE value was the lowest (Fig. 4b). In addition, the random forest tree method was used to rank the weights of 38 genes (Fig. 4c). At the same time, the occurrence of DCM was used as the dependent variable, and logistic analysis was performed. The results of the forest plot showed the OR value and confidence interval corresponding to each gene (Fig. 4d). Finally, the genes identified by the above algorithm were overlapped, and FRZB and EXT1 were identified as hub biomarkers (Fig. 4e).
Validation of hub biomarkers
In the screening set, ROC and difference analysis were performed on the above two genes. The results showed that the two genes had good predictive performance in the screening set: EXT1 (AUC=0.946), which was significantly high expressed in DCM samples; FRZB (AUC=0.985) was also highly expressed in DCM samples (Fig. 5a-b). In the external validation set, the expression of core genes was similar to that in the screening set, which were up-regulated in DCM tissues and also had strong diagnostic performance (EXT1, AUC=0.842; FRZB, AUC=0.954) (Fig. 5c-d). In addition, the regulatory network of the above two core genes was visualized, a TF-mRNA-miRNA network was constructed, and its potential candidate compounds targeting EXT1 and FRZB were predicted to improve the symptoms of DCM patients.
Analysis of differences in immune microenvironment
Considering the important role of immune-related pathways in the occurrence of DCM in gene enrichment analysis (Fig. 2), the CIBERSORT algorithm was used to analyze the content of immune cells in different samples. The histogram shows the overall landscape of immune cell distribution, and the results of heat map show in detail the correlation of 22 types of immune cells. The results of Wilcoxon test analysis showed the difference in the content of immune cells in DCM samples and normal myocardial tissues. In order to identify the core immune cells that change the immune microenvironment in myocardial tissue, random forest tree analysis was performed on 22 immune cells (Fig. 6a-b). Subsequently, the immune cells identified by Wilcoxon test and random forest tree were overlapped, and four core immune cells that may affect the occurrence of DCM were finally identified (Fig. 6c): Eosinophils, Macrophages M2, T cells CD4 memory resting and B cells naive. Among them, only B cells naive was up-regulated in DMC tissues, while Eosinophils, Macrophages M2, and T cells CD4 memory were down-regulated in DCM tissues (Fig. 6d).
Correlation analysis of immune cells and hub biomarkers
In the DCM tissue, the correlation analysis between 22 kinds of immune cells and 2 hub biomarkers was performed. Among them, EXT1 was negatively correlated with NK cells resting, and positively correlated with Dendritic cells resting, Mast cells resting, Eosinophils (Fig. 7A). Meanwhile, Fig. 7b specifically shows the scatter plot of the correlation between EXT1 and core immune cells Eosinophils. In addition, FRZB was positively correlated with Monocytes (Fig. 7c-d).