Increasing the morbidity and mortality of ICM has become the main health concern. Therefore, predicting and diagnosing ICM can increase the early intervention chance. However, no previous study has reported the diagnostic biomarkers in ICM. As a result, it is essential to construct the model for early diagnosis in order to identify representative biomarkers for ICM, and maybe it can predict the ICM morbidity before the development of ICM. Thus, we can intervene before the development of ICM. Recently, machine learning technologies have been greatly improved, and gene expression profiles are available from public databases, which can also provide the novel options for diagnosing and predicting ICM.
The present study gathered microarray expression profiling datasets from GEO database(GSE1869, GSE5406 and GSE42955).Meanwhile, 18 DEGs between ICM and normal samples were detected. Furthermore, GO enrichment analysis (BP, CC, MF) and KEGG analysis were conducted. Meanwhile, GSEA was performed, and the top ten signaling pathways were presented (Fig. 3).
Lasso represents the regression method using regularization to improve prediction ability[19]. RF refers to a reliable feature selection approach which can determine the optimal variables by the removal of the feature vectors yielded by RF[20]. Thus, we conducted Lasso regression on 18 DEGs. As a result,11 key genes between ICM and normal samples were obtained. Additionally, RF screening was also carried out on 18 DEGs, which detected 10 hub genes. Afterwards, genes identified from Lasso regression analysis and RF were intersected by venin, and six hub genes were eventually obtained. Then, we employed a characteristic gene-based ANN model to calculate predictive weights for six hub genes. Then, a diagnostic model for ICM was established by ANN. Additionally, we compared the established diagnostic model regarding predictive accuracy in the training and validation datasets with the use of the AUC of ROC curves. According to the obtained findings, the established model exhibited high diagnostic power. Moreover, the present study is the first attempt to construct a diagnostic model for ICM.
At first, our research identified 6 hub genes through bioinformatic analysis of 130 ICM and 42 normal samples in the GEO database. Apparently, the expression levels of COL1A1 genes in the ICM group are notably higher when compared with those in the normal group(log2FC > 1.5). Other five(FCN3, GLUL, MYOT, SERPINA3, SLC38A2) genes are significantly lower expressed in ICM group than normal group (log2FC<1.5)(Fig. 6A). COLIA1 is mapped to the long arm of chromosome 17 (17q21.33). More than 400 mutations related to human disease are identified in COL1A1 gene, and many of them are associated with osteoporosis[21]. In addition, COLIA1 is highly denoted in diverse cancers and controls a variety of cellular processes, containing cell proliferation, metastasis, apoptosis, and cisplatin resistance. COLIA1 is also correlated with cancer progression and prognosis; besides, elevated COLIA1 expression relates to poor prognosis in cancer patients[22].
Although COL1A1 and other five DEGs were correlated with various disease, no previous study reported the correlation between ICM and six DEGs. Thus, this is the new finding concerning this notion. This is the novel gene-based model to diagnose ICM. Furthermore, we presented violion chart (Fig. 6A), predicting the expression level of six hub DGEs in the ICM and normal subjects. It can be observed that all of the six genes are significantly different in ICM and normal subjects. In addition, it explains that every gene independently has the diagnostic value of ICM. Although we find the change of six hub gene levels within ICM, the causal relationship between differential expression of six hub genes in ICM and its mechanism remains unclear. Thus, we need to clarify whether changing six gene expression induced ICM or six gene levels resulted from self-protection reaction after the development of ICM. Moreover, we need to further investigate the mechanism of the existing relationship. ICM can be resulted by various risk factors including hypertention, diabetes, drinking alcohol and smoking.
Furthermore, immune abundance related analysis of the six hub genes was conducted, the plot was drawn (Fig. 7,A), and immune process correlation analysis was performed with six hub genes(Fig. 7,B). A heatmap that described the correlation between 6 hub genes and expression levels of chemotactic factors was developed (Fig. 7C). At the same time, a heatmap describing the relationship between six hub genes and expression levels of HLA family members was also plotted (Fig. 7D).
However, this study still has the following limitations. First, three datasets with small sample size were combined into training set. As a result, this diagnostic model requires to be re-established one independent datasets with large sample size. Secondly, our ICM prediction model was constructed on the basis of datasets from the GEO database. Furthermore, we need to perform further in vitro and in vivo experiments to practice and confirm the diagmostic model.
To sum up, six genetic biomarkers closely related to ICM were screened by bioinformatic analysis, and used to construct the highly-efficient ICM diagnosismodel. In addition, this study lays the foundation for early diagnosis of ICM, which also provides reliable biomarkers for predicting ICM. However, further researches on the molecular mechanisms where six genetic biomarkers are involved are still required with the purpose of confirming the roles of these genes in ICM.