Background
There are no obvious clinical symptoms in the early stages of Alzheimer's disease (AD). Therefore, the diagnosis of AD directly leads to serious lag. Studies have shown that most patients usually have mild cognitive impairment (MCI) before diagnosis. Therefore, the actual time of diagnosis of AD is much later than the time of onset. This brings great difficulties to the late treatment and management of patients. Therefore, early diagnosis of AD is very important. This paper mainly discusses the blood biomarkers of AD patients and uses machine learning methods to find the changes of blood transcriptome during the development of AD, and to search for potential blood biomarkers.
Method
Individualized blood mRNA expression data were downloaded from the GEO database in 711 patients, including control group (CON) (238 patients), MCI (189 patients), and AD (284 patients). Firstly, we analyzed the subcellular localization, protein types and enrichment pathways of the differentially expressed mRNAs in each group, and established an artificial intelligence individualized diagnostic model. Furthermore, Xcell tool was used to analyze the blood mRNA expression data to obtain the composition and quantitative data of blood cells. Ratio characteristics were established for mRNA and Xcell data respectively. Feature engineering operations such as collinearity and importance analysis are performed on all features to obtain the best feature solicitation. Finally, four machine learning algorithms, including linear support vector machine (SVM), Adaboost, random forest and artificial neural network, were used to model the optimal feature combinations and evaluate their classification performance in the test set.
Result
A total of 5625 differential mRNAs were obtained by differential analysis of blood mRNAs. Through feature engineering screening, the best feature collection was obtained, and the artificial intelligence individualized diagnosis model established based on this method achieved a classification accuracy of 91.59% in the test set. The AUC of CON, MCI and AD were 0.9746, 0.9536 and 0.9807, respectively.
Conclusion
The 181 features are composed of four dimensions, which can accurately classify CON, MCI and AD groups, suggesting that machine learning methods can capture changes in blood biomarkers in Alzheimer's patients. The results of cell homeostasis analysis suggested that the homeostasis of NTK cells might be related to AD, and the homeostasis of GMP might be one of the reasons for AD.