With the rapid development of computer technology, medical research has rapidly shifted from traditional empirical medicine to evidence-based medicine and precision medicine in recent decades, and machine learning has been widely applied in the medical field. Machine learning is a branch of artificial intelligence that analyzes and processes large amounts of data, discovers patterns and patterns in the data, and makes predictions and decisions based on these patterns and patterns[16, 17]. Therefore, in this study on the prediction model of diabetes and kidney disease based on the American population, we used three machine learning algorithms, Lasso regression, two-way stepwise iterative regression and random forest, to screen the risk factors of DKD, build a risk prediction model, test its accuracy, applicability and clinical application value, and provide new insights and reliable data support for the prevention and treatment of DKD. We found that age (Age), glycosylated hemoglobin (Hba1c), serum albumin (ALB), serum creatinine (Scr) and total serum protein (TP) were five variables that could well predict whether diabetes nephropathy occurred in the population. These five indicators are significantly independent of the occurrence of diabetes nephropathy. In the training data set, 899 subjects out of 3066 participants have diabetes kidney disease, representing about 29 million diabetes patients in the United States[18]. We constructed a column chart model to predict DKD. In addition, this study evaluated the performance of the model using C-index, Brier score, calibration curve, and DCA curve. The results show that the prediction model performs well in the training set and verification set, and has a good prediction ability for the occurrence of diabetes nephropathy.
Machine learning can be divided into different types, such as supervised learning, unsupervised learning, and reinforcement learning, and is widely used in fields such as image recognition, natural language processing, and data mining[19]. Predictive models are widely used in various fields, such as finance, healthcare, marketing, etc., to help people make more accurate predictions and decisions[20, 21]. In the field of healthcare, researchers extensively utilize a large amount of clinical data for exploration, develop numerous predictive models, and apply them to clinical practice. For example, Alanazi HO et al. used NHANES data set to build different time frames and data sets, established multiple machine learning models based on diabetes and cardiovascular diseases, and evaluated their classification performance. At the same time, they developed an integrated model for two diseases, obtained high AUC scores and found related risk factors[22]. In a national health survey study based on the Chinese population in 2023, Lasso regression was used to screen and determine important predictive factors for hypertension. Subsequently, logistic regression was used to develop a hypertension risk classification diagnostic model, which was visualized using a column chart. At the same time, a website was developed to calculate the exact probability of hypertension, in order to facilitate personalized prediction of hypertension[23]. JoonNyung Heo et al. developed three machine learning models for a retrospective study to predict long-term outcomes in patients with ischemic stroke, and evaluated the accuracy of the models through ASTRAL scores[24]. The results indicate that machine learning models can accurately predict long-term outcomes for acute stroke patients, especially deep neural network models based on the characteristics of multi-layer complex networks. The area under the curve is significantly higher than the ASTRAL score (0.888 vs 0.839; P < 0.001), and performs better than other models.
Diabetes nephropathy is an important complication of diabetes, which involves the whole kidney and is related to many factors such as blood sugar level, blood pressure control, glycosylated hemoglobin level and so on in diabetes patients. The clinical diagnosis of DKD is generally based on the sustained increase in UACR and/or decrease in glomerular filtration rate (eGFR) after excluding other chronic kidney diseases[25]. Its treatment mainly includes measures such as lowering blood sugar, lowering blood pressure, and reducing proteinuria, but these treatments can only slow down the progression of the disease and cannot cure it. Therefore, early prediction of its development is the key to timely intervention and management. Over the years, extensive research has been conducted to identify and understand predictors of diabetes and kidney disease. For example, in a cohort study, a logistic regression prediction model for two-year all-cause mortality of elderly patients with diabetes and kidney disease was constructed and internally validated. ROC curve, Kolmogorov Smirnov (KS) and calibration curve were used to evaluate the prediction performance of the model[26]. However, extracting data from the MIMIC-III database in this study may lack some important variables and have a small sample size. Therefore, this study increased the sample size, included more variables and gave different evaluation indicators in order to better predict and evaluate diabetes nephropathy. A cross-sectional study by Islam MR et al. found that the occurrence of DKD is associated with higher levels of Scr in patients, indicating decreased renal function[27]. The results of this study are consistent, indicating that Scr levels are one of the important indicators for predicting the occurrence of DKD. In addition, studies have shown that the synthesis of ALB in the liver of DKD patients is significantly reduced, and the decrease in ALB levels further weakens the condition of DKD patients[28]. Other studies have also found that a decrease in ALB gene transcription levels may lead to severe ALB metabolic disorders in DKD patients[29, 30]. This study also found that ALB levels are a related factor affecting DKD, which suggests that special attention should be paid to the expression levels of Scr and ALB in DKD patients in clinical practice. A study based on the Chinese population applied logistic regression to construct a DKD prediction model, using C-index, curve analysis, forest plot, net weight classification improvement, and internal validation to evaluate the model. Its advantage lies in the development of a new column chart with relatively moderate accuracy, which helps clinical doctors understand the risk of DKD in T2DM patients[31]. Yu Dahai et al. developed a new risk score based on routine clinical measurements in 2019, quantifying the individual risk of cardiovascular death and conducting external validation. A prospective cohort study was conducted to accurately predict the absolute risk of 2-year cardiovascular death in Chinese patients with DKD[32].