Machine learning can effectively learn the characteristics of a large number of data, which provides new research ideas and methods for accurate prediction. Machine learning algorithms include conventional algorithms (K-nearest neighbor, decision tree, support vector machine, etc.) and integrated algorithms (random forest, XGBoost, limit tree, etc.). In this study, Logistic regression, support vector machine, decision tree and random forest algorithm were used to construct the damage prediction model of children's HSPN. Through the comparison of the precision rate, accuracy rate, recall rate and F1 value of each model, we can see that the random forest model has a better effect, with values of 0.83, 0.87, 0.86 and 0.85 respectively, and its stability is better than the other three models.
The ROC curve of the four models was drawn for comparison, and it was found that the AUC of the random forest model was 0.912, which was also significantly higher than that of the other three models, indicating that the classification of the random forest model was more correct, had better classification effect, and had good generalization performance. Random forest is a collection of multiple decision trees, which can make up for the weak generalization ability of decision trees 11. This method relies on computers to learn all the complex nonlinear interactions between variables by minimizing the errors between the observation and the predicted results[12]. With low computational overhead, it shows strong performance in many practical tasks.
Henoch-Schönlein Purpura is a systemic vasculitis mediated by immune complexes, which is a characteristic self-limited disease. Its pathogenesis is related to genetic, immune and other factors. Renal involvement is the key to determine its prognosis. Clinical judgment of renal involvement in children mainly depends on urine test, renal function test and renal biopsy. However, due to the relatively high risk and low acceptance of kidney biopsy, and lag time of routine urine test, in recent years, a large number of scholars have devoted themselves to studying the high risk factors of HSP renal damage and the methods of preventing renal damage. It mainly includes the analysis of the epidemiological characteristics, clinical manifestations, auxiliary examination, treatment and medication of the disease. This study is ranked according to the feature importance provided by the random forest model, the top 10 features are: Persistent purpura≥4weeks, Cr, Clinic time, ALB, WBC, TC, Relapse, TG, Recurrent purpura, EB-DNA. These features may be important risk factors associated with HSPN in children.
Skin purpura is the most common clinical manifestation of HSP in children [13]. Studies have shown that about 78% ~ 100% of children are accompanied by skin purpura at the beginning of disease, and the accuracy rate of initial diagnosis is high [14]. Persistent purpura usually refers to the rash lasting more than 1 month. Recurrent purpura refers to the recurrence of a typical purpura-like rash in groups (more than 3 times) after the previous rash has completely subsided. Chan H et al[15] found that the risk of renal damage in HSP children with persistent purpura was 1.22-13.25 times higher than that in non-persistent purpura patients. Rigante D et al [16] believed that persistent skin rash for more than one month was an important predictor of renal involvement and disease recurrence in children with HSP. Ma DQ et al[17] found that the recurrence of rash ≥ 3 times was a risk factor for renal involvement in children with HSP. The reason may be the recurrent or persistent skin purpura, indicating that the recurrent and persistent presence of small vasculitis expands the inflammatory cascade reaction of the body, immune complex deposition and complement activation are widely active and persistent, and the renal capillaries are rich, so renal involvement happens.
Serum creatinine is an important indicator of renal function, and the increase of serum creatinine caused by the decrease of creatinine clearance is sign of renal insufficiency. AlKhater et al [18] showed that elevated serum creatinine was related to renal damage in children with HSP. Although in some reports, the average duration of renal disease is about one month after the onset of symptoms, the risk can last up to six months after the initial symptoms of HSP appear. In the report of Gupta et al [19], 57.8% of the patients developed renal symptoms within 4 weeks, 84.4% of the patients developed renal symptoms within 8 weeks, and the remaining patients continued to develop renal symptoms within 6 months after diagnosis. This indicates the need to provide adequate follow-up and monitoring of patients to assess renal involvement. It is generally believed that hypoproteinemia in nephropathy is caused by the loss of a large amount of protein from urine. The results of this study showed that decreased serum albumin was one of the risk factors for renal involvement, which was consistent with the study by Mao et al[20]. This is related to the damage of the charge barrier of glomerular filtration membrane and the increase of permeability in children with HSPN, which leads to albuminuria.
In recent years, studies have shown that elevated serum total cholesterol is more common in children with HSP, especially those with renal damage. For example, Xu et al[21] showed that the age, creatinine and TC levels of children with HSPN were higher than those of children with NHSPN. Logistics multivariate analysis showed that TC level was one of the independent risk factors for HSPN (P < 0.05), which was consistent with the study of Ma et al[17].
Wang et al[22] studies have showed that patients with an interval of less than 4 days have a higher risk of developing kidney damage and severe kidney disease than patients with an interval of more than 8 days from the onset of symptoms to diagnosis. This risk factor has rarely been reported in previous studies. Thus, HSP is a self-limited disease in most cases, but for a small number of patients, HSP may not be self-limited and it will progress to renal involvement or severe renal disease. This finding is similar to the view of Davin et al [23]. These results suggest that early treatment and early diagnosis may be beneficial to children with HSP. Recurrence refers to the recurrence of characteristic manifestations of HSP in children diagnosed with HSP at least 1 month after the disappearance of symptoms. Lei et al [24] defined the interval of recurrence as more than 3 months, including a total of 1002 children, of which 83.6% had one recurrence and 16.4% had more than 2 times of recurrence, and children with recurrence were more likely to have renal damage (P < 0.05).
As studies have shown, infection is the most common cause of HSP, and about 40%-70% of children are mainly affected by respiratory tract infection [25-26].Ma et al[17] showed that the increase of WBC was one of the independent risk factors for HSPN (P < 0.05). Chang et al [27-28]believed that the mechanism may be tissue damage caused by inflammatory mediators secreted by neutrophils, resulting in swelling and necrosis of renal vascular endothelium, while activated substances such as oxygen free radicals can chemotactic more WBC, aggravate vascular injury and form a vicious circle. EBV belongs to the γ subfamily of Herpesviridae, which is a linear double-strand DNA virus, and human is its only natural host. It has been reported that viral infection is the etiology of various renal diseases. EBV infection can directly activate cellular and humoral immunity leading to EBV infection-related renal injury, and can also promote the formation of blood antigen-antibody complex, and settle on the renal vascular wall, causing damage to renal function[29].
This study has the following limitations: (1) the collected cases are one-way retrospective study, the included sample size is limited and has not been externally verified, the results may be biased, and further multicenter large-sample prospective studies are needed for verification, (2) The examination items of children were different, and some index features were omitted due to its absence, and the predictive variables may be left out.
To sum up, this study is based on clinical data, using machine learning algorithm to predict children's HSPN, aiming to intervene the possible clinical risk factors, to assist early clinical diagnosis and improve the prognosis of children, and to reduce the damage caused by invasive examination. Prospective intervention experiments can be carried out in the later stage to try to establish an early warning system for renal damage in children with HSP in hospital, so as to conduct individualized treatment and prevention for patients. The combination of machine learning models and medical big data may provide new ways to predict the risk of children with Henoch-Schönlein Purpura.