In recent years, the prevalence of MPP has been on the rise, with high mortality and complication rates23. Although most children with MPP have a good prognosis after macrolide treatment, children with SMPP have a poor prognosis with macrolide treatment and a prolonged disease course. This may lead to long-term complications such as bronchiectasis, bronchiolitis obliterans, and permanent pulmonary atelectasis24. Therefore, early identification of SMPP is beneficial for rational treatment, reducing complications, and optimizing the utilization of medical resources. Current laboratory techniques for detecting MP infection include culture, serological assays, and various nucleic acid amplification tests. However, each method has limitations. Although MP culture is considered the "gold standard" for the diagnosis of MP infection, it is difficult to use in clinical diagnosis due to its time-consuming nature. Therefore, serological assays and PCR-based assays are the main tools for the diagnosis of MP infection in clinical practice25, 26. SMPP presents distinct diagnostic challenges due to its wide range of clinical manifestations and overlapping symptoms with other common respiratory diseases. The clinical symptoms associated with MPP in multivariable analysis are similar to those of respiratory viral illnesses, including influenza infection27, 28. In addition, clinical features independently associated with MP detection, such as rales, have always been associated with typical bacterial infections27. Timely identification of pneumonia etiology can improve clinical management, including decision-making on antibiotic use.
With the development of AI, ML based on prediction models has been widely used for risk prediction and assisting diagnosis in medicine29–31. The capacity of ML to comprehensively analyze extensive and intricate clinical datasets holds immense potential. Increasing studies have proven that ML algorithms have advantages over traditional statistical methods in building prediction models. In the present study, we assess the value of routine laboratory parameters in the diagnosis and prediction of SMPP using ML algorithms. Among the five machine learning algorithms—XGBoost, LR, LightGBM, KNN, and RF—LightGBM has the highest AUC value (AUC = 0.968) for the diagnosis of SMPP, which is superior to the other algorithms. Compared to other algorithms, LightGBM is distinguished by its ability to prevent overfitting and to be finely tuned for imbalanced datasets. Previous studies mainly used traditional calculation tools (nomograms) to establish MPP prediction models. Therefore, the utilization of complex ML algorithms to establish identification and prediction models for SMPP remains a relatively underexplored area of research.
Numerous studies have reported the correlation between age and MPP. Children over the age of 5 years are more susceptible to MP infection and exhibit more severe MPP symptoms17, 32. The median age of patients with viral pneumonia is significantly lower than that of MPP patients. In modeling cohort, the median age of SMPP patients is 6 years old, which is consistent with the previous finding that MP infection rarely occurs in children under 3 years of age33. Lu et al34. reported that age, LDH, and ESR were significant factors in predicting RMPP using logistic regression. LDH is widely distributed in various tissues of the body, including lung tissue. As a non-specific marker of tissue damage and cell death, serum LDH has long been used for the diagnosis of pulmonary infectious diseases as well as for prognosis prediction35. Consistent with previous reports36, 37,the current study has found that LDH is significantly increased in SMPP patients compared to patients with other respiratory diseases. CRP is recognized variable of inflammation and has been shown in many studies to be significantly elevated in children with MPP and is associated with disease severity. A study has found that children with CRP > 15.49 mg/L have a higher risk of developing SMPP.
Weights are numerical parameters representing the importance of different features or inputs in a model. In the realm of clinical diseases, researching weights in relation to disease relevancy has become a hot topic. Guided by the importance of feature variables, the top ten weight indicators of Model 1 are PT, PTA, APTT, GLU, HGB, AST, LYC, CRP, HBDH, and LDH. In the diagnostic model for SMPP (Model 1), the weight of coagulation indicators, including PT, PTA, and APTT, is significantly prominent, surpassing other variables in terms of their influence. Recent studies have shown that coagulation abnormalities in children with MPP are not uncommon. The specific mechanism of abnormal coagulation function in MP infection is unclear but may be related to MP inducing massive synthesis and secretion of a series of cytokines, such as interleukins, tumor necrosis factors, and chemokines, leading to local vascular damage and accumulation of metabolites in that area, resulting in vascular blockage38. It is also found that abnormal coagulation may be involved in the development of SMPP and may be closely related to the development and prognosis of its complications39, 40. This study compared the coagulation function between the SMPP group and the other respiratory diseases group and found that the differences in PT, PTA, and APTT levels were statistically significant.
According to reports, the frequency of extrapulmonary symptoms associated with MP infection has increased in recent decades41. The occurrence of cardiac and liver complications related to MPP has been well confirmed42, 43. However, the exact pathological mechanisms of MP infection-induced cardiac and liver complications remain unclear. Cardiac events and liver involvement are the two most common extrapulmonary manifestations, and multiple factors are involved in the pathogenesis of these conditions. A study found that TIM1 is associated with CK-MB, whereas TIM3 and TLR2 are associated with ALT, indicating that cardiac and liver damage caused by MP infection results from a combination of inflammatory cytokines and autoimmune reactions. Among the majority of pediatric patients, myocardial damage is mild, and the diagnosis is usually made through myocardial enzyme and electrocardiogram examinations. Myocardial enzymes, including CK, LDH, CK-MB, and AST, are the main serum enzymes used for the clinical diagnosis of MPP complicated with myocardial damage44. The fact that AST and LDH can be detected in a variety of tissues results in a lack of specificity. Hence, the application of AST and LDH in the diagnosis of myocardial damage is limited. As a myocardium-specific enzymatic indicator, CK-MB is scarcely found in other tissues. The change in CK-MB activity is closely associated with the necrosis of myocardial cells45. In Cohort 2 of this study, we found that the levels of CK-MB in SMPP patients with myocardial damage are significantly higher than those in patients with liver damage or non-damage. Additionally, among pediatric patients with extrapulmonary damage, except for myocardial damage, the CK-MB levels in those with liver damage are also higher than in those non-damage. A study shows asymptomatic elevation of liver enzymes in MPP patients46. ALT levels increase significantly after infection, indicating liver involvement during the disease process, and normalize after macrolide therapy. In Cohort 2, the ALT levels in patients with liver damage are significantly higher than those in patients with myocardial damage and non-damage. In the model established in this study, the SHAP values effectively explain the effect of each feature, making the results applicable in clinical practice. By conducting weight analysis and using SHAP values, we found that CK-MB and ALT play important roles in Model 2.
This study has several limitations: (1) This is a retrospective multicenter study with limited sample size. In the future, we need to conduct long-term, multicenter, prospective studies with larger samples to further demonstrate the value of SMPP prediction models; (2) The models developed in this study may not be applicable to other races other than Asian; (3) Technical and resource limitations precluded the demonstration of ensemble learning effects from multiple machine learning models.