Acute respiratory infections (ARIs) are a significant cause of morbidity and mortality, especially in developing nations, and they substantially contribute to the global disease burden 1. According to the World Health Organization, lower respiratory infections remained the world's most deadly communicable disease, ranking as the fourth leading cause of death in 2019, causing approximately 2.6 million deaths 2. These infections particularly affect vulnerable populations, such as children and older adults 3–5. The coronavirus disease 2019 (COVID-19) pandemic has altered the epidemiology of respiratory viruses and Mycoplasma pneumoniae (MP) 6.
MP is a leading pathogen that causes human upper and lower respiratory tract infections and was an important contributor to the pneumonia epidemic in autumn and winter in Beijing between 2015 and 2020 7. The emergence of the COVID-19 pandemic has highlighted the crucial demand for swift and precise diagnostic approaches to differentiate various infectious diseases, particularly pneumonia, which presents with overlapping clinical characteristics with COVID-19. Differentiation COVID-19 from pneumonia, especially in cases of co-infection, is vital for appropriate treatment allocation and improving patient outcomes. In this context, biomarkers play a pivotal role in identifying disease-specific pathways and facilitating the development of targeted therapies.
Biomarkers, including clinical and molecular indicators, are instrumental in diagnosing ARIs, assessing disease severity, and monitoring treatment responses 8. For conditions like COVID-19 and MP, specific biomarkers such as C-reactive protein (CRP), procalcitonin (PCT), interleukin 6 (IL-6), and white blood cell (WBC) counts have been identified as key factors associated with disease progression and outcomes 9. The differential expression of these biomarkers among patient groups provides valuable insights into the underlying pathophysiological mechanisms, offering a basis for tailored diagnostic strategies 10.
Recent advancements in computational biology and machine learning have revolutionized medical research, providing innovative tools for analyzing complex datasets. Incorporating machine learning into healthcare represents a transformative shift toward more precise diagnostics and improved patient outcomes 11. Among these, the Random Forest algorithm stands out for its robustness in handling high-dimensional data, making it particularly suited for exploring the intricate relationships between clinical features and disease states 12. By leveraging such computational methods, researchers can accelerate the discovery of diagnostic markers, thereby enhancing the precision of medical interventions, including those for ARIs.
ARIs are a complex syndrome with a range of clinical manifestations, making accurate diagnosis and targeted therapy difficult. ARIs with COVID and MP infections, known for their diverse pathogenicity, pose challenges in the diagnosis. However, the identification of biomarkers, infectious types, and the factors contributing to their presence in ARIs with these viral infections have not been fully explored in China.
To bridge this knowledge gap, we utilized the Random Forest algorithm to analyze a cohort of patients with COVID-19, MP, and co-infections, focusing on identifying significant biomarkers and clinical features that distinguish these conditions. The study leveraged a multidimensional dataset comprising demographic information and clinical parameters. Our investigation involved rigorous statistical analysis, machine learning techniques, and feature importance analysis to identify the key predictors of biomarkers and infectious sites in ARIs patients. To enhance the interpretability of the Random Forest model, we utilized SHAP (SHapley Additive exPlanations) values to understand the contribution of each biomarker to the model's predictions. SHAP values provide a unified measure of feature importance by considering the impact of each feature on the model's predictions, taking into account feature interactions and providing a more detailed understanding of the model's decision process.
This study aims to enhance the diagnostic process for infectious diseases by integrating computational methods with clinical data, ultimately improving patient care and therapeutic strategies in the era of precision medicine.