The occurrence and development of T2RF greatly affects the prognosis of patients with AECOPD, increases the mortality of patients and brings a huge economic burden [10, 28–30]. Usually, blood gas analysis will be done when the patient's clinical manifestations suggest abnormal oxygenation, ventilation and acid-base state, to diagnose whether T2RF occurrence in patients with AECOPD. However, there is still a lack of objective predictors that can predict the occurrence of T2RF. In addition, primary medical institutions lack 24h laboratory monitoring facilities and other expensive testing equipment. If the patient's biomarkers are feasible to predict the occurrence of T2RF in advance, it will effectively avoid further aggravation of the patient's condition. Therefore, this study established a predictive model by analysing the clinical data of 1,251 AECOPD patients at admission to use common predictors to identify AECOPD patients who will likely develop T2RF 48h in advance and achieve excellent predictive performances.
In this study, 6 risk-factors were identified and the optimal prediction model was established. COPD is a progressive chronic inflammatory disease with a long course, which is irreversible and characterised by pulmonary function decline with increasing age [31]. As the disease advances in its course, the obstructive lesions of the respiratory tract, lung tissue and airway are aggravated. Patients with long disease courses not only have a high detection rate of drug-resistant bacterial strains, high readmission rates [32, 33], but also serious lung function impairment and are more susceptible to infection [24]. It is worth noting that the pathogenesis of T2RF can be attributed to PVF grade. The higher the degree of PVF, the lower the patient’s respiratory patency, which leads to the aggravation of dyspnoea symptoms and induced T2RF [34]. PVF was also the most important predictor in our model, suggesting its important value in predicting T2RF. D-D is a plasma marker, and its elevated level indicates the presence of hypercoagulability and secondary fibrinolysis, which can be used to assess the severity of AECOPD and predict the risk of death [35–37]. Some scholars have found that the D-D level of patients with AECOPD combined with T2RF is significantly higher than that of patients with pulmonary respiratory failure [38]. Our study shows that D-D can also be a valuable predictor of T2RF. PCT, as a marker of bacterial infection, can reflect the severity of infection and is an effective detection biomarker during AECOPD. It has been widely used as a tool to evaluate and treat AECOPD patients[39, 40]. PCT levels are low in normal people, but are significantly elevated when patients have acute episodes of respiratory failure. It has been proved that PCT can be used as a predictor of acute onset of respiratory failure[41]. NLR is a simple and widely used marker of inflammation, and inflammation plays an important role in COPD. Study has shown that patients with high levels of NLR experience more disease complications[42], and that NLR can also serve as an excellent predictor of in-hospital mortality and mechanical ventilation needs[43–45]. Previous studies have assessed COPD with respiratory failure based on its value and determined the cutoff value is 3.54[46]. Besides NLR, NEUT% is also often used to characterize the systemic inflammation. It can be used as important indicators for the diagnosis and assessment of prognosis in patients with AECOPD[47], AECOPD patients who require non-invasive positive pressure ventilation treatment frequently have high levels of NEUT%[48], it suggests that the increase of NEUT% indicates the occurrence of T2RF. Therefore, in this study, NEUT%, PCT, D-D, NLR, PVF and COPD duration were combined to predict the acute onset of T2RF in AECOPD patients, the selected indicators are supported by medical knowledge.
From the perspective of the overall performance of the models, the performance of the logistic-based nomogram was inferior to the performance based on the ML algorithms. Logistic regression is widely used in the medical domain to explore the risk-factors of diseases because of its OR value with the strong interpretability. However, it has the disadvantage of underfitting when building the model, and the overall ability of the model is not high. ML is an emerging artificial intelligence technology that can analyze the complex nonlinear relationship between predictors and outcomes. In our study, the model performance of the RF and XGBoost algorithms are better than SVM algorithm, and also better than the logistic algorithm. AUROC and AUPR are commonly used metrics to evaluate the performance of ML models, where XGBoost achieves the maximum AUROC of 0.903(0.868–0.939) and RF achieves the maximum AUROC of 0.704. XGBoost and RF are two algorithms with excellent performance, which have been verified by many researchers. In this study, the ratio of the study group and control group was 1:4.17, which was an unbalanced dataset. XGBoost can adjust the scale_pos_weight parameter to make it have a strong ability to deal with unbalanced data, so that the model can achieve the optimal AUROC. The RF algorithm, proposed by Leo Breiman and Adele Cutler in 1995, has extremely high accuracy and can applied to large datasets without falling into overfitting. Many studies have shown that its performance on most datasets is excellent. Although only 4 algorithms were selected in this study, the performance of other algorithms on this dataset is unknown. However, we still found 2 algorithms that can accurately predict the occurrence of T2RF in AECOPD.
However, this study had some limitations. Firstly, it was a retrospective study using electronic medical record data. Therefore, we lack of evidence about whether the resulting associations can be interpreted as causal relationships. Secondly, some predictors of the electronic medical data used in this study were written and extracted by humans, which may also bias the results. Thirdly, there is a lack of some important predictors data in our platform, such as the number of acute exacerbations in the previous year and the deletion rate of C-reactive protein, which are not included in the model. Finally, this study only includes data from the southwestern region of China, which may make the model performance worse if applied to other regions.