Background
Random forest (RF) is a powerful ensemble algorithm for medical decision-making supporting (MDS). However the requirement of higher accuracy and smaller ensemble size remain significant burdens for the current RF, particularly for the risk identification of disease deterioration. To achieve the goal of higher accuracy and smaller ensemble size for the risk identification of disease deterioration, a diversity enhancement random forest (DERF) model is proposed.
Methods
We explored the idea of integrating trees that are accurate and diverse to build the DERF model. First, we calculated the accuracy of the out of bag data to select the best K trees. Then, we assessed the diversity of these trees using logarithmic loss functions on the validation data set. Further, we utilized the greedy stepwise backward search to increase the diversity of the random forest. Finally, public bench mark data sets on disease deterioration from KEEL and real data sets from tertiary hospitals in the last three years were used to assess the performance of the proposed DERF model and compared it with the existing model.
Results
Experiments show that the proposed model can improve the prediction performance and reduce the ensemble size of random forest model. Compared with the existing model random forest, the extreme random tree and the ensemble of optimal tree, our proposed DERF model obtains a higher predictive accuracy and a smaller ensemble size.
Conclusion
It reveals that the proposed DERF could reduce the size of the ensemble and achieve good classification results in the risk identification of disease deterioration