This section provides an overview of the data analysis results. It covers the utilization of the CART algorithm as a research method, a comparison between the accuracy of the CART algorithm and the traditional approach for measuring psychiatric disorders, an assessment of the correct rate of mental health index using the CART algorithm versus traditional measurement, and the outcomes of hypothesis testing.
5.1. CART algorithm
Classification accuracy is affected when the CART algorithm is used to deal with multi-level function data and highly correlated data sets. Currently, no researchers have studied the CART algorithm after its feature classification. Meanwhile, some researchers have combined Bayesian theory and CART methods with data classification and normalized the size of tree models based on probability distribution criteria. In the past, the traditional criteria were flawed and these methods have matured. Based on this, the performance of the Bayesian theory-based CART classification algorithm is improved a priori to find the probability and distribution criteria. As a result, this paper further improves the classification efficiency of the basket algorithm based on the attribute selection algorithm and correctly classifies the data.
The basic flow of the CART algorithm is shown in Fig. 2 (Kim & Rockova, 2023). The method uses recursive branching trees to construct hierarchical decision trees. In the standard distribution, the function should be selected to minimize the Gini index. The big car tree generation algorithm mainly consists of branching trees and pruning trees. The last tree should be as large as possible. The original decision tree uses branch size rules to vary the tree model and selects the optimal tree validation sequence based on performance predictions. The selection of function attributes is usually based on a set of criteria that we call separation criteria. This makes the results for each category as clear as possible. In the "Attributes" node, multiple conditions can be used. We are concerned with entropy and the Gini coefficient.
The decision tree is a common regression classification model. It accepts the concept of procedural recursion, introduces decision rules, and answers many questions - yes or no. CLS, ID3, and CART algorithms classify based on variable functional node selection criteria. The CLS algorithm is the first decision algorithm. Random selection chooses node variables without decision rules, making the algorithm unstable in decision-making. The entropy-based id3 uses data validation as a selection criterion for node feature variables, which solves the problem of random distribution of nodes leading to unstable decision results. However, id3 only deals with individual variables and can only handle classification problems. CART uses two methods to distinguish continuous variables, so the model can be used for both classification and recovery and thus has been widely used. When selecting attribute node variables, the body model uses gin impurities as a decision indicator to avoid distortion of multivariate attribute data. Because the shopping cart model is based on data from the attribute learning domain, the conflict between the developed decision rule and the decision state is equal to the decision state. Therefore, there are many reasonable decision trees in different decision tree models. It is difficult to measure the relative quality of different decision tree models with respect to the training sequence. If there is no feedback iteration process, the local optimal solution can be simplified by choosing a variable function for each node.
The workflow of the CART algorithm based on feature selection is shown in Fig. 3 (Sun et al., 2023). The initial values for classification and regression are decision trees. CART assumes that the end tree is a two-branch tree with internal attributes of "yes" and "no" nodes. The left branch is yes and the right branch is no. The end tree corresponds to a recursive dichotomy for each attribute, which divides the input state (i.e., the attribute space) into finite elements and determines their predicted probability distributions. Conditional probability distributions based on temporal location. This includes tree production and felling. CART assumes that the end tree is a two-branch tree with "yes" and "no" node internal attributes. The left branch is yes and the right branch is no.
Equation (1) indicates that the order of decisions in the decision tree should strictly follow the process, while Eq. (2) represents the second branch. Eq. (3) and the subsequent formulas are interpretations of the parameters.
The CART algorithm for decision-making has the advantages of simple rule modification, insensitivity to data loss, simple modeling, speed, and efficiency. Steering rods are usually used for sorting and retrieval. This paper mainly addresses the return problem. At present, we have new developments in the sectors of medicine, agriculture, environment, petroleum, coal, and electric power, and many fields such as meteorology have been studied and applied with significant achievements. The research and production practice has a certain reference value and guiding significance.
5.2. Accuracy of the CART algorithm versus the traditional approach for psychiatric disorders measurement
The accuracy of the CART algorithm versus the traditional approach to psychiatric measures is shown in Table 2. The accuracy of the CART algorithm with traditional methods when it comes to measuring psychiatric disorders was compared in this study. It seeks to determine whether the CART algorithm, which is a machine learning approach, performs better in diagnosing or assessing psychiatric conditions in comparison to conventional methods used in the field of mental health.
Table 2
Accuracy of the CART algorithm versus the traditional approach for psychiatric disorders measurement
|
Accuracy of the Traditional Approach
|
Accuracy of the CART Algorithm
|
Somatization
|
0.879
|
0.968
|
Psychotic compulsive symptoms
|
0.943
|
0.946
|
Interpersonal tension
|
0.867
|
0.912
|
Depressive symptoms
|
0.841
|
0.887
|
Anxiety symptoms
|
0.912
|
0.966
|
The research involved a thorough examination of the data and the results obtained through both the CART algorithm and traditional approaches to evaluate their precision and reliability in psychiatric disorder measurement. The results show that the CART algorithm is more accurate than traditional approaches.
5.3. Comparison of the correct rate of mental health index based on the CART algorithm and traditional measurement
The accuracy of assessing the mental health index using two different approaches: the CART algorithm and traditional measurement methods was compared. It aims to investigate and evaluate which of these two methods provides a more precise and reliable measurement of the mental health index.
The research involved analyzing and comparing the results obtained from both the CART algorithm and conventional methods to determine which approach is more effective in accurately measuring the mental health index. The comparison of the correct rate of the mental health index based on the CART algorithm and the traditional measurement is shown in Fig. 4. The data was found from the previous studies researched by Battista et al., (2023).
5.4. Hypothesis testing
Table 3 presents the results of the hypothesis testing of this study. The T-value is a measure of how many standard errors the sample estimate is from the null hypothesis. The β-value represents the coefficient associated with each hypothesis. It quantifies the strength and direction of the relationship between the variables being tested. The p-value indicates the probability of obtaining the observed results if the null hypothesis were true. If the p-value is less than the chosen significance level (0.05 or 0.01), the hypothesis is deemed "significant," meaning there is evidence to reject the null hypothesis in favor of the alternative hypothesis.
Table 3
Hypothesis Testing Results
Hypotheses
|
T-value
|
β-value
|
p-value
|
Result
|
H1:
|
8. 778
|
0. 579
|
< 0.01
|
Significant
|
H2:
|
7. 629
|
0. 326
|
< 0.01
|
Significant
|
H3:
|
6. 984
|
0. 340
|
< 0.05
|
Significant
|
H4:
|
9. 119
|
0. 601
|
< 0.01
|
Significant
|
H5:
|
7. 893
|
0. 309
|
< 0.05
|
Significant
|
In this hypothesis testing results, all five hypotheses have significantly low p-values, indicating that the data collected in the study strongly supports each of these hypotheses. The results suggest that patients in rural areas of China face challenges in accessing mental healthcare (H1), limited mental health knowledge among rural residents may lead to delayed treatment seeking (H2), nurses caring for such patients may experience significant psychological burdens (H3), religious prejudice and lack of mental health information may deter patients from seeking and effectively using medical services (H4), and the use of the CART algorithm in analyzing the health status of patients with severe psychiatric disorders is effective in revealing patterns and factors for more effective mental health strategies and resource allocation (H5).
5.5. Case Studies
Chitra & Seenivasagam, (2013) addressed the persistent global challenge of cardiovascular disease, the leading cause of death worldwide, by developing a predictive model for early-stage heart disease detection. The study employed a Supervised Learning Algorithm, specifically a Cascaded Neural Network (CNN) classifier, to analyze and classify information from patients' medical records. A set of 13 attributes was utilized as input for the CNN classifier to assess the risk of heart disease. The performance of the proposed system was benchmarked against the widely recognized Support Vector Machine (SVM) supervised classifier. The research aimed to provide a valuable tool for physicians in enhancing the efficiency of heart disease diagnosis. The study conducted tests using medical records from 270 patients, and the results demonstrated that the CNN classifier exhibited superior efficiency in predicting the likelihood of patients with heart disease compared to the SVM classifier, underscoring the potential of the proposed approach as a valuable asset in early-stage heart disease prediction. Bashir et al., (2014) enhanced the accuracy of heart disease diagnosis by leveraging intelligent data mining tools, given the vast amount of medical data available. Recognizing the significance of heart disease as the leading cause of global mortality in the last decade, the study focused on advancing diagnostic accuracy using data mining techniques. While previous research demonstrated acceptable accuracy levels with individual techniques, this study sought to push the boundaries by proposing a novel framework. The proposed approach utilized a majority vote-based ensemble of different data mining classifiers, aiming to capitalize on the strengths of diverse models. The research employed the UCI heart disease dataset for evaluation, and the results were compelling. The ensemble framework exhibited higher sensitivity, specificity, and overall accuracy when compared to individual techniques. The analysis revealed an impressive 82% accuracy, 74% sensitivity, and 93% specificity for the heart disease dataset, underscoring the effectiveness of the proposed hybrid model in improving diagnostic precision. Tu et al., (2009) developed the diagnostic accuracy of heart disease by leveraging intelligent medical decision support systems. Focusing on the identification of warning signs in patients, we propose the utilization of a bagging algorithm. Our research aims to assess the efficacy of this bagging algorithm in comparison to the widely adopted decision tree algorithm. To achieve this, a comprehensive set of methods was employed, involving the collection and analysis of patient data, application of the bagging algorithm, and subsequent comparison with the decision tree algorithm. The study utilized a diverse dataset of heart disease cases, employing statistical metrics to evaluate the performance of each algorithm. Preliminary results suggest that the bagging algorithm exhibits promising potential in improving the identification of warning signs associated with heart disease when compared to the conventional decision tree algorithm. The findings from this research contribute valuable insights to the ongoing efforts in developing intelligent medical decision support systems for enhanced cardiac diagnosis. Abdar et al., (2015) assess and compare the predictive capabilities of various data mining algorithms in determining the likelihood of individuals developing heart diseases, considering their status as a leading cause of mortality and morbidity nationwide. The study employed five distinct algorithms—CART Decision Tree, Neural Network, Support Vector Machine (SVM), K-Nearest Neighborhood (KNN), and Logistic Regression—to develop and validate predictive models after thorough feature analysis. Among these algorithms, the CART Decision Tree demonstrated the highest accuracy, achieving an impressive 93.02%. The KNN, SVM, and Neural Network models exhibited accuracies of 88.37%, 86.05%, and 80.23%, respectively. Notably, the decision tree results were highlighted for their simplicity and applicability, with easily interpretable rules that could be readily understood by diverse clinical practitioners. These findings contribute valuable insights to the field of heart disease prediction and emphasize the effectiveness of data mining techniques in enhancing risk assessment models. Shouman et al., (2012) explore the integration of k-means clustering with decision tree techniques for the diagnosis of heart disease, considering the crucial role of accurate diagnostics in reducing the global prevalence of heart-related fatalities. Recognizing the success of decision trees in this domain, the study aimed to enhance their accuracy by incorporating k-means clustering, a popular technique with known sensitivity to initial centroid selection. Various methods of initial centroid selection, including inlier, outlier, range, random attribute values, and random row approaches, were investigated to discern their impact on diagnostic performance. The methods were applied and evaluated in the context of diagnosing heart disease patients. The results demonstrated that the integration of k-means clustering with decision tree methodologies, particularly employing the inlier initial centroid selection method, led to a notable enhancement in accuracy compared to other centroid selection approaches. This finding underscores the potential of this integrated approach as a valuable tool for healthcare professionals in improving the precision of heart disease diagnoses.
Figure 5 presents a comprehensive comparison of specificity, sensitivity, and training accuracy across the various case studies discussed in this research. Specificity, sensitivity, and training accuracy are crucial metrics in evaluating the performance of diagnostic models, especially in the context of heart disease diagnosis. Specificity measures the ability of the model to correctly identify true negatives, sensitivity gauges its capacity to correctly identify true positives, and training accuracy reflects the overall correctness of the model during the training phase. Similarly, in the case of mental health, the integration of data mining techniques, such as combining clustering methods like k-means with decision tree algorithms, holds promise for improving diagnostic accuracy in psychiatric conditions. The specific metrics discussed in Fig. 5, including specificity, sensitivity, and training accuracy, are equally relevant in the context of mental health diagnostics. In mental health, achieving high specificity is crucial for accurately identifying individuals without psychiatric disorders, while sensitivity is important for recognizing those with such conditions. Training accuracy ensures the overall effectiveness of the diagnostic model during its development phase.
The findings from the comparative analysis of case studies that used different algorithms provide valuable insights into their performance metrics, such as specificity, sensitivity, and training accuracy. Notably, the Classification and Regression Trees (CART) algorithm exhibited a remarkable specificity of 98.99%, implying a high ability to accurately identify true negatives in the diagnosis of the studied condition. Additionally, CART demonstrated a relatively higher training accuracy of 90.77%, further affirming its efficacy in capturing and learning from the underlying patterns in the data. These superior metrics for CART, as compared to other algorithms like Support Vector Machines (SVM), k-nearest Neighbors (KNN), Naive Bayes, and Neural Networks, suggest that CART holds promise as a robust and accurate tool for the specific diagnostic task at hand. Given the importance of specificity in healthcare diagnostics, the decision to adopt the CART algorithm in the current study appears well-founded. The researchers likely adopted CART based on its proven track record in achieving high accuracy and specificity, offering confidence in its ability to contribute to the enhanced precision of the diagnostic model for mental health. By leveraging the strengths of the CART algorithm identified in the comparative analysis, the study aimed to build upon the success observed in prior research and further refine the diagnostic capabilities for heart disease. This strategic adoption of CART underscores the significance of evidence-based decision-making in algorithm selection, ensuring that the chosen methodology aligns with the specific goals and requirements of the study at hand.