Setting
Cologne is the fourth largest city in Germany with a population of about one million inhabitants. In-patient psychiatric care is provided by four hospitals and it is organized on a sectoral basis. A single Municipal Court is the deciding authority for all involuntary admissions, detentions and other coercive measures, which are carried out according to the Mental Health Act of the federal state of North Rhine Westphalia (PsychKG NRW). The PsychKG NRW applies to individuals who are mentally ill and present an immediate, severe threat to themselves or others (for details see [17]).
Data Sources
The present study combines data from two different sources.
First, we used the data of the previous retrospective study [17] which analyzed health records of 5764 cases treated in the four psychiatric hospitals in Cologne in the year 2011. Individual patients may have presented as several cases within the study period, therefore we refer to “cases” and not “patients”. The study included data of all 1773 cases under the PsychKG NRW (Mental Health Act) and of 3991 voluntary cases (random sample out of 8398 voluntary cases). Medical, sociodemographic and socioeconomic data of every case were extracted from the hospital records (main diagnosis of a mental disorder and concomitant psychiatric diagnoses according to ICD-10 [29]), suicidal behavior, previous suicide attempts, previous psychiatric treatment, guardianship, time of admission, age, gender, education, vocational and income status, living area, living situation, marital status, migratory background, etc). For a full list of the data collected and further details, see [17].
For the present study, we added environmental socioeconomic data (ESED) for the living area of each case to this data set. The ESED were obtained from RWI-GEO-GRID [30] which covers small scale information on various aspects of household structure, economic strength, house types, demography and mobility [31]. We selected eight variables that reflect economic strength, degree of urbanization and familial integration. We calculated rates per 100 inhabitants for unemployment, employment, commercial enterprises, buildings, households, residential buildings and children. In addition, we calculated the average purchasing power (in Euro) per postal code. This data was added to the original data set using the postal code of each case.
Study Design
Group differences in ESED between the voluntarily and involuntarily admitted cases were analyzed by independent Student´s t-tests.
In order to determine the best predictive model according to parsimony and fit, we first analyzed the original data set [17] and compared the CHAID algorithm used (Chi-Square Automatic Interaction Detection) with other ML algorithms such as CART and optimized hyperparameter tuning. Thereafter, we analyzed a data set enriched with the added ESED using the method which was previously shown to produce the best results (Classification and Regression Trees (CART) with hyperparameter tuning, see below). Preprocessing was performed for the training and testing datasets separately to avoid data leakage. Finally, we present the best model based on the enriched dataset in detail.
Methods Of Analysis
Decision trees continuously split the dataset into groups based on the best predictor per split. The definition of 'the best predictor' is determined in different ways depending on the specific learning algorithm. Specifically, CHAID chooses the predictors according to the Chi-square statistic, whereas CART identifies the predictors that create the most homogenous groups as a result of the split. CART always creates binary splits, whereas CHAID may split data into as many groups as there are categories within the variable that is used to split [32, 33].
We chose four points of the previous CHAID analysis as potential areas of optimizing the analysis. Thus, we ended up comparing the original CHAID analysis [17] to four new models (refer to Table 1 for an overview of the models we analyzed):
Table 1
Model Name | Class Imbalance | Algorithm | Hyperparameter Tuning (HT) | ESED |
Model 1 | Weighting | CHAID | - | - |
Model 2 | Imputation | CHAID | - | - |
Model 3 | Imputation | CART | - | - |
Model 4 | Imputation | CART | ✓ | - |
Model 5 | Imputation | CART | ✓ | ✓ |
Model 1. CHAID with Weighting (CHAID analysis by [17])
When using machine learning algorithms to predict a dependent variable, in our case (in)voluntary hospitalization, the number of observations per category in the dependent variable need to be equal among categories. In our case, this means that the algorithm required equal numbers of involuntary and voluntary admissions in the dependent variable; otherwise, it would have been more efficient at predicting the majority class, i.e. the larger class. There are various methods to deal with imbalances in the number of objects in different categories of the dependent variable. The previous CHAID analysis of this dataset used class weights as a means of balancing the dataset [17].
Model 2. CHAID with Imputation: For our current analysis, we used random oversampling in order to balance the cases in our dependent variable. Random oversampling generates additional data points for the minority class in the dependent variable at random until the balance between the classes is 1:1. For the model calculation of the CHAID, the imputation was performed for the entire training dataset. Notably, the exhaustive CHAID that was calculated for this analysis used the same parameters as the one calculated previously [17], i.e. a maximum depth of three, a minimum number of 100 cases to let a group be created, and a minimum number of 150 cases to let an existing group be split.
Model 3. CART with default parameters
We created a model using default parameters in order to establish a benchmark for the subsequent model 4. For the creation of this CART model as well as models 4 and 5, random oversampling was performed for each fold in the cross-validation, if applicable.
Model 4. CART with hyperparameter tuning (HT)
HT is a method for model optimization. It exhaustively applies combinations of given parameters in order to find the model which produces the best fit. In other words, HT is a grid search that focused on four different parameters (Table 2). Maximum depth refers to the number of splits that can be performed per branch within a decision tree. Minimum number of cases per group and split refer to the number of cases for a terminal node (group) and the node preceding the terminal one (split), respectively. Minimum impurity decrease refers to the amount of gini impurity that needs to be reduced in order for a split to occur. Gini impurity denotes the chance of classifying a randomly chosen case incorrectly. For instance, when a node is comprised of 10 involuntarily and 10 voluntarily treated cases, then the chance to classify an observation incorrectly is 50%. In this case, the gini is 0.5. The lower the gini, the purer the node and the better the classification. Ideally, a decision tree would produce only end-nodes with a gini of 0.
Table 2
Parameter | Values |
Maximum depth of the tree | 3, 4, 5, 6, 7, 8 |
Minimum number of cases per groups | 25, 50, 75, 100, 125 |
Minimum number of cases per split | 50, 100, 150, 200, 250 |
Minimum impurity decrease | .0001, .001, .01, .1, 0 |
Model 5. CART with HT and environmental socioeconomic data (ESED)
To assess whether the inclusion of ESED results in an improvement of the existing models, ESED were added to the data set. Thereafter, the enriched data set was analyzed with the method previously shown to produce the best results (model 4).
Software Packages And Data Handling
We used IBM® SPSS Statistics® (Version 24) for the CHAID analysis and the open-source machine learning library Scikit-learn (Version 0.21.1) in Python (Version 3.7.1) for the CART analysis. Prior to any algorithm calculation, we split the sample into a training/testing and a validation set with a ratio of 70:30. The validation set was used only in the last step to validate the models. To evaluate the steps that require validation in order to create the model (i.e. HT), K-fold cross-validation (k = 10) on the training/testing dataset was used. The chosen evaluation metric was the area under the receiver operating characteristic curve (AUROC).