We evaluated the performance of a novel suicide risk assessment tool (OxMIS) in 137 112 people in Finland diagnosed with a schizophrenia-spectrum disorder or bipolar disorder. Good discrimination was indicated by an area under the curve of 0.70. This means that in 70% of the instances when we randomly select two people from our sample, one of whom had died by suicide and the other had not, the tool would give the person who had died by suicide a higher predicted suicide risk score.
Consistent with the original OxMIS study, we discovered that model overestimated risk for extremely high-risk patients (i.e., those with a predicted suicide risk of > 5%) and the calibration was poor. However, this only affected a small percentage (1.3%) of our sample who had predicted probabilities of this magnitude. In our complementary sensitivity analysis, we observed improved calibration in these patients when we assigned them a suicide risk prediction of no more than 5%. Given our findings, it appears that specific suicide risk predictions above 5 percent are unlikely to be accurate and should therefore be reported as > 5%, consistent with how the OxMIS online calculator currently reports risks.
Suicide risk prediction models are commonly limited in the reported performance metrics. Furthermore, a recent systematic review of clinical prediction models in psychiatry found that only 16% of the studies had been validated in wholly independent samples and reported measures of discrimination in development and validation samples.5 Importantly, the same review found that nearly four out of five (78%) of these studies reported poorer out-of-sample discrimination. In contrast, we found that OxMIS maintained a similar performance from its Swedish validation, with no clear differences between the reported AUCs (range of 0.70–0.71).
The strengths of our study included the use of national registers that allowed us to study suicide risk in all individuals with SMI diagnosed in the entire country of Finland over a 22-year period. In contrast to the many validation studies,24 this approach further enabled us to test OxMIS on an external validation sample that was more than twice as large as the derivation sample (n = 137 112 vs. n = 58 777), thereby providing excellent statistical power. In Finland, the definitions of the predictors were comparable to those in the original study with no missing data. There are two methodological limitations to consider. First, our crime data consisted of criminal conviction records in which assaults, manslaughter, and murder were predefined as violent crimes. In light of this, our definition of ‘previous violent crime’, one of the 17 predictors, was narrower than that of the original OxMIS study, which included violent property and sexual crimes in addition to unlawful threats. Nonetheless, the influence of this misclassification bias remained minimal, as the tool performed similarly in both countries. Second, although we had complete coverage of inpatient care episodes throughout the entire follow-up period, outpatient care data did not begin until 1998, indicating that the prevalence rates for the predictors derived from the patient data were higher for younger cohorts due to left truncation bias. However, despite variations in the birth cohorts included and coverage of outpatient care data, which began in 2001 in Sweden, we obtained similar results to the original OxMIS study.
External validation is only one, albeit key, component of a comprehensive evaluation of any risk prediction model and tool. Other important considerations include the extent to which it is feasible to implement the tool in clinical practice, e.g. the availability of data to calculate risks, transparency in its development and reporting, and for it to gain acceptability among clinicians.25 Furthermore, any such tool needs linkage to interventions for outcomes to improve. One such intervention in relation to OxMIS is that it underscores safety planning as everyone receives a risk score. At the same time, potential harms need to be considered and OxMIS should be used to support rather than replace clinical decision-making, the latter which will necessarily consider individual, more proximal and contextual factors.
In conclusion, we have conducted a large external validation of one prediction model for suicide (OxMIS). In contrast to the majority of external validation studies in psychiatry, we have pre-specified predictors and outcomes, ensured adequate statistical power (with 1475 outcomes), published a research protocol, and transparently reported our findings, including presenting measures of both discrimination and calibration. We further reduced risks of potential biases by using nationwide registry data with no missing data for the predictors included in the model. With further research on feasibility and work considering how to link risk scores to interventions, OxMIS could assist mental health services in reducing suicide rates in people with SMI.