Machine Learning to Predict Long and Short Term Fracture Risk in Postmenopausal Women

doi:10.21203/rs.3.rs-1735686/v1

Download PDF

Research Article

Machine Learning to Predict Long and Short Term Fracture Risk in Postmenopausal Women

https://doi.org/10.21203/rs.3.rs-1735686/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Purpose

Fractures in older adults are a significant cause of morbidity and mortality, particularly for post-menopausal women with osteoporosis. Prevention is key for managing fractures in this population and may include identifying individuals at high fracture risk and providing therapeutic treatment to mitigate risk. This study aimed to develop a machine learning fracture risk prediction tool to overcome the limitations of existing methods by incorporating additional risk factors and providing short-term risk predictions.

Methods

We developed a machine learning model to predict risk of major osteoporotic fractures and femur (hip) fractures in a retrospective cohort of post-menopausal women. Models were trained to generate predictions at 3, 5, and 10 year prediction windows. The model used only ICD codes, basic demographics, vital sign measurements, lab results and medication usage from a proprietary national longitudinal electronic health record repository to make predictions.

Results

The algorithms obtained area under the receiver operating characteristic values of 0.83, 0.81, and 0.79 for prediction of major osteoporotic fractures at 3, 5, and 10 year windows, respectively. The algorithms also obtained AUROC values of 0.79, 0.75, and 0.75 for prediction of femur fractures at 3, 5, and 10 year windows, respectively. For all models, when sensitivity was fixed at 0.80, average specificity was 0.615.

Conclusion

Machine learning clinical decision support may inform clinical efforts at early detection of high-risk individuals, mitigating their risk and for establishing clinical research cohorts with well-defined patient populations.

Artificial Intelligence and Machine Learning

menopause

fractures

artificial intelligence

osteoporotic fracture

Fractures remain one of the leading causes of morbidity and mortality among older adults.^1–3 Fractures additionally contribute to substantial healthcare spending, with studies estimating direct costs of up to $42,000 per-hospitalization, contributing to more than $20 billion in total direct costs by the year 2025.^4,5 Although epidemiologic studies have estimated that age-adjusted fracture rates have stabilized or declined for many fracture sites, including the femur (ie, hip), overall fracture incidence and associated spending are predicted to rise as the population continues to age.⁶ Identification and treatment of individuals at risk for osteoporosis and osteoporotic fractures has therefore been identified as a crucial strategy for preventing fractures and their sequelae in older adults.^7,8

The FRAX calculator (https://www.sheffield.ac.uk/FRAX/index.aspx) is one of the most widely used tools for identifying individuals at risk of experiencing a femur or other osteoporotic fracture within the next 10 years. FRAX was designed to overcome the limitations of using only bone mineral density (BMD) to identify patients at risk of fracture, as BMD has been found to provide poor sensitivity when used to predict fracture risk.^9,10 The FRAX model incorporates information on patient demographics, basic osteoporosis risk factors such as alcohol and glucocorticoid use and optionally incorporates BMD measurements. To account for variability in international rates of osteoporotic fractures,¹⁰ country-specific models have been developed and are used nearly 6 million times per year across the globe.¹¹ FRAX has been shown to have reasonable performance in a number of studies ^12–15 and is widely used in both clinical and research settings.

Despite its performance and widespread use, the FRAX tool has a number of noted limitations.^16,17 Although incorporation of BMD measurements improves FRAX performance, these measurements are unlikely to be available for many patients, including younger patients or patients without documented osteoporosis risk factors. Yet another limitation of FRAX is that it provides a single, ten-year risk prediction window. While 10-year risk can be translated with relative ease to one-year risk estimates in healthy individuals with low mortality rates,^18,19 the 10-year window may not be easily interpretable for older individuals or those with increased mortality risk. Fracture risk assessments with shorter outlooks also offer physicians the opportunity to take immediate action to mitigate these risks, which may include therapeutic interventions, altering a patient’s prescriptions to limit those which increase fall risk, and lifestyle changes.²⁰ Understanding this patient outlook may help guide discussions between patients and physicians about fracture risk mitigation to determine which method may be best to ensure adherence, which may include adding bisphosphonates to a patient’s medication regimen, changing medications to avoid those that induce dizziness, or adding weight bearing or strength exercises to an exercise regimen.²⁰ Perhaps as a result of these limitations, FRAX has been noted to perform most accurately in identifying those patients who will not experience a fracture while failing to identify a substantial number of patients who would experience fractures.¹³ FRAX has also been found to have varying performance across different geographic locations and patient characteristics (eg, age), with AUROCs ranging from 0.65 to 0.81.^21–23 Crandall et al. ²¹ reported a FRAX AUROC of 0.65-066 in a US study on women aged 50–79 using both observational and prospectively collected data, whereas a study on Spanish women aged 50–90 reported an FRAX AUROC of 0.812 for predicting major osteoporotic fractures using data obtained from bone density testing and a participant questionnaire.²⁴ Using data extracted from electronic medical records to calculate FRAX without BMD measurements, Goldshtein et al. demonstrated AUROCs of 0.65 and 0.82 for major osteoporotic fracture and hip fractures, respectively.²⁴

Other risk assessment tools have been validated for the purpose of estimating osteoporotic fracture risk in postmenopausal women.²⁵ The GARVAN Fracture Risk Calculator (GARVAN-FRC), designed for use on individuals > 60 years of age, excludes many risk factors, such as parental history of fracture, smoking, and alcohol intake as inputs and has achieved an AUROC of 0.63–0.88.^15,26,27 The QFracture, a tool based on a UK prospective open cohort study, shows hip fracture results that are comparable to FRAX.²⁵ QFracture requires the inclusion of BMD.²⁵ For use in older populations, a 5 or 10 year prediction window may also limit utility of these tools.²⁷

Machine learning (ML) and artificial intelligence have been explored in previous research for osteoporotic fracture prediction and management.^26,28,29 ML models are algorithms that are trained over a set of data to recognize specific patterns and high-dimensional relationships to make predictions on new data/patients. An ML-based clinical decision support (CDS) may assist clinicians to stratify risk for older adult females, regardless of a woman’s history of osteoporosis and use of therapeutics for osteoporosis. It may also guide individualized clinical management, which may include drug holidays or initiation of bisphosphonates, or to form well-defined patient cohorts for clinical studies. Our research examines the ability of our CDS to aid with osteoporotic fracture prediction, with the development and validation of a machine learning algorithm (MLA) to predict 3-, 5-, and 10-year major osteoporotic fracture and femur (hip) fracture risk in a population of women aged 45 to 79 years.

Data Processing

A retrospective analysis was performed on 731,056 patients using a commercial database containing electronic health record (EHR) data collected between 2007 and 2020 from over 700 healthcare sites across the United States. Information regarding patient demographics, vital signs, lab measurements and past diagnoses was extracted for algorithm development. All model features used as input are presented in Table 1. This patient data has been de-identified in compliance with the Health Insurance Portability and Accountability Act (HIPAA). As such, this research does not constitute human subjects research as per the definition put forth in 45 Code of Federal Regulations 46 and did not require Institutional Review Board approval.³⁰

Cohort Definition and Algorithm Runtime

Patients were included in the study cohort if they were of female sex, at least 45 years of age at the start of the algorithm prediction window, and had at least 10.5 continuous years of data available. Sex was determined from self-identification in the patient health record. The age threshold was selected to ensure that the vast majority of postmenopausal women were captured in the data. To determine how many years of medical data a patient had, the EHR entries from the patient’s first month active and last month active were used as a proxy for when medical data collection started and ended for each patient. Therefore, only patients whose difference between their first month active and last month active was greater than 10.5 years were included in the study sample.

The algorithm was designed to utilize six months of medical data prior to the start of the patient’s 10-year prediction window. The end of the patient’s 10 year prediction window was equivalent to their last month active and the start of the prediction window was 10 years before the last month active. Data collection and algorithm prediction windows are illustrated in Fig 1. Although we did not pose an upper age limit, all participants who were 80 years or older were filtered out because they did not have the required data for the 10-year follow-up period.

Participants with fractures were defined as having at least one recorded femur fracture or major osteoporotic fracture during the 10 year window. A fracture diagnosis for the all fracture algorithm and femur (hip) fracture algorithm was determined by the presence of International Classification of Diseases (ICD-9 and ICD-10) codes (Supplementary Table 1). Femur fractures and all major osteoporotic fractures (defined as fractures occurring to the femur, forearm/wrist, shoulder, or clinical spine) were assessed as separate outcomes and were predicted separately.³¹

Patient characteristics (ie, demographics and medical history) were compared across those who did and did not experience a major osteoporotic fracture during the study period. For each variable, the proportions of patients with fracture and patients without fracture were compared using a two-proportions z-test.

Machine Learning Models

A total of six ML models were developed to separately predict risk of major osteoporotic fractures or femur (ie hip) fractures at each of 3, 5, and 10 year prediction windows. ML techniques are known to be capable of mining reliable predictive patterns and higher order interactions in high-dimensional EHR data where these patterns cannot be captured with conventional statistical approaches. Each of these ML models was built using gradient boosting decision trees using the eXtreme Gradient Boosting (XGBoost) method in Python.³² We chose to apply XGBoost-based prediction models in this study as it typically performs better than logistic regression when handling a large number of input features, particularly those with non-linear relationships to the outcomes. XGBoost is also robust to missing data.³³ Gradient boosting combines results from various decision trees to generate prediction scores.³⁴ This method aims to minimize the model’s loss function by successively adding weak learners, in this case decision trees, using gradient descent. Each decision tree divides the patient cohort into progressively smaller groups such that each branch of the tree creates two new groups that have been split on a selected feature value. This process ultimately results in a set of branchless nodes, or “leaves,” and all the patients under a given “leaf” are designated with the same risk score by the model. The XGBoost algorithm is also inherently capable of dealing with missing values. During the training process, a default branch is identified by looking to minimize loss and missing values are thereby sequestered down that side of the branch. Since XGBoost has the ability to handle missing data, feature imputation was not performed during the training and testing of the six models presented in this paper.

All six models were provided with the same inputs. Inputs were selected to include commonly available clinical measurements and known risk factors for pathological fractures. The mean, standard deviation, and most recent value of each vital sign and laboratory measurement over the six month data collection window were considered as model inputs. Medication history and past medical conditions were encoded as binary categorical variables (present vs absent in the EHR). Medical condition and medication history were obtained by searching the full patient record prior to the algorithm run time and were not restricted to the six month data collection window. Race and ethnicity were represented using one-hot encoding. Height, weight and body mass index (BMI) for each patient were determined using the values obtained at the patient encounter closest to the end of the six month assessment window (patient's algorithm runtime).

Prior to training each model, the data was randomly divided using a 70:30 split, in which 70% of the data was used to train the model while the remaining 30% was set aside as a hold-out test set. Model hyperparameters were tuned using 5-fold cross validation on the training portion of the data. Final hyperparameters for each of the six models are listed in Supplementary Table 2. The algorithm produced predicted probabilities that were dichotomized to obtain a score which, if greater than the threshold at the chosen operating point, indicates the patient is likely to get a fracture in the future within the prediction window. The model’s AUROC was given by the 95% confidence interval.

The XGBoost model was calibrated. With a well calibrated model, the probability associated with the predicted class label reflects the likelihood of the correctness of its ground truth.³⁵ The XGBoost model was calibrated using the method of isotonic regression, implemented using the scikit learn package.^36,37 Supplementary Figure 1 shows the true probability vs. predicted probability at pre-calibration and post-calibration. Model performance was evaluated on the 30% hold-out test set not seen during the model training process. The models were assessed in regards to six different metrics: AUROC, sensitivity, specificity, positive and negative likelihood ratios (+/-LR), and diagnostic odds ratio (DOR). LR+ is for positive (case) classification and is defined as the probability of classifying fracture in a person who develops fracture divided by the probability of classifying fracture in a person who does not develop fracture. LR- is for negative (control) classification. The diagnostic odds ratio (DOR) is defined as the ratio between LR+ and LR-. DOR has the advantage of presenting a single indicator of a model's overall performance while being independent of the fracture prevalence in the study population. The feature importance for each of the XGBoost Models were assessed using the SHapley Additive exPlanations (SHAP) values ³⁸ to identify the features that contributed the most towards the models’ predictions. SHAP values of each feature for each patient in the dataset were calculated and a plot (SHAP plot) was generated to evaluate the effect of differing values for the features on the models’ predictive capabilities.

The study sample had a total of 731,056 patients between the ages of 45-79 years. 14,048 of these patients had a major osteoporotic fracture within a 10-year prediction window. In general, patients who experienced a fracture were older, more likely to be white and non-Hispanic, and were more likely to have rheumatoid arthritis, osteoarthritis, osteopenia, and history of cancer (Table 2). All characteristics were statistically significant between the two groups with p < 0.01 in all cases. Femur fracture patients represent a subset of the major osteoporotic fracture population presented in Table 2.

The AUROC plots for the major osteoporotic fracture models and the femur fracture models are displayed in Fig. 2. The complete results of performance metrics for all the 6 trained models on hold-out test sets are reported in Table 3. These metrics included AUROC, sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR-), and diagnostic odds ratio (DOR).

The major osteoporotic fracture model with a three-year prediction window achieved the best overall performance (AUROC:0.830, sens:0.801, spec:0.726). The three-year femur fracture prediction model performed better than the other femur fracture models (AUROC:0.786, sens:0.812, spec:0.612).

Feature importance plots were generated for all six models using the TreeSHAP algorithm. The SHAP summary plots for the major osteoporotic fracture model and femur fracture model with 10 year prediction windows are shown in Fig. 3. Older age, presence of primary osteoporosis, history of bisphosphonate use, mean bilirubin, and white race were the features most predictive of major osteoporotic fracture within 10 years. Similar features were predictive of femur fracture within 10 years, with older age, lower weight, history of bisphosphonate use, white race, and low BMI being most predictive of femur fractures.

In this study, we developed six machine learning models to predict femur fractures and major osteoporotic fractures at 3, 5, and 10 year prediction intervals in women aged 45-79. All algorithms obtained a high AUROC, consistently high sensitivity (>0.80), and moderate to high specificity (0.522 to 0.726) for fracture predictions with the strongest performance yielded in the three-year prediction window for both the major osteoporotic and femur fracture prediction models. This suggests that accurate and interpretable predictions can be made in populations where a 10-year window may not be meaningful or clinically useful. These results support that the algorithms developed in this study may be able to help guide clinical decision-making surrounding individual-level fracture risk in women. Predictions were made using commonly available EHR information and six months of patient data. History of bisphosphonates use was also determined to be an important indicator of fracture risk in all models. These results support that the algorithms developed in this study may be able to help guide clinical decision-making surrounding individual-level fracture risk in women, even in those who have undergone treatment for fracture prevention. The inclusion of females only in our study was due to the fact that osteoporotic risk factors are more complex in women, particularly at postmenopausal state, than they are in men.^39,40 Therefore, women may gain the most benefit from these prediction models as fractures in older women constitute a serious health risk.

Though individuals who are being treated with therapeutics for osteoporosis are known to be at higher risk for fractures than the general population, this tool maintains its clinical utility from a cost/benefit perspective. These assessments would allow clinicians to understand a person's fracture risk while on osteoporosis medication to determine future medication use. It may also help a clinician to determine which patients may be candidates for a drug holiday while ensuring that those that are at higher risk for fractures continue treatment.⁴¹

Explicitly including osteoporotic medications ensures that the algorithm is valid for individuals with a history of pharmacological fracture prevention treatment, which is a population for whom risk stratification is imperative but for whom FRAX is not validated. Though the clinical efficacy of these therapies in modifying fracture rates has been demonstrated in previous clinical trials, fewer studies have examined long-term fracture risk effects of these drugs, particularly in combination with other clinical risk factors.⁴¹ Therefore, long-term osteoporotic fracture risk (even after the osteoporosis diagnosis has been established and treatment has begun) has clinical utility for identifying individuals who continue to stay at higher fracture risk due to additional factors.

The algorithms demonstrated better performance for prediction of major osteoporotic fractures in comparison to the AUCs that are reported for FRAX by Crandall et al.²¹ and met or exceeded previously reported performance of the FRAX tool in other studies, including those conducted on comparable populations.^{21–23,28,31}

In addition to traditional risk assessment tools, machine learning has been explored for its utility as a risk stratification tool for osteoporotic fractures.^26,28,29 ML-based tools for this purpose hold the potential for individualized fracture prediction. ML-based tools that use EHR data may draw upon routinely assessed high-dimensional variables (vital signs, demographic measurements, and comorbidities) that influence the risk of osteoporotic fracture to make personalized assessments. In other health applications, it has been demonstrated that MLAs are particularly suitable for identifying interacting risk factors in high-dimensional EHR data that are otherwise difficult to capture with conventional statistical models.^42–44

Several studies have been conducted to evaluate the ability of ML to stratify osteoporotic fracture risk using various data sources and ML-methods. Almog et al. employed natural language processing methods (NLP) to examine the capability of NLP for the assessment of fracture risk within a one to two year lookahead period ²⁶. Sequential, longitudinal ICD data drawn from an 11 year window was used for analysis and eligible patients were > 50 years of age with two years of data available.²⁶ ICD code vectorization with long short-term memory (LSTM) achieved an AUROC of 0.812.²⁶ Kong et al. developed and compared a novel gradient-boosted machine learning model, CatBoost, with two additional common ML methods for fracture predictions. Non-traditional risk factors, such as lifestyle or economic status, were incorporated into the analysis.²⁸ CatBoost was the best performing of the three models, however, its highest area under the curve (AUC) value was only 0.688 for total fracture prediction when all available data were incorporated.²⁸ Performance decreased slightly for prediction of fractures in the hip and vertebrate.²⁸ For this study, we selected the XGBoost algorithm due to its ability to handle missing data and imbalanced classes in addition to being explainable with SHAP plots. We additionally explored more sophisticated models that had greater flexibility in regards to inputs from the time series data of patients.

Limitations of this study are as follows. The performances of the fracture prediction algorithms were not assessed in prospective settings due to the retrospective nature of the dataset. Osteoporotic fracture risk factors were identified solely via EHR data.Some of relevant information such as menopause status and the ability to distinguish between type 1 and type 2 Diabetes was not feasible with our dataset. Other relevant variables, such as fall history, are not well documented in EHR. The exclusion of disorders that impair neurological function and increase fracture risk (eg, dementia, stroke, etc.) is also a limitation of the present work.⁴⁵ Furthermore, we cannot guarantee that the database includes all patient-related events occurring during the follow-up (10.5 years) period. Due to dataset limitations, we were not able to consider the therapy dose or duration, recency of fractures, or androgen depletion therapy or hormone antagonist therapy. Our dataset does not allow us to determine exact age as it includes birth years only. We cannot predict how our algorithm may perform in other patient populations or in populations with different data availability. Because of this, we used +/- 1 year from the actual age. Although we did not pose an upper age limit, all participants who were 80 years or older, who are known to be at high fracture risk, were filtered out because they did not have the required data for the 10-year follow-up period. It is possible that this introduced bias into the sample, which should be addressed in future work. While the exclusion of subjects without at least 10 years of follow-up was required for the performance evaluation of the long-term prediction model, it also created a biased selection of healthier younger subjects by excluding individuals who died after the start of the follow-up.⁴⁶ By not adjusting for competing mortality, we may have overestimated the 10-year fracture probability. In future studies, non-parametric statistical methods can be used to adjust for the higher risk or mortality within the observation window that exists for older individuals.⁴⁶ Further bias may have been introduced through the use of a six month window of data collection, as this may have led to the exclusion of healthier individuals who did not require any medical interventions in that six month window. There was a slight discrepancy between the predicted and observed probabilities (see calibration curves), indicating that our model may overestimate the predicted fracture risk in some individuals or subgroups due to noise in the training data and differences in patient characteristics between training and test datasets. Finally, our study included a long prediction window, during which external factors, such as regulatory clearance for new pharmaceutical treatments for osteoporosis, may have had an impact on patient outcomes in ways not fully captured by algorithm performance. Future directions include using a longer window of patient data to generate predictions, validating the MLA in populations of men and individuals aged >80 to determine how performance is impacted, validating the algorithm in multiple geographic locations to account for localized risk factors, conducting a prospective validation, and examining possible confounding factors that may influence the performance and accuracy of the algorithm.

Ethical approval and Informed Consent: This article does not contain any studies with human participants or animals performed by any of the authors. For this type of study formal consent is not required.

Data Availability: The data used in this study is proprietary, and therefore, not publicly available.

Funding: n/a

Competing Interests: Dr. Mao owns stock in Dascena. Dr. Hoffman holds stock options in Dascena. All authors have a financial relationship with Dascena as employees or contractors of the company.

Author Contributions: AR: Conceptualization, methodology, formal analysis, and writing (original draft). AS: Conceptualization, methodology, and writing (original draft). JM: Conceptualization, validation, writing (review and editing). GB: Investigation and writing (original and revised draft). SS: Investigation and Writing (revised draft) JH: Project administration and writing (review and editing). QM: Supervision.

bone mineral density (BMD), GARVAN Fracture Risk Calculator (GARVAN-FRC), machine learning (ML), clinical decision support (CDS), machine learning algorithm (MLA), electronic health record (EHR), Health Insurance Portability and Accountability Act (HIPAA), International Classification of Diseases (ICD), eXtreme Gradient Boosting (XGBoost), body mass index (BMI), SHapley Additive exPlanations (SHAP),positive likelihood ratio (LR+), negative likelihood ratio (LR-), diagnostic odds ratio (DOR), natural language processing methods (NLP), long short-term memory (LSTM), area under the curve (AUC), area under the receiver operating characteristic (AUROC) curve, major osteoporotic (MO)

1. Ioannidis G, Papaioannou A, Hopman WM, et al. Relation between fractures and mortality: results from the Canadian Multicentre Osteoporosis Study. CMAJ Can Med Assoc J. 2009;181(5):265-271. doi:10.1503/cmaj.081720

2. Teng GG, Curtis effrey R, Saag KG. Mortality and osteoporotic fractures: is the link causal, and is it modifiable? Clin Exp Rheumatol. 2008;26(5 0 51):S125-S137.

3. Bliuc D, Nguyen ND, Nguyen TV, Eisman JA, Center JR. Compound risk of high mortality following osteoporotic fracture and refracture in elderly women and men. J Bone Miner Res. 2013;28(11):2317-2324. doi:https://doi.org/10.1002/jbmr.1968

4. Heinrich S, Rapp K, Rissmann U, Becker C, König HH. Cost of falls in old age: a systematic review. Osteoporos Int J Establ Result Coop Eur Found Osteoporos Natl Osteoporos Found USA. 2010;21(6):891-902. doi:10.1007/s00198-009-1100-1

5. Burge R, Dawson-Hughes B, Solomon DH, Wong JB, King A, Tosteson A. Incidence and economic burden of osteoporosis-related fractures in the United States, 2005-2025. J Bone Miner Res Off J Am Soc Bone Miner Res. 2007;22(3):465-475. doi:10.1359/jbmr.061113

6. Amin S, Achenbach SJ, Atkinson EJ, Khosla S, Melton LJ. Trends in Fracture Incidence: A Population-Based Study Over 20 Years. J Bone Miner Res Off J Am Soc Bone Miner Res. 2014;29(3):581-589. doi:10.1002/jbmr.2072

7. Woolf AD, Åkesson K. Preventing fractures in elderly people. BMJ. 2003;327(7406):89-95.

8. Wilkins CH, Birge SJ. Prevention of osteoporotic fractures in the elderly. Am J Med. 2005;118(11):1190-1195. doi:10.1016/j.amjmed.2005.06.046

9. Assessment of fracture risk and its application to screening for postmenopausal osteoporosis. Report of a WHO Study Group. World Health Organ Tech Rep Ser. 1994;843:1-129.

10. Kanis JA, Johnell O, De Laet C, Jonsson B, Oden A, Ogelsby AK. International variations in hip fracture probabilities: implications for risk assessment. J Bone Miner Res Off J Am Soc Bone Miner Res. 2002;17(7):1237-1244. doi:10.1359/jbmr.2002.17.7.1237

11. Kanis JA, Johansson H, Harvey NC, McCloskey EV. A brief history of FRAX. Arch Osteoporos. 2018;13(1):118. doi:10.1007/s11657-018-0510-0

12. Kanis JA, Johnell O, Oden A, Johansson H, McCloskey E. FRAX and the assessment of fracture probability in men and women from the UK. Osteoporos Int J Establ Result Coop Eur Found Osteoporos Natl Osteoporos Found USA. 2008;19(4):385-397. doi:10.1007/s00198-007-0543-5

13. Jiang X, Gruner M, Trémollieres F, et al. Diagnostic accuracy of FRAX in predicting the 10-year risk of osteoporotic fractures using the USA treatment thresholds: A systematic review and meta-analysis. Bone. 2017;99:20-25. doi:10.1016/j.bone.2017.02.008

14. Leslie WD, Majumdar SR, Morin SN, et al. Performance of FRAX in clinical practice according to sex and osteoporosis definitions: the Manitoba BMD registry. Osteoporos Int J Establ Result Coop Eur Found Osteoporos Natl Osteoporos Found USA. 2018;29(3):759-767. doi:10.1007/s00198-018-4415-y

15. Gourlay ML, Ritter VS, Fine JP, et al. Comparison of fracture risk assessment tools in older men without prior hip or spine fracture: the MrOS study. Arch Osteoporos. 2017;12(1):91. doi:10.1007/s11657-017-0389-1

16. Silverman SL, Calderon AD. The Utility and Limitations of FRAX: A US Perspective. Curr Osteoporos Rep. 2010;8(4):192-197. doi:10.1007/s11914-010-0032-1

17. Watts NB, Ettinger B, LeBoff MS. FRAX facts. J Bone Miner Res Off J Am Soc Bone Miner Res. 2009;24(6):975-979. doi:10.1359/jbmr.090402

18. FRAX - FAQ. FRAX Fracture Risk Assessment Tool. Accessed March 10, 2021. https://www.sheffield.ac.uk/FRAX/faq.aspx

19. Leslie WD, Majumdar SR, Morin SN, et al. FRAX for fracture prediction shorter and longer than 10 years: the Manitoba BMD registry. Osteoporos Int. 2017;28(9):2557-2564. doi:10.1007/s00198-017-4091-3

20. Moritz M, Knezevich E, Spangler M. Updates in the Treatment of Postmenopausal Osteoporosis. Accessed March 12, 2021. https://www.uspharmacist.com/article/updates-in-the-treatment-of-postmenopausal-osteoporosis

21. Crandall CJ, Larson J, Cauley JA, et al. Do Additional Clinical Risk Factors Improve the Performance of Fracture Risk Assessment Tool (FRAX) Among Postmenopausal Women? Findings From the Women’s Health Initiative Observational Study and Clinical Trials. JBMR Plus. 2019;3(12):e10239. doi:https://doi.org/10.1002/jbm4.10239

22. Ensrud KE, Lui LY, Taylor BC, et al. A Comparison of Prediction Models for Fractures in Older Women: Is More Better? Arch Intern Med. 2009;169(22):2087-2094. doi:10.1001/archinternmed.2009.404

23. Azagra Ledesma R, Prieto-Alhambra D, Encabo Duró G, et al. Usefulness of FRAX tool for the management of osteoporosis in the Spanish female population. 2011;136(14):613-619. doi:10.1016/j.medcli.2010.09.043

24. Goldshtein I, Gerber Y, Ish-Shalom S, Leshno M. Fracture Risk Assessment With FRAX Using Real-World Data in a Population-Based Cohort From Israel. Am J Epidemiol. 2018;187(1):94-102. doi:10.1093/aje/kwx128

25. Kanis JA, Harvey NC, Johansson H, Odén A, McCloskey EV, Leslie WD. Overview of fracture prediction tools. J Clin Densitom Off J Int Soc Clin Densitom. 2017;20(3):444-450. doi:10.1016/j.jocd.2017.06.013

26. Almog YA, Rai A, Zhang P, et al. Deep Learning With Electronic Health Records for Short-Term Fracture Risk Identification: Crystal Bone Algorithm Development and Validation. J Med Internet Res. 2020;22(10):e22550. doi:10.2196/22550

27. Liebowitz J. If It’s Broken, Fix It: Can an Automated System Predict Short-Term Fracture Risk? The Rheumatologist. Accessed March 9, 2021. https://www.the-rheumatologist.org/article/if-its-broken-fix-it-can-an-automated-system-predict-short-term-fracture-risk/

28. Kong SH, Ahn D, Kim B (Raymond), et al. A Novel Fracture Prediction Model Using Machine Learning in a Community-Based Cohort. JBMR Plus. 2020;4(3):e10337. doi:https://doi.org/10.1002/jbm4.10337

29. de Vries BCS, Hegeman JH, Nijmeijer W, Geerdink J, Seifert C, Groothuis-Oudshoorn CGM. Comparing three machine learning approaches to design a risk assessment tool for future fractures: predicting a subsequent major osteoporotic fracture in fracture patients with osteopenia and osteoporosis. Osteoporos Int J Establ Result Coop Eur Found Osteoporos Natl Osteoporos Found USA. 2021;32(3):437-449. doi:10.1007/s00198-020-05735-z

30. Protections (OHRP) O for HR. Exemptions (2018 Requirements). HHS.gov. Published March 8, 2021. Accessed January 21, 2022. https://www.hhs.gov/ohrp/regulations-and-policy/regulations/45-cfr-46/common-rule-subpart-a-46104/index.html

31. Briot K, Paternotte S, Kolta S, et al. FRAX®: Prediction of Major Osteoporotic Fractures in Women from the General Population: The OPUS Study. PLoS ONE. 2013;8(12). doi:10.1371/journal.pone.0083436

32. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016:785-794. doi:10.1145/2939672.2939785

33. Rusdah DA, Murfi H. XGBoost in handling missing values for life insurance risk prediction. SN Appl Sci. 2020;2(8):1336. doi:10.1007/s42452-020-3128-y

34. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobotics. 2013;7. doi:10.3389/fnbot.2013.00021

35. Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. In: InProceedings of the 34th International Conference on Machine Learning-Volume 70 2017 Aug 6. ; :1321-1330.

36. Zadrozny B, Elkan C. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. Published online 2002.

37. About us — scikit-learn 0.16.1 documentation. Accessed November 30, 2021. https://scikit-learn.org/0.16/about.html

38. Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. ArXiv170507874 Cs Stat. Published online November 24, 2017. Accessed May 3, 2021. http://arxiv.org/abs/1705.07874

39. Cawthon PM. Gender Differences in Osteoporosis and Fractures. Clin Orthop Relat Res. 2011;469(7):1900-1905. doi:10.1007/s11999-011-1780-7

40. Forbes AP. Fuller Albright. His concept of postmenopausal osteoporosis and what came of it. Clin Orthop. 1991;(269):128-141.

41. Adams AL, Adams JL, Raebel MA, et al. Bisphosphonate Drug Holiday and Fracture Risk: A Population-Based Cohort Study. J Bone Miner Res. 2018;33(7):1252-1259. doi:10.1002/jbmr.3420

42. Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep. 2020;10(1):4406. doi:10.1038/s41598-020-61123-x

43. Frontiers | Developing an Explainable Machine Learning-Based Personalised Dementia Risk Prediction Model: A Transfer Learning Approach With Ensemble Learning Algorithms | Big Data. Accessed September 17, 2021. https://www.frontiersin.org/articles/10.3389/fdata.2021.613047/full

44. Behravan H, Hartikainen JM, Tengström M, Kosma VM, Mannermaa A. Predicting breast cancer risk using interacting genetic and demographic factors and machine learning. Sci Rep. 2020;10(1):11044. doi:10.1038/s41598-020-66907-9

45. Kelly RR, Sidles SJ, LaRue AC. Effects of Neurological Disorders on Bone Health. Front Psychol. 2020;11. doi:10.3389/fpsyg.2020.612366

46. Leslie WD, Lix LM, Wu X, On behalf of the Manitoba Bone Density Program. Competing mortality and fracture risk assessment. Osteoporos Int. 2013;24(2):681-688. doi:10.1007/s00198-012-2051-5

Table 1: Electronic health record information used to predict major osteoporotic fractures and femur (hip) fractures.

Demographics

Age

Race

Ethnicity

Vital Signs

Diastolic blood pressure

Heart rate

Respiratory rate

Systolic blood pressure

Temperature

Medical History

Alcohol use disorder

Cancer

Celiac’s disease

Chronic kidney disease

Crohn's disease or ulcerative colitis

Current smoking status

Diabetes

Family history of osteoporosis

Hepatic fibrosis and cirrhosis

Osteoarthritis

Osteomalacia

Osteopenia

Primary/age-related osteoporosis

Prior fractures

Prior falls

Renal failure

Rheumatoid arthritis

Secondary osteoporosis

History of Medications

Bisphosphonates or denosumab

Teriparatide, abaloparatide, or romosozumab

Glucocorticoids

Estrogen

Chemotherapy

Laboratory Measurements

25-hydroxyvitamin D

Alanine Aminotransferase

Alkaline Phosphatase

Aspartate Aminotransferase

Bilirubin

Blood urea nitrogen (BUN)

C-reactive protein

High sensitivity C-reactive protein

Calcium

Creatinine

Hemoglobin

Homocysteine

Peripheral oxygen saturation (SpO₂)

Platelet count

Red blood cell (RBC) count

Sodium

White blood cell (WBC) count

Clinical Measurements

Height

Weight

Body mass index

T-score (Femoral neck)

T-score (Femur)

T-score (Forearm)

T-score (Hip)

T-score (Radius)

T-score (Spine)

T-score (Wrist)

Table 2. Characteristics of the study cohort.

	Characteristic	Major osteoporotic fracture Patients (%) n = 14,048	Non-fracture patients (%) n = 717,008
Age Median (IQR) Fracture patients: 72 (62-77) Non-fracture patients: 58 (51-68)	45-49	597 (4.25%)	132478 (18.48%)
	50-54	921 (6.56%)	142204 (19.83%)
	55-59	1203 (8.56%)	116789 (16.29%)
	60-64	1478 (10.52%)	94792 (13.22%)
	65-69	1872 (13.33%)	77582 (10.82%)
	70-74	2221 (15.81%)	60998 (8.51%)
	75+	5756 (40.97%)	92165 (12.85%)

Ethnicity	Hispanic	259 (1.84%)	23767 (3.31%)
	Not Hispanic	13237 (94.23%)	647014 (90.24%)
	Unknown	552 (3.93%)	46227 (6.45%)

Race	White	13258 (94.38%)	629553 (87.8%)
	African American	339 (2.41%)	48617 (6.78%)
	Asian	95 (0.68%)	7927 (1.11%)
	Other/Unknown	356 (2.53%)	30911 (4.31%)

Comorbidities	Rheumatoid Arthritis	584 (4.16%)	11265 (1.57%)
	Osteoarthritis	2672 (19.02%)	71181 (9.93%)
	Osteomalacia	27 (0.19%)	410 (0.06%)
	Osteopenia	1740 (12.39%)	50237 (7.01%)
	Cancer	1676 (11.93%)	47528 (6.63%)
	Chronic Kidney Disease	73 (0.52%)	1821 (0.25%)
	Diabetes	1829 (13.02%)	76275 (10.64%)
	Hepatic Fibrosis and Cirrhosis	60 (0.43%)	1986 (0.28%)
	Celiac’s Disease	131 (0.93%)	5243 (0.73%)
	Crohn’s Disease or Ulcerative Colitis	40 (0.28%)	1372 (0.19%)
	Renal Failure	223 (1.59%)	5188 (0.72%)
	Primary/Age-Related Osteoporosis	3031 (21.58%)	43079 (6.01%)
	Prior Fractures	577 (4.11%)	883 (0.12%)
	Secondary Osteoporosis	280 (1.99%)	3378 (0.47%)
	Prior Falls	581 (4.14%)	8827 (1.23%)

Table 3. Comprehensive results for the performance of the 6 trained models on hold-out test set.

	Major Osteoporotic Fracture Models			Femur (hip) Fracture Models
	10 year	5 year	3 year	10 year	5 year	3 year
AUROC (95% CI)	0.792 (0.786-0.8)	0.806 (0.796-0.815)	0.830 (0.818-0.844)	0.747 (0.73-0.763)	0.745 (0.702-0.787)	0.786 (0.725-0.842)
Sensitivity (95% CI)	0.800 (0.787-0.813)	0.800 (0.782-0.817)	0.801 (0.776-0.826)	0.800 (0.774-0.827)	0.803 (0.732-0.871)	0.812 (0.712-0.908)
Specificity (95% CI)	0.628 (0.626-0.63)	0.659 (0.657-0.661)	0.726 (0.724-0.727)	0.541 (0.539-0.543)	0.522 (0.52-0.525)	0.612 (0.61-0.614)
LR+	2.152	2.347	2.917	1.744	1.681	2.089
LR-	0.318	0.302	0.275	0.370	0.377	0.308
DOR	6.767	7.747	10.613	4.718	4.460	6.782

SupplementaryMaterials.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Machine Learning to Predict Long and Short Term Fracture Risk in Postmenopausal Women

Status:

Version 1

Abstract

Purpose

Methods

Results

Conclusion

Figures

Introduction

Material And Methods

Results

Discussion

Conclusion

Declarations

Abbreviations

References

Tables

Supplementary Files

Status:

Version 1