We presented the IDEARS platform, which uses state-of-the-art machine learning algorithms XGBoost and SHAP to provide a ranking of risk factors for PD using the world’s largest and most comprehensive prospective community study, the UK Biobank. Ageing is widely recognised as by far the most significant factor in predicting PD, therefore we chose to age normalise our datasets to uncover a hierarchy of feature importance that is age-independent. Our model demonstrated that gender was the most important feature, with PD being more prevalent in males, which led us to further split subsequent analyses by gender to uncover gender specific feature importances. Our unbiased machine learning approach uncovered a novel set of features most associated with PD. Interestingly, several well-established risk factors thought to have a high level of association with PD were not identified in the most important features in our model (e.g., pesticide exposure, smoking status, traumatic brain injury and caffeine consumption).
Of note is the importance of insulin-like growth factor 1 (IGF-1), which presented in the top 3 most important features, based on mean SHAP score in the combined dataset, and male and female lists. On deeper inspection of the data, it was clear that IGF-1 levels were elevated in males and females up to 10 years before disease onset. IGF-1 is an endocrine, paracrine and autocrine hormone that is a primary mediator of the effects of growth hormone. Major functions of IGF- include insulinlike activity, cell proliferation and survival, antioxidant effects and neuroprotection. In vivo studies have demonstrated IGF-1 deficiency results in increased oxidative stress, inflammation, neuronal cell death and cognitive deficits that can be improved by exogenous IGF-116,17. It is well documented that IGF-1 is elevated in serum at diagnosis in PD patients, and levels at this time correlate with disease severity5,7. To account for the discrepancy in the beneficial effects of IGF-1 and the fact it is increased in PD, it has been hypothesised that IGF-1 signalling is defective in PD, resulting in a decrease in the neuroprotective effects and reduction in the brains ability to buffer oxidative damage. Moreover, IGF-1 signalling is known to be dysregulated by both toxin-induced inflammation and central obesity5, 18, 19, which is consistent with our model identifying prospective biomarkers predictive of greater PD risk in these categories. Therefore higher-than-average IGF-1 levels years before diagnosis may be indicative of a compensatory mechanism in response to dysregulated IGF-1 signalling. Our findings suggest that IGF-1 should be further considered as a prognostic biomarker for PD risk.
AST:ALT was elevated up to 10 years before, and after PD diagnosis in males but not in females, this is consistent with elevated ALT being protective in the male SHAP list. Elevated AST:ALT ratios between 1–2 are indicative of non-alcoholic fatty liver disease (NAFLD) or non-alcoholic steatohepatitis (NASH), whilst levels < 2 are indicative of alcoholic liver disease20,21, therefore the moderate increases in the male UKB PD cohort may be indicative of NAFLD/NASH, although some individuals in the PD group have levels above 2. A recent study of NAFLD and PD found that there was greater risk of PD in females with NAFLD22, and an earlier study found that NASH in males and females with hepatitis B and C infection led to a greater PD risk23. With that said, NAFLD is associated with cardiovascular disease and metabolic disorders which does not fully align with our other findings (see below)24. Whilst more research on NAFLD and PD is required, our findings indicate elevated AST:ALT may be a useful prospective biomarker of PD in males.
The IDEARS model identified several features associated with cardiovascular health and body adiposity. Total and LDL cholesterol levels were reduced in PD in males 10 years before diagnosis but only 5 years in females. This observation is in keeping with a large population-based study of 261,638 statin-free individuals, which identified that males who had lower levels of total and LDL cholesterol were at a greater risk of developing PD, however there was no significant differences in females8. Given lower LDL levels, PD patients have shown a reduced risk of myocardial infarction and stroke25,26, and it has been hypothesised that the reduced cholesterol levels may be due to nonmotor peripheral symptoms, such as constipation, that can manifest before motor symptoms appear8.
Cardiovascular health is also strongly linked to metabolic regulation, and there are mixed findings on the co-morbidity of type 2 diabetes and PD, with some studies showing an increase 27, and others showing a reduced prevalence26,28. As mentioned above HbA1c is higher 10 years before PD onset in males, but reduced 0–5 years before diagnosis, although the proportion of the PD group with HbA1c in the diabetic range is slightly higher than the non-PD group. Therefore, further research is needed to investigate the possible associations of diabetes and PD.
In keeping with previous literature, the IDEARS platform identified that increasing concentrations of urate are associated with a lower risk of PD29,30. It is thought that urate reduces the risk of neurodegenerative diseases through its iron chelating properties, antioxidant quenching of superoxide and hydroxyl free radicals, and as an electron donor that increases antioxidant activity of enzymes, such as superoxide dismutase31. IDEARS identified increased creatinine levels in the urine of both sexes before and after PD diagnosis, which may be indicative of poor kidney function beginning in the pre-symptomatic phase. However this finding is at odds with a large Swedish study that found a slight reduction in creatinine serum levels from 1 year before diagnosis onwards32, whilst another smaller study found no change in serum creatinine in PD33. Therefore, whilst decreased urate levels might be a useful biomarker for PD, further investigations are required to understand the relationship of creatinine and PD.
Several epidemiological studies have linked central adiposity to PD34,35, which is consistent with output from the IDEARS model with waist circumference being ranked 14th, and being significantly increased before diagnosis in females. Although this observation may be at odds with better cardiovascular and metabolic health in general, body fat distribution is likely key factor, and increased adiposity has also been hypothesised to modulate IGF-1 signalling5,18. Clearly, more research is required to better understand the complex interactions of body adiposity and the risk of PD.
Several features relating to the immune system were identified by the IDEARS model, specifically an increase in neutrophil count, a decrease in lymphocyte count and an increase in NLR, were all identified to be altered both 10 years before and at diagnosis in males, whilst only NLR followed the same pattern in females. An elevated neutrophil count is associated with the occurrence, progression and severity of inflammation or infection, whereas a decreased lymphocyte count, as part of the adaptive immune response, is heavily depressed by stress. Thus, NLR is considered a compound biomarker of inflammation and stress, and therefore it is perhaps not surprisingly that NLR is the most robust and consistent example of a prospective biomarker of PD risk from the IDEARS model. A recent study demonstrated similar findings with increased NLR in 100 PD patients, but no change in Alzheimer’s disease 6. Increased neutrophil count and NLR are in keeping with the literature that inflammation and infection are risk factors for PD. NLR may therefore be considered a useful prospective biomarker for the risk of PD, however as it is associated with many other chronic diseases, it should be used in combination with other biomarkers identified by our model.
Unexpectedly, C-reactive protein, a marker of acute inflammation, appeared as protective in the SHAP list, however on closer inspection it was unchanged in the PD cohort before or after diagnosis in both sexes. The appearance of C-reactive protein as protective, may be due to its complex relationships with other inflammatory markers, which may be more chronic. No change in C-reactive protein is at odds with a recent meta-analysis that found an increase in C-reactive protein in PD36. This may be explained as the UKB dataset includes 2,719 subjects with PD, compared to a combined 2,691 subjects across twenty studies in the meta-analysis. Moreover, findings in those studies were highly variable, two showed no change in C-reactive protein, 10 a small increase and 8 showed a large increase. Therefore, the usefulness of C-reactive protein as a biomarker of PD remains an open question.
Epidemiological studies have revealed viral (e.g. influenza, HSV, hepatitis) and bacterial (e.g. C. pneumonia and H. pylori) infections are associated with an increased risk of developing PD 23,37−40. Inflammatory conditions, such as head trauma, allergic rhinitis and exaggerated allergic reactions following insect stings, have been linked to an increased risk of developing PD41–44. Neuroinflammation is also a common pathological hallmark seen in the PD brain45–48. Conversely, long-term use of non-steroidal anti-inflammatory drugs (NSAIDs) reduce the risk of developing PD 49–52. Our analysis clearly demonstrates a protective effect of Ibuprofen use in the UKB participants, which was more pronounced in at higher NLR.
The reduction in lymphocyte count well before PD in our study is consistent two recent studies, including one that used the UKB dataset (thus validating our approach)11, 29, as well as a meta-analysis that showed decreased numbers of CD3+ and CD4+ lymphocyte subsets in intermediate and late-stage PD, whilst a decrease in CD8+ T lymphocytes was also observed53. It is interesting to observe that this reduction in lymphocyte count occurs up to 10 years prior to diagnosis in males, but only 5 years before in females, and therefore maybe a better prospective marker in men. It is noteworthy that ‘suffers from nerves’ (19th overall) and self-reported nervous feeling (8th in females) were highly ranked risk factor in the IDEARS model, and therefore the PD group may have higher-than-average stress levels, which could depress lymphocyte counts. More detailed analyses of CD4+ T lymphocyte subsets suggests that they are skewed towards proinflammatory phenotypes (i.e., increased Th1, Th17, and reduced Th2 and Tregs) in PD patients54–56. The inflammatory milieu in PD is a likely contributor to decreased IGF-1 signalling mentioned previously5,16,17. Overall, these findings imply a predisposition to PD may be established by conditions that induce peripheral inflammation (injury/infection) and stress, or in individuals with an immune system skewed towards inflammation.
Given that PD is an age-related motor disease it was unsurprising that the IDEARS model identified overall health rating and number of treatments/medications taken as highly ranked features, with both indicative of overall frailty. A deeper analysis into other frailty related features revealed reduced hand grip strength and decreased walking pace can be considered early markers of motor dysfunction and given they are significantly reduced in both sexes 10 years before diagnosis they should be considered as useful clinical measures to predict the risk of PD onset. Existing literature has identified the importance of these factors. Hand grip strength and reduced dexterity have been reported as a predictors of motor symptom severity in PD57. Slow walking speed has been correlated with both advanced aged and PD severity58, and it is also one of the first complaints in the early stages of the disease59. Increased number of ICD conditions at baseline, and reduced forced vital capacity were also apparent years before diagnosis in both sexes, and are indictors of general ill health and multiple co-morbidities in PD patients. Arthritis, hypertension, atrial fibrillation, depression, back problems, and cataracts are commonly reported co-morbidities of PD 27,60, and require a wide range of treatments.
Other significant gender differences were observed with parental PD being more important for men than women, which may suggest that since idiopathic PD has a phenotype that strongly overlaps with monogenic forms of the disease61, there may be a greater genetic component in idiopathic PD in males. Conversely, vitamin D was more shown to be more protective for PD in women than men. Vitamin deficiency has been linked to neurodegenerative diseases, and a deficiency in vitamin D in particular has been linked to reduced dopamine levels and alpha-synuclein accumulating, which are pathological hallmarks of PD 62. Vitamin D has been shown to have neuroprotective, anti-inflammatory and antioxidant effects in vitro63, however a recent metanalysis could not conclude clear benefits of vitamin D supplementation in reducing PD risk64.
The application of a novel methodology in the IDEARs platform has enabled us to examine a much larger range of variables without a priori assumption. The advantages of using XGBoost and SHAP in this context is in the ability to consider a large number of variables and accurately determine their importance in the model while implicitly modelling interactions between variables, resulting in a demonstratively higher AUC. The disadvantage is the black box nature of this approach. We have sought to mitigate this by providing a separate univariate analysis of individual variables. In addition to the power of determining the most significant risk factors in driving PD, this approach could be used separately to provide a risk score which would be more accurate than existing methods.