Main findings
This was a comprehensive analysis of the prevalence of cardio-renal-metabolic (CRM) and mental health (MH) conditions in 12 million patients in a primary care electronic health records (EHRs) database. There was a high burden of depression, anxiety, and hypertension across the population. As expected, most conditions reported in EHRs were increasingly prevalent with increasing deprivation and age, although mental health conditions were potentially under-represented in children. Most CRM conditions, schizophrenia and substance misuse were more prevalent in men, whilst anxiety, depression, bipolar and eating disorders were more common in women. Hypertension and diabetes were twice as prevalent in black patients compared with white patients and diabetes was three times as common in Asian patients. However, black and Asian patients generally had lower recorded prevalences of cardiovascular disease (aortic aneurysms, AF, PVD, HF, heart valve disorder, IHD, stroke) than white patients. Mental health conditions were reported twice as frequently in those of white ethnicity as in those of black or Asian ethnicity in EHRs, except for PTSD and schizophrenia, which were 33% more prevalent and twice as prevalent in those of black ethnicity respectively.
Estimates for prevalence of most clinically detected CRM conditions, as well as depression, anxiety, bipolar disorder, and schizophrenia in the EHR database were broadly similar or greater than the self-reported doctor-diagnosed prevalence reported in the Health Survey for England (HSE) and Adult Psychiatric Morbidity Survey (APMS). This suggests these conditions are well represented in EHRs. However, there were sizable differences in the prevalence of hypertension, diabetes, and depression in the EHR compared to other prevalence estimates from studies screening for these conditions. Screen-detected prevalence estimates for PTSD, bipolar disorder and eating disorders were 4–6 times higher than prevalence of these conditions in primary care EHR records, potentially reflecting a significant burden of underdiagnosed or less well documented MH morbidity.
Comparisons with other literature
In EHR the risk factors for cardiovascular disease (i.e., hypertension and diabetes) were more prevalent in black and Asian people than white people, but paradoxically this was not typically matched by higher prevalence of cardiovascular disease itself (i.e., PVD, aortic aneurysms, stroke, IHD). This has also been reported in other cohort studies analysing variation in prevalence of aortic aneurysms and peripheral artery disease by ethnicity. [43, 44] We found that AF was recorded twice as frequently in white patients compared with black and Asian patients. A previous cross-sectional analysis also found lower prevalence of AF recorded in African American patients’ records compared with white American patients, but no difference in prevalence with systematic unbiased testing. [45] Potential explanations have included differential uptake of screening in the case of aortic aneurysms, and under-diagnosis due to language barriers or lower-health literacy in Asian people regarding PVD symptoms. [17]
These disparities may also reflect the higher premature death rate from IHD in Asian people compared to white people, thus susceptible Asian people do not survive long enough to develop symptoms of PVD.[17] In mental health conditions there was typically significant reduction in prevalence in the over 70-year-olds compared with those aged 40–50, which may reflect earlier mortality for those diagnosed with these conditions at younger ages.[46] Reduced prevalence in the oldest adults is especially notable in eating disorders (see Additional File 2; Supplementary Fig. 3), which is the MH condition with highest mortality rate. [47] In the analyses of prevalence conditions by socio-demographic factors; it is important to note that those who have died before the index date were excluded from the sample so those with non-fatal disease may be over-represented in the survivors.
Prevalence of MH conditions recorded in the primary care EHR was comparatively very low in children. Depression was recorded 40 times more frequently in 17–30 year-olds compared with under-16 year-olds. The latest Mental Health of Children and Young People in England survey found that one in six people aged 6–16 years had a "probable" MH condition. [48] However, this reflects a wide range of mental health symptoms from mood and anxiety to attention and hyperactivity, rather than specific diagnoses. Nevertheless, there is likely to be considerable under-representation of the true prevalence of MH conditions in children in EHRs. Qualitative research suggests that GPs report feeling ill-equipped to diagnose MH conditions in children, and there are considerable challenges in accessing child and adolescent mental health specialists. [49–51]
The gap between screen-detected prevalence and primary care EHR prevalence was more apparent for MH conditions than for CRM conditions, notably for depression, bipolar disorder, eating disorders and PTSD. PTSD had the most notable discrepancies between both screen-detected prevalence and self-reported doctor-diagnosed prevalence compared with prevalence in the EHR, which suggests that this condition may be especially under-recognised and under-diagnosed. Many people with symptoms of common MH conditions do not present to primary care. [11] However, self-reported screening questionnaires also consistently overestimate the prevalence of MH conditions in epidemiological studies, [52] thus CPRD Aurum and other EHR databases may be more reliable for case-detection of these conditions. Results from the SAIL EHR databank, showed that ten year prevalence of depression and/or anxiety was 16.2% and of anxiety/depression symptom codes was 21.4% which is similar to our estimates (16.0% had depression (95%CI 16.0–16.0%).[53]
Women had double the rates of reported depression and anxiety compared with men in the primary care EHR. However, in surveys screening for symptoms of depression and anxiety prevalence of these conditions is only around 25–50% higher in women.[11] In the EHR, depression and anxiety were three times as common in those of white ethnicity compared with those of black or Asian ethnicity. However, in screening studies, symptoms of depression and anxiety were more common in people of black and Asian ethnicity. [11]. Like previous studies, we found that black people were twice as likely to be diagnosed with schizophrenia as other ethnicities.[11] Research in this area is limited by small sample sizes. However, it is recognised that there are considerable barriers to accessing mental healthcare for people from black and minority ethnic communities, which may lead to under-diagnosis in primary care. [54] These disparities between screening prevalence and prevalence of mental health conditions in EHR likely reflect patterns of help-seeking behaviour and barriers to access, which are influenced by both gender and ethnicity. [54, 55]
There was also an overall gap between screen-detected prevalence in HSE and CPRD Aurum prevalence for diabetes and hypertension, whilst doctor-diagnosed prevalence estimates were similar. [20] However, it is important to note that the methods used for screening in HSE are not diagnostic, for example, a single raised HbA1c measurement was used to estimate the prevalence of diabetes, whereas clinical guidelines state that two raised HbA1c measurements are required to confirm the diagnosis.
Replicating the screening methods used in HSE with clinical biomarkers such as blood creatinine and blood pressure produced a similar prevalence rate of hypertension and CKD.[20] These biomarkers may be useful for some studies looking at short term outcomes. A previous study in CPRD Gold found that clinical codes underestimate the prevalence of CKD and concluded that a combination of codes and test results is most appropriate to detect CKD. [56] However, for studies investigating multimorbidity and detection of disease accumulation over several years, clinical codes are more likely to be more specific and most reflective of long-term conditions. Furthermore, cases detected by biomarkers that are missed by using clinical codes may be milder cases, and inclusion of these may dilute the case pool.
The prevalence of all CRM and MH conditions in CPRD Aurum typically ranged from 5–50% higher than prevalence rates reported in other UK primary care EHR databases (predominantly QOF data). Our codelists were more comprehensive than QOF codelists; for example, the codelists for heart failure and depression included more codes related to interventions, abnormal test results, disease monitoring, and referral to secondary care services. In both these conditions the prevalence estimates in CPRD Aurum were similar to the self-reported doctor-diagnosed prevalence estimates. Therefore, our codelists may be more sensitive but less specific than QOF codelists.
A diagnosis of anxiety was more prevalent in CPRD Aurum data (15.8% (95%CI 15.8–15.8%)) in comparison with a previous analysis of THIN data (7.2% (95%CI 7.1–7.2%)).[30] However, the THIN analysis reported prevalence of anxiety codes entered between 2002–2004 only, whereas we included any case prior to 2020. Doctor-diagnosed prevalence of generalised anxiety disorder was also higher in CPRD Aurum (9.4% 95%CI 9.4–9.4%)) compared with self-reported doctor-diagnosed generalised anxiety in HSE (5.5% (95%CI 4.9–6.1%)). [34] The most frequently used code within our anxiety codelist by some margin was “Anxiety with depression”, reflecting the established overlap between these two conditions.
As in previous studies, the prevalence of all conditions increased with increasing socio-economic deprivation (with the exception of eating disorders).[57] A recent systematic review showed no consistent pattern of association between socio-economic status and eating disorders, but that historically those in more affluent groups were more likely to access diagnosis and treatment, which may explain the inverse association between social deprivation and eating disorders. [58]
The prevalence of alcohol misuse in CPRD Aurum in over 16-year-olds (5.4% (95%CI 5.4–5.4%)) was considerably higher than HSE reports of both self-reported doctor-diagnosed alcohol misuse (1.2% (95%CI 1.0-1.5%)) and the screen-detected prevalence of alcohol misuse in the same age group (3.1% (95%CI 2.7–3.5%)). Participants may potentially under-report their true drinking practices in surveys, whilst GPs may be entering clinical codes for alcohol misuse but not conveying the extent of their concerns to patients. [59] On the other hand, substance misuse appears to be under-diagnosed in CPRD Aurum compared with self-reported substance misuse. The prevalence reported in CPRD Aurum 2.1% (95%CI 2.1–2.1%) was lower than the screen-detected prevalence of drug dependence in APMS analysis 3.1% (95%CI 2.7–3.5%), which is in keeping with findings from other studies.[60]
Strengths and limitations
This CPRD Aurum database contains EHR from over 12 million patients reflecting a nationally representative sample of the UK population. For half of the 18 conditions (almost all the CRM conditions) primary care clinicians are financially incentivised via the QOF system since 2004 to accurately record diagnosis codes in EHRs.
Our codelists for identifying conditions within CPRD Aurum were created using a rigorous and systematic process by a team of experienced clinicians, building on a strong foundation of previous research using clinical codes in EHRs. Our findings demonstrate that these codelists appear to have high sensitivity to detect the majority of CRM and MH conditions within EHRs.
The literature review was more pragmatic than a systematic review methodology as it would not have been feasible to do a systematic review for each of the 18 conditions. However, the majority of the comparisons are from the latest official UK government commissioned studies or audits of disease prevalence (e.g., QOF, HSE, APMS, National Diabetes Audit, etc.). [32] Comparisons with studies reliant on self-reported health status (e.g., HSE) are subject to response bias which may have influenced their findings.
For pragmatic reasons, only age (and sex in the case of aortic aneurysms) was used to stratify CPRD Aurum data to make comparisons with prevalence estimates from the literature. Where disease prevalence has changed over time, especially given the ageing population, there can be far less certainty in the comparisons with prevalence estimates from less recent studies in the literature. Caution should be taken in analysis of prevalence of conditions by ethnicity, given that these categories aggregate together very diverse communities and ranges of cultural practices and countries of ethnic origin. Where researchers wish to examine specific conditions or sub-populations in more depth or wish to understand prevalence within a specific sub-population these factors may need to be explored in greater detail.