This study aimed to assess the implications of the underrepresentation of non-European populations in genomics through conducting a two-sample MR analysis of lipid traits, blood pressure, BMI, kidney function and T2D on CAD across diverse ancestry groups. The MR results showed that when the MR methodology was adjusted, albeit inappropriately, to be more inclusive on non-European data, the observed direction of causality for non-European populations became inconsistent with what is currently known from previous studies in European populations. This was seen in the MR analysis of lipid traits, whereby an increase in LDLC and TC in both the East Asian and South Asian cohort resulted in a protective effect against CAD liability, rather than the increased risk of developing CAD which both exposures are known to have (2, 17, 19).
In order to accommodate the non-European GWAS summary statistics for the analysis, alpha thresholds of P < 5x10⁻⁸, 10⁻⁷, 10⁻⁶ and 10⁻⁵ were selected. We found that when the stricter thresholds were employed, at least one of the non-European populations had insufficient data to conduct a two-sample MR analysis in each selected exposure except for lipid traits, due to the limited number of identified SNPs and instrumental variables. By performing the same analysis on more relaxed thresholds, we were able to identify more instrumental variables for the MR analysis; with this, our resulting point estimates remained generally the same, however weak instrument bias increased. This need to employ a more lenient p-value threshold for instrument selection is not uncommon when conducting an MR analysis using GWAS summary statistics, with other previously conducted MR analyses with different selected exposures and outcomes of interest facing similar challenges (47, 48). However, acknowledging that reducing the alpha threshold is not appropriate for routine MR analysis is very important in such cases, as the introduction of weak instrument bias and violation of necessary MR assumptions may result in a tremendous amount of error in claims of causal effects, as seen from results in this analysis across the different alpha thresholds.
Additionally, despite relaxing the inclusion criteria to identify a sufficient number of SNPs for the analysis, non-European populations had less instruments available and greater explained exposure variance than that of European ancestry populations, with an average of 684 more SNPs identified in European populations compared to what was the highest identified number of SNPs in the non-European ancestry datasets (Supplementary Material 2). This finding highlights the underrepresentation of non-European populations in genomics, with previous studies also limited by power and lack of sufficiently large GWAS summary statistics when using GWAS studies for CAD research or other MR analyses (10, 49, 50). With this, data for blood pressure in African populations is known to be available from the MVP, however, due to a separate ongoing analysis we were not able to include it in this study.
Findings in all MR analyses performed in European populations were consistent with previous MR and observational studies. Comparatively, the observed direction of causality that we identified in the lipid trait analysis in both East Asian and South Asian populations is inconsistent to what is currently known from previous MR and observational studies from diverse ancestry populations, as both LDLC and TC are globally well-established as some of the primary targets for interventions and treatments aimed at reducing overall CAD risk (19, 51–53).
A possible explanation for the observed unexpected MR outcomes is the relaxed alpha threshold which violated the first MR assumption. This introduction of variants not strongly associated with the exposure introduced error into the analysis, which may have biased the results and therefore produced casual effects in the opposite direction than what are well-established from other epidemiological studies. The weak-instrument bias is supported by the small F-statistics identified for non-European populations, which decreased as the alpha threshold became more relaxed (Table 1; Supplementary Material 6). A second explanation is in the size and quality of the outcome (CAD) GWAS for each ancestry group. Having accurate estimates of the SNP-outcome effects is as important as SNP-exposure effects. An overfitting bias is another potential explanation for the observed direction of causation, due to sample overlap between biobanks included in GWAS data sources. In an effort to increase population sample sizes and power to detect variants with smaller effects, an increasing number of GWAS have been performed through conducting a meta-analysis of several biobanks (54, 55). Although this provides sufficiently large samples, there is an increased risk that there was substantial participant overlap between exposure and outcome data sources, as well as variable quality in the GWAS sources of both the exposure and outcome data (Supplementary Material 1).
Besides methodological issues, distinct lipoprotein profile patterns have been identified to be associated with region-specific ancestries (56). LDLC levels have been observed to be lower in East Asian ancestry groups compared to Europeans, with a “treat-to-target” LDLC treatment strategy recommended to be implemented for East-Asian patients with CAD (57, 58). Additionally, individuals of South Asian ancestry have an overall smaller particle size and increased amount of LDLC compared to European populations, possibly meaning that the total measured amount doesn’t capture the amount truly in the population (59–63). In a study by Ruuth and colleagues, they found that LDLC is more prone to aggregation in healthy South Asian individuals compared Europeans, resulting in an increased build-up in the arterial wall and consequent increased risk of developing cardiovascular diseases such as CAD in their lifetimes (64). Additionally, it was found in the INTERHEART study that irrespective of the measured LDLC levels, the higher levels of ApoB compared to other ancestry groups correlated with individuals of South Asian ancestry having a higher total atherogenic lipoprotein (62). These differences may contribute towards variances in the observed function of lipid traits, and therefore warrants the need for further studies into how it affects the aetiology of dyslipidaemia and its subsequent contribution towards CAD risk in South Asian populations.
It has also previously been found that associations exist between small birth size and an increased rate or cardiovascular and metabolic disease in later life (65). This “foetal origins” hypothesis cannot be overlooked as another contributing factor to the differences observed between disease liability across diverse ancestry groups (66). In the context of CAD, evidence suggests that maternal nutrition, specifically maternal malnutrition, contributes towards both the occurrence and early onset development of CAD in offspring (67, 68). If participants included in any of the selected GWAS studies were exposed to famine during gestation, it is possible they were already born with a predisposition to CAD compared to other seemingly comparable populations.
Understanding the genetic basis of complex multifactorial diseases such as CAD is crucial for addressing global health disparities. However, if genetic research is restricted to populations of European ancestry, it may not capture the full spectrum of genetic factors contributing to disease pathophysiology in diverse populations. This lack of understanding regarding the transferability of MR findings across diverse ancestry populations raises questions about the applicability of current genetic insights to individuals from non-European populations. Consequently, both policymakers and funding agencies need to advocate for more diverse research studies in order to ensure that developed public health interventions and healthcare policies account for the unique genetic characteristics of different populations. Until then, clinicians and policymakers need to exercise caution when applying findings from genetic information derived from predominantly European studies to patients or populations from other ancestries.
It is important to note that findings from MR studies should not be used in isolation and instead should be considered as supplementary evidence for findings from other epidemiological methodologies, such as observational studies or randomized controlled trials. Integrating evidence from diverse methodological approaches allows for the mitigation of potential biases and limitations of respective methods, which in the case of MR would be the strict assumptions required to be met for the analysis to be true. In the context of this study, we found that performing a routine two-sample MR analysis in non-European populations was not possible unless we relaxed the alpha thresholds for instrument discovery, resulting in findings of causal effect different from what is currently known from previous observational and MR studies, as well as high levels of weak instrument bias observed.
Limitations
This study has several limitations within the MR-analysis itself. Firstly, we included lenient P-value thresholds of P < 5x10− 5, 10− 6 and 10− 7 to select significant SNPs when developing our genetic instruments. These significance values were selected in order to ensure a sufficient number of instruments from non-European ancestry populations would be identified, however this may have also resulted in the inclusion of weaker or even invalid instruments in the analysis. The presence of weak instrument bias gives rise to a number of further limitation and challenges, including: reduced statistical power, an inflated type 1 error rate (false positives), bias introduced due to either underestimation or overestimation of true effects and the introduction of larger standard errors due to inefficient estimation. With the current genetic research landscape, overcoming weak instrument bias in underrepresented non-European populations remains a challenge. If individual-level data is available, despite smaller sample sizes in both exposure and outcome cohorts, a one-sample MR which employs a stricter alpha threshold may produce more robust findings, however the choice of MR analysis in non-European populations is dictated largely by the availability of suitable datasets.
Due to the limited availability of large-scale GWAS data in populations of non-European ancestry, potential bias may have been introduced when comparing same-ancestry populations from different geographical regions. Demographically heterogeneous populations of the same ancestry may have different genetic architectures or influence of environment on either the exposures or the outcome, resulting in bias being introduced into the MR analysis. Smaller sample sizes in non-European ancestry cohorts may have also contributed to the observed variations in causal effects across ancestry populations, with smaller sample sizes resulting in lower statistical power and less precise estimates. The presence of horizontal pleiotropy is also a limitation of this study. Despite findings from the sensitivity analysis, we cannot exclude the possible presence and influence of horizontal pleiotropy in this study. This was highlighted by the identification of a number of genetic instruments with a low F-statistic, indicating weak instrument bias.
Methodologically, a limitation is that it is also possible that there was sample overlap between data sources selected for the exposure and outcome variables included in the analysis. Although an effort was made to ensure independent populations were selected for the two-sample MR, it should not be overlooked that some data may have been shared across consortiums, especially for non-European populations.
Strengths
In this analysis we were able to compare the feasibility of conducting a two-sample MR analysis across diverse ancestry populations in the same set of exposures on CAD. Employing known causational effects derived from European data as benchmarks, we were able to assess the implications of employing non-European data to investigate the same questions of effect. To the best of our knowledge, this is the first study which conducted such a study on a large-scape across four major ancestry groups.
Our findings of insufficient power in GWAS summary statistic data and limited feasibility of running a two-sample MR in non-European populations highlights gaps and future directions for research. The most apparent gap in this research area relates to the need for the development of more large-scale consortia. Collaborative efforts between researchers to cumulate resources and GWAS data will expectantly improve both the feasibility and reliability of MR studies in non-European populations. Refining the MR methodology to be more inclusive of diverse ancestry data is another future consideration which needs to be taken into account. Establishing protocols which standardize data harmonization and population stratification across different ancestry populations will expectantly allow for more robust analyses to be conducted in diverse populations.