Genetic Insights into Coronary Artery Disease in Underrepresented Populations: Assessing Two-Sample Mendelian Randomization across Diverse Ancestry Populations

doi:10.21203/rs.3.rs-4435794/v1

Download PDF

Article

Genetic Insights into Coronary Artery Disease in Underrepresented Populations: Assessing Two-Sample Mendelian Randomization across Diverse Ancestry Populations

https://doi.org/10.21203/rs.3.rs-4435794/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Understanding the causal effect of modifiable risk factors on a disease is crucial for aiding and shaping public health policies, identifying targets of interventions and advancing our global understanding of health and diseases. With this however, the disproportionate representation of non-European ancestries in research has raised important questions regarding the transferability and reliability of genetic findings on a diverse global scale.

In this study, we investigated the feasibility of conducting a two-sample Mendelian randomization (MR) analysis in populations of diverse ancestries, focusing on both methodological challenges and biologically differences when data from European, East Asian, South Asian and African ancestry populations were compared against each other. Employing data generated from large-scale genome-wise association studies (GWAS), we chose to compare the causal effects of lipid traits, blood pressure, body-mass index, type-2 diabetes and kidney function on coronary artery disease.

Due to an insufficient number of identified single-nucleotide polymorphisms in non-European data when strict alpha thresholds were employed, we were not able to conduct MR analyses across all ancestry populations until the threshold was relaxed. We found that allowing for a lenient inclusion threshold and extending the MR methodology to be more inclusive of non-European data resulted in an increase in weak instrument bias, resulting in imprecise estimates and a reduced ability to detect true causal effects.

Notably, our results showed causal associations known to be inconsistent with established findings, specifically between lipid traits in South Asian populations compared to European ancestry populations. These findings reiterate the urgent need for independent large-scale GWAS in non-European populations, to improve the power and reliability of MR studies as well as to develop methods which take into account population-specific effects.

The Mendelian randomization (MR) epidemiological methodology circumvents limitations of observational studies in order to provide an estimate of the causal relationship between an exposure and disease of interest. MR study design limits susceptibility to reverse causation and confounding by exploiting the naturally occurring random allocation of genetic variants at birth as proxies for modifiable risk variables (1–3). There are three key assumptions which need to be met in order for a MR analysis to be valid. Firstly, genetic instruments employed as IVs are robustly related to the exposure of interest (relevance), secondly, the genetic instrument(s) and the outcome association have no confounding variables (independence), and finally, that the relationship between the genetic instruments and the outcome are mediated only through the selected exposure (exclusion restriction) (2, 4, 5).

Following the availability of data generated from large-scale genome-wide association studies (GWAS), studies employing MR methodologies to better understand the development of common critical diseases have been gaining prominence. With this however, the majority of MR analyses have been limited to populations of European ancestry (6–8). This is important as the existence of differences in environment, allele frequencies and patterns of linkage disequilibrium across diverse ancestry populations gives rise to questions regarding the ability to beneficially translate findings beyond European ancestry individuals (4, 9–11). The underrepresentation of diverse ancestry populations, and the consequent lack of well-powered data, further highlights the need to understand potential methodological issues of running a two-sample MR analysis in populations of non-European ancestry.

This study aimed to assess the implications of underrepresentation of diverse ancestry data in the context of MR analyses and evaluate the feasibility of running a two-sample MR analysis in non-European populations. To do this, we chose to focus on exposure and outcomes in which the direction of causal association had already been well-established in European populations and were it is reasonable to expect similar causal associations in non-European populations.

Therefore, this study employed a two-sample MR approach to compare the effects of major lipid traits, blood pressure, body-mass index (BMI), type-2 diabetes mellitus (T2D) and kidney function on coronary artery disease (CAD) in populations of European, East Asian, South Asian and African ancestries.

Data Sources

Despite ongoing efforts to improve disease identification, treatment and prevention, CAD remains as the leading cause of death worldwide, with around 1 in 6 deaths caused by CAD globally and approximately 17.9 million deaths due to the disease in 2021 (11–14).Researchers have investigated the causal effects of a number of exposures on the development of CAD, including anthropometric characteristics, hepatic traits, reproductive factors and cardiometabolic traits (15–19).

We used summary statistics obtained from large-scale GWAS to identify genetic association estimates for the analysis. Data sources for both exposure and outcome variables included the GWAS Catalog, Asian Genetic Epidemiology Network (AGEN), Million Veteran Program (MVP), Global Lipids Genetics Consortium (GLGC), Biobank Japan (BBJ) and the UK Biobank (UKB) (20–37) (Supplementary Material 1). In line with the two-sample MR methodology, to the best of our ability with the data available, both exposure and outcome data sources were independent from each other in order to minimize participant overlap in the analysis (38). Relevant participant consent and ethical approval were obtained by each of the respective original studies.

Genetic Instruments

Genetic instruments for each MR analysis were selected following a series of steps. Firstly, single-nucleotide polymorphisms (SNPs) at a significance threshold of P < 5x10⁻⁸, P < 5x10¯⁷, P < 5x10¯⁶ and P < 5x10⁻⁵ were respectively selected from GWAS summary statistics of each exposure trait.

In accordance with the first MR assumption, a strict threshold should be employed in order to identify variants with a robust association with the exposure. However, more lenient threshold values were included here to allow for the comparison of both the number and strength of included instrumental variables identified for each ancestry group (39, 40). When a data source has a small sample size or has limited available instruments, as is seen more commonly in non-European datasets, researchers may consider employing less stringent alpha thresholds to allow for more variants to be included in the analysis. However, the relaxation of alpha thresholds below appropriate genome-wide thresholds is not advocated as it will increase type I error, but strictly for exploratory analyses it can be instructive to do so. We also note that given variation in the length of LD patterns among ancestries, appropriate alpha thresholds will change, for example, it was recently estimated that an alpha of 5x10^− 9 was most appropriate for a Ugandan cohort (41). The effects of the introduction of weak instrument bias and potential type-1 error as a result of relaxing the inclusion threshold were investigated when performing the two-sample MR.

Independent SNPs were identified as instruments for each exposure by linkage disequilibrium (LD) clumping using a window of 1000kb and r² < 0.01. This was conducted before searching for the identified genetic instruments in the respective CAD-outcome summary statistics for each ancestry population. No proxy SNPs were required to be identified for this analysis. Finally, we formatted both exposure and outcome files to comply with the TwoSampleMR R package, used the “harmonize_data()” function to perform data harmonization on the clumped SNPs and inferred the forward strand alleles using allele frequency information for palindromic SNPs with the “subset()” function.

Mendelian Randomization Analysis

For two-sample MR analyses, we employed the random effects inverse-variance weighted (IVW) methodology as the primary analysis. Respective analyses were performed with the instrumental variables selected from each of the selected alpha discovery thresholds where possible.

In order to assess the robustness of the findings and allow for less stringent assumptions in the methodology, MR-Egger and weighted median methods were performed as part of the sensitivity analysis. These methods were selected as MR-Egger allows for directional pleiotropy including an intercept term in the regression to estimate the pleiotropic effect, while the weighted-median methodology provides robust estimates if at least 50% of the instruments are valid. Together, a combination of these methods aids in the interpretation of the causal inferences made in a two-sample MR analysis.

To quantify the strength of the instruments, F-statistics for each exposure was calculated using the methods described by Burgess et al and the MR-Base database (42–43; Supplementary Material 6). We additionally performed further sensitivity analyses including heterogeneity testing, a MR-PRESSO analysis, a leave-one-out analysis, a SNP-specific forest plot and a funnel plot, in order to validate the causal relationship observed for respective exposures in each ancestry population.

We used the TwoSampleMR and MendelianRandomization packages to perform each of the analyses in the R version 4.2.3 programming language (43–46)

Genetic Instruments

The number of available SNPs for the discovery of instrumental variable increased when the alpha threshold was reduced (Supplementary Material 2). When an alpha threshold of 10¯⁸ was employed, there were 10 MR analyses which couldn’t be run due to having no available instrumental variables, and 3 analyses which couldn’t be run due to having an insufficient number of instrumental variables for a two-sample IVW MR to be performed (Table 1). Similarly, when a more relaxed alpha threshold of 10⁻⁷ was selected, 6 analyses produced an insufficient number of available SNPs, and 3 had no available instrumental variables. Besides for the blood pressure analysis, all of the unsuccessful analyses were only in populations of non-European ancestries, specifically South Asian and African for T2D, East Asian, South Asian and African for BMI, East Asian for Kidney Function, and South Asian for both DBP and SBP. The alpha threshold of 10⁻⁶ proved to be the strictest possible threshold selected which had a sufficient number of instrumental variables for all 33 potential MR analyses to be performed and compared across each other.

Table 1. Number of instrumental variables available, mean F-statistics and GWAS sample sizes for two-sample mendelian randomization analysis in selected exposures for the causational analysis of coronary artery disease in respective ancestry populations when alpha thresholds of 10⁻⁸, 10⁻⁷, 10⁻⁶ and 10⁻⁵ were employed. East Asian (EAS); European (EUR); African (AFR) and South Asian (SAS).

Two-Sample Mendelian Randomization Analysis

When using a strict alpha threshold (P < 5x10^− 8) to identify robust IVs for our exposures we observed evidence to suggest that HDLC, LDLC, TC, logTG, T2D, and BMI each have a causal influence on CAD. Increases in HDLC decrease CAD (OR 0.663; 95%CI 0.612–0.718) while increases in LDLC, TC, logTG, and BMI each increase the risk in CAD respectively (Fig. 1, 2 and Supplementary Material 3). However, these observations are limited to analyses conducted in the EUR ancestry group (Fig. 1). No, IVs were available for diastolic and systolic blood pressure in the EUR ancestry group, and while kidney function had 23 instruments for EUR this analysis was influenced by weak instrument bias (Table 1). All other ancestry groups had no or far fewer IVs and in all instances a greater possibility of weak instrument bias.

Nevertheless, notable differences are observed between non-European and European populations. Confidence intervals of East Asian ancestry populations do not overlap European populations when LDLC (OR 0.984; 95% CI 0.615–1.576), TC (OR 0.661; 95% CI 0.403–1.082), logTG (OR 1.100; 95% CI 0.771–1.567) and T2D (OR 0.858; 95% CI 0.699–1.052) were selected as exposure variables (Figs. 1 and 2). Despite this, MR results of all non-European ancestry populations produced confidence intervals which overlap the null.

MR analyses performed with an alpha inclusion threshold of 5x10^− 6 allow for a complete comparison of results across all ancestry groups for each exposure category. In these results, confidence intervals of South Asian ancestry populations do not overlap European populations when HDLC (OR 1.242; 95% CI 0.874–1.768), T2D (OR 1.162; 95% CI 0.895–1.508) and BMI (OR 0.422; 95% CI 0.159–1.120) were selected as exposure variables, while the point estimates and confidence intervals of TC, logTG and LDLC in the East Asian cohort still do not overlap Europeans populations.

A key observation in this analysis is that when we relaxed the inclusion criteria for associated SNPs, we obtained results from non-European populations in the opposite direction than what is currently well-established in research from European populations (Supplementary Material 3). This observation is most clearly seen in the point-estimate results from the South Asian population when lipid traits were selected as exposures, whereby an opposite direction of effect than what is currently known in European populations for HDLC was observed when lenient thresholds of 10^− 5 (OR 1.284; 95% CI 1.040–1.587) and 10^− 6 (OR 1.242; 95% CI 0.874–1.768) were employed, compared to 10^− 7 (OR 0.944; 95% CI 0.576–1.547) which showed the same direction of effect of that in European populations. The direction of effect for LDLC and TC in South Asian ancestry populations is also opposite than that of Europeans for the relaxed thresholds of 10^− 5 (OR 0.993; 95% CI 0.810–1.219) and 10^− 5 (OR 0.967; 95% CI 0.791–1.182) and 10^− 6 (OR 0.906; 95% CI 0.382–2.149) respectively, however as there is insufficient data for the analysis in the stricter thresholds, we are not able to observe how applying stricter alpha thresholds affect the direction of the results (Fig. 2).

The IVW results from the most relaxed analysis of HDLC on CAD in the South Asian ancestry cohort do not overlap the null, an observation that may lead researchers to suggest that the effects of HDLC on CAD between South Asian and European ancestry groups are likely to differ from each other. The results suggests that an increase in HDLC is protective against CAD in European individuals, while an increase in HDLC causes CAD in South Asian populations. However, invalid and weak instruments limit the validity of this result, and these finding reiterates the need for caution when relaxing alpha thresholds in a MR analysis (Supplementary Material 6).

In addition to this, the large range of p-values for many of the ancestry groups, ranging from 0 in HDLC, LDLC, TC, logTG and T2D for European ancestry populations to 0.948 in LDLC for East Asian ancestry populations, indicates limited statistically power between the majority of exposures on CAD in non-European ancestry groups (Supplementary Material 3). With the stricter alpha threshold of 10^− 8, the only exposures which were identified to be associated with CAD were observed in EUR ancestry analyses. When using exposures and an outcome where it is reasonable to assume that biology would be similar across human groups, these results highlight the imbalances in data availability across ancestries groups.

Sensitivity Analysis

To determine if observed causal associations are sensitivity to model assumptions, we compared the IVW estimates with MR Egger and weighted median models. The observed associations between genetically proxied exposure variables and the risk of CAD illustrated a great deal of heterogeneity among ancestries. The observed associations between genetically proxied exposure variables and the risk of CAD were generally consistent across ancestry populations for the analysis with inclusion threshold 10^− 6, except for results of logTG and T2D in African ancestry populations, BMI and kidney function in East Asian ancestry populations and TC, logTG and SBP in South Asian ancestry populations, which all showed a different direction of causation between the IVW and MR-Egger results (Supplementary Material 3). The identified difference in results between the two methods imply potential directional bias in the analysis, specifically highlighting the presence of horizontal pleiotropy.

A detailed summary of all sensitivity analyses conducted for each exposure variable for the two-sample MR analyses conducting with an alpha threshold of p < 5x10^− 8 is shown in Supplementary Material 4, 5 and 6 and 7.

This study aimed to assess the implications of the underrepresentation of non-European populations in genomics through conducting a two-sample MR analysis of lipid traits, blood pressure, BMI, kidney function and T2D on CAD across diverse ancestry groups. The MR results showed that when the MR methodology was adjusted, albeit inappropriately, to be more inclusive on non-European data, the observed direction of causality for non-European populations became inconsistent with what is currently known from previous studies in European populations. This was seen in the MR analysis of lipid traits, whereby an increase in LDLC and TC in both the East Asian and South Asian cohort resulted in a protective effect against CAD liability, rather than the increased risk of developing CAD which both exposures are known to have (2, 17, 19).

In order to accommodate the non-European GWAS summary statistics for the analysis, alpha thresholds of P < 5x10⁻⁸, 10⁻⁷, 10⁻⁶ and 10⁻⁵ were selected. We found that when the stricter thresholds were employed, at least one of the non-European populations had insufficient data to conduct a two-sample MR analysis in each selected exposure except for lipid traits, due to the limited number of identified SNPs and instrumental variables. By performing the same analysis on more relaxed thresholds, we were able to identify more instrumental variables for the MR analysis; with this, our resulting point estimates remained generally the same, however weak instrument bias increased. This need to employ a more lenient p-value threshold for instrument selection is not uncommon when conducting an MR analysis using GWAS summary statistics, with other previously conducted MR analyses with different selected exposures and outcomes of interest facing similar challenges (47, 48). However, acknowledging that reducing the alpha threshold is not appropriate for routine MR analysis is very important in such cases, as the introduction of weak instrument bias and violation of necessary MR assumptions may result in a tremendous amount of error in claims of causal effects, as seen from results in this analysis across the different alpha thresholds.

Additionally, despite relaxing the inclusion criteria to identify a sufficient number of SNPs for the analysis, non-European populations had less instruments available and greater explained exposure variance than that of European ancestry populations, with an average of 684 more SNPs identified in European populations compared to what was the highest identified number of SNPs in the non-European ancestry datasets (Supplementary Material 2). This finding highlights the underrepresentation of non-European populations in genomics, with previous studies also limited by power and lack of sufficiently large GWAS summary statistics when using GWAS studies for CAD research or other MR analyses (10, 49, 50). With this, data for blood pressure in African populations is known to be available from the MVP, however, due to a separate ongoing analysis we were not able to include it in this study.

Findings in all MR analyses performed in European populations were consistent with previous MR and observational studies. Comparatively, the observed direction of causality that we identified in the lipid trait analysis in both East Asian and South Asian populations is inconsistent to what is currently known from previous MR and observational studies from diverse ancestry populations, as both LDLC and TC are globally well-established as some of the primary targets for interventions and treatments aimed at reducing overall CAD risk (19, 51–53).

A possible explanation for the observed unexpected MR outcomes is the relaxed alpha threshold which violated the first MR assumption. This introduction of variants not strongly associated with the exposure introduced error into the analysis, which may have biased the results and therefore produced casual effects in the opposite direction than what are well-established from other epidemiological studies. The weak-instrument bias is supported by the small F-statistics identified for non-European populations, which decreased as the alpha threshold became more relaxed (Table 1; Supplementary Material 6). A second explanation is in the size and quality of the outcome (CAD) GWAS for each ancestry group. Having accurate estimates of the SNP-outcome effects is as important as SNP-exposure effects. An overfitting bias is another potential explanation for the observed direction of causation, due to sample overlap between biobanks included in GWAS data sources. In an effort to increase population sample sizes and power to detect variants with smaller effects, an increasing number of GWAS have been performed through conducting a meta-analysis of several biobanks (54, 55). Although this provides sufficiently large samples, there is an increased risk that there was substantial participant overlap between exposure and outcome data sources, as well as variable quality in the GWAS sources of both the exposure and outcome data (Supplementary Material 1).

Besides methodological issues, distinct lipoprotein profile patterns have been identified to be associated with region-specific ancestries (56). LDLC levels have been observed to be lower in East Asian ancestry groups compared to Europeans, with a “treat-to-target” LDLC treatment strategy recommended to be implemented for East-Asian patients with CAD (57, 58). Additionally, individuals of South Asian ancestry have an overall smaller particle size and increased amount of LDLC compared to European populations, possibly meaning that the total measured amount doesn’t capture the amount truly in the population (59–63). In a study by Ruuth and colleagues, they found that LDLC is more prone to aggregation in healthy South Asian individuals compared Europeans, resulting in an increased build-up in the arterial wall and consequent increased risk of developing cardiovascular diseases such as CAD in their lifetimes (64). Additionally, it was found in the INTERHEART study that irrespective of the measured LDLC levels, the higher levels of ApoB compared to other ancestry groups correlated with individuals of South Asian ancestry having a higher total atherogenic lipoprotein (62). These differences may contribute towards variances in the observed function of lipid traits, and therefore warrants the need for further studies into how it affects the aetiology of dyslipidaemia and its subsequent contribution towards CAD risk in South Asian populations.

It has also previously been found that associations exist between small birth size and an increased rate or cardiovascular and metabolic disease in later life (65). This “foetal origins” hypothesis cannot be overlooked as another contributing factor to the differences observed between disease liability across diverse ancestry groups (66). In the context of CAD, evidence suggests that maternal nutrition, specifically maternal malnutrition, contributes towards both the occurrence and early onset development of CAD in offspring (67, 68). If participants included in any of the selected GWAS studies were exposed to famine during gestation, it is possible they were already born with a predisposition to CAD compared to other seemingly comparable populations.

Understanding the genetic basis of complex multifactorial diseases such as CAD is crucial for addressing global health disparities. However, if genetic research is restricted to populations of European ancestry, it may not capture the full spectrum of genetic factors contributing to disease pathophysiology in diverse populations. This lack of understanding regarding the transferability of MR findings across diverse ancestry populations raises questions about the applicability of current genetic insights to individuals from non-European populations. Consequently, both policymakers and funding agencies need to advocate for more diverse research studies in order to ensure that developed public health interventions and healthcare policies account for the unique genetic characteristics of different populations. Until then, clinicians and policymakers need to exercise caution when applying findings from genetic information derived from predominantly European studies to patients or populations from other ancestries.

It is important to note that findings from MR studies should not be used in isolation and instead should be considered as supplementary evidence for findings from other epidemiological methodologies, such as observational studies or randomized controlled trials. Integrating evidence from diverse methodological approaches allows for the mitigation of potential biases and limitations of respective methods, which in the case of MR would be the strict assumptions required to be met for the analysis to be true. In the context of this study, we found that performing a routine two-sample MR analysis in non-European populations was not possible unless we relaxed the alpha thresholds for instrument discovery, resulting in findings of causal effect different from what is currently known from previous observational and MR studies, as well as high levels of weak instrument bias observed.

Limitations

This study has several limitations within the MR-analysis itself. Firstly, we included lenient P-value thresholds of P < 5x10^− 5, 10^− 6 and 10^− 7 to select significant SNPs when developing our genetic instruments. These significance values were selected in order to ensure a sufficient number of instruments from non-European ancestry populations would be identified, however this may have also resulted in the inclusion of weaker or even invalid instruments in the analysis. The presence of weak instrument bias gives rise to a number of further limitation and challenges, including: reduced statistical power, an inflated type 1 error rate (false positives), bias introduced due to either underestimation or overestimation of true effects and the introduction of larger standard errors due to inefficient estimation. With the current genetic research landscape, overcoming weak instrument bias in underrepresented non-European populations remains a challenge. If individual-level data is available, despite smaller sample sizes in both exposure and outcome cohorts, a one-sample MR which employs a stricter alpha threshold may produce more robust findings, however the choice of MR analysis in non-European populations is dictated largely by the availability of suitable datasets.

Due to the limited availability of large-scale GWAS data in populations of non-European ancestry, potential bias may have been introduced when comparing same-ancestry populations from different geographical regions. Demographically heterogeneous populations of the same ancestry may have different genetic architectures or influence of environment on either the exposures or the outcome, resulting in bias being introduced into the MR analysis. Smaller sample sizes in non-European ancestry cohorts may have also contributed to the observed variations in causal effects across ancestry populations, with smaller sample sizes resulting in lower statistical power and less precise estimates. The presence of horizontal pleiotropy is also a limitation of this study. Despite findings from the sensitivity analysis, we cannot exclude the possible presence and influence of horizontal pleiotropy in this study. This was highlighted by the identification of a number of genetic instruments with a low F-statistic, indicating weak instrument bias.

Methodologically, a limitation is that it is also possible that there was sample overlap between data sources selected for the exposure and outcome variables included in the analysis. Although an effort was made to ensure independent populations were selected for the two-sample MR, it should not be overlooked that some data may have been shared across consortiums, especially for non-European populations.

Strengths

In this analysis we were able to compare the feasibility of conducting a two-sample MR analysis across diverse ancestry populations in the same set of exposures on CAD. Employing known causational effects derived from European data as benchmarks, we were able to assess the implications of employing non-European data to investigate the same questions of effect. To the best of our knowledge, this is the first study which conducted such a study on a large-scape across four major ancestry groups.

Our findings of insufficient power in GWAS summary statistic data and limited feasibility of running a two-sample MR in non-European populations highlights gaps and future directions for research. The most apparent gap in this research area relates to the need for the development of more large-scale consortia. Collaborative efforts between researchers to cumulate resources and GWAS data will expectantly improve both the feasibility and reliability of MR studies in non-European populations. Refining the MR methodology to be more inclusive of diverse ancestry data is another future consideration which needs to be taken into account. Establishing protocols which standardize data harmonization and population stratification across different ancestry populations will expectantly allow for more robust analyses to be conducted in diverse populations.

Overall, we found that despite the increasing availability of GWAS summary statistics in populations of non-European ancestries, the feasibility of running a two-sample MR, that meets the necessary MR assumptions while having sufficient power, is still limited. Our findings highlighted the impracticality of running a routine large-scale two-sample MR analysis in non-European ancestry populations today, as the error and bias introduced during the preliminary steps of data sourcing and genetic variant identification made the results almost implausible. This emphasizes the need for larger scale genomic studies in non-European populations in the near future, in order to increase sample sizes and power to facilitate research and improve the MR reliability of common complex traits such as CAD across diverse populations.

Data availability

The genome-wide association summary statistics data used in this study are publicly available at https://www.ebi.ac.uk/gwas/downloads/summary-statistics. The processed data generated in this study are provided in the Supplementary Information and Supplementary Data.

Code availability

We used publicly available R package and its code is publicly available at https://mrcieu.github.io/TwoSampleMR/articles/introduction.html. Other software programs used are listed and described in the Methods.

Acknowledgments

The authors thank Million Veteran Program (MVP) staff, researchers, and volunteers, who have contributed to MVP, and especially participants who previously served their country in the military and now generously agreed to enroll in the study. (See https://www.research.va.gov/mvp/ for more details). The citation for MVP is Gaziano, J.M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70, 214-23 (2016). This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration, and was supported by the Veterans Administration (VA) Cooperative Studies Program (CSP) award #G002. “Data was accessed through approved dbGaP proposal #30287 entitled, “Genomic determinant of Complex Diseases in African ancestry individuals”. SF is supported by the Wellcome Trust grant 220740/Z/20/Z.

Authors Contribution Statement

SS, SF and DN conceptualized the study. SF and DN designed and supervised the study. SS performed the main analyses. SS wrote the first draft of the manuscript. SF, DAH, CK and CT read, reviewed the first draft and provided critical feedback on the paper.

Conflicts of Interest

Authors declare no competing interests.

Thomas M, Su YR, Rosenthal EA, Sakoda LC, Schmit SL, Timofeeva MN, Chen Z, Fernandez-Rozadilla C, Law PJ, Murphy N, Carreras-Torres R. Combining Asian-European Genome-Wide Association Studies of Colorectal Cancer Improves Risk Prediction Across Race and Ethnicity. medRxiv. 2023:2023-01.
Lee SH, Lee JY, hui Kim G, Jung KJ, Lee S, Kim HC, Jee SH. Two-sample mendelian randomization study of lipid levels and ischemic heart disease. Korean Circulation Journal. 2020 Oct 1;50(10):940-8.
Burgess S, Smith GD, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C, Morrison JV. Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome open research. 2019;4.
Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, Periyasamy S, Lam M, Iyegbe C, Strawbridge RJ, Brick L, Carey CE. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell. 2019 Oct 17;179(3):589-603.
Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Statistical Methods in Medical Research. 2017 Oct;26(5):2333-55.
Fatumo S, Choudhury A. African American genomes don't capture Africa's genetic diversity. Nature. 2023 May;617(7959):35.
Soremekun O, Karhunen V, He Y, Rajasundaram S, Liu B, Gkatzionis A, Soremekun C, Udosen B, Musa H, Silva S, Kintu C. Lipid traits and type 2 diabetes risk in African ancestry individuals: A Mendelian Randomization study. EBioMedicine. 2022 Apr 1;78.
Chen Z, Schunkert H. Genetics of coronary artery disease in the post‐GWAS era. Journal of Internal Medicine. 2021 Nov;290(5):980-92.
Mester R, Hou K, Ding Y, Meeks G, Burch KS, Bhattacharya A, Henn BM, Pasaniuc B. Impact of cross-ancestry genetic architecture on GWASs in admixed populations. The American Journal of Human Genetics. 2023 Jun 1;110(6):927-39.
Fatumo S, Karhunen V, Chikowore T, Sounkou T, Udosen B, Ezenwa C, Nakabuye M, Soremekun O, Daghlas I, Ryan DK, Taylor A. Metabolic traits and stroke risk in individuals of African ancestry: Mendelian randomization analysis. Stroke. 2021 Aug;52(8):2680-4.
Musunuru K, Kathiresan S. Genetics of common, complex coronary artery disease. Cell. 2019 Mar 21;177(1):132-45.
Lindstrom M, DeCleene N, Dorsey H, Fuster V, Johnson CO, LeGrand KE, Mensah GA, Razo C, Stark B, Varieur Turco J, Roth GA. Global burden of cardiovascular diseases and risks collaboration, 1990-2021. Journal of the American College of Cardiology. 2022 Dec 20;80(25):2372-425.
World Health Organization (WHO). Cardiovascular diseases (CVDs). World Health Organization (WHO). 2021 [May 5 2023]. Available at: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
Malakar AK, Choudhury D, Halder B, Paul P, Uddin A, Chakraborty S. A review on coronary artery disease, its risk factors, and therapeutics. Journal of cellular physiology. 2019 Oct;234(10):16812-23.
Grace C, Hopewell JC, Watkins H, Farrall M, Goel A. Robust estimates of heritable coronary disease risk in individuals with type 2 diabetes. Genetic Epidemiology. 2022 Feb;46(1):51-62.
Hu X, Zhuang XD, Mei WY, Liu G, Du ZM, Liao XX, Li Y. Exploring the causal pathway from body mass index to coronary heart disease: a network Mendelian randomization study. Therapeutic Advances in Chronic Disease. 2020 May;11: 2040622320909040
Richardson TG, Sanderson E, Palmer TM, Ala-Korpela M, Ference BA, Davey Smith G, Holmes MV. Evaluating the relationship between circulating lipoprotein lipids and apolipoproteins with risk of coronary heart disease: A multivariable Mendelian randomisation analysis. PLoS medicine. 2020 Mar 23;17(3):e1003062.
Geng T, Smith CE, Li C, Huang T. Childhood BMI and adult type 2 diabetes, coronary artery diseases, chronic kidney disease, and cardiometabolic traits: a Mendelian randomization analysis. Diabetes care. 2018 May 1;41(5):1089-96.
Holmes MV, Asselbergs FW, Palmer TM, Drenos F, Lanktree MB, Nelson CP, Dale CE, Padmanabhan S, Finan C, Swerdlow DI, Tragante V. Mendelian randomization of blood lipids for coronary heart disease. European heart journal. 2015 Mar 1;36(9):539-50
Sollis E, Mosaku A, Abid A, Buniello A, Cerezo M, Gil L, Groza T, Güneş O, Hall P, Hayhurst J, Ibrahim A. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic acids research. 2023 Jan 6;51(D1):D977-85.
Spracklen CN, Chen P, Kim YJ, Wang X, Cai H, Li S, Long J, Wu Y, Wang YX, Takeuchi F, Wu JY. Association analyses of East Asian individuals and trans-ancestry analyses with European individuals reveal new loci associated with cholesterol and triglyceride levels. Human molecular genetics. 2017 May 1;26(9):1770-84.
Graham SE, Clarke SL, Wu KH, Kanoni S, Zajac GJ, Ramdas S, Surakka I, Ntalla I, Vedantam S, Winkler TW, Locke AE. The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021 Dec 23;600(7890):675-9.
Matsunaga H, Ito K, Akiyama M, Takahashi A, Koyama S, Nomura S, Ieki H, Ozaki K, Onouchi Y, Sakaue S, Suna S. Transethnic meta-analysis of genome-wide association studies identifies three new loci and characterizes population-specific differences for coronary artery disease. Circulation: Genomic and Precision Medicine. 2020 Jun;13(3):e002670.
Cho YS, Chen CH, Hu C, Long J, Hee Ong RT, Sim X, Takeuchi F, Wu Y, Go MJ, Yamauchi T, Chang YC. Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nature genetics. 2012 Jan;44(1):67-72.
Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, Whitbourne S, Deen J, Shannon C, Humphries D, Guarino P. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. Journal of clinical epidemiology. 2016 Feb 1;70:214-23.
Schunkert H, König IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, Preuss M, Stewart AF, Barbalic M, Gieger C, Absher D. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics. 2011 Apr;43(4):333-8.
Loh M, Zhang W, Ng HK, Schmid K, Lamri A, Tong L, Ahmad M, Lee JJ, Ng MC, Petty LE, Spracklen CN. Identification of genetic effects underlying type 2 diabetes in South Asian and European populations. Communications Biology. 2022 Apr 7;5(1):329.
Scott RA, Scott LJ, Mägi R, Marullo L, Gaulton KJ, Kaakinen M, Pervjakova N, Pers TH, Johnson AD, Eicher JD, Jackson AU. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes. 2017 Nov 1;66(11):2888-902.
Wong HS, Tsai SY, Chu HW, Lin MR, Lin GH, Tai YT, Shen CY, Chang WC. Genome-wide association study identifies genetic risk loci for adiposity in a Taiwanese population. PLoS Genetics. 2022 Jan 20;18(1):e1009952.
Turcot V, Lu Y, Highland HM, Schurmann C, Justice AE, Fine RS, Bradfield JP, Esko T, Giri A, Graff M, Guo X. Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nature genetics. 2018 Jan;50(1):26-41.
Gurdasani D, Carstensen T, Fatumo S, Chen G, Franklin CS, Prado-Martinez J, Bouman H, Abascal F, Haber M, Tachmazidou I, Mathieson I. Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell. 2019 Oct 31;179(4):984-1002.
Walters RG, Millwood IY, Lin K, Valle DS, McDonnell P, Hacker A, Avery D, Edris A, Fry H, Cai N, Kretzschmar WW. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genomics. 2023 Aug 9;3(8).
Wuttke M, Li Y, Li M, Sieber KB, Feitosa MF, Gorski M, Tin A, Wang L, Chu AY, Hoppmann A, Kirsten H. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nature genetics. 2019 Jun;51(6):957-72.
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine. 2015 Mar 31;12(3):e1001779.
Locke AE, Steinberg KM, Chiang CW, Service SK, Havulinna AS, Stell L, Pirinen M, Abel HJ, Chiang CC, Fulton RS, Jackson AU. Exome sequencing of Finnish isolates enhances rare-variant association power. Nature. 2019 Aug 15;572(7769):323-8.
Finer S, Martin HC, Khan A, Hunt KA, MacLaughlin B, Ahmed Z, Ashcroft R, Durham C, MacArthur DG, McCarthy MI, Robson J. Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. International journal of epidemiology. 2020 Feb 1;49(1):20-1i.
Nelson CP, Goel A, Butterworth AS, Kanoni S, Webb TR, Marouli E, Zeng L, Ntalla I, Lai FY, Hopewell JC, Giannakopoulou O. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nature genetics. 2017 Sep;49(9):1385-91.
Burgess S, Davies NM, Thompson SG. Bias due to participant overlap in two‐sample Mendelian randomization. Genetic epidemiology. 2016 Nov;40(7):597-608.
Wasserstein RL, Schirm AL, Lazar NA. Moving to a world beyond “p< 0.05”. The American Statistician. 2019 Mar 29;73(sup1):1-9.
Thakur P, Jha V. Potential effects of lowering the threshold of statistical significance in the field of chronic rhinosinusitis-A meta-research on published randomized controlled trials over last decade. Brazilian Journal of Otorhinolaryngology. 2023 Jan 20;88: 83-9.
Fatumo S, Mugisha J, Soremekun OS, Kalungi A, Mayanja R, Kintu C, Makanga R, Kakande A, Abaasa A, Asiki G, Kalyesubula R. Uganda Genome Resource: A rich research database for genomic studies of communicable and non-communicable diseases in Africa. Cell Genomics. 2022 Nov 9;2(11).
Burgess S, Thompson SG, Crp Chd Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. International journal of epidemiology. 2011 Jun 1;40(3):755-64.
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R, Tan VY. The MR-Base platform supports systematic causal inference across the human phenome. elife. 2018 May 30;7: e34408.
R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. International journal of epidemiology. 2017 Dec 1;46(6): 1734-9.
Burgess S, Bowden J, Fall T, Ingelsson E, Thompson SG. Sensitivity analyses for robust causal inference from Mendelian randomization analyses with multiple genetic variants. Epidemiology (Cambridge, Mass.). 2017 Jan;28(1): 30.
Zhang X, Wen Z, Xing Z, Zhou X, Yang Z, Dong R, Yang J. The causal relationship between osteoarthritis and bladder cancer: A Mendelian randomization study. Cancer Medicine. 2023 Dec 15.
Soremekun O, Musanabaganwa C, Uwineza A, Ardissino M, Rajasundaram S, Wani AH, Jansen S, Mutabaruka J, Rutembesa E, Soremekun C, Cheickna C. A Mendelian randomization study of genetic liability to post-traumatic stress disorder and risk of ischemic stroke. Translational psychiatry. 2023 Jul 1;13(1):237.
Silva S, Nitsch D, Fatumo S. Genome-wide association studies on coronary artery disease: A systematic review and implications for populations of different ancestries. Plos one. 2023 Nov 29;18(11): e0294341.
Ke W, Rand KA, Conti DV, Setiawan VW, Stram DO, Wilkens L, Le Marchand L, Assimes TL, Haiman CA. Evaluation of 71 coronary artery disease risk variants in a multiethnic cohort. Frontiers in Cardiovascular Medicine. 2018 Mar 14; 5:19.
Makshood M, Post WS, Kanaya AM. Lipids in South Asians: epidemiology and management. Current cardiovascular risk reports. 2019 Aug; 13:1-1.
Barzi F, Patel A, Woodward M, Lawes C M M, Ohkubo T, Gu D, Lam T H, Ueshima H; Asia Pacific Cohort Studies Collaboration. A comparison of lipid variables as predictors of cardiovascular disease in the Asia Pacific region. Annals of epidemiology. 2005 May 1;15(5): 405-13.
Heart Protection Study Collaborative Group. MRC/BHF Heart Protection Study of cholesterol lowering with simvastatin in 20 536 high-risk individuals: a randomised placebo-controlled trial. The Lancet. 2002 Jul 6;360(9326): 7-22.
Sadreev II, Elsworth BL, Mitchell RE, Paternoster L, Sanderson E, Davies NM, Millard LA, Smith GD, Haycock PC, Bowden J, Gaunt TR. Navigating sample overlap, winner’s curse and weak instrument bias in Mendelian randomization studies using the UK Biobank. MedRxiv. 2021 Jul 1:2021-06.
Fang S, Hemani G, Richardson TG, Gaunt TR, Davey Smith G. Evaluating and implementing block jackknife resampling Mendelian randomization to mitigate bias induced by overlapping samples. Human Molecular Genetics. 2023 Jan 15;32(2):192-203.
Zhang L, Qiao Q, Tuomilehto J, Janus ED, Lam TH, Ramachandran A, Mohan V, Stehouwer CD, Dong Y, Nakagami T, Onat A. Distinct ethnic differences in lipid profiles across glucose categories. The Journal of Clinical Endocrinology & Metabolism. 2010 Apr 1;95(4):1793-801.
Doi T, Langsted A, Nordestgaard BG. Lipoproteins, cholesterol, and atherosclerotic cardiovascular disease in East Asians and Europeans. Journal of Atherosclerosis and Thrombosis. 2023 Nov 1;30(11):1525-46.
Hong SJ, Lee YJ, Lee SJ, Hong BK, Kang WC, Lee JY, Lee JB, Yang TH, Yoon J, Ahn CM, Kim JS. Treat-to-Target or High-Intensity Statin in Patients With Coronary Artery Disease: A Randomized Clinical Trial. JAMA. 2023 Apr 4;329(13):1078-87.
Allaire J, Vors C, Couture P, Lamarche B. LDL particle number and size and cardiovascular risk: anything new under the sun?. Current opinion in lipidology. 2017 Jun 1;28(3):261-6.
Bilen O, Kamal A, Virani SS. Lipoprotein abnormalities in South Asians and its association with cardiovascular disease: current state and future directions. World journal of cardiology. 2016 Mar 3;8(3):247.
St-Pierre AC, Cantin B, Dagenais GR, Mauriege P, Bernard PM, Després JP, Lamarche B. Low-density lipoprotein subfractions and the long-term risk of ischemic heart disease in men: 13-year follow-up data from the Quebec Cardiovascular Study. Arteriosclerosis, thrombosis, and vascular biology. 2005 Mar 1;25(3):553-9.
Yusuf S, Hawken S, Ôunpuu S, Dans T, Avezum A, Lanas F, McQueen M, Budaj A, Pais P, Varigos J, Lisheng L. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. The lancet. 2004 Sep 11;364(9438):937
Kulkarni KR, Markovitz JH, Nanda NC, Segrest JP. Increased prevalence of smaller and denser LDL particles in Asian Indians. Arteriosclerosis, thrombosis, and vascular biology. 1999 Nov;19(11):2749-55.
Ruuth M, Janssen LG, Äikäs L, Tigistu-Sahle F, Nahon KJ, Ritvos O, Ruhanen H, Käkelä R, Boon MR, Öörni K, Rensen PC. LDL aggregation susceptibility is higher in healthy South Asian compared with white Caucasian men. Journal of clinical lipidology. 2019 Nov 1;13(6):910-9.
Roseboom TJ, Painter RC, van Abeelen AF, Veenendaal MV, de Rooij SR. Hungry in the womb: what are the consequences? Lessons from the Dutch famine. Maturitas. 2011 Oct 1;70(2):141-5.
Barker DJ, Osmond C, Golding J, Kuh D, Wadsworth M. Growth in utero, blood pressure in childhood and adult life, and mortality from cardiovascular disease. BMJ: British Medical Journal. 1989 Mar 3;298(6673):564.
Roseboom TJ, van der Meulen JH, Osmond C, Barker DJ, Ravelli AC, Schroeder-Tanka JM, van Montfrans GA, Michels RP, Bleker OP. Coronary heart disease after prenatal exposure to the Dutch famine, 1944–45. Heart. 2000 Dec 1;84(6):595-8.
Painter RC, de Rooij SR, Bossuyt PM, Simmers TA, Osmond C, Barker DJ, Bleker OP, Roseboom TJ. Early onset of coronary artery disease after prenatal exposure to the Dutch famine–. The American journal of clinical nutrition. 2006 Aug 1;84(2):322-7.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Genetic Insights into Coronary Artery Disease in Underrepresented Populations: Assessing Two-Sample Mendelian Randomization across Diverse Ancestry Populations

Status:

Version 1

Abstract

Introduction

Methods

Data Sources

Genetic Instruments

Mendelian Randomization Analysis

Results

Genetic Instruments

Two-Sample Mendelian Randomization Analysis

Sensitivity Analysis

Discussion

Limitations

Strengths

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1