Shared heritability among the six CVDs
Following the harmonization and filtering of SNPs shared between GWAS summary statistics, we used LDSC to calculate SNP-based heritability (h2SNP) and genome-wide genetic correlations (rgs). The power of GWASs and the strength of their genetic signal varied considerably among the six CVDs, with the median h2SNP being 1.4% (range: 0.6%-3.2%, Fig. 2a and Supplementary Table 2a). Estimated rgs were moderate to strong (median = 0.435, range: 0.148–0.677) and always positively significant at a Bonferroni-corrected significance level of P = 3.33×10− 3 (Fig. 2b and Supplementary Table 2b). Substantial rgs were observed both between pairs of CVDs with low h2SNP (for example, for heart failure [HF] and Stroke, h2SNP is 0.6% and 0.8%; rg = 0.552) as well as relatively high h2SNP (for example, for atrial fibrillation [AF] and coronary artery disease [CAD], h2SNP is 2.5% and 3.2%; rg = 0.197).
Among the six CVDs, HF and CAD exhibited augmented estimated higher estimated polygenicity, indicative of more extensive involvement of genetically correlated genomic regions compared to other CVDs. Conversely, PAD displayed superior discoverability while exhibiting less pronounced polygenicity in relation to other CVDs. Consequently, PAD displayed relatively fewer genomic loci causally associated with varying effect sizes, in contrast to other CVDs (Supplementary Table 3b). Notably, the estimated sample size needed to achieve 90% h2SNP was over 24 times smaller for PAD compared to HF (Supplementary Fig. 1). Overall, there were substantial polygenic overlaps among the six CVDs with the median Dice coefficient being 0.362 (range: 0.207–0.732, Fig. 2c, Supplementary Fig. 2, and Supplementary Table 3b), both in the presence of moderate positive genetic correlation (CAD and HF) and minimal genetic correlation (AF and PAD). For example, the polygenic overlap between CAD and HF was particularly striking, with 1,397 (SD = 254) shared variants, representing 93.3% variants influencing CAD and 60.9% variants influencing HF, mirroring their robust positive genetic correlations (rg = 0.699, SE = 0.0129). Comparatively fewer shared variants (74, SD = 11) were noted between AF and PAD, representing 14.0% and 18.9% of variants influencing AF and PAD, respectively, with a weak positive genetic correlation (rg = 0.140, SE = 0.0150).
To identify genomic regions harboring variants associated with multiple CVDs, local genetic correlations were calculated within regions displaying univariate signals for CVDs using the LAVA approach. Among 896 defined genomic regions, 495 (55.25%) exhibited univariate genetic signals (P < 1 × 10− 4) for multiple CVDs, with 21 regions (2.34%) displaying signals for more than half of the CVDs (Fig. 2d and Supplementary Table 4). Yet, only 24 distinct genomic regions reached significance for local genetic correlations after Bonferroni correction, and merely 2 regions displayed significant correlation for more than two CVDs (P = 0.05 / no. of bivariate tests = 0.05 / 577 = 8.67 × 10 − 5; Fig. 2e and Supplementary Table 5).
The investigation extended to colocalization analysis through GWAS-PW to evaluate whether paired CVD associations emanated from the same signal. GWAS-PW identified 196 unique genomic regions suggestive of variants influencing pairs of CVDs (PPA_3 > 0.9, Supplementary Table 6), signifying shared regions and causal variants. Notably, 50 unique genomic regions demonstrated robust for loci shared by more than two traits. Strikingly, these regions encompassed many genomic loci not deemed significant in the bivariate LAVA analysis of CVD pairs. Meanwhile, only 3 unique genomic regions (chr4: 109978983–111733579, chr9: 21674033–23091193, and chr9: 136042491–136770926) were significantly captured by both bivariate LAVA and GWAS-PW (Supplementary Fig. 3).
Subsequently, the construction of the latent causal variable (LCV) model allowed the estimation of partial genetic causality between significantly correlated CVD pairs. Convincing evidence (|GCP| > 0.6, P < 0.05 / no. of CVD pairs = 0.05 / 15 = 3.33 × 10− 3) for partial genetic causality emerged for two CVD pairs (Supplementary Table 7). By considering the sign of the genetic correlation, it was plausible to infer that CVD could elevate HF risk, while VTE might have an adverse impact on CVD. Notably, the LCV results hinted at a partial causal influence of VTE (GCP = 0.649, P = 6.31 × 10− 3) on Stroke, although it did not withstand multiple testing corrections (P < 3.33 × 10− 3). Furthermore, a trait pair (AF-HF, GCP = 0.575, P = 1.07 × 10− 3) exhibited trends towards partial genetic causation, suggesting AF might partially influence VTE, but fell short of the stringent GCP threshold (|GCP| > 0.6). To validate the reproducibility of partial causal association findings for these four CVD pairs, LHC-MR method was employed, yet all trait pairs demonstrated bias due to reverse causality (Supplementary Fig. 4 and Supplementary Table 8).
In all, these findings reinforced the genetic resemblance among CVDs, implicating the plausible presence of shared SNPs, genes, or biological pathways contributing to general CVD predisposition.
Functional annotation of the MTAG-identified loci
In previous studies, cross-trait meta-analysis has proven valuable in augmenting the statistical power to uncover pleiotropic SNPs, shared genetic underpinnings, and common biological annotations across distinct psychiatric disorders and immunological diseases. In line with this, we undertook cross-trait GWAS meta-analyses employing the MTAG methodology, with the overarching objective of enhancing the identification of novel genetic associations for each trait by pinpointing pleiotropic SNPs that contribute to the genetic susceptibility of cardiovascular diseases. As a result, we observed substantial augmentation in the maximal effective sample sizes, with increments from1,030,836 to 1,076,012 for AF; 1,165,690 to 1,237,236 for CAD; 1,500,861 to 1,631,058 for VTE; 977,323 to 1,219,100 for HF; 511,634 to 843,568 for PAD; and 1,308,460 to 1,735,909 for Stroke (Table 1 and Fig. 3a-f). Importantly, examination of heritability Z-scores for MTAG results indicated greater polygenic signals compared to the original GWAS outcomes. Furthermore, the LDSC intercepts were approximately one, suggesting that the increase in mean χ2 statistics was due to polygenicity and not due to stratification or other confounding biases. The Q-Q plots and λGC of all MTAG and GWAS results are shown in Supplementary Figs. 5 and 6. The λGC values for AF, CAD, VTE, HF, PAD, and Stroke ranged from 1.096 to 1.421 in the GWAS results, and from 1.014 to 1.446 in the MTAG results, suggesting no evidence of inflation was found. Supplementary Note 2 and 3 listed Detailed information about novel genomic loci and mapped genes based on MTAG results of six CVDs (Supplementary Figs. 7 and 8, Supplementary Table 11–18).
Table 1
Statistical summary of GWAS and MTAG results.
| AF | CAD | VTE | HF | PAD | Stroke |
GWAS | MTAG | GWAS | MTAG | GWAS | MTAG | GWAS | MTAG | GWAS | MTAG | GWAS | MTAG |
mean χ2 | 1.5451 | 1.4888 | 1.7648 | 1.8020 | 1.5698 | 1.4830 | 1.1591 | 1.2062 | 1.1025 | 1.1253 | 1.2280 | 1.1944 |
h2 | 0.0250 | 0.0245 | 0.0324 | 0.0345 | 0.0182 | 0.0183 | 0.0080 | 0.0118 | 0.0094 | 0.0157 | 0.0060 | 0.0084 |
λGC | 1.2932 | 1.2697 | 1.4210 | 1.4460 | 1.3203 | 1.2498 | 1.1270 | 1.1555 | 1.0957 | 1.0135 | 1.1876 | 1.1459 |
Intercept | 1.0538 | 1.0025 | 1.0218 | 1.0067 | 1.0511 | 0.9501 | 1.0073 | 0.9797 | 1.0157 | 0.9795 | 1.0738 | 0.9750 |
Ratio | 0.0978 | 0.0052 | 0.0302 | 0.0083 | 0.0898 | < 0 | 0.0416 | < 0 | 0.1560 | < 0 | 0.3245 | < 0 |
Sample size | 1,030,836 | 1,076,012 | 1,165,690 | 1,237,236 | 1,500,861 | 1,631,058 | 977,323 | 1,219,100 | 511,634 | 843,568 | 1,308,460 | 1,735,909 |
Total SNPs | 6,925,287 | 5,782,223 | 6,925,287 | 5,782,223 | 6,925,287 | 5,782,223 | 6,925,287 | 5,785,906 | 6,925,287 | 5,785,906 | 6,925,287 | 5,785,906 |
nSig SNPs | 10,784 | 9,422 | 15,789 | 13,739 | 9,070 | 7,155 | 272 | 540 | 92 | 562 | 941 | 713 |
nGenomic locus | 117 | 120 | 203 | 194 | 91 | 80 | 11 | 13 | 4 | 48 | 24 | 20 |
nLead SNPs | 749 | 668 | 1049 | 992 | 765 | 646 | 21 | 42 | 8 | 78 | 50 | 53 |
nInd Sig SNPs | 224 | 214 | 362 | 339 | 222 | 196 | 12 | 14 | 5 | 55 | 25 | 22 |
nGenes mapped by position | 397 | 375 | 662 | 642 | 447 | 399 | 38 | 47 | 21 | 73 | 67 | 71 |
nGenes mapped by MAGMA | 298 | 281 | 551 | 558 | 401 | 334 | 20 | 49 | 5 | 27 | 43 | 53 |
nGenes mapped by eqtl | 352 | 329 | 647 | 651 | 345 | 319 | 30 | 39 | 7 | 59 | 56 | 60 |
nGenes mapped by TWAS | 155 | 147 | 284 | 295 | 204 | 189 | 11 | 14 | 5 | 15 | 16 | 16 |
mean χ2, Mean chi-squared value; h2, SNP-based heritability; λGC, LambdaGC of 1 indicates no inflation of the median χ2 association statistic; Intercept, LD score intercept of 1 indicates that inflation is not due to population stratification biases; Ratio, (intercept-1) / (mean chi-squared value-1), the ratio indicates what percentage of the inflation can be ascribed to other sources than polygenicity; nSig SNPs, Number of genome-wide significant SNPs (P < 5 × 10− 8); nGenomic locus, Number of the genomic locus; nLead SNPs, Number of lead SNPs in the genomic locus (P < 5 × 10− 8 and r2 < 0.1); nInd Sig SNPs, Number of the independent significant SNPs in the genomic locus (P < 5 × 10− 8 and r2 < 0.6); nGenes mapped by position, Number of genes mapped by position using FUMA; nGenes mapped by MAGMA, Number of genes mapped by MAGMA; nGenes mapped by eqtl, Number of genes mapped by eqtl using FUMA; nGenes mapped by TWAS, Number of genes mapped by TWAS. |
Characterizing shared biological mechanisms among CVDs
In light of the challenges posed by prior GWAS results in discerning shared biological mechanisms among CVDs, our subsequent efforts involved a comprehensive assessment of genetic overlap within the MTAG results of the six CVDs, spanning genomic loci, SNPs, genes, and pathways.
Among the noteworthy observations, a total of 40 pleiotropic loci showed genetic signals for multiple CVDs, with 9 (22.0%) demonstrating this phenomenon for more than half of the considered CVDs (Supplementary Table 19). However, when testing whether these overlapping loci shared causal variants through multitrait colocalization analysis from HyPrColoc, only 13 (3.32%) regions exhibited strong colocalization evidence (PP > 0.70). Out of these, 11 regions were corroborated by Metasoft analysis (Supplementary Fig. 9).
The most prominent pleiotropic locus we identified is in an intron of the lipoprotein(a) gene (LPA) on 6q25.3. This locus, encompassing the shared causal SNP rs10455872, demonstrated colocalization across all CVDs except MTAG_VTE. Notably, this causal variant has been established as linked to elevated lipoprotein (a) (Lp[a]) levels, in addition to an increased risk of CVDs across the general population21. Although this region displayed associations solely with MTAG_CAD and MTAG_PAD (Pmeta = 1.11× 10− 143, Fig. 4a) through MetaSoft analysis, it underscored the significant influence of this locus. Another notable region, 7p21.1 surrounding SNP rs8084351, located at the intergenic region of twist family bHLH transcription factor 1 gene [TWIST1] and histone deacetylase 9 [HDAC9]), exhibited the second-most pronounced pleiotropic association. This region demonstrated a strong connection with prevalent vascular diseases, including MTAG_CAD, MTAG_PAD, and MTAG_Stroke)(Pmeta = 2.20× 10− 41, Fig. 4b). The signal in our analysis was associated with artery aorta eQTLs for TWIST1 rather than HDAC9 (eQTL association FDR for TWIST1 = 5.54 × 10− 4), supporting TWIST1 as a plausible candidate gene. Tests have demonstrated that the rs2107595 locus can regulate the expression of TWIST1 and provide evidence for a functional role in vascular smooth muscle cells22. Further exploration led to the identification of the most pleiotropic novel locus at the11p11.2 region, positioned upstream of hydroxysteroid 17-beta dehydrogenase 12 gene (HSD17B12). This locus exhibited association with MTAG_CAD, MTAG_HF, and MTAG_Stroke. Although it fell short of the stringent PP threshold (PP = 0.626, less than 0.7), the shared causal SNP rs7944241 showcased noteworthy significance (Pmeta = 6.27× 10− 28, Fig. 4c). HSD17B12, a gene predominantly expressed in platelets, plays a pivotal role in lipid metabolism and fatty acid biosynthesis23.
A pivotal aspect of our analysis encompassed investigating the presence of opposite directional effects of shared causal variants among the 13 pleiotropic loci across various diseases. We identified only one locus with evidence of antagonistic effects on two or more CVDs. This locus, encompassing rs28929474, an exonic variant located at the Z-allele of serpin family A member 1 gene (SERPINA1) within the 14q32.31 region, showed opposing directional effects on MTAG_CAD and MTAG_VTE. However, this result did not surpass the significant threshold in Metasoft analysis (Pmeta = 0.260 > 0.05; Fig. 4d, Supplementary Fig. 10). Notably, all remaining 12 loci exhibited the same directional effect on diseases, including 10 susceptible loci and 2 protective loci, in line with their strong genome-wide genetic correlation.
Of the 819 Lead SNPs, 1,063 susceptibility genes by MAGMA, and 588 tissue-specific genes by TWAS that achieved genome-wide significance (GWS) for any of the six CVDs, only a fraction of these demonstrated significance across multiple CVDs. Specifically, only 92 (11.23%) Lead SNPs (P < 5 × 10− 8; Supplementary Table 20), 160 (15.05%) susceptibility genes (P < 2.84 × 10− 6), and 73 (12.41%) tissue-specific genes exhibited GWS for more than one CVD. Of these, only 18 (2.20%) Lead SNPs, 20 (1.88%) susceptibility genes, and 4 (0.68%) tissue-specific genes reached GWS for more than half of CVDs (Supplementary Tables 18 and 20). In conclusion, the observation that more than one CVD showed genetic signals in the given overlapped region with high-supporting evidence of colocalization is further supported by direct comparisons of SNP- and gene-based results among the six CVDs.
While the observation of a relatively limited number of overlapping genes and regions among cardiovascular diseases is apparent, the distinct genes associated with these conditions could potentially converge within shared biological pathways or display enrichment in similar tissues or functional categories. To explore this notion, we undertook a comprehensive analysis, encompassing gene set assessment based on MAGMA-derived susceptibility genes, pathway enrichment analysis using tissue-specific genes from Metascape, and a closer examination of tissues and functional characteristics utilizing SNP-based heritability data from LDSC-SEG. Following meticulous correction for 9,398 tests across the six CVDs (that is, 7,744 and 1,654 gene sets for GO BP and Reactome, respectively; αBON = .05 / 9,398 / 6 = 8.87 × 10− 7; Supplementary Table 21), we observed little convergence among sets of CVDs for gene sets, of which only 3 gene sets (involved in circulatory system development, circulatory system process, and heart development) were enriched in more than one CVD and no gene set showed significant enrichment for more than two traits. Subsequently, more lenient enrichment tests of gene function using Metascape confirmed little convergence across CVDs’ tissue-specific genes (Supplementary Table 22). Similarly, After correcting separately for tissues or functional categories and six CVDs (that is, 49 tissues and 489 functional categories; αBON for tissues = .05 / 49 / 6 = 1.70 × 10− 4; αBON for functional categories = .05 / 489 / 6 = 1.70 × 10− 5; Supplementary Figs. 11 and 12 and Supplementary Tables 23 and 24), no test surpassed the significance threshold. Taking a less stringent approach, only correcting separately for tissues or functional categories, there were still no convergence among CVDs for tissues and functional categories. To assess whether certain functional genomic categories (i.e. regulatory or functional features in specific tissues or cell types) show enrichment among multiple CVDs, we performed functional enrichment analysis using GARFIELD. After correction for the 1,005 tested functional categories across all CVDs (αBON = .05 / 1,005 / 6 = 8.29 × 10− 6; Supplementary Fig. 13 and Supplementary Table 25), much more significant enrichment of genetic signals was observed for exon region as well as for 217 (21.6%) DNaseI hypersensitive sites, 69 (6.87%) chromatin accessibility peaks, 34 (3.38%) histone-modified regions, 16 (1.59%) transcription-factor footprints, 8 (0.796%) chromatin states, 1 TFBS, and 1 FAIRE of different tissues or cell types in more than one CVD except MTAG_HF, MTAG_PAD, and MTAG_Stroke. This finding suggested that a considerable part of our characterizing shared signals of functional categories was primarily driven by AF, CAD, and VTE, the CVDs with the strongest genetic signals, hindering interpretation of the characterizing shared signals of functional categories as representing ‘general liability to CVDs’. Similarly, Of the 92 Lead SNPs, 160 susceptibility genes by MAGMA, and 73 tissue-specific genes by TWAS that were GWS for more than one CVD, 57 (61.96%) Lead SNPs, 58 (36.25%) susceptibility genes, and 53 (72.60%) tissue-specific genes were GWS for more than one CVD except MTAG_HF, MTAG_PAD, and MTAG_Stroke.
Ultimately, these findings corroborate our hypothesis, indicating that the analytic signal characterizing shared biological mechanisms was notably influenced by AF, CAD, and VTE, rather than universally representing the genetic variance inherent in multiple CVDs.
Causal proteins identified by Proteome-wide MR and Colocalization analysis
We conducted an in-depth examination of the MR associations between 1,773 proteins with available index cis-acting variants (cis-pQTL) and the risk of CVD outcomes using the SMR approach. Through this analysis, we identified 1202 unique protein-cardiovascular-disease pairs at the marginal significance level (P < 0.05 for SMR analysis), including 223 protein-MTAG_AF pairs, 323 protein-MTAG_CAD pairs, 265 protein-MTAG_VTE pairs, 148 protein-MTAG_HF pairs, 105 protein-MTAG_PAD pairs, and 138 protein-MTAG_Stroke pairs (Supplementary Table 26). Subsequently, following the removal of associations that did not pass the HEIDI test, as well as employing sensitivity analysis using multi-SNPs-SMR and multiple testing corrections with a 5% false discovery rate (FDR), we identified a total of 173 proteins associated with CVDs. Among these, 20, 70, 47, 1, 1, and 10 proteins were significantly associated with MTAG_AF, MTAG_CAD, MTAG_VTE, MTAG_HF, MTAG_PAD, and MTAG_Stroke, respectively (Fig. 5a and Supplementary Table 26). Further investigation was conducted to ascertain whether the circulating proteins identified in the analysis shared causal variants with CVDs. Strong evidence of colocalization was observed between 43 proteins and CVDs (Supplementary Fig. 14 and Supplementary Table 26). More specifically, MTAG_AF demonstrated high support for colocalization with 6 proteins including LRIG1, GUSB, DUSP13B, SPON1, CFL2, and TNFSF12. For MTAG_CAD and MTAG_VTE, we found 15 and 17 proteins exhibited robust colocalization evidence. Likewise, MTAG_HF, MTAG_PAD, and MTAG_Stroke demonstrated high support for colocalization with 1, 1, and 3 proteins, respectively, including SWAP70 for MTAG_HF, PCSK9 for MTAG_PAD, and HINT1, TMEM106B, and SWAP70 for MTAG_Stroke. A total of 12 proteins were identified in more than one CVD, of these, 4 proteins showed strong evidence of colocalization including CALB2 and F2 for MTAG_CAD and MTAG_VTE, SWAP70 for MTAG_HF and MTAG_Stroke, and TMEM106B for MTAG_CAD and MTAG_Stroke. Only F2 for MTAG_CAD and MTAG_VTE were validated with high support of evidence (PP > 0.70) using multitrait colocalization analysis from HyPrColoc, where index SNP rs3136516 (an intronic variant in the F2 gene) was also identified as the shared causal variant (Fig. 5b-d).
Druggability of identified causal proteins
To leverage this genetic information into new potential therapeutic targets for CVDs, we investigated the potential druggability of 39 causal proteins pinpointed in colocalization analysis. Our investigation unearthed a collection of 39 distinct drugs that target 7 unique proteins, as outlined in Supplementary Table 27. Of these drugs, a notable subset consisted of 11 medications designed to target 3 unique proteins specifically tailored for the treatment of the corresponding CVDs. This includes F2, which pertains to for both MTAG_CAD and MTAG_VTE, as well as F11 for MTAG_VTE, and PCSK9 for MTAG_PAD. A prominent example is the drug Dabigatran/Dabigatran etexilate, a direct thrombin inhibitor that targets coagulation factor II (F2). This agent has garnered approval for treating various CVDs such as VTE and CAD24. However, it's important to note that despite Dabigatran's favorable risk-benefit profile, it might be associated with major bleeding events, including gastrointestinal bleeding. For the coagulation factor XI inhibitor targeting F11, namely Abelacimab and Milvexian, evidence suggests heightened effectiveness and safety in treating VTE25. Additionally, monoclonal antibodies Alirocumab and Evolocumab, both PCSK9 inhibitors, have gained approval for reducing low-density lipoprotein (LDL) cholesterol levels, thereby safeguarding against an array of CVDs, including PAD26. Drugs initially designed to address autoimmune conditions, targeting SYK and TNFSF12, also emerge as potential therapeutic avenues for VTE27,28. Despite a lack of drug-related data for 32 unique proteins within the OpenTargets database, a compelling 18 (56.25%) of them present as promising candidates for future treatment prospects, exhibiting highly reliable druggability tiers. Whether agents modulating these proteins could be repurposed for the prevention or treatment of CVDs still needs to be examined in clinical trials.