We conducted a single-stage BP GWAS meta-analysis of individuals of European ancestry evaluating common SNPs. SBP, DBP, and PP GWAS summary statistics from each study were from linear regression models analyzing SNP associations adjusted for age at BP measurement, age2, sex, BMI, and the top 10 genetic principal components. Inferences were limited to SNPs with imputation quality (INFO) scores of 0.1 or higher, Hardy-Weinberg equilibrium p-values greater than or equal to 1x10-6, and MAF greater than or equal to 1%. PP was calculated in each study as the difference between SBP and DBP.
Study Populations
The total sample size for this investigation was up to 1,028,980 adults from the meta-analysis of four BP GWAS datasets: UKB, ICBP, MVP, and BioVU. Characteristics of these studies are presented in Supplementary Table 1. ICBP is a large meta-analysis of 77 studies, therefore descriptive characteristics were not available.
UKB
UKB includes ~500,000 volunteers aged 40-69 years of age ascertained through NHS registers. Following informed consent participants completed a standardized questionnaire on life course exposures, medical history and treatments and underwent a standardized portfolio of phenotypic tests including two BP measurements taken seated after two minutes rest using an appropriate cuff and an Omron HEM-7015IT digital BP monitor. A manual sphygmomanometer was used if the standard automated device could not be employed. Body mass index (BMI) was calculated as weight (kg) divided by height squared (m2) with weight measured using an electronic weighing scale (Tanita BC-418). The participants undergo longitudinal life course linkage to electronic health data including Hospital Episode Statistics and Office for National Statistics cause of death data.
DNA extraction and genotyping for UKB has been previously described11. Briefly, UKB genetic data includes genotypes for 488,377 individuals. DNA was extracted from stored blood samples and genotyping was carried out by Affymetrix Research Services Laboratory. 49,950 participants involved in the UK Biobank Lung Exome Variant Evaluation (UK BiLEVE) study were genotyped at 807,411 markers using the Affymetrix UK BiLEVE Axiom Array and 438,427 participants were genotyped using the Affymetrix UK Biobank Axiom Array (825,927 markers), which shares 95% of marker content with the UK BiLEVE Axiom Array.
Variants were imputed centrally by UK Biobank using a reference panel that merged the UK10K and 1000 Genomes Phase 3 panel as well as the Haplotype Reference Consortium (HRC) panel12. For current analysis only SNPs imputed from the HRC panel were analyzed (N=39,235,157), of which ~7.1 million SNPs with minor allele frequency (MAF) >1% and imputation quality INFO >0.1 are analyzed here for GWAS.
For the UKB GWAS, we calculated the mean SBP and DBP values from two automated (N=418,755) or two manual (N=25,888) BP measurements. For individuals with one manual and one automated BP measurement (N=13,521), we used the mean of these two values. For individuals with only one available BP measurement (N=413), we used this single value. Following both genetic and phenotypic data QC and by excluding pregnant women (n=372) and those individuals who had withdrawn consent (N=36), the sample size for analysis therefore included N=458,577 and N=458,575 self-reported European-ancestry individuals for SBP and DBP, respectively. For measures taken while a patient was on an antihypertensive medication, we added 15 mm Hg to SBP and 10 mm Hg to DBP. We performed linear mixed model (LMM) association testing under an additive genetic model of the three (untransformed) continuous, medication-adjusted BP traits (SBP, DBP, PP) for all measured and imputed genetic variants in dosage format using the BOLT-LMM (v2.3) software.
ICBP
ICBP GWAS is an international consortium to investigate BP genetics and has been previously described elsewhere6,13. All study participants were of European descent and were imputed to either the 1000 Genomes Project Phase 1 integrated release version 3 [March 2012] all ancestry reference panel or the HRC panel. The final ICBP GWAS dataset included 77 studies comprising data from 299,024 individuals. Three quantitative BP traits were analyzed: SBP, DBP, and PP. Within each study, BP measures were adjusted for medication use by adding 15 and 10 mm Hg to SBP and DBP, respectively.
Prior to meta-analysis of all 77 ICBP GWAS studies, we undertook central quality control checks across all studies. This included checks to ensure allele frequency consistency (across studies and with reference populations), checks of effect size and standard error distributions (i.e., to highlight phenotype issues) and generation of quantile-quantile (QQ) plots and genomic inflation factor lambdas to check for over- or under-inflation of test statistics. Genomic control was applied (if lambda>1) at study-level. Variants with imputation quality <0.1 were excluded prior to meta-analysis. EasyQC was used for the quality control process14. Finally, data were filtered to SNPs with MAF ≥1% and effective sample size (reflecting the quality of genotype imputation) >60% of the total effective sample size. Meta-analysis was performed using METAL software employing inverse variance weighted fixed-effects models15. Between-study heterogeneity was assessed using the Cochran’s Q statistic and we performed additional filtering removing heterogeneous variants with Cochran’s Q p<1x10-4.
MVP Study
The MVP study is a large cohort of fully consented participants who were recruited from the patient populations of 63 Department of Veterans Affairs (VA) medical facilities. Summary statistics from the analysis of 220,501 self-reported non-Hispanic white participants were included in our meta-analysis. These results have been previously reported by Giri et al4. Briefly, DNA was extracted from whole blood and genotyped using a custom Affymetrix array (Axiom Biobank; Thermo Fischer Scientific Inc, Waltham, MA, USA). Genotype calling and QC were performed centrally and genotypes were phased using EAGLE v216 and imputed from the 1000 Genomes Project phase 3 version 5 reference panel using Minimac317 software. Participants included adults (age ≥18 years) with non-Emergency Department outpatient SBP and DBP measures available in their electronic health record. For individuals with greater than or equal to three measures available, median SBP and corresponding DBP were used in analysis. For rare cases where fewer than three measures were available, the lowest available SBP and corresponding DBP were used. We observed an average of 220 measures across individuals. In individuals in whom the median SBP value was observed at multiple clinical encounters on distinct dates, we used the earliest of those measures to identify the DBP, age, BMI, and anti-hypertensive treatment status of the individual at that time. Measures were ineligible if they occurred at or after an International Classification of Diseases Ninth Revision (ICD-9) code from the groups 585 (chronic kidney disease), 405 (secondary hypertension), or 428 (heart failure). If pain scores were available, BP measures taken during encounters when a pain score ≥5 was recorded were also ineligible. BP measures were adjusted for medication use by adding 15 and 10 mm Hg to SBP and DBP, respectively. Linear regression association tests were conducted using additive models for untransformed medication-adjusted BP traits (SBP, DBP, PP) using SNPTEST-v2.5.4-beta18.
BioVU
The BioVU DNA Repository is a deidentified database of electronic health records that are linked to patient DNA samples at Vanderbilt University Medical Center. Summary statistics from the analysis of 50,649 self-reported non-Hispanic white participants were included in our meta-analysis. A detailed description of the database and how it is maintained has been published elsewhere10. BioVU participant DNA samples were genotyped on a custom Illumina Multi-Ethnic Genotyping Array (MEGA-ex; Illumina Inc., San Diego, CA, USA). Quality control (QC) was conducted, excluding samples or variants with missingness rates above 2%. Samples were also excluded if consent had been revoked, sample was duplicated, or failed sex concordance checks. Imputation was performed on the Michigan Imputation Server v1.2.417 using Minimac4 and the Haplotype Reference Consortium panel v1.112.
Among BioVU participants, we selected unrelated self-reported adults of European ancestry (age ≥ 18 years) and used the earliest median eligible non-Emergency Department outpatient measured SBP in the electronic health record, and the corresponding DBP. For individuals with fewer than three measurements available (N=2,933), the lowest available SBP and corresponding DBP were used. On average, there were 69 SBP measures per individual. Measures were considered ineligible if they occurred at or after an ICD-9/10 billing code from the groups 585/N18 (chronic kidney disease), 405/I15 (secondary hypertension), or 428/I50 (heart failure). For measures taken while a patient was on an antihypertensive medication, we added 15 mm Hg to SBP and 10 mm Hg to DBP. We performed linear regression association tests with additive models for untransformed medication-adjusted BP traits (SBP, DBP, PP) using SNPTEST-v2.5.4-beta18.
Study-level QC
We applied a harmonized QC procedure for each BP trait in all four studies (i.e., 12 GWAS datasets in total) using the GWASInspector R package19. The 1000 Genomes Project reference panel20, supplemented with the Haplotype Reference Consortium data panel12, was used as the reference dataset for appropriate flipping and/or switching of the alleles, checking for allele frequency concordance with the 1000 Genomes reference, annotating dbSNP rs accession numbers, and constructing harmonized identifiers for meta-analyses. SNP effect sizes from ICBP were considered as the reference to validate the reported effect sizes from the other three GWAS datasets (Supplementary Figures 1-3)7.
The following criteria were then used for filtering the GWAS datasets: i) SNPs only (i.e., no insertions/deletions, copy number variants, etc.); ii) MAF greater than or equal to 1%; iii) imputation quality (INFO) greater than 0.1; iv) HWE p-value greater than or equal to 1x10-6. Effective sample size (N_EFFECTIVE) was calculated as the product of total sample size and INFO for each SNP.
Meta-analysis
We initially applied LD Score Regression21 (LDSR) to the summary statistics for three of our four component datasets (UKB, MVP, and BioVU) to calculate the LDSR intercepts that were used to correct for pre-meta-analysis genomic inflation. ICBP summary statistics, as a meta-analysis of 77 independent cohorts, were previously corrected for genomic inflation5. HapMap322 SNP alleles and pre-calculated LD scores from 1000 Genomes Project20 European reference data supplied with the package were used to calculate LDSR intercepts. Observed LDSR intercepts for each dataset were as follows: 1.2177, 1.2195, and 1.1851 for UKB, 1.0530, 1.0247, and 1.0413 for MVP, and 1.0288, 1.0127, and 1.0207 for BioVU, for SBP, DBP and PP respectively. Inverse-variance weighted fixed-effects meta-analysis of common (MAF≥0.01) bi-allelic SNPS with imputation quality (INFO) greater than or equal to 0.1 across our four studies was performed using METAL15 software. No further GC correction was applied to the meta-analysis results which combined our four datasets together.
QC of the meta-analysis results
Similar to study-level QC, we used the GWASInspector R package19 to ensure standardization and perform QC of post-meta-analysis summary statistics. Analyses included: i) checks of allele frequency concordance with the 1000 Genomes reference and concordance of effect sizes with ICBP (Supplementary Figure 4); ii) evaluation of Q-Q plots and genomic inflation factors (Supplementary Figure 4); and iii) evaluation of bivariate scatterplots of key summary statistics to identify patterns indicating the presence of low-quality SNPs (Supplementary Figure 5).
These analyses revealed the presence of SNPs in our data with low effective sample sizes and large standard errors, as well as a sub-peak of SNPs with higher effective sample sizes and large standard errors. Based on these observations, we applied a filtering threshold for SNPs that were present in at least three of our four studies or SNPs that reached an effective sample size greater than or equal to 60% of the maximum (Supplementary Figures 6- 8). Application of these criteria to achieve an optimal balance between quality of retained SNPs and sample size resulted in 7,584,058 SNPs available for analysis.
Distinguishing known from novel loci
Published BP SNPs
We collated published BP GWAS and compiled all 3,800 unique BP SNPs reported to date (Supplementary Tables 2-3). In many BP-GWAS papers the list of previously reported BP variants has focused on the lead sentinel variant and with validated evidence from independent replication. To expand to a fully comprehensive list of known variants, we curated a list of all published common and rare variants, including results from studies conducted in non-European ancestries, all types of methodological analyses including interaction analyses, results from both one-stage and two-stage study designs, and secondary variants reported from conditional or fine-mapping analyses. We began with the list of all 984 SNPs from the total of 901 previously known and novel loci reported from Evangelou et al5, then added to these: i) any secondary SNPs reported from conditional analyses in publications up to 20185,7,9,23; ii) SNPs reported from a large one-stage discovery analysis prior to 20188; iii) SNPs reported by Giri et al 20194; and all other SNPs from GWAS published between 2018 and the end of 202024–30. We removed duplicated SNPs to generate a unique set of ~3,800 SNPs. Subsequent checks of our results in GWAS Catalog31 and PhenoScanner32 confirmed that all published BP variants had been successfully captured. For QC purposes, we compared the allele frequencies and the resulting effect estimates of these published SNPs in our GWAS meta-analysis data with the published data.
Linkage disequilibrium analyses
Linkage disequilibrium (LD) was calculated using PLINK33 with 1000 Genomes Project20 phase 3 version 5 European reference genotypes. LD proxies were captured for the ~3,800 previously reported BP SNPs at an r2 threshold >0.8 and a maximum distance of 500 kb. Further, we identified the most strongly associated SNP within 500 kb of each known SNP regardless of LD (i.e., “distance proxies”). The strongest trait-specific associations of these previously reported SNPs, their best LD proxies, and best distance proxies in our meta-analyses are presented in Supplementary Table 4.
We partitioned our data into known and unknown subsets. To identify the “unknown” portion of our GWAS results, we removed previously reported SNPs, SNPs within 500 kb of previously reported SNPs, LD proxies for previously reported SNPs at an r2 threshold >0.1 and a maximum distance of 5 Mb, and SNPs within the HLA region of chromosome 6 (25-34 MB) from each of our meta-analyses. QQ plots of all SNPs versus unknown SNPs are shown in Supplementary Figure 9.
Reporting criteria for novel loci
All remaining SNPs reaching genome-wide significance (p<5x10-8) and consistent direction of effect in all available studies were clumped into 1 Mb regions and the most significant SNP for any trait was selected from each region as a sentinel variant for the locus. Novel sentinel SNPs were checked for pairwise-LD against all other novel sentinel SNPs at an r2 >0.1 to confirm independence. Considering our one-stage study design, a significance threshold of p<5x10-9 was imposed for primary reporting of novel sentinel SNPs, with additional loci subsequently reported at the traditional p<5x10-8 significance threshold.
Categorizing known variants into independent loci
Similarly, previously reported SNPs, their best LD proxy if the SNP was unavailable in our data, or the best distance proxy if neither was available, were clumped into 1 Mb regions and the most significant SNP for any trait was selected. Selected SNPs were then checked for pairwise-LD against all other selected SNPs at an r2 >0.1 to confirm independence. The most significant SNP for any trait was selected within each LD block, and these independent SNPs were designated as known sentinel SNPs.
LDSR approach for determination of polygenicity
We applied LDSR to each of our three meta-analyses (SBP, DBP, and PP), as well as the novel proportion of each meta-analysis, and compared these values with genomic inflation factors to determine if inflation of our test statistics was due to population substructure or polygenicity.
Annotation of variant functions and shared associations for novel signals
Novel signals were extended to their linked variants (r2 >0.5) using an in silico sequencing approach34. PLINK33 was used for LD calculations and ANNOVAR35 software to annotate the nearest genes for novel signals and to annotate variant functions. Then the extended loci (r2 >0.8) were used to search the GWAS Catalog31 as well as PhenoScanner32 for shared associations (p<5x10-8).
Conditional Analysis
Genome-wide joint conditional analysis was performed using GCTA-COJO36 specifying a five Mb LD window and a genome-wide significance threshold of 5x10-8, and using UKB European-ancestry sample genotypes as the LD reference. For each of our three BP traits, summary statistics were analyzed by chromosome to build a stepwise joint conditional model that selected independently associated SNPs. Pairwise-LD was calculated in both the 1000 Genomes Project20 phase 3 version 5 European reference genotypes and UKB European-ancestry sample genotypes. SNPs in LD (r2>0.1 in either UKB or 1000 Genomes reference at ±5 Mb) with known or novel sentinel SNPs from our primary analysis or in LD with known SNPs not available in our data were excluded. Among SNPs identified in conditional analysis, the most significant SNP for any trait was selected within each LD block, and these independent SNPs were designated as secondary SNPs. Secondary SNPs were further evaluated to determine if they fell within the novel portion of our data.
Lifelines Cohort Study genotype and phenotype data
For our study, genetic risk score (GRS) is defined as a risk score comprised of SNPs reaching genome-wide significance (p<5x10-8) in our analyses or in previously published studies and polygenic risk score (PRS) as a full genome-wide risk score optimized at a selected p-value threshold that explains the maximum trait variance. We calculated GRS and PRS and assessed variance explained in Lifelines data (Supplementary Figure 10). Both GRS and PRS were calculated as the sum of an individual's risk alleles, weighted by BP trait-specific risk allele effect sizes.
The Lifelines cohort is a large prospective population-based cohort study performed in 167,729 individuals living in the North of the Netherlands with a unique three generation design, aiming at investigating risk factors for multifactorial diseases37. It was approved by the medical ethics committee of the University Medical Center Groningen and conducted in accordance with Helsinki Declaration Guidelines. All participants signed an informed consent form prior to enrollment.
A subset of 38,030 volunteers were genotyped using the Infinium Global Screening Array MultiEthnic Disease Version, according to manufacturer’s instructions, at the Human Genomics Facility of the Erasmus Medical Center, Rotterdam and the Department of Genetics, University Medical Center Groningen. Standard QC was performed on both samples and markers. Samples with a genotyping call rate<99%, outliers for heterozygosity and sex mismatches were excluded, as well as samples that did not show consistent information between reported familial information and observed identity-by-descent sharing with family members, and between genotypes available from this and previous studies. Variants with a genotyping call rate <99%, Hardy-Weinberg equilibrium P <1×10−6 or excess of Mendelian errors in families (>1% of the parent-offspring pairs) were removed. A total of 36,339 samples and 571,420 autosomal and X-chromosome markers passed quality checks. The genotyping dataset was then imputed at the Sanger imputation server1 using the HRC panel v1.1.
From the set of 36,339 samples, we selected 10,782 unrelated individuals who are also independent from Lifelines samples that were included in a previous ICBP meta-GWAS5. After excluding 552 children (age<18 years), 12 pregnant women, five individuals without SBP or DBP, and three individuals without BMI, a final total number of 10,210 individuals were included for analyses.
In Lifelines, BP was measured every minute during a period of ten minutes using an automated DINAMAP Monitor (GE Healthcare) and the average of the final three readings was recorded for SBP and DBP. Participants with a measured BP ≥140/90 mm Hg irrespective of treatment and those taking antihypertensive medication (ATC codes C02, C03, C07, C08, C09) irrespective of BP were defined as having hypertension. In continuous trait analyses, 15 mm Hg was added to SBP and 10 mm Hg was added to DBP for 1,236 individuals who were taking antihypertensive medication. PP was calculated using these medication-adjusted BP values.
GRS and PRS construction and percentage of variance explained
To calculate the percentage of BP variance explained by genetic variants in an independent dataset, we generated the residuals from a regression of each BP trait against sex, age, age2 and body mass index in 10,210 Lifelines individuals. We then fit a second linear model for the trait residuals with the top ten principal components and a third linear model for the trait residuals with ten principal components plus GRS. The difference in the adjusted R2 between the third and the second model is the estimation of the percentage of variance of the dependent (BP) variable explained by the GRS. To evaluate the contribution of previously reported BP loci, as well as novel and secondary loci detected in our analyses, to observed variance in BP traits, and to test the predictive value of our genome-wide results, we constructed four different GRS and a PRS: (i) GRS of 1,723 pairwise-independent (LD-pruned with r2<0.1) SNPs from published known loci; (ii) GRS of 113 sentinel SNPs at genome-wide significant (p<5x10-8) novel loci; (iii) GRS of 1,723 known SNPs plus 113 sentinel SNPs at genome-wide significant novel loci; (iv) GRS of 1,723 known SNPs plus 113 SNPs from novel loci plus 267 secondary SNPs, and; (v) full PRS at optimally selected p-value threshold (1x10-3, 0.01, 0.01 for SBP, DBP, and PP, respectively) that maximized variance explained in Lifelines data.
We generated GRS and PRS by multiplying the risk allele dosages for each SNP by its respective effect size as weight, and then summed all SNPs in the score. The four different GRS included the same set of SNPs for all three BP traits (SBP, DBP, and PP), but were weighted by the trait-specific beta coefficients from the GWAS results for SBP, DBP, and PP. Summary statistics for all SNPs in the GRS are displayed in Supplementary Table 5.
For each BP trait, we calculated full PRS by the clumping and thresholding approach38. Summary statistics of final GWAS results for each trait and the LD reference panel of 503 European ancestry samples from 1000 Genomes phase 320 were used. SNPs with ambiguous strands (A/T or C/G) were removed for the score derivation. An LD-driven clumping procedure was then performed by PLINK version 1.90 (r2<0.1, 1000 kb window). Finally, the PRS were generated at 17 selected P-value thresholds (1x10-8, 5x10-8, 1x10-7, 5x10-7, 1x10-6, 5x10-6, 1x10-5, 5x10-5, 1x10-4, 5x10-4, 1x10-3, 5x10-3, 0.01, 0.05, 0.1, 0.5, 1) and the optimal PRS with the maximum of variance explained in Lifelines data were selected. Summary statistics for all SNPs in optimal PRS of SBP, DBP, and PP are displayed in Supplementary Tables 6a-c.
Decile analyses of full BP PRS for SBP, DBP, PP, and hypertension in Lifelines
To evaluate to what extent BP PRS were predictive for SBP, DBP, PP, and hypertension, we selected the optimal PRS of SBP, DBP, and PP for decile analyses of their respective traits and modeled the joint effect of the optimal PRS for SBP and DBP for hypertension analyses. Then we applied linear and logistic regression with adjustment for sex to compare BP levels and risk of hypertension, respectively, in all deciles versus the bottom decile of the PRS distribution of 10,210 Lifelines individuals. P-values were calculated from the normal distribution for BP traits and from a chi-square distribution with two degrees of freedom for hypertension.
Hypertension model performance and calibration assessment in Lifelines
Hypertension model discrimination probability and calibration were examined by calculating the area under the receiver operating characteristics curve (AUROC)39,40 and Brier score41,42, respectively. These analyses were implemented using the pROC R-package43 with 10-fold cross-validation. An AUROC value of 0.5 indicates no discriminative probability, while a value of 1 is a perfect discrimination probability. The Brier score is the average squared difference between predicted probability and observed outcome with values approaching zero indicating high calibration. The cut-off value of hypertension odds to predict high risk were identified using the Youden index, the point on the AUROC where sensitivity and specificity are maximized. Statistics were calculated for two models: 1) a model including covariates used in GWAS-meta-analyses (sex, age, age2, BMI; model 1), and 2) a model including covariates and optimized PRS for SBP and DBP (model 2).
Decile analyses of full BP PRS for hypertension in UKB African-ancestry individuals
To evaluate to what extent BP PRS were predictive for hypertension in non-European ancestry individuals, we modeled the joint effect of the optimal PRS for SBP and DBP on odds of hypertension in UKB African-ancestry individuals.
SNPs were imputed centrally by UKB using a reference panel that merged the UK10K and 1000 Genomes Phase 3 panel as well as the Haplotype Reference Consortium (HRC) panel. For our analysis, only SNPs imputed from the HRC panel were considered. Data were limited to genotyped and imputed variants with imputation INFO scores of 0.4 or higher, HWE p-values >5x10-8, and MAF >0.01.The hypertension phenotype was provided by UKB.
We performed our analysis using sex-adjusted logistic regression to compare risk of hypertension in all deciles versus the bottom decile of the distribution of 3,341 UKB self-reported “Black” individuals. P-values were calculated from a chi-square distribution with two degrees of freedom.
Comparison of REML methods to calculate heritability
The hSNP2 of BP traits has previously been calculated within the N~457k UKB cohort GWAS dataset using the restricted maximum likelihood (REML) method BOLT-REML44, e.g. with hSNP2=21.3% for SBP5. To check the consistency across different software, and to compare to previously published results, we calculated hSNP2 of SBP within the UKB BP-GWAS dataset using GCTA-GREML36. The full imputed genetic data was converted from BGEN dosage format into hard-call genotyped PLINK format. SNPs were filtered according to MAF >1% and high imputation quality with INFO ≥0.9 from the central UKB QC, and then restricted to only the set of SNPs present in our full meta-analysis dataset. Due to the high memory RAM that GCTA software requires, we selected a representative subset from UKB for our analysis. We calculated percentiles of principal components PC1 & PC2 of all individuals from the centrally provided UKB QC data and extracted the most homogeneous subset of individuals centered around the median data-points with both PC1 and PC2 within the 40-60th percentile range, resulting in a subset sample size of N=19,410. Within GCTA the genetic relatedness matrix (GRM) was generated for each autosome separately, then merged together and filtered for relatedness according to a 0.2 cut-off to remove any first and second degree relatives. Then hSNP2 for SBP was calculated with adjustment of the same covariates applied to the UKB BP-GWAS, namely: sex, age, age2, BMI, genotyping chip array and the top ten PCs. One-tailed p-values were calculated according to the hSNP2 and SE results in base R.
Heritability analyses in Lifelines data
We used GCTA-GREML45 to calculate hSNP2 for BP in exactly the same Lifelines dataset as in the variance explained analyses (N=10,210). SNPs in Lifelines were restricted to the same list of SNPs used in the UKB GCTA-GREML45 analyses. Then hSNP2 for SBP, DBP, and PP was calculated with adjustment of sex, age, age2, BMI, and ten PCs.
In Silico Transcriptome-wide association study
Genetically Predicted Gene Expression Analysis
Our in silico transcriptome-wide association study (TWAS) was performed using S-PrediXcan46, an approach that imputes genetically predicted gene expression in a given tissue and tests predicted expression for association with a GWAS outcome using SNP-level summary statistics. For this study, input included summary statistics from each of the meta-analyses (SBP, DBP, and PP) and gene-expression references for five tissues from GTEx47 v7 including aorta, tibial artery, left ventricle, atrial appendage, and whole blood. Our analyses incorporated covariance matrices based on 1000 Genomes20 European populations to account for LD structure. Bonferroni-corrected significance threshold was 1.55x10-6 to account for the total number of gene models assessed across all tissues in these analyses.
Colocalization analysis
The hypothesis that a single variant underlies GWAS and eQTL associations at a given locus (i.e. colocalization) was tested using coloc48, a Bayesian gene-level test that evaluates GWAS and eQTL association summary statistics at each SNP at the locus and provides gene- and SNP-level posterior probabilities for colocalization. For this analysis, input included results for common variants in our study and eQTL summary statistics corresponding to the gene-expression references used in S-PrediXcan analysis, restricting to only variants included in the S-PrediXcan models. Output includes posterior probabilities for the null hypothesis (PP.H0) that SNPs at the locus are associated with neither gene expression nor the outcome (i.e. SBP, DBP or PP), the first alternative hypothesis (PP.H1) that SNPs are associated with expression but not the outcome, the second alternative hypothesis (PP.H2) that SNPs are associated with the outcome but not expression, the third alternative hypothesis (PP.H3) that SNPs are associated with both expression and the outcome but not colocalized, and the fourth alternative hypothesis (PP.H4) that SNPs associated with both expression and the outcome are colocalized. Also included are annotations of the SNP with the highest PP.H4 at each locus and the corresponding posterior probability. A PP.H4 greater than 90% was considered evidence of colocalization.
Data availability statement
Full summary statistics of our analyses are available from the study authors upon request. Summary statistics for sentinel SNPs for each BP-trait, as well as optimized PRS, are available in Supplementary Tables. Statistically significant reports for S-PrediXcan results for all 5 tissues for all BP-traits evaluated are also made available in the Supplementary Tables.
Ethics statement
Our study is based on meta-analysis of previously published, publicly available data for which appropriate site-specific Institutional Review Boards and ethical review at local institutions have previously approved use of this data.