Study subjects: The Tehran Cardiometabolic Genetic Study (TCGS) is a family-based genetic analysis of the Tehran Lipid and Glucose Study (TLGS) as the oldest Iranian cohort. In total, more than 20,000 participants are followed in 7 phases since 1999 and underwent a clinical examination on more than 230 metabolic-related traits in each phase of the study. The Medical Ethics Committee of Shahid Beheshti University of Medical Sciences approved this study. All participants gave written informed consent to participate in the original cohort and TCGS. In the case of younger participants, written formal consent was obtained from the parents or guardians. Principles for clinical investigations could be found in the original TCGS and TLGS papers [14,15].
In the present study, 17,462 participants from the five phases of TCGS (1999-2014) were recruited to investigate hypertension. More details on DNA sample collection, genotype quality control process, phenotype measurements, covariate imputation, case definition, and selection criteria are explained in the supplementary.
Study Design: Based on critical points of BP changes in the age trajectory, all study subjects aged 18 or above were included in the present study, and average values of SBP and DBP during follow up visits were considered in GWA analysis. For binary trait analysis, HTN incident cases and a random sample of healthy individuals with two or more follow-up records were included in the analysis (Supplementary file, Figure S1, S2). Age, Body Mass Index (BMI), Waist Circumference (WC), and insulin resistance were included in both GWA analysis on quantitative and binary traits after imputing their missing values, using the Expectation-Maximization method with Bootstrapping (EMB) approach by Amelia package in R [16](Supplementary file, Figure S3).
Quality control of genotypes: To maximize power against the removal of individuals and markers, quality control (QC) of genotyping data was implemented on a per-individual basis before per-marker using PLINK version 1.9 and R [17]. Accordingly, a standard QC pipeline on 652,919 SNP, with an average mean distance of 4 kilobases, were performed in 7,694 adults after excluding Individuals with discordance of genetically inferred sex versus self-report, genotype rate ≤10%, missing phenotype, genotype failure rate≥3%, and high heterozygosity (Fstatistics ± 3 standard deviation). Moreover, related subjects with Identity By Decent (IBD) ≥ 18.5% were excluded from the study. In genotype level variants with minor allele frequency (MAF) < 1%, missing genotype calls >5%, and Hardy Weinberg Equilibrium (HWE) with a P-value < 9×10-6 in the presence of BP traits were filtered out (Supplementary file, Figure S4).
Statistical analysis: After checking collinearity for all covariates (r2>0.8), linear and logistic regression association tests were performed. Additive and overdominant inheritance models on autosomal chromosomes were checked for GWAS on the quantitative and binary traits in PLINK v1.9, respectively. A conventional genome-wide threshold of 5×10-8 considered for a significant P-value. The genomic inflation factor was computed for each analysis, and observed versus expected P values were highlighted in the Q-Q plots to check for population stratification. Finally, four regression-based multivariate analyses evaluated the predictive accuracy of initial GWAS outputs for quantitative traits [18]. Accordingly, Polygenic Risk Score (PRS) was calculated after adjusting for the covariates, and the proportion of variance, which is explained by genomic variants (R2) was computed. Moreover, discrimination of PRS was assessed for each analysis according to sex.
Confirmation study: The confirmation study was conducted on 1618 participants in 210 selected TCGS families with an age range of 1 to 93 and familial aggregation of ≥ two affected (HTN) cases. Similar to initial GWAS, the QC processes were applied in the confirmation study. After removing Mendelian errors and pruning out SNPs in linkage disequilibrium (LD) with a correlation coefficient of >0.2, the effects of significant independent SNPs in the initial GWAS were tested in the presence of the same covariate sets using two-level Haseman-Elston regression model by SAGE version 6.4 [19].
Post GWAS: Three consecutive steps were followed to explain probable functions and pathophysiologic pathway(s) of discovered variants' effects on BP with P values less than 1×10-4. In the first step, chromosomal coordinates, genes, transcripts, and variants on protein sequence were annotated in Ensemble Variant Effect Predicator [20]. In the second step, GWAS catalog information was retrieved to identify the association of specified loci on BP traits [21]. Moreover, we sought to map known BP loci by assessing these loci's functional consequences in Ensemble [22]. In the case of a similar locus, ldlink browser by National Cancer Institute was checked for LD of detected and previously reported variant(s) in three populations of South Asians, East Asians, and Europeans using the website (http://analysistools.nci.nih.gov/LDlink/). In the second step, a list of detailed information on all loci was retrieved separately in Open Targets POST GWAS, including disease associations, protein interactions, pathways, similar targets based on diseases in common, RNA, and protein baseline expression by the anatomical system and organ [23].In the final step, the overall association score was retrieved for each locus and other genes whose protein products interact with new loci through protein interaction networks [24,25]. Open Target Platform provided the score from 20 data sources, is ranged from 0 to 1, that the former implies no evidence and later corresponds to the most reliable evidence supporting evidence based on frequency, severity, and significance of association [26].