This work was approved by the Institutional Review Board of Washington University in St. Louis and complies with all relevant ethical regulations. For each of the participating cohorts, the appropriate ethics review board approved the data collection and all participants provided informed consent.
Data Harmonization
Data from each cohort (Supplementary Tables S1-S2) were harmonized following this centralized protocol. Data were stratified by population group, based on self-reported ancestry and individual cohort definitions (AFR: African, EAS: East Asian, EUR: European, HIS: Hispanic/Latinos, SAS: South Asian), and sex (combined sex, female sex, male sex). Analyses considered 3 primary blood pressure (BP) traits as outcome variables (SBP: systolic, DBP: diastolic, PP: pulse pressure) and 2 dichotomous lifestyle exposures (LTST: long total sleep time, STST: short total sleep time). Genetic variants (G) were restricted to autosomal chromosomes 1–22 imputation quality≥0.3, and minor allele frequency≥0.1%. Age was restricted to ≥18 years, and reported total sleep time constrained within 3 and 14 hours. In scenarios of multiple visits, the single visit with largest sample size was utilized and in case-control study designs, cases and controls were required to be analyzed separately. For BP outcome measures, if multiple readings were taken in a single visit the mean was used. All BP values were winsorized at 6 standard deviations from the mean. BP values were adjusted for reported use of anti-hypertensive medications as follows: SBP (+ 15 mmHg) and DBP (+ 10 mmHg). PP was derived as SBP – DBP. In the case of studies with known between-sample relatedness, null model residuals (regressing BP traits on a kinship matrix/genetic covariance matrix) were denoted as the BP outcome. STST and LTST were derived from total sleep time (TST) by regressing TST on age, sex, age×sex and using the residuals’ 20th and 80th percentiles as cutoffs (STST = 1 if ≤ 20th percentile, LTST = 1 if ≥ 80th percentile, STST = 0 if > 20th percentile, LTST = 0 if < 80th percentile). Covariates included population-group specific principal components, cohort-specific confounders (study center), age, age2, sex, age×S/LTST, age2×S/LTST, and sex×S/LTST. Samples with missing data were excluded.
Genome-wide Gene-Sleep Interaction Analysis
After data harmonization, each population-group specific cohort ran 2 regression models (M1 and M2) for 18 phenotype-exposure-sex combinations (3 phenotypes x 2 exposures x 3 sex groups: combined sex, female sex, male sex). Below E denotes the lifestyle exposure (STST or LTST), Y denotes the BP outcome (SBP, DBP, or PP), C1 denotes the vector of covariates incorporating E (age, age2, S/LTST, age*S/LTST, age2*S/LTST, sex, sex*S/LTST), and C2 denotes the subset without incorporating E (age, age2, sex). Female-specific and male-specific analyses were not adjusted for sex. Specialized software choice included LinGxEScanR v1.0 (https://github.com/USCbiostats/LinGxEScanR), GEM v1.4.1 (https://github.com/large-scale-gxe-methods/GEM), and/or MMAP (latest version available) (https://github.com/MMAP/MMAP.github.io) with robust standard errors (SEs) enforced(9). One degree of freedom (df) tests for the marginal effect (βM2_G), the main effect (βM1_G), and the interaction effect (βM1_GxE) were conducted; alongside the 2df joint test that simultaneously assesses the main effect and the interaction effect (βM1_G, βM1_GxE)(10).
Model 1 (Primary GxE Model of Interest)
(1)M1: Y = βM1_0 + βEE + βM1_GG + βM1_GxE E⋅G + βM1_C1C1
Model 2 (Marginal Effect Model for Comparison)
(2) M2: Y = βM2_0 + βM2_GG + βM2_C2C2
Summary statistics were centrally processed after individual studies submitted results. EasyQC2 software (www.genepi-regensburg.de/easyqc2) was used to perform quality control (QC) on resultant data(11). Data were filtered for degrees of freedom≥20 calculated as minor allele count * imputation quality (e.g. MACxR2 provided by each cohort) within the unexposed, the exposed, and the total sample. Missing or invalid/out of range values for statistics and duplicated or monomorphic variants were discarded. hg19 genomic coordinates were lifted over to hg38 genomic coordinates. Allele frequency discrepancies relative to TOPMed-imputed 1000G reference panels (Trans-Omics for Precision Medicine imputed 1000Genomes) were assessed for each specific population group, along with genomic control (GC) lambda inflation. Meta-level quality control was conducted within groups based on population group, with evaluation of unwanted centering of the outcome variable, outlying cohorts highlighting unstable numerical computation, or alarming inflation.
Meta-Analysis
Meta-analysis was designed as the following paradigm. Cross-population meta-analysis (CPMA) was designed to be combine all population group results, with additional focused population-group specific and sex-specific analyses. This resulted in 18 total meta-analyses to be run: 6 population groups (CPMA, EUR, HIS, EAS, AFR, SAS) and 3 sex groups (combined sex, female sex, male sex). To accomplish this, METAL software was first used to run all meta-analyses within each specific population group for the marginal effect (βM2_G), main effect (βM1_G), interaction effect (βM1_GxE), and joint effect (βM1_G, βM1_GxE) with GC correction for inflation(12). Inverse-variance weights were used and Manning et. al’s method for the 2df joint test (13). CPMA was subsequently executed on the resultant population-group specific METAL output results with GC correction.
Genome-wide Significant Loci Identification
EasyStrata2 software was used to prioritize top loci from significant results identified from the 1df interaction and 2df joint tests(14). GC correction for population-group specific results was applied. Variants found within 1 Mb distance of the major histocompatibility complex (MHC) region were excluded. Either minimum sample size (N > 20000) or multiple cohorts (≥3) was required as necessary criteria for processing results from a specific sex-stratified, and/or population group-stratified meta-analysis.
Significant variants were identified using the following threshold criteria: (i) i variants with significant interaction effect (PM1_GxE<5e-9, FDR < 0.05); (ii) j variants with significant joint effect (PM1_G,GxE<5e-9, FDR < 0.05) were filtered as top variants; and (iii) k top variants for the interaction effect were identified using a 2-step method – identifying first z variants by the marginal effect (PM2_G<1e-5) and then filtering these by the interaction effect (PM1_GxE<0.05/NG, FDRGxE<0.05) where NG is the number of independent tests calculated using principal components analysis on the z variants. This 2-step method was incorporated to increase power for detecting interactions(15). This design was executed to maintain both stringent threshold criteria and incorporate false discovery correction implemented by the Benjamini-Hochberg method.
All such i + j + k significant variants were narrowed down to loci based on 500 kilobase (kb) regions. Finally, within these regions independent lead variants were identified as the top significant variant within the locus, subsequently defining variants in LD as those with linkage disequilibrium (LD) r2 threshold < 0.1 using TOPMed-imputed 1000G reference panels. If variants were missing in the LD panels, then the most significant variant within each 500kb region was retained for combined sex meta-analyses results.
Prioritizing Novel Sleep Duration Interaction Loci
Significant independent loci were subsequently filtered to prioritize gene-sleep duration interaction loci. From the 1df interaction test, X interaction loci were prioritized as those not found within 1Mb of previously identified gene-sleep duration loci for BP(8). Loci were annotated as whether novel for BP genetic architecture, or not, by checking for overlap with 1Mb of previous GWAS variants (Supplementary Table S3).
For the 2df test, first loci were filtered to those variants not found within 1Mb of previous GWAS identified variants for BP traits, and with insignificant marginal effect (PM2_G >5e-09, FDRM2_G >0.05). From these variants, Y loci were prioritized as driven by interaction if they harbored a stronger interaction effect relative to the main effect (PM1_GxE < PM1_G), and Z loci deemed as supported (but not driven) by interaction if this was not true.
Thus, collectively X + Y gene-sleep duration interaction loci were highlighted, alongside secondarily Z loci supported by interaction.
Heterogeneity by Sex
To test for interaction effects showing evidence of heterogeneity by sex (p < 0.05/Q), two-sample Z-tests assuming independence, were conducted for each of the top interaction loci and adjusted for multiple testing.
Mapped Protein Coding Genes
Gene mapping prioritized protein-coding genes for downstream interpretation. Variants directly overlapping protein-coding gene regions were top priority criteria for gene assignment. For intergenic variants nearest distance to transcription start site (TSS) or gene start/end site was queried from Open Target Genetics v22.10(16) or MyGene.Info using Python package mygene v3.2.2 (https://github.com/biothings/mygene.py). Variant mapping annotations were additionally noted from Open Target Genetics, Functional Annotation of Variants – Online Resource v2.0 (FAVOR)(17), HaploReg v4.2 (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php), BRAVO variant browser (https://bravo.sph.umich.edu/freeze8/hg38/), Functional Mapping and Annotation of Genome-wide Association Studies v1.5.6 (FUMA)(18), and MyGene.Info.
Variant Annotations
FAVOR was queried to annotate deleteriousness or functionality scores(17), and RegulomeDB v2.2 was used to extract aggregate regulatory function evidence scores, along with chromatin state, DNA accessibility, overlap with transcription factor (TF) binding sites or TF motifs, and expression quantitative trait loci (eQTL)(19). FUMA’s SNP2GENE pipeline was used to annotate a comprehensive list of genes for each top locus, incorporating positional, chromatin interaction (FDR < = 1e-6, 250bp upstream − 500 bp downstream of TSS), and GTEXv8 eQTL evidence (agreeing with RegulomeDB) with the top variant or its variants in LD (r2 > 0.1 within 500kb)(18).
At the variant level, PheWeb, Open Target Genetics, Common Metabolic Diseases Knowledge Portal (https://hugeamp.org/), and Oxford Brain Imaging Genetics Server (BIG40) were queried for significant trait associations (p < 5e-08) from past GWAS(16, 20, 21). At the gene level, International Mouse Phenotyping Consortium release 19.1 (IMPC), Online Mendelian Inheritance in Man (OMIM; https://omim.org/), PheWeb, Phenotype-Genotype Integrator (PheGenI), and Open Target Genetics were queried for phenotypic annotations from mice knockout study results, involvement in Mendelian disorders, and significant trait associations (p < 5e-08) (16, 20, 22, 23). All STST and LTST mapped protein-coding genes were then queried using FUMA’s GENE2FUNC pipeline to identify significant (adjusted p-value < 0.05) pathways and traits(18). STRING v12.0 was additionally queried using medium confidence threshold (0.4) to note significantly (FDR < 0.05) enriched traits or pathways to compare and contrast LTST and STST loci (24).
Druggability Analysis
The Drug-Gene Interaction database (v4.2.0) was first utilized to identify druggability potential, with genes also annotated for implicated pathways and functions using the Kyoto Encyclopedia of Genes and Genomes database. Druggability target categories were annotated and all interacting drugs queried from reports across 43 databases (BaderLabGenes, CarisMolecularIntelligence, dGene, FoundationOneGenes, GO, HingoraniCasas, HopkinsGroom, HumanProteinAtlas, IDG, MskImpact, Oncomine, Pharos, RussLampel, Tempus, CGI, CIViC, COSMIC, CancerCommons, ChemblDrugs, ChemblInteractions, ClearityFoundationBiomarkers, ClearityFoundationClinicalTrial, DTC, DoCM, DrugBank, Ensembl, Entrez, FDA, GuideToPharmacology, JACX-CKB, MyCancerGenome, MyCancerGenomeClinicalTrial, NCI, OncoKB, PharmGKB, TALC, TEND, TTD, TdgClinicalTrial, Wikidata). Protein targets for available active ligands in ChEMBL were also noted. Gene targets were looked up in the druggable genome using the most recent druggable genome list established from the NIH Illuminating the Druggable Genome Project (https://github.com/druggablegenome/IDGTargets) available through the Pharos web platform. Lastly, FDA-approved drugs, late-stage clinical trials and disease indications were queried in the DrugBank, ChEMBL, ClinicalTrials.gov databases to provide results for the top MESH and DrugBank indications and clinical trials.