UK Biobank
Our primary dataset consists of ~450,000 individuals from the UK Biobank (UKB) with available WES data and linked health records. We only included unrelated (KING coefficient < 0.177 corresponding up to 2nd degree) individuals of European origin. To assign European ancestry, we used flashpca253 and constructed 10 principal components (PCs) for all UKB participants and participants in 1000 Genomes v3 data using ~20,000 SNPs as input. With the resulting PCs, we trained a random forest classifier based on the gold standard 1000G v3 ethnicities. We combined samples from CEU, TSI, GBR, and IBS into a European clade. Each UKB individual was classified using the random forest classifier and those having a probability of >95% of being European were put in the European ancestry. Descriptive statistics of the studied individuals are provided in Suppl. Table 2.
Phenotype definitions
We ascertained ischemic stroke status based on hospital episode stay (HES) data, death certificates and self-report using published ICD10 codes (I63-I63.9, I64.X)54. Focusing on the European subset of the UKB, we identified 8,557 cases with any ischemic stroke (AIS) and 415,947 controls. This information was used to construct a classical case/control (CC) phenotype. Non-ischemic stroke cases were removed from the analysis. We used published definitions for coronary artery disease (ICD10: I21, I22, I23, I24, I25.1, I25.2, I25.5, I25.6, I25.8, I25.9, OPCS4: K40, K41, K42, K43, K44, K45, K46, K49, K50.1, K50.2, K50.4, K75)55.
LT-FH phenotype
The UKB holds information on paternal, maternal or sibling stroke (coded in UKB fields 20107, 20110 and 20111, respectively) and the number of siblings (UKB fields 1883 and 1873). We used this information to construct the LT-FH phenotype as previously described19. Specifically, we used information on proband stroke, paternal stroke, maternal stroke, sibling stroke and number of siblings (with a maximum of 5 considered) to construct the continuous LT-FH phenotype. We used a heritability of 0.17 as proposed in the original publication19, 56. While the UKB family history fields do not distinguish between ischemic and non-ischemic stroke, we considered a family history of any stroke as being equivalent to ischemic stroke.
Rare variant burden test
For the rare variant aggregation tests, we used regenie21. Due to low power in single variant association tests, we used gene aggregation burden as our primary outcome. To create rare variant aggregation masks, we used five different minor allele frequency (MAF) bins (<0.01, <0.001, <0.0001, <1E-5, singletons).
We further determined the functional consequences of exome-wide variants using the Variant Effect Predictor (VEP) tool v10157. For the whole-exome burden test on stroke analysis, we selected rare variants (MAF < 0.01) that are either predicted to be damaging by REVEL58 (REVEL score > 0.5) or predicted to exert a high-confidence loss-of-function (LoF) effect using the LoFTEE59 plugin in VEP. We used the GrCh38 refFlat definition of genes as provided by the UCSC genome annotation database. In regenie, we used all members of the burden, SKAT60, 61 and ACAT62 families with sex, age and 10 PCs as covariates. We deemed p-values < 2.5E−6 as being significant, corresponding to a Bonferroni correction for 20,000 genes.
Replication in Biobank Japan
We sought to replicate the association of a rare variant burden in HTRA1 and stroke in the Biobank Japan also using a CC and LT-FH approach20. BioBank Japan Project was a multi-institutional hospital-based biobank that collected DNA and clinical information from about 200,000 patients with 47 common diseases between 2003 and 2007 (1st cohort). The 180K Participants were genotyped with the Illumina HumanOmniExpressExome BeadChip or a combination of the Illumina HumanOmniExpress and HumanExome BeadChip. Quality control of participants and genotypes was performed following the previous study63, 64. After pre-phasing without an external reference using EAGLE v2.4.165, the genotype data for individuals of the East Asian ancestry were imputed with JEWEL3k (3,256 Japanese whole-genome sequence data (WGS) + 2,504 1000G Phase3 v5) reference panel66 using Minimac4 v1.0.267. Imputed variants with Rsq < 0.3 were removed and the coordinates were lifted from hg19 to hg38 using GWASLab v3.3.1068. For both CC and LT-FH approaches, we excluded individuals who lacked family history information. For cases, we ascertained ischemic stroke status via interviews and reviews of medical records using a standardized questionnaire. Individuals without ischemic stroke, cerebral aneurysm, intracerebral hemorrhage, subarachnoid hemorrhage, or unruptured cerebral aneurysm were selected as controls. We identified 18,865 cases with any ischemic stroke (AIS) and 124,284 controls. For LT-FH phenotype, we further used the information on paternal or maternal ischemic stroke status from interviews of medical records. This status for siblings was not available in Biobank Japan. Rare variant burden test was conducted using the same methods as in UKB.
Biochemical characterization of UKB variants
Variant selection: Focusing on missense variants targeting the HTRA1 protease domain (amino acids 157-370), UKB variants were selected for biochemical assessment based on residue conservation in human HTRAs according to Clustal Omega69. Specifically, all variants strictly conserved in human HTRA1-4 (n= 48), 29 out of 30 variants conserved in 3 human HTRAs (1 variant not included in the HTRA1 trimer structure and thus not selected), and 2 out of 42 variants conserved in ≤ 2 human HTRAs including R227W, the most frequent exonic mutation in the UKB, were selected for analysis (Figure 1).
Protease activity measurements: Eukaryotic expression plasmids encoding either full-length human HTRA1 (aa 1-480) fused to a C-terminal Myc-(His)6 tag or the N-terminal region of human LTBP1s (aa 1-689) fused to a C-terminal V5-(His)6 tag have been previously described13, 16. Mutagenesis was conducted using the QuickChange Lightning Site-Directed Mutagenesis kit (Agilent Technologies).
Human embryonic kidney (HEK-293E) cells were grown in Dulbecco’s modified Eagle‘s medium (DMEM) containing GlutaMAX, 10% (v/v) fetal calf serum (FCS), 100 U/ml penicillin and 100 μg/ml streptomycin (all from Invitrogen) at 37°C and 5% CO2. Cells were seeded in poly-L-lysine (Invitrogen) coated 6-well-plates, transfected with Lipofectamine 2000 (Invitrogen), and maintained in serum-free medium for 48 h. Secretomes containing the protein of interest were collected and centrifuged for 10 min at 400 g to remove debris. Where indicated, cells were lysed in 50 mM Tris, 150 mM NaCl, 1% (v/v) triton X-100, 1% (w/v) sodium deoxycholate, 0.1% (w/v) sodium dodecyl sulphate, pH 7.4 for 20 min at 4°C and samples centrifuged 20 min at 16,000 g. Secretomes from non-transfected cells and from cells overexpressing wild type (wt) HTRA1 or the inactive variant S328A were included in each run as controls.
HTRA1 levels were evaluated by anti-Myc immunoblot (Santa Cruz Biotechnology, #sc-40, 1:2,500) using horseradish peroxidase-coupled secondary antibodies (Dako), immobilon Western kit reagents (Millipore) and the Fusion FX7 documentation system (Vilbert Lourmat). Actin (Sigma-Aldrich Ab #A2066; dilution 1:500) served as loading control for the analysis of cell lysates.
Secretomes from cells overexpressing LTBP1 (5 µl) were exposed to secretomes from non-transfected cells, or from cells overexpressing HTRA1 (20 µl) for 24 h at 37°C and samples were analyzed by anti-V5 immunoblot (Invitrogen, #R960-25, 1:10,000). Where appropriate, wt or mutant HTRA1-containing secretome was diluted in secretome from non-transfected cells to measure HTRA1 activity in samples containing comparable HTRA1 concentrations. Datapoints exhibiting low residual intact LTBP1 or low mutant HTRA1 levels compared to wt HTRA1 (see Source Data) were excluded.
LTBP1 and HTRA1 signals were quantified using ImageJ. LTBP1 processing was evaluated as the ratio cleaved / intact LTBP1 and HTRA1 protease activity was estimated as the ratio LTBP1 processing / HTRA1 levels. The activity of wt HTRA1 was set to 1. Activity thresholds for follow-up analyses were arbitrarily set as follows: (i) activity estimates <0.25, (ii) >0.25 to <0.5, (iii) >0.5 to <1.0 and (iv) >1.
Trimer interface prediction
We used the PyMOL plugin “InterfaceResidues.py” (https://pymolwiki.org/index.php/InterfaceResidues) to predict the HTRA1 trimer interface.
Correlation of effects
We constructed Pearson correlation coefficients between (i) effect sizes from single-variant analyses of logWMH (white matter hyperintensity) volume in the UKB vs. effect sizes constructed from the average HTRA1 activity ratio and (ii) effect sizes from single-variant analyses of LT-FH the UKB vs. effect sizes constructed from the average HTRA1 activity. All analyses were performed using R 4.3.0.
Predicting individual HTRA1 activity
To assess the association of HTRA1 activity with cardiovascular endpoints, we genetically proxied HTRA1 activity for each individual in the UKB. For this, we set HTRA1 activity for individuals without a rare HTRA1 protease domain mutation to 100%. For mutation carriers we multiplied the measured average HTRA1 activity from the in-vitro assays with 100%. This activity estimate was used as an exposure in our analysis. We used Generalized Additive Models (GAM) to show association between this exposure and the LT-FH phenotype and logWMH volume, respectively.
PheWAS
To explore the association of predicted HTRA1 activity with the full range of phenotypes encoded in the UKB, we used DeepPheWAS to assign participants to standardized Phecodes.22 We used all ICD10 codes (main position, secondary position, death records) from the UKB. We excluded Phecodes with <100 cases and Phecodes that are male- or female-specific. Individuals were assigned a case status if >1 ICD10 code mapped to the respective Phecode. Individuals meeting the pre-specified exclusion criteria were removed from the analysis, otherwise the individual was assigned a control status. We used logistic regression with age, sex and 10 PCs as covariate to test rare variant carrier status (0/1) against the phenotype of interest. Results with p<0.05 were corrected with Firth’s correction. We considered results with FDR<5% as statistically significant.
Colocalization of common variants
We obtained summary statistics for any ischemic stroke and small vessel stroke from the GIGASTROKE consortium4, for coronary artery disease (CAD) from the CARDIoGRAMplusC4D Consortium5, for lacunar stroke from the UK DNA Lacunar Stroke studies 1 and 2 and from collaborators within the International Stroke Genetics Consortium27, and for migraine through the International Headache Genetics Consortium (IHGC)14. We used an interval of +-150kb from HTRA1 and included all SNPs that were available for analysis for all phenotypes. We used Hyprcoloc28 and synchronized effect sizes and standard errors to perform colocalization and report the resulting posterior probability of colocalization.
Blood eQTL and pQTL data were obtained from the eQTLgen consortium29 and deCODE30, respectively. We used Primo31 to perform pleiotropy analyses between eQTL, pQTL and disease phenotype data and report posterior probability of pleiotropy. We used an interval of +-150kb from rs2672592 and included all SNPs that were available for analysis for all phenotypes.
Stratification of individuals into disease risk classes
Using imputed genotype dosage of rs2672592 and rare HTRA1 protease domain carrier status, we used logistic regression with age, sex and 10 PCs as covariates. We set the lowest risk category (rs2672592 TT and no rare variant) as the reference category and all odds ratios are reported compared to this category.
Display items
Figures were prepared with R version 4.3.0, ggplot2, Pymol, Adobe Illustrator, Microsoft Power Point and BioRender.com.