2.1 Study design and ethical statement
To explore the relationship between EPVS in various locations and IS, its subtypes, and TIA, we conduct MR analysis. Figure 1 outlines the study design. In the forward MR, we consider extensive white matter, hippocampal, and basal ganglia perivascular space burdens as exposures, investigating their causal links with IS, its subtypes and TIA individually. Additionally, we also utilize multivariable MR (MVMR) to adjust for confounding factors, followed by conducting a meta-analysis to evaluate the overall effect of EPVS on IS, its subtypes and TIA from different sources. The MR analysis in this study meets three core assumptions: 1) significant correlation between IVs and exposures,11 2) no correlation between IVs and confounding factors affecting the relationship between exposures and outcomes,11 and 3) IVs solely influencing the outcomes through exposures.12
Ethical approval is not necessary for this study as it utilizes publicly available data that has already been approved by the relevant institutional ethics committees. Moreover, it is reported according to the Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization guidelines (STROBE-MR) (Supplementary Table S1).10
2.2 Data sources
The summary data of EPVS in various sites are sourced from a cohort study involving 18 populations, encompassing over 8 million SNPs (minor allele frequency ≥ 1%) from more than 40,095 participants (mean age 66.3 ± 8.6 years, 51.7% female). To address variations in PVS quantification methods, image acquisition, and participant characteristics, we categorized PVS burden using thresholds closest to the upper quartile of the PVS distribution. Ultimately, 9,607 out of 39,822, 9,189 out of 40,000, and 9,339 out of 40,095 participants exhibit EPVS in white matter, hippocampus, and basal ganglia, respectively.13
The data of IS, its subtypes and TIA are sourced from multiple publicly available Genome-wide Association Studies (GWAS) datasets. Specifically, IS data are derived from the MEGA consortium (Ncase = 34,217, sample size = 440,328),14 a meta-analysis involving the UK Biobank (Ncase = 11,929, sample size = 484,121),15 and the FinnGen database (Ncase = 10,551, sample size = 212,774). Based on the TOAST classification of IS, partial GWAS information for its subtypes, including large-artery atherosclerosis (Ncase = 4,373, sample size = 150,765), cardioembolism (Ncase = 7,193, sample size = 211,763), and small-vessel (Ncase = 5,386, sample size = 198,048), is obtained from the MEGA consortium.14 Lacunar stroke (Ncase = 6,030, sample size = 225,419) is sourced from cases recruited from acute stroke hospitalization and outpatient services in Europe, the United States, South America, and Australia. This study involves a meta-analysis of MRI-diagnosed lacunar stroke patients' data and existing GWAS datasets. The patients are from hospitals in the UK, part of the UK DNA Cavernous Stroke Study and collaborators from the International Stroke Genetics Consortium.16 GWAS data of TIA are obtained from the FinnGen database and the UK Biobank,17 both of which conducted GWAS studies on large populations to identify risk loci for the disease.
2.3 Selection of IVs
During the forward MR analysis, EPVS found in various regions are considered as the exposures, while IS, its subtypes, and TIA are examined as outcomes. To meet assumption 1, this study identifies single nucleotide polymorphisms (SNPs) across the entire genome that show significant associations with EPVS at various locations (P < 1×10− 5) and have no linkage disequilibrium (LD) (r2=0.01, kb = 5000), ensuring the independence of the selected IVs. To address potential confounding factors, we utilize MVMR analysis to control for common confounders of IS and TIA, including obesity, hypertension, diabetes, and alcohol, thereby satisfying assumption 2. To fulfill assumption 3, this study further excludes SNPs significantly associated with IS, its subtypes and TIA across the entire genome (P < 1×10− 5). To ensure the strength of the selected IVs, we calculate the statistical strength using the F value. Specifically, F = R²/ (1 - R²) * (N - K − 1)/K, where N represents the sample size of the exposure, K is the number of SNPs, R² is the proportion of variance explained by SNPs in the exposure dataset, and R2 = 2× (1-MAF) (MAF) ×(β/SD)2, β denotes the effect size of the allele.18 IVs with F < 10 will be excluded. Furthermore, SNPs that are inconsistent with the exposure and outcome alleles, as well as palindromic SNPs with moderate allele frequencies, are excluded. The SNPs subjected to the rigorous screening process are utilized for the final causal analysis.
2.4 MR analysis
This study employs inverse variance weighted (IVW) as the primary method for MR analysis. When the selected SNPs are all effective IVs, the IVW method can provide the most accurate estimates of causal association effects.19 Additionally, Bayesian weighted, weighted median (WM), weighted mode, and simple mode are used as supplementary analyses. Bayesian weighted Mendelian randomization explicitly accounts for uncertainty related to weak effects from polygenic traits and can identify outliers, addressing instrumental variable assumption violations due to pleiotropy.20 The WM method provides effective causal estimates when over half of the SNPs are valid IVs.21 Weighted mode is reliable when most individual instruments' causal effect estimates come from valid instruments, even if some IVs are considered invalid.22 Furthermore, the simple mode can serve as an unweighted empirical density function for estimating causality.23 To enhance the robustness of results, we require consistent directions of β values across all methods while ensuring significance in IVW and Bayesian weighted results. Moreover, we use false discovery rate (FDR) correction for P-values. Significant causal relationships between EPVS and outcomes are indicated when P < 0.05 and PFDR < 0.05, and potential causal relationships are suggested when P < 0.05 and PFDR > 0.05.
2.5 Sensitivity analysis
We use IVW and MR Egger regression to detect heterogeneity and calculate Cochran’s Q statistic to quantify its magnitude. P < 0.05 indicates significant heterogeneity, warranting the use of a random-effects model for causal inference.24 MR-Egger intercept test is utilized to analyze horizontal pleiotropy, estimating directional inference by calculating the intercept and resulting in a directional P-value. P > 0.05 suggests the absence of horizontal pleiotropy, demonstrating the robustness of the MR analysis results.25 The MR-PRESSO Global test identifies outliers, whose presence is confirmed by P < 0.05, requiring their exclusion for subsequent analysis.26 Besides, leave-one-out analysis assesses individual SNPs' influence on the MR results. After removing outlier SNPs, P < 0.05 in the MR Egger regression renders the MR results unreliable. In forward MR analysis, we utilize MR Steiger to ensure directional accuracy. This method assumes that the genetic variants should explain more variance during exposure than outcome, meeting the legitimate requirements of MR investigation and aiding in identifying potential bidirectional effects.27 Finally, reverse MR analysis is used to observe bidirectional effects between EPVS and IS, its subtypes, and TIA, with SNP selection criteria consistent with forward MR.
2.6 MVMR analysis
MVMR analysis can evaluate direct causal effects between exposure and outcome.28 Thus, to adjust for potential confounders (Obesity, hypertension, type 2 diabetes, and ongoing alcohol addiction), we perform MVMR analysis following univariable MR (UVMR) analysis to examine the independent impact of EPVS in different locations on IS, its subtypes, and TIA. We utilize Multivariable IVW, Multivariable Egger, and Multivariable Median methods, with Multivariable IVW serving as the primary method. Also, to assess result stability, Cochran’s Q statistic and I2 detect result heterogeneity, while the Egger-intercept test identifies horizontal pleiotropy.
2.7 Meta- analysis
To mitigate biases stemming from various sources of GWAS data on IS, following MR analysis, we conduct meta-analysis to examine the overall impact of EPVS in different locations on IS and its subtypes, as well as TIA. Additionally, we employ I2 to assess the heterogeneity of the findings.
2.8 Linkage disequilibrium score regression and directionality tests
We employ linkage disequilibrium score regression (LDSC) analysis to summarize GWAS data and estimate heritability and genetic correlations based on single-nucleotide variants. The LD reference panel from the 1000 Genomes Project is used to compute LD scores. Finally, we utilize the LDSC tool to further evaluate the genetic associations between EPVS at different locations and IS and its subtypes.29
All MR-related analyses are conducted in R (version 4.3.0) using the "TwoSampleMR",23 "MR-PRESSO",26 and "Mendelian Randomization" R packages.