Exposure
In this study, we used three CSF biomarkers for AD, Aβ, p-tau, and t-tau, as exposures for investigating the causal relationship with the outcome of interest. Meta-analysed GWAS summary statistics of these biomarkers were obtained from 3,146 individuals of European ancestry in nine different studies (Knight ADRC, the Charles F. and Joanne Knight Alzheimer’s Disease Research Center; ADNI1, Alzheimer’s Disease Neuroimaging Initiative phase 1; ADNI2, Alzheimer’s Disease Neuroimaging Initiative phase 2; BIOCARD, Predictors of Cognitive Decline Among Normal Individuals; HB, Saarland University in Homburg/Saar, Germany; MAYO, Mayo Clinic; SWEDEN, Skåne University Hospital; UPENN, Perelman School of Medicine at the University of Pennsylvania; UW, the University of Washington) [9]. The sample size of these GWASs is the largest at present with respect to Aβ, p-tau, and t-tau collected from CSF. The effect per single-nucleotide polymorphism (SNP) in the GWAS summary statistics was defined as a standardised beta coefficient since each phenotype was converted using a log-transformation to follow the normal distribution.
Outcome
Our outcome of interest was LOAD, defined as AD with an onset at 65 years of age or older. We utilised the summary-level data from the stage 1 meta-analysis of the GWASs for LOAD in the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site [10]. The meta-analysis result was obtained from the four consortia (The Alzheimer Disease Genetics Consortium; The European Alzheimer's disease Initiative; The Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium; and The Genetic and Environmental Risk in AD Consortium Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer’s Disease Consortium). It consisted of 46 case-control studies that included 63,926 individuals of European ancestry (21,982 LOAD cases and 41,944 cognitively normal controls).
Selection of instruments for MR
We performed the following procedures to select appropriate genetic variants that preferentially satisfy three IV assumptions of the MR analysis [11].
First, we selected top SNPs with a relaxed threshold (p<1 × 10-5), which was considered in recent MR analyses in the case when GWAS for exposure traits only yielded a small number of genome-wide significant SNPs [12]. The sample size of the data used in the present study is the largest on CSF biomarkers to date [9]. CSF biomarkers are expensive, acquired through an invasive procedure, and require skilled professionals, which results in difficulty to gather a sample size sufficient to identify many independent SNPs passing a genome-wide significance level (p<5 × 10-8). We relaxed the threshold (p<1 × 10-5) to compensate for the moderate sample size.
Second, we selected independent genetic variants among those that passed the relaxed threshold, using the cut-off of linkage disequilibrium (LD) value (r2<0·001) to ensure that the IVs for exposure were independent [13]. The LD between SNPs was calculated based on European individuals from the 1000 Genomes Project. If a certain SNP was not available in the summary statistics of the outcome, we substituted the SNP with its LD proxy SNP having a high correlation coefficient (r2≥0·8) based on the European ancestry using the LDlink (https://ldlink.nci.nih.gov/). If such LD proxy SNP was not found, the SNP was excluded from the IV set.
Third, we eliminated SNPs with ambiguous alleles from the IV set when the alleles in the exposure and the outcome were not identical. For example, we excluded an SNP if the effect and non-effect alleles of the exposure and outcome were T/C and T/G, respectively [13].
Fourth, to ensure that there was no horizontal pleiotropy among the IVs, we conducted an MR-Pleiotropy Residual Sum and Outlier (MR-PRESSO) test that detects pleiotropic variants among the exposure-associated variants [14]. In our analysis, the MR-PRESSO removed more than 50% of the IVs, which means that MR-PRESSO might not detect true horizontal pleiotropy. Instead of removing the outliers detected by the MR-PRESSO, therefore, we considered excluding SNPs that have a known direct pleiotropic effect on LOAD, the outcome of interest. The Apolipoprotein E (APOE) region has been reported to have multiple pleiotropic effects in many previous studies. When the MR analysis is performed with the outliers detected by MR-PRESSO or variants in the APOE region, including the pleiotropic SNPs in the instruments, it may result in a positive bias or a negative bias due to horizontal pleiotropy and induce an inaccurate causal relationship [15]. Among the IVs of three CSF biomarkers, rs769449 is only one variant in the APOE region that is highly associated with LOAD [16]; therefore, we performed the MR analysis after excluding the APOE variant (rs769449) as a sensitivity test. Subsequently, to confirm the absence of horizontal pleiotropy, we performed a MR-Egger intercept test with the intercept unconstrained. The intercept of the MR-Egger regression represents a statistical estimate of the directional pleiotropic effect, which can be a confounding factor in MR. The selected genetic variants are listed in Additional file 1: Tables S1–S3.
Two-sample MR method(TSMR)
TSMR utilises GWAS summary statistics obtained from two large sample sets, allowing to use more robustly associated genetic instruments compared with one-sample MR [7]. TSMR in the present study was performed using the Two Sample MR R package (version 0·4·22) from the MR-Base platform [13]. To confirm that the findings of the estimation of the causal effect of the exposures on risk for LOAD are credible, we used diverse methods, including the inverse-variance weighted (IVW), MR-Egger regression, simple median, weighted median, and weighted mode. These multiple methods have been developed and differ from each other in terms of sensitivity to heterogeneity, bias, and power. We selected the IVW method as our primary MR method because it provides reliable results in the presence of heterogeneity in an MR analysis and is appropriate when using a large number of SNPs. The standard error (SE) of the IVW effect was estimated using a multiplicative random effects model. Because results of the IVW may be biased even though only one IV is invalid, we performed the MR-Egger regression that allows all IVs to be invalid under an InSIDE (instrument strength independent of direct effect) assumption [17]. The intercept term in the MR-Egger regression represents an estimate of overall pleiotropy. The null hypothesis for the MR-Egger intercept test is that the intercept term is equal to zero; therefore, we can trust the result of MR-Egger regression if the null hypothesis of the MR-Egger intercept test is rejected. We also tested two median-based estimators: simple median and weighted median which do not require the InSIDE condition and assume that more than 50% of the IVs are valid. The weighted mode provides a single causal estimate based on the largest subset of IVs that have similar causal effects [18].
We used a forest plot to visualise the heterogeneity between the instruments due to horizontal pleiotropy and the contribution of each instrument to the overall estimate [13].
Power calculation
We calculated the statistical power of the MR using an online tool (https://sb452.shinyapps.io/power/) based on the proportion of variance in the exposure (R2) explained by genetic instruments, true causal effect of the exposure on the outcome, sample size, and ratio of cases to controls of the outcome [19]. R2 was obtained from the MR-Steiger directionality test. We estimated the true causal effect based on the observed odds ratios (ORs) between CSF biomarkers and risk for LOAD.
Role of the funding source
The funders of this study had no role in study design, data collection, data analysis, or data interpretation. The corresponding authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.