Design of MR
The MR methodology is a potent statistical method for assessing causal relationships devoid of confounding biases. At the heart of this methodology is the proficient use of genetic variants as Instrumental variables (IVs) in MR estimation. Adhering to MR theory principles, IVs must satisfy three key criteria: (1) strong correlation with exposure factors; (2) adherence to random assignment protocols to avoid confounder influences on the outcome; (3) indirect function following a pre-established causal pathway related to the specific exposure[15]. A study conducted by the MiBioGen Consortium on gut microbiota, involving 13,266 individuals, selected single nucleotide polymorphisms (SNPs) with strong associations with gut microbiota as IVs[16]. Our main objective was to explore potential causal implications of gut microbiota (exposures) on AA (outcome). Both exposure and outcome data were collected from separate independent samples. No additional approval for the current study was needed as previous investigations provided ethical approvals. Refer to Figure 1A for a visual demonstration of MR assumptions and Figure 1B for a comprehensive representation of a two-sample MR study.
Data Sources
An in-depth, genome-wide meta-analysis conducted the most comprehensive study on gut microbiota composition to find genetic variants. The current study enhanced our understanding of gut microbiota diversity and its connection to the human genome. The analysis, led by the MiBioGen consortium, involved 18,340 individuals across 24 cohorts with predominantly European lineage (13,266 participants)[16]. Focusing on variable regions—V4, V3-4, and V1-2—of the 16S rRNA gene, this study delved into microbial communities using direct binning for profiling microbial composition and taxonomic classification. An integral part of the research, microbiota quantitative trait loci (mbQTL) mapping, was used to understand the influence of host genetic variants on gut microbiota. Further, we utilized mbQTL mapping covering 131 genus-level taxa, offering a comprehensive analysis of the diverse microbial community in the human gut[17]. Summary statistics for AA from the publicly accessible GWAS online platform (https://gwas.mrcieu.ac.uk/) comprised 211,428 samples mainly from European participants of the FinnGen consortium [18]. The diagnostic criteria for AA samples were based on ICD-8, ICD-9, and ICD-10. AA samples' diagnostic criteria were based on ICD-8, ICD-9, and ICD-10. The average age at the onset of the initial event was 41.9 years. The AA data was gathered from individuals from a separate consortium, distinct from the gut microbiota GWAS data contributors. Importantly, we extracted SNPs from GWAS, primarily focusing on European ancestry, to limit potential bias due to heterogeneity.
Selection of IVs
IVs were carefully selected based on predetermined criteria [19]. SNPs correlating with each genus at locus-wide significance were considered potential IVs. Linkage disequilibrium (LD) was precisely calculated using European samples from the 1000 Genomes Project. We included only SNPs with an R2 value less than 0.001 and those exhibiting p-values less than 1e-5, and excluded SNPs with a minor allele frequency (MAF) of 0.01 or less, setting strict criteria for SNP selection. SNPs strongly correlating with the outcome were discarded. If no SNPs were found for the outcome, proxy SNPs were sought automatically. For palindromic SNPs, forward-strand alleles were determined based on allele frequency, demonstrating a thorough genomic analysis approach (Figure 1B). Pursuant to previous research and meticulous methodologies, we utilized 1,232 SNPs as IVs for 119 bacterial genera in our MR investigation[17].
Statistical analysis
We used various statistical methods, including inverse variance weighted (IVW), maximum likelihood (ML), MR Egger, weighted median, and weighted mode estimates to identify any causal links between gut microbiota and AA [17].
IVW, often the first choice for two-sample MR analyses when all IVs are strong and valid, was used as the primary analytical method due to its efficiency. IVW results are unbiased in the absence of horizontal pleiotropy. In cases of significant heterogeneity, we used a random-effects IVW model; without such heterogeneity, a fixed-effects model was applied [20]. ML within MR provides reliable parameter estimation and beneficial statistical properties, especially in large sample sizes, allowing model comparison through likelihood value examination. However, ML can face computational complexity and the risk of overfitting data [21]. MR-Egger regression, which presumes the instrument strength independent of direct effect (InSIDE), detects pleiotropy through the intercept term. A null intercept indicates no horizontal pleiotropy and consistency between MR-Egger regression and IVW results [22]. The weighted median method works when more than half of the IVs meet the MR assumptions but can cause bias when fewer than half meet these assumptions [23]. The weighted mode method has the potential for accuracy as it allocates weights to each instrumental variable's effect, but it demands a substantial number of valid gene instrumental variables and extensive computation. Nevertheless, if the selected mode is erroneous due to an excess of similar yet incorrect gene instrumental variables, it might introduce bias [24]. We applied an assortment of MR methods, with IVW as the primary approach. To fortify the validity of our MR analysis, the IVW method, along with at least one other, must yield significant results in the same direction, , and any non-significant methods should align with the direction of the IVW method[25].
In our endeavor to ensure the durability of our results, we conducted various sensitivity analyses, including evaluations of heterogeneity, pleiotropy, single-SNP omission trials, and the MR pleiotropy residual sum and outlier (MR-PRESSO) test[26]. Cochran’s Q test evaluated the dispersion among IVs used in the IVW method, and a p-value below 0.05 was considered indicative of significant heterogeneity [27]. The MR-Egger intercept examination was used to investigate pleiotropy, and a p-value greater than 0.05 indicated no pleiotropic influence. The MR-PRESSO test identified and assessed the effect of outliers. SNP exclusion tests were conducted by sequentially removing each SNP. Consistency of the results, remaining on the same side of zero after each SNP omission, confirmed the robustness and credibility of the causal association. The F-statistic calculated IV potency, and an F-statistic above 10 indicated insignificant weak instrumental bias [28]. The q-value process was used for false discovery rate (FDR) adjustment, maintaining a q value of less than 0.10. An indicative link between gut microbiota genera and AA was inferred when p was less than 0.05, but q was equal to or greater than 0.10[29].
All statistical calculations were performed using the "Two-Sample MR" and "MRPRESSO" packages in R software (version 4.2.3).