Study design
For CSF, we performed MGWAS using a three-stage study design: discovery, replication, and meta-analyses. The discovery stage included 1,224 unrelated non-Hispanic white (NHW) samples from the Knight Alzheimer Disease Research Center (Knight-ADRC), Dominantly Inherited Alzheimer Network (DIAN) and the memory and disorder unit at the university hospital Mutua de Terrasa, Spain (Barcelona-1) cohorts (Supplementary Fig. 1). The replication stage included a total of 1,378 unrelated NHW samples from the Alzheimer's Disease Neuroimaging Initiative (ADNI), Fundació ACE Alzheimer Center (ACE) and the Wisconsin CSF study cohorts (WADRC and WRAP)14 (detailed information of the cohorts can be found in Supplementary Table 1). Metabolomics data of all cohorts were generated using the Metabolon HD4 platform. After rigorous quality control (QC; see methods and Supplementary Fig. 2, Supplementary Table 2), a total of 440 metabolites passed QC (329 non-xenobiotics, 59 xenobiotics and 52 minimally characterized metabolites; minimally characterized metabolites include unknown metabolites and partially characterized metabolites). In order to compare the CSF genetic architecture of metabolites with that of brain, we also performed a large-scaled meta-analyses of brain MGWAS that included a total of 1,016 unrelated NHW participants from three cohorts (Knight-ADRC and DIAN, N = 405; ROSMAP, N = 415; MAYO, N = 196; Supplementary Fig. 1). A total of 962 metabolites (779 non-xenobiotics, 74 xenobiotics and 109 minimally characterized metabolites) were included in the analyses (Supplementary Fig. 2, Supplementary Table 3). Of these 962 metabolites, 360 passed QC in CSF and therefore 602 were uniquely analyzed in brain (Supplementary Fig. 3). Following the identification of associations, we performed deep characterization and functional annotation to identify the effector genes.
In this study, we not only compared the genetic architecture of CSF, brain, blood and urine, but also determined novel vs. known associations and loci by comparing our results with the largest CSF, plasma and urine MGWAS available at the moment of the study. Furthermore, the genetic regulators for brain and CSF metabolites were used to identify genetically dysregulated metabolites (via TWAS/Fusion) and to uncover metabolites causal for 12 neurological and 15 non-neurological traits or disorders through colocalization and MR (Supplementary Fig. 1, Supplementary Table 4). Each of these traits and disorders either has been linked to the central nervous system or is a risk factor for brain disorders.
CSF and brain MGWAS identify hundreds of novel and tissue specific metabolite associations
We performed the largest MGWAS to date in both CSF (N = 2,602) and brain (N = 1,016) tissues. In CSF, significant associations (metabolites-genetic locus pairs) were defined as those with: 1) nominal significance in discovery and replication; 2) effects in the same direction; and 3) study-wide significance in meta-analysis (P < 2.79×10− 10; Supplementary Fig. 1; See material and methods for additional details). Among 440 CSF metabolites we identified a total of 192 associations for 144 metabolites at 102 distinct loci (Fig. 1a&b, Supplementary Fig. 4a&5a, Supplementary Tables 5&6). The effect sizes of the index variants between discovery and replication showed high correlation (r = 0.97; Supplementary Fig. 6), indicating high replicability and that both stages contributed to the association. Of these 192 associations, 173 associations belong to 130 well characterized metabolites (123 non-xenobiotics and 7 xenobiotics) and 19 associations belong to 14 minimally characterized metabolites (Supplementary Table 2&5).
Given the potentially complex nature of certain associations, we performed conditional analysis to identify independent associations within the identified regions. To be considered as an independent signal, we required that the variant passed the study-wide threshold (P < 2.79×10− 10) in the conditional analyses. Of the 192 associations, 13 had two independent signals, and seven had three independent signals (Supplementary Fig. 7a). Therefore, in CSF we identified a total of 219 independent association signals for 144 metabolites at 102 loci. Of the seven association regions with three signals, two were novel (Supplementary Table 5). One of them was an association between methylmalonate (MMA) and the ACSF3 region (16q24.3). Its primary signal included a missense variant (rs11547019 - p.Ala17Pro) for malonyl-CoA synthetase family member 3. While the two additional signals did not include any SNP that modifies protein sequence, but harbored known expression quantitative loci (eQTLs;P < 10− 4) for ACSF3. In the other case, three independent signals were found at the CNDP1 gene region which was associated with homocarnosine, a substrate of CNDP1’s encoded enzyme. While the primary signal (rs56042934) was intronic to CNDP1 with unknown mechanism of action, the two other signals cause benign (rs73973908) and deleterious (the lead variant rs140836083) missense changes to the enzyme. In addition, there could be an underestimation in the number of independent signals, because the use of study-wide threshold can be too stringent. In fact, associations with higher significance are more likely to have complex regions based on student’s t test (student’s t = 3.24, P = 4 ×10− 3; Supplementary Fig. 7b). Regardless, these results indicated that several independent signals may regulate metabolites levels in the same locus through multiple independent events (some signals change protein sequence or even protein function, and others alter protein level by modifying mRNA level)17,21.
As our study was enriched for Alzheimer’s diseases (AD) patients, our results may have been affected by participants’ health status. We therefore conducted sensitivity analyses by performing MGWAS including either only healthy (N = 883) or AD individuals (N = 769) based on the biomarker status (amyloid/tau/neurodegeneration (ATN) classification (see extended results)). The effect sizes of the index variants showed high correlation with that of analyses included either controls-only (r = 0.96; P < 2.2×10− 16) or ADs-only (r = 0.97; P < 2.2×10− 16) individuals, indicating that the disease status minimally affected our identified associations (Supplementary Fig. 8; extended results).
In brain MGWAS, only associations that had the same direction of effect for two or more three cohorts (for shared metabolites) and had study-wide significance in the meta-analyses were considered significant (Supplementary Fig. 1). The brain MGWAS (n = 1,016 individuals and 962 metabolites) identified 35 associations for 34 metabolites at 27 loci (Fig. 1a&c, Supplementary Fig. 4a&5a, Supplementary Fig. 9, Supplementary Tables 7&8, see extended results). Conditional analysis identified one additional independent signal for cytidine (Supplementary Fig. 7c). Therefore, we identified a total of 36 independent association signals for 34 metabolites at 27 loci. Of these associations, 16 were identified in both, CSF and brain (PP.H4.abf > 0.8, Supplementary Table 9). Many more associations were identified in CSF compared to brain that could be due to either CSF and brain having different genetic architectures or simply because CSF had higher statistical power due to sample size. To address this, we examined the study-wide significant associations (180 associations for 133 metabolites) of the 360 metabolites present in both tissues. The effect size for these 180 associations showed a high correlation between CSF and brain (r = 0.81, p < 2.2×10− 16; Supplementary Fig. 10f; extended results), indicating that the genetic architecture of metabolites levels is similar between CSF and brain. Additionally, we applied Mashr16 approach to compare study-wide associations between CSF and brain in direction and magnitude. We found that 90% of associations had the same direction of effect and 57% of associations shared effects in both direction and magnitude.
We then performed additional analyses comparing the overall genetic architecture of metabolite levels across four different tissues: brain, CSF, blood and urine, using mashr (Supplementary Fig. 11). We used the latest blood and urine MGWAS available at the time of the study9,12. For this comparison, we focused on the 247 metabolites that were tested across all tissues and their genome-wide significant signals. All tissue pairs had over 80% of consistent direction of associations, with the highest percentage in CSF and blood (91%; Supplementary Fig. 11b, Supplementary Table. 10). When both direction consistency and magnitude similarity (within 2-fold) were considered, CSF and brain showed the highest (57%) overlap of associations followed by blood and urine (32%). Additionally, brain and urine had more direction specific associations (effect direction in one tissue being different from other tissues) than other tissues, indicating unique genetic regulation in brain metabolism and renal function (urine metabolite levels). These findings emphasized the need to analyze brain-related tissues in MGWAS in order to better understand neurological diseases.
Finally, we examined whether the CSF and brain MGWAS led to any novel signal by comparing to the large blood and urine studies and the CSF study of which was included in our meta-analysis1,8,9,12,14. Of the 219 independent association signals (192 associations) in CSF, 88 signals (70 association regions) had been previously reported in at least one of the five large-scale Metabolon-platform based studies (PP.H4.abf > 0.6; Fig. 1d; Supplementary Fig. 12). We found that 97.7% of the 131 novel association signals (at 113 novel regions and 9 reported regions) originated from previously examined metabolites in blood or urine, suggesting that our associations may be specific to CSF. The 131 novel signals corresponded to 24 novel loci and 49 previously reported loci that were associated with different metabolites (Fig. 1d; Supplementary Fig. 13).
For brain, 16 independent signals (16 associations) of the 36 signals (35 associations) have been reported (Fig. 1d) in previous studies (PP.H4.abf > 0.6; Fig. 1d, Supplementary Fig. 13). Therefore, we identified 20 novel association signals (at 13 novel association regions and six reported regions) for 18 metabolites, in which six signals (4 novel loci) were for metabolites not analyzed in any previous study. In addition, these 20 novel association signals correspond to seven novel loci and six reported loci associated with different metabolites than the one identified here (Fig. 1d; Supplementary Fig. 13, see extended results).
Pleiotropic loci and polygenic metabolites
Pleiotropic analyses can be instrumental to identify metabolites that are part of the same metabolic reaction or are unknown substrates, or products of one specific reaction. In CSF, 43 of the 102 identified loci were associated with more than one metabolite (Supplementary Table 6). Most of these loci were associated with two (23 loci) or three (10 loci) metabolites, although there were two loci with four metabolites, four loci with five metabolites, two loci with six, one locus associated with seven metabolites, and one locus associated with ten metabolites (Fig. 2a, b; Supplementary Table 6; Supplementary Fig. 14). The most pleiotropic CSF locus, located at SLC13A3/ADA gene region, was associated with ten metabolites (Fig. 2b, Supplementary Fig. 14a). This region was a complex region as there were two independent signals (based on r2 > 0.8): rs406383, intronic to ADA, was associated with N1-methyladenosine and rs439143, intronic to SLC13A3, was associated with nine different metabolites. Of these nine metabolites, seven were potential direct substrates of the transporter encoded by SLC13A3, being either amino acid derivatives or Krebs cycle components22, and the others were carnitine molecules secondary to Krebs cycle components.22
In brain, six of the 27 loci were pleiotropic and associated with either two (four loci) or three (two loci) metabolites (Fig. 2d-f). All six loci were identified as pleiotropic in CSF as well. The metabolites associated with these loci were often shared by both tissues, while additional metabolites were identified from brain due to either metabolites uniquely analyzed in brain or brain-tissue-specific associations not identified in CSF (see extended results).
The pleiotropic nature of many loci corresponds to known biological mechanisms, as in the case of SLC13A3/ADA locus, as in the case of CPS1, which encodes an enzyme catalyzing the first step of urea cycle. The variants in this region were associated with metabolites (Supplementary Fig. 14b, n = 6; i.e homoarginine, glycine, glutamine degradant, among others) that are part of urea cycle or alternative ammonia elimination pathways23. The APOE/APOC1 locus was associated with five lipid metabolites, including cholesterol and four phosphatidylcholines (1,2-dipalmitoyl-GPC (16:0/16:0), 1-myristoyl-2-palmitoyl-GPC (14:0/16:0), 1-palmitoyl-2-stearoyl-GPC (16:0/18:0), 1-palmitoyl-2-palmitoleoyl-GPC (16:0/16:1)). Apolipoprotein E is known to interact with lipoproteins and function as cholesterol and phosphatidylcholines carrier24. APOE variants are one of the major genetic risk factors of Alzheimer’s disease (AD) and cholesterol has been associated with AD development downstream of Aβ and Tau pathology. Several studies also indicate that phosphatidylcholines may lower the risk for dementia and AD25–27. These five metabolites were also predicted to be associated with AD based on the MWAS analyses, and all of them were found lower in AD patients CSF based on differential abundance analysis (p < 0.05; Extended data Table 1, and extended results).
In addition, many metabolites were polygenic, meaning that multiple loci were associated with the same metabolite. In CSF, of the 144 metabolites with study-wide association(s), 37 metabolites were associated with multiple loci: 29 metabolites were associated with two loci, six metabolites with three, one metabolite methylsuccinoylcarnitine with four and one metabolite, bilirubin (E,E), was associated with five loci (Fig. 2c, Supplementary Table 5). The nominated effector genes (See section “In silico functional annotation of the CSF and brain associations”) for bilirubin (E,E) were UGT1A6 (2q37.1), GYPA (4q31.21), TWISTNB (7p21.1), FAS (10q23.31), and SPRY2 (13q31.1). UGT1A6, encodes an enzyme that transform bilirubin to water-soluble molecules and GYPA encodes the major intrinsic membrane protein of the erythrocyte, where bilirubin is generated28. Mutations in FAS leads to an autoimmune lymphoproliferative syndrome (ALPS) that is associated with hyperbilirubinemia29. However, the role of SPRY2 in bilirubin metabolism is unknown but these findings suggest that it is also part of the pathways that produce or regulate bilirunin. The metabolite methylsuccinoylcarnitine was associated with four loci, which signals were predicted to affect CPT2 (1p32.3), SUCLG2 (3p14.1), ACADS (12q24.31), and SLC13A3 (20q13.12) (Fig. 2c, Supplementary Table 5). Its association with CPT2 has been reported previously, yet the mechanism is unknown. The other three loci were novel, and their nominated functional genes were implicated in the metabolism of methylsuccinoylcarnitine. Both SUCLG2, encoding a succinyl-CoA ligase, and the metabolite is involved in succinyl-CoA pathways. Mutation in ACADS causes short-chain acyl-CoA dehydrogenase deficiency and methylsuccinate level was altered in this disorder30. The SLC13A3 encoded protein can transport succinate, which is a building block for methylsuccinoylcarnitine. This is the first time these genes have been linked to bilirubin and methylsuccinoylcarnitine levels. Additional functional analyses will be needed to characterize them in the context of these polygenic metabolites.
In silico functional annotation of the CSF and brain associations
To identify the effector gene for each association, we applied two complementary strategies (Fig. 3a, b). The first strategy is based on the ProGeM31 program which incorporates both genetic annotation and broad metabolism relevance; it prioritizes a gene if 1) the associated signal (the sentinel variants and its tagged variants (r2 > 0.8)) leads to a change in protein sequence, 2) the gene in the loci belongs to metabolic pathways, 3) the gene that harbors an eQTL overlaps with the association signal, and 4) it is the nearest gene to the sentinel variant (Supplementary Fig. 15)31. The second strategy is based on the manually curated biological knowledge, which relies on metabolite-gene relationship from KEGG32, GeneCards33, and HMDB34 databases.
For the 219 CSF signals, the ProGeM-strategy nominated 130 genes for 219 signals and the knowledge-based strategy nominated 89 genes for 165 signals (Fig. 3a). For brain, the ProGeM strategy nominated 29 genes for 36 signals and the knowledge-based strategy nominated 17 genes for 23 signals (Fig. 3b). Both strategies provided consistent predictions, with the same effector gene being nominated in 83.6% and 78.3% of CSF and brain associations (Fig. 3a, b). In case of discordance (27 CSF and 5 brain associations), the gene nominated from the biological knowledge-based strategy was prioritized over the ProGeM, as we confirmed that the ProGeM-strategy nominated gene was not biologically meaningful to the metabolite (see extended results).
Once the effector gene was nominated for each association signal, we categorized the associations firstly based on the location and consequence of variants to the effector gene, and subsequently based on eQTLs to the effector gene. Categorizing by consequence to the nominated genes, the association included a protein-sequence-altering variants (missense or splice acceptor variants) in 28.3% of CSF (62 association signals mapped to 39 genes) and 19.4% of brain associations (seven association signals mapped to 6 genes; Fig. 3c, d, Supplementary Fig. 16, Supplementary Table 5). Of these, 25 of the 62 CSF and three of the seven brain associations were deleterious to protein functions, predicted by SIFT and PolyPhen35,36 and of these, ten CSF and two brain deleterious associations are novel. Based on CSF, loss-of-function or deleterious variants had higher effect sizes (deleterious vs. benign missense: t = 3.3, p = 2 ×10− 3; deleterious v.s. non-coding: t = 3.1, p = 6 ×10− 3) and lower minor allele frequencies (deleterious vs. benign missense: t = -3.2, p = 2 ×10− 3; deleterious vs. non-coding: t = -3.2, p = 3 ×10− 3) than those that were non-coding or were predicted to be benign (Fig. 3e-h). We identified that 58.9% of the CSF (129 signals mapped to 81 genes) and 72.2% of the brain association signals (26 signals mapped to 19 genes) included an eQTL (Supplementary Table 5). In 15.1% of the CSF association signals (29 signals in 21 genes) and in 11.4% of the brain association signals, the same prioritized gene was supported by both altered protein sequence and eQTL variant evidence (Supplementary Table 5&7).
Among the nominated effector genes, 91.8% in CSF (87.9% of unique genes) and 77.8% in brain (75.0% of unique genes) encoded enzymes or transporters (Supplementary Fig. 17c, d). In addition, 42.9% and 34.4% of the total nominated effector genes for the CSF and brain association signals correspond to cis-proteins, defined as enzymes and transporter (production, degradation, transport) for a specific metabolite (Supplementary Fig. 17a, b).
Pathway analyses using KEGG database with the nominated effector genes found enrichment of many metabolism-related processes. The 127 unique nominated effector genes in CSF were enriched for branched chain amino acid degradation (map00280: P = 6.6×10− 12), alanine, aspartate and glutamate metabolism (map00250: P = 7.5×10− 9), and pyrimidine metabolism (map0024: P = 2.0×10− 8; Supplementary Fig. 17e). The 29 unique genes in brain were enriched for pyrimidine metabolism (map0024: P = 1.3 × 10− 5) and drug metabolism (map00983: P = 4.9 × 10− 5; Supplementary Fig. 17f).
Then, we investigated if the nominated genes showed an enrichment for any specific brain cell type. The cell type specificity was determined for each gene based on gene expression (see material and methods)37. We found that the nominated effector genes for the CSF associations were enriched for astrocytes (log2FC = 1.66, p = 4.7×10− 5, Supplementary Fig. 18), which were the key regulators of brain energy metabolism38.
Insights into brain-related phenotypes using genetically regulated metabolites
Metabolism dysregulation, observed in many disorders, can be part of the causal pathway and potentially be good targets for intervention. The plasma MGWAS by Chen et al. identified 95 causal relationships for 12 phenotypes (five phenotypes included in this study) including O-sulfo-l-tyrosine for PD and the ratio of choline phosphate/choline for AD2. The recent urine MGWAS study identified 684 relationships between 110 metabolites and 68 phenotypes (no phenotype overlapped with this study) through colocalization analyses 9. An earlier CSF MGWAS study14 identified 19 metabolites-trait pairs for multiple neurological and psychiatric disorders including attention deficit hyperactivity disorder (ADHD)-malate and schizophrenia-N-delta-acetylornithione14, through metabolome-wide association (MWAS) analyses.
Here, we integrate our CSF and brain MGWAS data to identify potential biomarkers and causal for 27 brain and wellness-related traits or disorders (Alzheimer’s disease, alcoholism, cognitive performance, among others; Supplementary Table 4), by integrating 1) MWAS (FUSION-approach39), 2) colocalization40,41 3) MR42 and 4) drug repositioning43. Previous studies have used each of MR, colocalization or MWAS approaches. Here we integrated all three approaches along with drug repositioning to identify causal and druggable metabolites for complex traits for the first time2,9,13.
To identify metabolites dysregulated with those traits, the FUSION approach was used to build metabolite level prediction models based on study-wide significant associations and performing association analysis between predicted metabolite levels and phenotypes. The weights for predicting metabolites were calculated for 92.4% (133/144) of CSF metabolites and 85.3% (29/34) of brain metabolites that had at least one heritable association region (Supplementary Table 11). Through this approach, we identified 62 CSF metabolite levels associated with 19 phenotypes including ADHD, alcoholism, bipolar disorder (128 metabolite-phenotype pairs), and nine brain metabolites associated with 12 phenotypes (22 metabolite-phenotype pairs; Fig. 4, Supplementary Figs. 19 & 20, and Supplementary Tables 12 & 13). Both CSF and brain analyses identified seven metabolite-trait pairs, including four metabolites (succinylcarnitine (C4-DC), N6-methyllysine, methylsuccinate, ethylmalonate) and six traits (AD, baldness, educational attainment, major depressive disorder, schizophrenia, smoking initiation). Across tissues, these associations showed consistent effects in both direction and magnitude (Supplementary table 14). In total, we identified 140 unique metabolite-traits pairs in CSF and/or brain, in which only five were reported in the previous CSF MWAS study (Supplementary Table 15)14. Therefore, the remaining 135 metabolite-traits pairs are novel. We found the trait waist-to-hip ratio adjusted for BMI (WHRadjBMI) was associated with 21 metabolites (the largest number), education attainment with 19 metabolites, cognitive performance with 13 metabolites, schizophrenia with ten metabolites, and Alzheimer’s disease with nine metabolites.
To investigate whether the metabolite-phenotype associations identified through MWAS had the same functional variant for the metabolite and the phenotype, we performed colocalization analysis. Of the 128 CSF metabolites-trait (disease) associations, 26 pairs showed colocalization between the two traits (PP.H4 > 0.6; Fig. 4, Supplementary Table 16). Of the brain 22 associations, seven pairs showed colocalization (PP.H4 > 0.6; Fig. 4, Supplementary Table 17, see extended results).
To infer causal metabolites, we performed MR excluding highly pleiotropic regions (associated with > 5 metabolites)44,45. In CSF, we identified 38 metabolites causal for 22 traits after FDR correction (78 pairs; Supplementary Table 18). For brain, we identified 11 causal metabolites for 10 traits (20 pairs; Supplementary Table 19). In total, we identified 92 causal effects involving 46 metabolites and 22 phenotypes from both tissues. There were five causal relationships identified in both tissues, including for example succinylcarnitine for AD and HDL (Supplementary Table 20). In addition, we conducted a sensitivity analysis by performing MR using a more stringent method which removed all genetic regions associated with more than one metabolite. The sensitivity analysis identified 46 causal relationships between 20 metabolites and 18 phenotypes from both tissues (Supplementary Table 21–24, Supplementary Fig. 21).
The differences between the findings from the standard and stringent MR analysis come from how pleiotropic regions were defined. However, in some of these scenarios pleiotropic effects may identify relevant biological processes. For example, when a finding was pointing to an enzyme that catalyzes a specific metabolic reaction, changes in the activity of the enzyme will affect at least two metabolites: the direct substrate and the direct product. In some situations, it may affect more analytes if multiple substrates and products are involved. This could be the case where a signal (lead by rs17279437) affected ACADS46, which gene encodes an enzyme in beta-oxidation where fatty acids carried by carnitines were broken down to produce energy. This signal was associated with various metabolites involved in beta-oxidation pathways, like acylcarnitine related molecules, methylsuccinate47 and methylsuccinoylcarnitine, and fatty acids such as ethylmalonate. In the other scenarios, the signal may be driven by a metabolite channel or transporter, where genetic variants that decrease the activity of this transporter will lead to changes in levels of several metabolites. For example, the signal (lead by rs17279437) that affected a transporter encoding gene SLC6A2048 was associated with multiple substrates of the transporter, such as proline, betaine, and dimethylglycine. Therefore, although each single metabolite might not be causal, what is leading to the disease maybe the dysregulation of a specific metabolic process. However, these events will be identified as source of pleiotropic effects and were therefore removed from the MR analyses, leading to many false negative findings. Therefore, for MGWAS it may be necessary to reconsider how we may adjust the definition of pleiotropy by incorporating biological knowledge. To conclude, the intertwined nature of metabolic pathways often resulted in pleiotropic effects of signals, creating challenges for MR approaches, which may redirect us to identify metabolic reactions rather metabolites themselves.
In any case, we examined how many of the metabolites-traits pairs we found in our MWAS and MR were also reported in previous CSF, plasma or urine studies2,14. We replicated one of the three findings of the previous CSF study, which was brain ethylmalonate’s causal effect on schizophrenia, while the other two (N-delta-acetylornithine’s causal effect on cognitive performance and schizophrenia) were not replicated due to its pleiotropic signal at NAT8. Among 95 causal metabolite (or metabolite ratio)-phenotype relationships identified by the plasma Chen et al study2, we were able to analyze six pairs (three phenotypes and four metabolites), but were unable to replicate these findings: four due to tissue-specific findings, one due to study power difference, and one due to instrument variable selection difference (see extended results). At the same time, our analyses identified five causal metabolites-disease pairs that were previously tested but were not found as significant and therefore represent novel associations. These included bilirubin (Z,Z) for type 2 diabetes (T2D) and N-acetylhistidine for WHRadjBMI (Supplementary Table 18). These findings were driven by tissue specific MGWAS findings, highlighting the need to not just perform larger studies on plasma, but to expand these studies to additional tissues.
Then we integrated the findings from the three analyses: MWAS, colocalization and MR. In the CSF, 26 metabolite-trait pairs were significant for MWAS and MR, including nine pairs with colocalization evidence (Supplementary Table 25). These included six metabolites (Fig. 4) for seven traits (Alcoholism, bipolar disorder, WHRadjBMI, brain volume, cognitive performance, PD, and T2D). In brain, 11 pairs were significant in both MWAS and MR, in which two metabolites, N6-methyllysine and N6,N6-dimethyllysine showed colocalization with baldness (Supplementary Table 26).
For cognitive performance, we found causal associations with lower levels of two metabolites, 6-oxopiperidine-2-carboxylate and 3-hydroxyisobutyrate, based on all three analyses: MWAS, MR and colocalization (Fig. 4 & Supplementary Table 25). These associations were not found in the previous plasma studies because the genetic associations with these metabolites are CSF-specific. Some previous studies supported these metabolites influencing cognition. 6-oxopiperidine-2-carboxylate and 3-hydroxyisobutyrate have been linked to cognition in AD or epilepsy-specific studies49,50.
We also found that the higher levels of mannose may be causal to alcoholism, T2D, bipolar disorder, and PD based on MWAS, MR and colocalization analyses (Fig. 4 & Supplementary Table 25). The nominated effector gene for mannose, GCKR, was shown to affect both lipids and carbohydrates, including sphingolipids, glycerolipids, and serine (key connecter of amino acids to lipids, carbohydrates)51, suggesting that lipids and carbohydrates pathways may play an important role in alcoholism, T2D and PD. Mannose is known to be involved in alcoholism metabolism as it had an anti-steatosis role in alcoholic liver disease52,53. High mannose level at fasting have been associated with insulin resistance in diabetic individuals independent of obesity level54. In addition, MBL2 (encodes mannose-binding lectin 2), which is implicated in mannose metabolism, has been found to be associated with bipolar disorder in genetic studies, supporting the causal role of this metabolite in bipolar disorder55–57. Our analyses indicate that lower mannose levels were associated with higher risk for bipolar disorder patients, and thus prescribing mannose, an available supplemental substance, may be useful to study as a potential intervention for bipolar disorder. Moreover, this is the first time that mannose was causally associated with PD based on literature.
Besides mannose, lower galactosylglycerol levels were potentially causal to Parkinson Disease (PD) through a signal that colocalized at GALC (Fig. 4 & Supplementary Table 25). The gene GALC encodes galactosylceramidase that removes galactose from ceramide derivatives, while the role of galactosylceramidase in galactosylglycerol metabolism in PD remains unknown. Knockout of GALC prevented alpha-synuclein accumulation in PD mice model, indicating that the enzyme galactosylceramidase may accelerate the development of PD by reducing galactosylglycerol58.
Additionally, lower xanthine level was predicted to have a causal effect for WHRadjBMI with support from MWAS, MR, and colocalization analyses (Fig. 4 & Supplementary Table 25). High level of xanthine oxidase activity, which reduced xanthine level, has been observed in obese individuals59. Therefore, xanthine as dietary supplement could be tested for potential intervention for individuals who suffer from obesity. Finally, an unknown metabolite X-24228 was found causal to brain volume, with their signals colocalized at a novel loci CLDN16. This gene plays a role in cell-adhesion, which is a crucial component in brain development60.
In brain, we identified N6-methyllysine and N6,N6-dimethyllysine to be causal for baldness, according to MWAS, MR, and colocalization analyses (Fig. 4 & Supplementary Table 26). The PYROXD2 was the effector gene for both metabolites. Interestingly, CSF N6-methyllysine level neither shared the same causal signal with baldness nor had causal effect towards baldness, which could be explained by the tissue-specific association of N6-methyllysine (Supplementary Table 9).
Overall, we identified 11 high-confident metabolites-traits causal relationships (nine in CSF and two in brain) that were supported by the MWAS, MR and colocalization analyses. These associations were novel due to either the MGWAS signal being tissue-specific, or the metabolite not analyzed in other tissues. Previous studies only performed one or two of these three types of analyses to identify metabolites implicated in or causal to traits. Here, we reported metabolites-traits that were identified not only by these two approaches but also significant in the third method, which made our analyses more stringent. Many more metabolites-traits pairs were found in our analyses if we only require to be associated in two of those approaches: additional 43 metabolites-trait associations (34 in CSF and 14 in brain), were supported by two approaches (MWAS + Coloc; MWAS + MR), such as succinylcarnitine and adenine for AD, and (N(1) + N(8))-acetylspermidine and 5-methylthioadenosine (MTA) for brain volume and others (Extended results, Supplementary Table 27). In these 43 additional pairs, 41 were novel and therefore warrant future exploration.
The Druggable metabolites
The metabolites identified in these analyses could be drug targets for improving disease outcomes or achieving desired phenotypes. Based on DrugBank database43,18 in 67 of the metabolites identified in the MWAS analyses were of pharmacological interest (Supplementary Table 28): there are six metabolites that are either approved drugs or being targeted by approved drugs. Betaine, which higher level was found to be associated with ADHD, Autism, and WHRadjBMI, is used for the treatment of homocystinuria to decrease elevated homocysteine blood levels61. Valine, which were also positively associated with ADHD, Autism and WHRadjBMI, is a crucial component of parenteral nutrition and the treatments such as Aminosyn II 7% was approved for premature infants62. Asparaginase treatments, such as pegaspargase, reduce asparagine levels and were approved for acute lymphoblastic leukemia. Lower asparagine level was associated with WHRadjBMI. Adenine (rejuvesol treatment), which higher level was found in AD, ADHD, and smaller brain volume, is approved for Sickle Cell Disease (SCD)63, suggesting that adenine may be implicated in multiple complex traits but with opposite effects. Therefore, if adenine is going to be targeted for therapeutic intervention, it will be important to track potential increased risk for other traits. Statins are FDA approved drugs to lower cholesterol levels in cardiovascular disease and others64. We found higher cholesterol level being associated with T2D and WHRadjBMI, and therefore the use of statins could also be used to treat T2D and obesity. In addition, five metabolites are commercial dietary supplements, and the three others are at experimental stages (See extended results for a detailed description of these findings; Supplementary Table 28). These results indicated that some of the metabolites identified as potential causal factors for these traits are druggable, but at the same time, due to the complex nature of the metabolism regulation, changing the levels of those metabolites may also increase risk of other diseases, and therefore a close monitoring of those potential secondary effects will be needed.
In addition, in some cases, the nominated effector gene, instead of the metabolite itself can be also druggable. For example, Tipiracil, an approved drug for gastric or colorectal malignancies, can inhibit the transferase activity of thymidine phosphorylase, which leads to higher levels of 2'-deoxyuridine. Our study showed that 2'-deoxyuridine was lower in inflammatory bowel disease (IBD) through TYMP locus (rs140522, p = 3.98 × 10− 57) by MWAS, and consistently, a recent study showed that high uridine/2'-deoxyuridine ratio was causal for IBD2. Thus, the increase in 2'-deoxyuridine level by Tipiracil might provide therapeutic benefits in IBD, although experimental evidence would be needed to support this hypothesis. In another example, Belinostat and Panobinostat were pharmacologically approved inhibitors of Histone deacetylase 10, which regulates polyamine substrates including (N(1) + N(8))-acetylspermidine and diacetylspermidine (rs61748567, p = 5.56 × 10− 15; rs143617749, p = 1.54 × 10− 33). Higher levels of these metabolites were observed in brains with shrinked sizes based on MWAS. CNS injury was associated with an increase in N1-acetylspermidine level in rat brain, indicating a link between polyamine acetylation and impaired brain function65. Therefore, increased (N(1) + N(8))-acetylspermidine and diacetylspermidine levels by Belinostat or Panobinostat may likely lead to side effect of a reduced brain size.