The workflow of the study
The overall study workflow was shown in Fig. 1. To identify potentially causal proteins, we systematically linked high-throughput proteomics in the blood to AD. On the one hand, we used plasma protein data obtained from the UKB-PPP and findings from the latest AD GWAS by Céline et al[18] (111,326 AD cases, 677,663 controls) to perform a PWAS analysis using a genetically informed framework. PWAS is designed to detect gene-phenotype associations mediated by changes in protein levels. PWAS integrates protein expression levels with genome-wide association studies to estimate the associations between genetically regulated protein levels and trait of interests. Then, we used the summary data-based Mendelian randomization followed by its accompanying heterogeneity in dependent instruments (SMR and HEIDI)[20] and Bayesian colocalization analysis (COLOC)[29] to provide additional support for PWAS-identified proteins. On the other hand, we performed a cohort study to probe AD-associated proteins based on the protein levels at baseline, followed by sensitivity analyses. In addition, we provided evidence between potentially causal proteins and early-stage phenotypes of AD including mild cognitive disorder and brain imaging features. Lastly, we embedded the newly identified proteins with well-studied AD-related proteins by a protein-protein interaction network.
PWAS of Alzheimer’s disease
To identify genetically regulated proteins associated with AD, we performed a PWAS by integrating the AD GWAS results and human plasma proteomes profile. We utilized data from the latest large-scale AD GWAS, which included 111,326 AD cases and 677,663 controls of European descent. The UK Biobank Pharma Proteomics Project, encompassing 45,540 ancestrally European participants, served as the data source for establishing SNP-protein associations. A total of 2,817 proteins were included for analysis after quality control, among which 1,146 proteins were predictable (Pearson’s correlation r > 0.1) and their prediction models were generated by elastic net regression (Supplementary Table 1). The PWAS identified 30 proteins whose genetically predicted levels were found to be associated with AD (false discovery rate, FDR < 0.05, Fig. 2, Table 1 and Supplementary Table 1). Of the 30 AD risk proteins identified by PWAS, 14 (CR1, PILRA, ACE, EPHX2, CD2AP, TREM2, CD55, GRN, IL34, LPA, SIRPA, ACHE, TREML2, and C1R)[30–45] have been reported to be associated with AD. The remaining 16 proteins (PILRB, FES, CR2, C1S, LRRC37A2, PRSS53, HAVCR2, TNFSF13B, HDGF, GC, LRP11, ITGAL, IDUA, DHRS4L2, SH3BP1, and BPIFB2) were newly discovered. Detailed information can be found in Table 1. In summary, we confirmed the associations between 14 proteins and AD, and identified another 16 proteins whose genetically determined abundance in plasma were associated with the risk of AD.
Table 1
PWAS-identified AD-associated proteins.
Protein | CHR | Z score | Effect size | P value | New identified |
---|
CR1 | 1 | 9.407 | 0.164 | < 1E-320 | No[32] |
PILRB | 7 | 8.529 | 0.081 | < 1E-320 | Yes |
PILRA | 7 | 8.373 | 0.090 | < 1E-320 | No[46] |
ACE | 17 | -7.961 | -0.170 | 1.78E-15 | No[35] |
EPHX2 | 8 | 7.714 | 0.667 | 1.22E-14 | No[35] |
TREM2 | 6 | -7.518 | -0.444 | 5.57E-14 | No[44] |
CD2AP | 6 | 7.153 | 0.266 | 8.48E-13 | No[43] |
GRN | 17 | -5.501 | -0.323 | 3.77E-08 | No[38] |
CD55 | 1 | -5.384 | -0.127 | 7.29E-08 | No[41] |
IL34 | 16 | -5.071 | -0.063 | 3.96E-07 | No[33] |
LPA | 6 | -4.575 | -0.052 | 4.77E-06 | No[31] |
HAVCR2 | 5 | -4.422 | -0.238 | 9.76E-06 | Yes |
PRSS53 | 16 | -4.354 | -0.051 | 1.34E-05 | Yes |
C1S | 12 | 4.351 | 0.097 | 1.36E-05 | Yes |
SIRPA | 20 | 4.162 | 0.029 | 3.16E-05 | No[47] |
TREML2 | 6 | 4.098 | 0.098 | 4.17E-05 | No[42] |
CR2 | 1 | 4.060 | 0.136 | 4.90E-05 | Yes |
TNFSF13B | 13 | -4.046 | -0.141 | 5.21E-05 | Yes |
LRRC37A2 | 17 | -4.027 | -0.047 | 5.64E-05 | Yes |
ACHE | 7 | -3.971 | -0.101 | 7.15E-05 | No[30] |
HDGF | 1 | 3.774 | 0.029 | 1.61E-04 | Yes |
IDUA | 4 | 3.735 | 0.053 | 1.88E-04 | Yes |
GC | 4 | 3.629 | 0.034 | 2.85E-04 | Yes |
C1R | 12 | 3.568 | 0.109 | 3.60E-04 | No[40] |
LRP11 | 6 | -3.516 | -0.053 | 4.39E-04 | Yes |
DHRS4L2 | 14 | -3.468 | -0.127 | 5.24E-04 | Yes |
ITGAL | 16 | -3.441 | -0.235 | 5.80E-04 | Yes |
SH3BP1 | 22 | 3.404 | 0.148 | 6.64E-04 | Yes |
ZBTB16 | 11 | 3.321 | 0.088 | 8.98E-04 | Yes |
FES | 15 | 3.256 | 0.228 | 1.13E-03 | Yes |
This table gives the z-scores and effect size for the AD PWAS associations with their corresponding P values for proteins passed the FDR correction.
Additionally, to understand whether these 30 proteins identified by PWAS were specifically distributed in certain tissues, we visualized the gene expression profile across 54 tissues from the GTEx project by a heatmap. The average log-transformed transcript per million (TPM) are shown in Fig. 3 and Supplementary Table 13. The heatmap revealed genes with specific expression patterns including the ones highly expressed in brain tissues as well as other tissues. Notably, PILRB and HDGF exhibited ubiquitous expression across nearly all examined tissues. This observation implies that the regulatory mechanisms governing these genes expression are likely systemic, affecting a broad range of tissues, including brain tissues.
Mendelian randomization and colocalization analyses support the role of plasma protein on AD
PWAS may result in false positive identification due to linkage or pleiotropy effect, to further priority the associations identified in PWAS, we performed SMR and HEIDI[20] analysis and Bayesian colocalization analysis[29] on the basis of different hypothesis for causality inference. We firstly used the genotype data and plasma protein data for pQTL analysis.
Then, we used the pQTL information and the same AD GWAS data for SMR and HEIDI analysis. Among the 30 proteins identified by PWAS, SMR analysis supported most of the findings (25 / 30, SMR P < 0.05, Table 2 and Supplementary Table 2) while the HEIDI test, a post filtering step in SMR using multiple SNPs in a cis-pQTL region to distinguish pleiotropy from linkage, suggested that the 8 of the associations may attribute to linkage (HEIDI P < 0.05, Table 2 and Supplementary Table 2). Token together, the SMR and HEIDI supported the role of 17 proteins (namely, DHRS4L2, ITGAL, C1R, CD2AP, HDGF, IL34, GRN, C1S, TNFSF13B, ZBTB16, TREM2, PILRB, GC, FES, ACHE, LRP11, and SIRPA), including eight novel ones, on the risk of AD.
Furthermore, colocalization analysis was performed by implementing a Bayesian colocalization approach COLOC. COLOC analysis showed a concordant signal distribution between pQTL and GWAS for TREM2, GRN, CIR, DHRS4L2, and SH3BP1.Notablly, four of them (namely, TREM2, GRN, C1R, and DHRS4L2) were also supported by the SMR and HEIDI analysis (Table 2 and Supplementary Table 3).
In summary, 18 of the 30 proteins were supported by additional genetically informed approaches with more strict assumptions, showing the robustness of the identifications.
Table 2
COLOC and SMR analysis of the 30 significant proteins in AD PWAS.
| | COLOC | SMR |
---|
Protein | CHR | PPH4 | SMR P | HEIDI P |
CR1 | 1 | 8.22E-26 | 3.88E-29 | 4.81E-06 |
PILRA | 7 | 8.45E-04 | 1.75E-15 | 1.29E-03 |
PILRB | 7 | 1.82E-10 | 1.70E-15 | 3.98E-01 |
ACE | 17 | 2.25E-09 | 3.49E-11 | 2.92E-02 |
EPHX2 | 8 | 2.00E-23 | 1.51E-06 | 8.15E-03 |
TREM2 | 6 | 1.00E + 00 | 3.05E-09 | 3.89E-01 |
CD2AP | 6 | 8.96E-02 | 8.72E-13 | 1.98E-01 |
GRN | 17 | 9.92E-01 | 4.45E-06 | 2.61E-01 |
CD55 | 1 | 1.01E-04 | 5.48E-08 | 2.06E-03 |
IL34 | 16 | 4.83E-03 | 1.23E-05 | 2.32E-01 |
LPA | 6 | 5.45E-02 | 1.48E-01 | 3.81E-02 |
HAVCR2 | 5 | 3.70E-02 | 5.79E-02 | 5.56E-02 |
PRSS53 | 16 | 8.60E-05 | 8.10E-06 | 2.64E-04 |
C1S | 12 | 6.02E-03 | 2.03E-05 | 2.66E-01 |
SIRPA | 20 | 8.22E-03 | 1.42E-05 | 9.57E-01 |
TREML2 | 6 | 3.66E-08 | 6.90E-01 | 1.59E-08 |
CR2 | 1 | 2.49E-22 | 4.64E-01 | 1.06E-04 |
TNFSF13B | 13 | 2.57E-01 | 7.93E-03 | 2.83E-01 |
LRRC37A2 | 17 | 9.95E-03 | 9.91E-01 | NA |
ACHE | 7 | 9.74E-02 | 1.35E-03 | 5.84E-01 |
HDGF | 1 | 1.04E-02 | 4.87E-04 | 2.32E-01 |
IDUA | 4 | 5.44E-06 | 1.23E-07 | 7.50E-04 |
GC | 4 | 5.65E-03 | 2.68E-04 | 4.89E-01 |
C1R | 12 | 9.43E-01 | 2.14E-05 | 1.79E-01 |
LRP11 | 6 | 1.65E-02 | 9.02E-04 | 8.07E-01 |
DHRS4L2 | 14 | 5.59E-01 | 2.77E-04 | 1.32E-01 |
ITGAL | 16 | 5.37E-02 | 1.42E-02 | 1.58E-01 |
SH3BP1 | 22 | 5.00E-01 | 7.85E-04 | 1.23E-02 |
ZBTB16 | 11 | 3.46E-02 | 1.43E-04 | 3.78E-01 |
FES | 15 | 7.14E-02 | 1.68E-02 | 5.63E-01 |
For the 30 proteins identified by AD PWAS, the result of Bayesian posterior probability of pQTL-GWAS colocalization is represented by the regional PPH4.The P values for the SMR and HEIDI tests are presented. NA indicates not enough variants for a HEIDI test.
Prospective cohort study of AD proteomics
We performed a cohort study to identify AD-associated plasma proteins. At baseline, participants with no medical records or self-reported AD diagnosis were included in the study. Subjects diagnosed with AD within the first two years of follow-up were excluded. A total number of 45,511 participants with plasma protein measurement by Olink were finally included in the study. After 13.7 years follow up on average, we observed 449 new AD cases. Using a proportional Cox regression model, we tested the association between proteins and AD risk while adjusting for age and sex. As shown in Fig. 4A and Supplementary Table 4, 204 proteins were found to be associated with AD (P < 0.05), leading by GFAP (HR = 1.959, 95%CI = 1.763–2.178, P = 9.86E-36), APOE (HR = 0.533, 95%CI = 0.484–0.587, P = 1.78E-37), and 23 proteins passed the multiple comparison correction line (FDR < 0.05). Notably, among the 18 potentially causal proteins of AD identified by PWAS and SMR/COLOC (Table 2), PILRB (HR = 1.123, 95% CI = 1.023–1.232, P = 1.49E-02) and FES (HR = 1.138, 95% CI = 1.032–1.255, P = 9.57E-03) were also evidenced by the cohort study with concordant sign of effects (Supplementary Tale 4 and 5). In addition, consistent results were observed using a Logistic regression model without taking into account the time of events.
By excluding individuals with diseases (including cancer, type 2 diabetes, stroke, depression, epilepsy, schizophrenia, and cardiovascular-related diseases), that may influence plasma protein levels at baseline and adjusting for age, sex, education, and physical activity, we re-conducted proportional Cox regression and Logistic regression models to identify the association between proteins and AD. The results of sensitivity analysis were consistent with the primary findings (Fig. 4B, Supplementary Tables 6 and 7). Similarly, GFAP (HR = 2.366, 95%CI = 1.954–2.865, P = 1.12E-18) and APOE (HR = 0.519, 95%CI = 0.441–0.611, P = 3.51E-15) showed the highest significant levels in the Cox regression model (Supplementary Table 6). The baseline abundance of PILRB in plasma was found to be positively associated with AD risk in both Cox and logistic regression (Cox: HR = 1.286, 95% CI = 1.078–1.535, P = 5.31E-03; logistic: OR = 1.291, 95%CI = 1.078–1.546, P = 5.40E-03). Detailed information can be found in Supplementary Tables 6 and 7.
Association study between protein and early traits of AD
In order to find the proteins associated with the early trait of AD, we first conducted a Cox regression analysis for each protein using mild cognitive disorder as the outcome. Results showed that 409 proteins were associated with mild cognitive disorder with nominal significance (P < 0.05, Supplementary Table 8). Among the 409 proteins, the significance of 25 protein-AD associations passed the FDR correction (FDR < 0.05). Two proteins (PILRB and GRN) with genetic-based evidence from PWAS and SMR/COLOC were also identified to be associated with mild cognitive disorder (Table 2). Taking PILRB as an example, a higher protein abundance of PILRB was found as a potential risk factor for mild cognitive disorder (HR = 1.178, P = 2.80E-02) which was concordant with the observations that (1) a higher genetically determined abundance of PILRB was found to be associated with an increased risk of AD (PWAS: β = 0.081, P < 1E-320), and (2) a positive association between PILRB and risk of AD by the cohort study (HR = 1.123, P = 1.49E-02).
In parallel, we associated the proteins with brain imaging features including the volume of the whole hippocampus and the total volume of white matter hyperintensities. Linear regression was performed on log-transformed brain imaging data. We found 233 proteins associated with the total hippocampal volume (P < 0.05, Supplementary Table 9, 10), of which two proteins (TREM2 and HDGF) were also probed by PWAS and SMR/COLOC for AD risk (Table 2). The levels of both TREM2 and HDGF were negatively associated with the hippocampus volume (TREM2: β = -0.0025, P = 4.43E-02; HDGF: β = -0.0025, P = 2.33E-02). In addition, 652 proteins were associated with the total volume of white matter hyperintensities (P < 0.05). Among these, TREM2 and GRN were positively associated with total volume of white matter hyperintensities (TREM2: β = 0.0483, P = 8.31E-04; GRN: β = 0.0269, P = 4.54E-02) and have evidence by SMR/COLOC to be potentially causal proteins for AD (Table 2).
AD-associated proteins substantiated by multiple lines of evidence
By taking PWAS as the primary analysis, we integrated evidence from multiple complementary tests including SMR/COLOC, the association study for AD, mild cognitive disorder, and brain imaging features. We highlighted five proteins underpinned by multiple lines of evidence (Fig. 5).
We underscored PILRB which showed consistence evidence that a higher genetically determined abundance of PILRB was associated with an increased risk of developing AD. SMR/COLOC analysis further prioritized the association by incorporating pQTL data. In addition, the positive association was further confirmed by the cohort studies for both AD and mild cognitive disorder which took the measured protein level as exposure at baseline.
Both genetically and non-genetically informed association test suggested FES as a risk factor for AD. The PWAS showed a positive association between genetically predicted abundance of FES and AD. The SMR/COLOC analysis excluded potential false positive findings due to inconsistent causal variant between pQTL and GWAS for AD. The cohort study for AD supported the role of FES using the UK Biobank subjects which is largely independent of the samples used in the PWAS association test.
Genetic-based PWAS revealed that a higher plasma abundance of HDGF were found to be associated with a higher risk of AD. The results from non-genetically informed cross-sectional study suggested that a higher plasma levels of HDGF were associated with a lower whole hippocampus volume, suggesting that a negative role of HDGF on memory functions. In addition, a meta-analysis conducted by Bai et al on seven AD-associated proteomic datasets from brain tissues and six AD-associated proteomic datasets from cerebrospinal fluid (CSF) revealed that the level of HDGF was higher in both brain tissues and CSF of AD patients compared to healthy controls[48]. These consistent results further supported the potentially essential role of HDGF on AD.
Protein-protein interaction
To connect the newly identified AD-associated proteins with well-established AD-associated proteins and to embed the existing knowledge graph, we performed an analysis through protein-protein interactions (PPI). We performed a PPI analysis by implementing STRING[28] to embed the 16 newly identified proteins by PWAS with 20 well-studied AD-related proteins (namely, ABCA7, ABI3, ADAM10, APBB3, APOE, APP, BIN1, CASP7, CR1, PILRA, TREM2, EPHA1, MS4A6A, PICALM, PLCG2, PSEN1, PSEN2, RIN3, SORL1, and SPI1)[21–27]. We found comprehensive interactions between TREM2, GRN, FES, and PILRB and these 20 AD pathologic proteins (Supplementary Table 11). TREM2, which interacted with 16 proteins (all are AD pathologic proteins reported in the literature) has the most interactions (Fig. 6 and Supplementary Table 11). PILRB was found to interact with ABCA7, a well-known AD-related protein involved in maintaining homeostasis of the immune system. ABCA7 dysfunction may influence the effect of microglia and increase amyloid deposition, which in turn leads to the development of AD. In brief, the PPI analysis bridged several newly identified proteins with AD-related proteins and suggested potential path through which these novel candidates may contribute to AD pathology.