Genetic and Multi-omic Risk Assessment of Alzheimer’s Disease Implicates Core Associated Biological Domains

doi:10.21203/rs.3.rs-2895726/v1

Download PDF

Research Article

Genetic and Multi-omic Risk Assessment of Alzheimer’s Disease Implicates Core Associated Biological Domains

https://doi.org/10.21203/rs.3.rs-2895726/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 31 Mar, 2024

Read the published version in Alzheimer's & Dementia: Translational Research & Clinical Interventions →

Version 1

posted

You are reading this latest preprint version

Background: Alzheimer’s disease (AD) is the predominant dementia globally, with heterogeneous presentation and penetrance of clinical symptoms, variable presence of mixed pathologies, potential disease subtypes, and numerous associated endophenotypes. However, there is no methodology to objectively rank endophenotypes for disease risk, nor to enumerate the genes associated with each endophenotype at a genome scale. Consequently, therapeutic development is challenged by the uncertainty of which endophenotypic areas, and specific subordinate targets, to prioritize for further translational research.

Methods: Here we report the development of an informatic pipeline that ranks genes for AD risk genome wide and organizes them into disease associated endophenotypes--which we call AD biological domains. The AD risk ranking draws from genetic association studies, predicted variant impact, and linkage with dementia associated phenotypes to create a genetic risk score. This is paired with a multi-omic risk score utilizing extensive sets of both transcriptomic and proteomic studies to identify systems level changes in expression associated with AD. These two elements combined constitute our target risk score (TRS) that ranks AD risk genome wide. The ranked genes are organized into endophenotypic space through the development of 19 biological domains associated with AD in the described genetics and genomics studies and accompanying literature. The biological domains are constructed from exhaustive gene ontology (GO) term compilations, allowing automated assignment of genes into objectively defined disease-associated biology. This rank and organize approach, performed genome-wide, allows the characterization of aggregations of AD risk across biological domains.

Results: The top AD-risk associated biological domains are Synapse, Immune Response, Lipid Metabolism, Mitochondrial Metabolism, Structural Stabilization, and Proteostasis, with slightly lower levels of risk enrichment present within the other 13 biological domains. Synapse and Mitochondrial Metabolism are the most down-regulated biological domains, with mitochondrial function being the most enriched, while Immune Response is the most up-regulated biological domain.

Conclusions: The TRS ranked genes which are organized into the biological domains provides an objective methodology that can be automated into workflows to localize risk within specific biological endophenotypes, and drill down into the most significantly associated sets of GO-terms and annotated genes for potential therapeutic targets.

Alzheimer's disease is a complex heterogeneous neurodegenerative disease defined by the extracellular aggregation of amyloid plaques and the intracellular accumulation neurofibrillary tangles comprised of paired helical filaments of hyperphosphorylated tau protein^1–4. However, while amyloid and tau are hallmarks of the disease, large-scale multi-omic analyses are pointing to the complexity of interwoven biological processes associated with AD pathogenesis. Over a decade ago the National Institute on Aging (NIA) and the Alzheimer’s Association (AA) put together a joint initiative to capture the complexity of AD in the form of a disease ontology, the Common Alzheimer's and Related Dementias Research Ontology (CADRO) ⁵. The goal behind CADRO’s development was to objectively articulate the biological processes and cell types involved in AD pathology and progression. CADRO is used extensively to characterize therapeutic development in AD ^6–12 and has since been used to track the shift in number and focus of emerging clinical trials. For example, in 2016 most disease modifying therapies in phase III clinical trials were amyloid targeting molecules ¹². However, by 2022 less than one-third of disease modifying therapies in phase III clinical trials are targeting amyloid, with the remainder working across a much broader biological space that includes synaptic targets, neuroprotective agents, metabolic factors, and immune modulators⁶. The diversification is even greater for therapeutic targets in phase I and II clinical trials⁶. Establishing a diverse target portfolio enhances the potential translational impact; the availability of therapeutic targets implicated in an array of disease-linked biology maximizes the potential to intervene through distinct mechanisms, which may be necessary to address the heterogenous AD population and could have the potential to work in coordination¹³. Clinical trials employ CADRO classification to identify the mechanism of action of a therapeutic under investigation. However, the alignment between the gene target of a therapeutic and its ontological classifier is performed manually based upon the judgment of domain experts and cannot be scaled genome-wide without computationally amenable definitions.

A driving force behind the diversification of the AD target portfolio is an expanding view of AD biology due, in part, to recent efforts that have amassed a wealth of disease-relevant molecular data from a variety of patient cohorts. The Accelerating Medicines Partnership for Alzheimer’s Disease (AMP-AD) consortium, for example, has generated multiple omics datasets from postmortem brain samples (including genomic, transcriptomic, proteomic, metabolomic, etc.) and made these data openly available on the AD Knowledge Portal¹⁴. These systems-level investigations into AD are a rapidly increasing information domain and each new study contributes large datasets that provide an unbiased view of disease processes across different biological layers. However, each of these datasets can suggest hundreds of genes as potential new therapeutic targets without clear priority. Genome-wide association studies (GWAS) alone have identified over 75 risk loci^15–19 and analyses of transcriptomic^20–29 and proteomic^30–35 data have identified dozens of co-expression modules that consist of hundreds to thousands of genes or proteins each. There are currently over 600 targets that have been nominated by AMP-AD researchers for further therapeutic development (agora.adknowledgeportal.org). Furthermore, these studies each implicate a diverse set of biological processes, pathways, and endophenotypes that are altered in the genesis of, and response to, the late-onset progressive neurodegeneration in AD. The difficulty in performing a unified analysis of these divergent datasets is two-fold: (1) there is no objective and unbiased manner by which to categorize genes into specific AD endophenotypes and (2) there is no integrated, genome-wide methodology to assess and assign AD associated risk.

In this paper we describe data integration across modalities to score, rank, and organize potential AD therapeutic targets at a genome-wide scale, providing the largest resource to rank and organize AD targets ever developed. First, we identified 19 biological domains that capture the preponderance of AD-associated endophenotypes and defined them using an exhaustive set of Gene Ontology (GO) terms, with the intent to keep each domain siloed in a biologically coherent fashion. The extensive sets of GO terms used per biological domain, and the multiplicity of genes annotated to any GO term, provides a classification methodology that spans much of the genome. This provides an objective and unbiased organizational strategy to identify gene targets and to assess which AD endophenotypes are especially risk enriched. Second, we developed a Target Risk Score (TRS), which quantifies dimensions of risk based on genetic association as well as signatures of differential expression in transcriptomic and proteomic data. We show that these tools can be applied to assess which specific genes within large datasets are elevated in disease risk, and to group the most risk-enriched genes within common biological domains, providing a framework for analysis that can be employed across research studies. While we observe that AD risk distributes across all 19 biological domains, we find that the biological domains demonstrating the greatest AD risk association are Synapse, Immune Response, Lipid Metabolisms, Mitochondrial Metabolism, Structural Stabilization, and Proteostasis. Each domain can be examined in more detail by elaborating specific elements of a biological process that are particularly enriched in AD risk—for example, we identify electron transport chain complex I related factors within Mitochondrial Metabolism as one such focal point. The system described here represents the most comprehensive to date, providing genomic coverage of risk mapped onto known AD endophenotypes - spanning 27 genetic association studies, transcriptomic signatures from 1,699 brains, proteomic signatures from 1,188 brains, as well as 7,127 Gene Ontology terms structured within the 19 biological domain classifications. These tools are openly available to the research community as a part of the Target Enablement to Accelerate Therapy Development in AD (TREAT-AD) efforts to facilitate the continued diversification of the AD drug development pipeline.

Alzheimer’s Disease Biological Domains & Enrichment Analysis

Methodological Overview. The development of the biological domains broadly encompasses two distinct processes: the selection and the definition of each biological domain. The selection of the biological domains is guided by the attempt to exhaustively identify the endophenotypes and biological areas linked to AD pathogenesis. As AD is a heterogeneous neurodegenerative pathology with multiple interacting biological events either stemming from, or contributing to, the central disease sequelae, there have been repeated efforts to exhaustively categorize the subpathologies and endophenotypes in AD. One of the most developed resources is the CADRO developed by the National Institute on Aging in association with the Alzheimer’s Association⁵. As CADRO is already in standard use for drug development classification, we leveraged this resource to help guide the initial stages of identification of relevant biological domains of AD. However, as the focus of the biological domains is upon the identification of subprocesses and pathologies in AD that may cut across cell-types, the inclusion of CADRO terms involved a rearrangement of the structure to facilitate cell-type autonomy of disease processes. For example, autophagy is made an independent biological domain as it does not occur exclusively within immune cells. The identification of biological domains was expanded beyond CADRO to be maximally inclusive of data derived from large scale consortia studies in different areas of disease relevance. The expansion goal is two-fold: first, to be as comprehensive as possible across AD research; and secondly, to align with our own scoring criteria. Consequently, we focused upon the categorization of processes implicated by GWAS (genome wide association studies) studies and large scale multi-omic investigations. We also categorized the AD hypothesis literature to ensure we were not missing any key concepts or fields of study. While this scope lends to an unbounded examination of disease linked biological traits, we attempted to be broad enough in the biological domain definitions to capture large areas of related disease process and to constrain the studies leveraged to those primary publications within each area of the field. The process is detailed below.

Genetic Considerations in Selecting Biological Domains. In genetics, we focused on key genome wide GWAS, the newer genome wide association study by proxy (GWAX) ^15-19,36-42 using the parental disease status, and whole exome sequencing studies^43-57 that have transpired over the last decade. The identification of potential genetic risk associated with individual genes is represented in the genetics score (detailed below), and the goal here is not to recapitulate the scoring methodology, but to assess the potential biological contexts of the imputed genes’ biological function. The characterization of gene function was completed by examining its functional classification within UniProt and Entrez gene, the linked biological process gene ontology terms, and the description within the primary literature. We examined both those genes that were validated through expression quantitative trait loci (eQTL) or protein quantitative trait loci (pQTL)^58,59, as well as the lead statistically associated gene, and the gene set analysis results obtained via multi-marker analysis of genomic annotations (MAGMA) analysis⁶⁰ performed in most of the above-mentioned studies. While there are a multitude of caveats to the interpretation of the genetically identified loci, recently reviewed by Goate et al⁶¹, our goal was to capture potential biological relevance of the genetic observations, providing a biological space for future gene validation results. We acknowledge this approach increases sensitivity of the biologically associated areas at the potential cost of decreases in specificity, but as the goal is to create a broad hypothesis space, we deemed this trade off acceptable at the present time. The genetic investigation recapitulated many of the biological domains identified within CADRO (Supp Table 1) and reclassified within our structure as Immune Response, Endolysosomal Trafficking, APP Metabolism, Tau Homeostasis, Lipid Metabolism, Synapse, and Epigenetics (recently reviewed^61-63).

Genomic Consideration in Selecting the Biological Domains. The genomic branch of our investigation involved the interrogation of the primary large postmortem transcriptomic or proteomic studies performed in the last five years. There are a multitude of brain proteomic studies associated with TREAT-AD or AMP-AD investigations that have identified functional modules of co-expressed genes that associate with various parameters of neurodegenerative pathology^{21,30,32,33,35,64-72}. These modules generally center upon specific biological functions, and hence are amenable to biological domain subordination. One novel biological domain that we created to describe a set of genes involved in binding extracellular matrix, forming cell-cell junctions, or translating interactions from the extracellular milieu to the intracellular cytoskeleton. We named this biological domain Structural Stabilization as these proteins play a role in intracellular and intercellular structure critical to cellular processes, where shape, or the modification of cellular shape, may be essential for biological function. The leading modules in many studies were in alignment with core features called out within the CADRO ontology, which we took as dual validation of the biological significance of these domains: Synapse, Mitochondrial Metabolism, Oxidative Stress, Proteostasis, Immune Response, Endolysosome, APP Metabolism, Vasculature, Lipid Metabolism and Tau Homeostasis. The nomenclature we use varies from that provided by CADRO, but the conceptual overlap is high, with 14 of the 19 biological domains having some instantiation within CADRO (Supp Table 1). Several genomics studies implicated RNA splicing factors as potentially involved in disease state, promoting the development of the RNA Spliceosome biological domain^33,64,73,74.

Literature Consideration in Selecting the Biological Domains. We interrogated the literature for other biological domains through a focused search of Alzheimer’s related hypothesis papers, specifically querying for the co-occurrence of “Alzheimer” and “hypothesis” within the title. The search produced 463 articles, 51 of which were excluded for topics not pertaining to molecular causes of disease, such as lifestyle factors that may protect against disease risk, imaging observations, or isolated speculation about systems level therapeutics. Of the remaining 412 papers (Supplementary Material), the largest groups were the amyloid hypothesis (n=151), which maps onto the APP Metabolism biological domain, and synaptic (n=90) hypotheses, mapping onto the Synapse biological domain, followed by Immune Response (n=32) related disease mechanisms (Supp Fig 7). Nested within the synaptic hypotheses articles were topics covering the cholinergic hypothesis, as well as other hypotheses focused upon other specific neurotransmitter systems, such as glutamate, serotonin and dopamine. All of the identified biological domains were supported by at least 1 hypothesis paper, however, several hypotheses were identified for which we did not previously have coverage. These included Cell cycle (n=12), DNA Repair (n=6), and Metal Binding and Homeostasis (n=28) – the latter of which consisted of a set of hypotheses focused predominantly upon iron and copper. Following the identification of significant GO term enrichment within each (see Figures 2-4), we elected to retain these biological domains. The literature identified numerous hypotheses we elected not to include at this time. Those include: the calcium cascade hypotheses (n=15), diabetes hypotheses (n=11) and microbial infection hypotheses (n=26). The calcium cascade hypothesis was not included as it significantly overlaps with the Apoptosis, Mitochondrial Metabolism and Proteostasis biological domains. The diabetes hypothesis was not included as glucose metabolic processes downstream of transport are already modeled within Mitochondrial Metabolism, and we are attempting to keep the biological domains discrete and as siloed as possible. The microbial infection hypothesis was not developed as a biological domain, as the main endogenous genetic signal for this domain is already resident in the host response related terms within the Immune Response biological domain, and we are not examining exogenous genomic information within our analysis framework.

Conclusions on the Selection of the Biological Domains. The 19 biological domains identified in this work through the examination of CADRO and the areas discussed above elucidate a set of endophenotypes that have been already noted in the field–as each domain has at least one supporting hypothesis paper associated with it (Supp Fig 7). Each of these biological domains as implemented, the details of which are discussed below, demonstrate strong AD risk signal via our scoring methodology in either genetics, genomics or both. We decided to conclude the development of biological domains with these core domains as analysis of the GSEA enriched GO terms (methodology below) that did not participate in the characterized biological domains appeared to be uncategorizable into disease endophenotypic space for one of several reasons: (1) the term was a high-level but non-specific term (such as ATP Metabolic Process), (2) it involved cellular localization information that also did not relate to specific disease related processes (such as Cell Leading Edge), or (3) related to molecular process information that was either generic or not categorizable within a specific biological domain (such as tyrosine kinase). Consequently, we deemed the current build sufficient for our initial release. We will continue to look for additional disease linked processes that intersect with the current biological endophenotypes, as the majority of ranked genes belong to multiple domains.

Implementation of the Biological Domains. The implementation of the biological domains for drug target identification studies and future biological investigations of AD pathogenic mechanisms required a strategy that was (i) objective, (ii) automatable, (iii) easily intelligible, and (iv) communally modifiable. Based on these criteria, we elected to use an exhaustive elaboration of gene ontology (GO) terms associated with each biological domain as the core definition. A five-part development cycle was followed for the instantiation of each domain (Supp Fig 8). The first step was to identify the starting query terms to employ in searching the ontology. For example, for Immune Response, “innate immune” was one of the high-level query terms invoked. The terms linked to key points within the ontology that allowed a local search of parent and child terms—those falling within the conceptual biological space of the biological domain were collected and aggregated as step 2. The third step involved the use of these terms within the EBI GO infrastructure to expand the set of linked terms to identify GO annotation terms that may have been missed from the query-based ontology evaluation. These terms were incorporated into the GO term definition space. The fourth step involved the manual examination of each term within the biological domain definition architecture to ensure that query expansion had not identified spurious terms, or child terms that were the junction of two parents from inside and outside the biological domain space that no longer were consistent with the biological domain. This took extensive and iterative rounds of review. Once the domains were adequately defined by the GO term collection, the fifth and final step was performed, and gene level annotation was populated from BioMart for each GO term within each biological domain. In this manner we created a fluid system for traversing between high level biological concepts and individual constituent genes. For each of the GO terms enumerated within the biological domains, the genes annotated to that term were retrieved from Ensembl BioMart using the biomaRt R package^75,76.

Target Risk Score (TRS) Development and Process

The goal behind the generation of the TREAT-AD Target Risk Score (TRS) is to develop a scoring infrastructure to objectively, and with minimum bias, assess AD risk association genome wide, leveraging and integrating all available data types. The contributing data types may evolve over time. Here we have initiated the process drawing from genetics, transcriptomics, and proteomics. The composite scoring method delineated below enables us to rank all genes for linkage with AD.

Genetic Risk Score Component

The genetic component of AD risk, or genetics score, queries genetic evidence attributable to all loci identified by Ensembl (GRCh38, version 104) as “gene”, “pseudogene” or “ncRNA_gene” resulting in a total 60,664 loci. The score is based on evidence retrieved from both genome wide association studies (GWAS) as well as GWAS by proxy (GWAX) studies^{15,16,36,77-89} and quantitative trait locus (QTL) studies^58,59(see Supp Table 3). For consistency across the different GWAS studies used, nominally significant variants (unadjusted p value < 0.05) from anywhere within a 200 kb window surrounding a given target gene’s coordinates are assigned to the gene. For QTL studies, significant variants (FDR < 0.05) must affect the expression of the identified target to be assigned. For each study type (such as GWAS or QTL), the score incorporates both the number of studies with significant genetic variants assigned to the target, the minimum significance values of identified variants assigned to the target across studies of that type, and the mean rank of the minimum significance values of identified variants across all studies of that type.

The score also includes further functional characterization of identified variants, both coding and noncoding. The severity of coding and splice-site variants was assessed with ANNOVAR⁹⁰ (version 2020-06-07) using the dbnsfp35a⁹¹ and dbscsnv11⁹² databases. For each variant, the average rank score predictions of deleterious coding and splice site variants were calculated, and the maximum rank score across all variants assigned to a target is reported. In addition, the number and fraction of deleterious coding variants per gene are included, where a variant is called deleterious when at least 3 predictors classified the variant as deleterious. Finally, the coding variant summary score also includes the propensity for each gene to accommodate deleterious coding variants (based on the gnomAD LOEUF score)⁹³. For noncoding variants, only those identified in one of the queried QTL studies are considered. The severity of noncoding variants was assessed using both the RegulomeDB (regulomedb.org, accessed Dec 2021)⁹⁴ probability score as well as the DeepSEA (hb.flatironinstitute.org/deepsea, Beluga DNA sequence model)⁹⁵ mean -log e-value (MLE) variant score.

The score also incorporates phenotypic evidence supporting a given target from both Human and animal model sources. Phenotypes for human genes and orthologs were accessed via the Monarch Initiative API^96,97. Human phenotypes for each target were extracted from the Human Phenotype Ontology (hpo.jax.org)⁹⁸ and the number of phenotypes in common with AD (MONDO:0004975) or dementia (MONDO:0001627) are normalized against all phenotypic abnormalities annotated to a gene. Phenotypes of orthologs to human genes extracted from the Unified Phenotype Ontology (uPheno) (www.ebi.ac.uk/ols/ontologies/upheno, accessed June 2022) and the number of phenotypes in common with AD and dementia are normalized against all phenotypic abnormalities annotated to all orthologs of a gene. Finally, the score also includes whether a target has a model in development through the Model Organism Development and Evaluation for Late-onset Alzheimer’s Disease (MODEL-AD) consortium (model-ad.org/strain-table, accessed June 2022) as well as the maximum correlation between mouse model gene expression and AMP-AD transcriptional module gene expression⁹⁹.

The Genetic Risk Score for a target (Supp Table 4) is then calculated as the sum of the inverse rank for each of the following evidence categories, scaled to a total of 3 points: number of GWAS studies, minimum GWAS p-value across studies, mean rank of the minimum GWAS p-value across studies, number of QTL studies, minimum QTL false discovery rate (FDR) across studies, mean rank of the minimum QTL FDR across studies, coding variant summary, noncoding variant summary, human phenotype score, model organism phenotype score, and MODEL-AD strain and correlation.

Multi-omic Risk Score Component

Transcriptomic Weight. A ratio of means meta-analysis with a random effects model^100,101 was applied to transcriptomics data from RNA-Seq profiling from 8 neocortical tissues to identify differentially expressed features between cases and controls (see Supp Table 6). Feature rank was determined by ordering the absolute fold change between cases and controls and rank was converted to a decimal statistic between zero and one. A logistic regression model was implemented to predict a feature rank from absolute log fold change. This predicted 0 to 1 weight for each feature served as the gene feature’s input transcriptomic weight value to determine a gene feature’s genomic harness weight value and corresponding TREAT-AD genomic score value.

Proteomic Weight. A ratio of means meta-analysis with a random effects model^100,101 was applied to proteomics data from both label-free quantitation (LFQ) and Tandem Mass Tagging (TMT) shot-gun profiling methods generated from 8 neocortical tissues to identify differentially expressed features between cases and controls (see Supp Table 7). Feature rank was determined by ordering the absolute fold change between cases and controls and rank was converted to a decimal statistic between zero and one. A logistic regression model was implemented to predict a feature rank from absolute log fold change. This predicted 0 to 1 weight for each feature served as the gene feature’s input proteomic weight value to determine a gene feature’s genomic harness weight value and corresponding TREAT-AD genomic score value.

Multi-omic Harness. To harness both sets of weights into a single value between zero and two to constitute the genomics portion of a gene feature’s contribution to the overall target risk score, a weighted adjustment by weight-modality was applied. Beyond collapsing the weight values into a single statistic, the omics harness is designed to weight proteomics more heavily than transcriptomics to account for the practicality of therapeutic intervention strategies at the protein level rather than the transcript level. Proteomic and Transcriptomic weights were combined by ENSG gene identifiers. In the instance of multiple proteomic identifiers mapping to a single ENSG, the greatest weight value was selected from isoforms which were significant (FDR < 0.05). In the case of multiple isoforms mapping to an ENSG identifier and none were significantly associated with disease, a random isoform’s weight was selected. A scoring harness was applied to combine the Transcriptomic and Proteomic weights of each ENSG gene identifier (Supp Table 8). Genes were ranked by their harness value and ranks were converted to 0-1 decimal. As in the transcriptomic and proteomic weights, a binomial model was fitted to predict 0-1 adjusted rank from the second-degree polynomial of the log harness values. This model was used to compute a predicted genomics weight from zero to one where one corresponds to a greater harness value. For genes with no statistically significant RNA or protein values this omics weight was set to zero. The omics weight was then multiplied by 2 to attain the points value the Multi-omics Risk Score Component contributed to a gene’s overall score.

GSEA Analysis using the Biological Domains

To assess the relative enrichment of different biological domains and their constituent GO terms, gene set enrichment analysis (GSEA) was performed using the gseGO function from the clusterProfiler R package¹⁰² (version 4.1.4) and the results were then categorized into biological domains based on the GO ID of enriched terms. The input for each enrichment analysis were non-zero target scores, in descending order. We performed this analysis separately for each component score: genetics, multi-omics, and combined target risk. The displayed results include the normalized enrichment score (NES) as well as the Benjamini-Hochberg corrected p-value (p adj) for GO terms annotated to each biological domain. For co-expression module enrichment analyses, the identities of genes and proteins in each module were used for GO term enrichment analysis using the enrichGO function from the clusterProfiler package and the results were then categorized into biological domains based on the GO ID of enriched terms.

Other data

Several other data sets are used in this work. The list of AD GWAS hits is derived from integrating genes identified in three sources: the supplementary table that accompanies Neuner et al 2020¹⁰³, Supplementary Table 5 from Bellenguez et al 2022 that identifies all genome-wide significant loci¹⁵ and the list of AD loci with genetic evidence compiled by the ADSP Gene Verification Committee (adsp.niagads.org/index.php/gvc- top-hits-list/, accessed July 2022). The Open Targets^104-107 disease association scores for AD (https://platform.opentargets.org/ disease/MONDO_0004975/associations), including data type scores, were accessed using the Open Targets API (accessed October 2022). The identities of currently nominated targets from the AMP-AD consortium, listed on the Agora site (agora.adknowledgeportal.org/genes/, https://www.synapse.org/#!Synapse:syn12540368) were accessed using the synapseR R client¹⁰⁸.

Alzheimer’s Disease Biological Domains

The primary goal of defining a structured set of biological domains is to standardize areas of disease-associated biology that can serve as a common reference point for the analysis of large data sets. The CADRO ontology, discussed in the methods section, provided a useful starting place for the development of the biological domains, having been designed for a related purpose, and regularly employed to categorize clinical investigations^6,109. Leveraging CADRO as a starting place and augmenting this ontology based on literature curation, we established a set of nineteen biological domains that cover the AD endophenotypic space to objectify the processes involved in AD pathogenesis (Supp Table 1). In total we use 7,127 unique GO terms (16.4% of all terms in the GO) to annotate the biological domains, and the number of GO terms annotated to each biological domain varies enormously (Fig. 1A, Supp Table 2). The ‘Synapse’ domain requires the largest number of GO terms to define, with 1,379 constitutive GO terms, while ‘Tau Homeostasis’ is the smallest with only 10 GO terms. ‘APP Metabolism’ is the second smallest biological domain; the two smallest biological domains focus on gene-centric processes, requiring fewer terms to annotate than larger domains with broader biological focus, such as ‘Lipid Metabolism’ or ‘Proteostasis’. Each biological domain was designed to be discrete from the others. This is reflected in the sparse overlap of shared GO terms between domains (Fig. 1A). The few GO terms that are present in more than one domain are truly inextricable (e.g. the term “mitophagy” defined as “The selective autophagy process in which a mitochondrion is degraded by macroautophagy'' legitimately resides in both the Mitochondrial Metabolism and Autophagy domains), in which case the repetition was allowed, as it represents a meaningful intersection of biological areas.

The genes annotated to the biological domains reveal a different pattern, with high levels of promiscuity between the domains (Fig. 1B). The number of genes annotated to each domain is roughly proportional to the number of GO terms per domain, with almost two orders of magnitude separating the largest domain ‘Proteostasis’ and the smallest ‘Tau Homeostasis’. Many genes are annotated to multiple GO terms subordinate to multiple biological domains. This may represent a convergence of biologically related processes, or the pleiotropy associated with a given gene’s function. While a plurality of annotated genes (30%) is only annotated to a single biological domain, many participate in multiple domains each (Fig. 1C), and genes annotated to more biological domains tend to have higher overall Target Risk Scores (see below) (Pearson r = 0.158, p = 2.8x10^− 80).

Target Risk Score Overview

The TREAT-AD Target Risk Score (TRS) is a metric designed to rank potential disease involvement of specific genes based on multiple independent lines of evidence to objectify the prioritization of potential targets and disease-linked biology. The TRS has two component elements: genetic risk and multi-omic risk, each derived from a meta-analysis harmonizing multiple datasets. In the sections below, we discuss the results for each component, with genetic risk weighted more heavily, receiving up to 3 points, while multi-omic risk has a maximum of 2 points. The rationale for providing more weight to the genetics score component reflects the greater success in clinical trials for targets with genetic support^110–112.

Genetic Risk Score Component

The genetic risk score component is a summary of genetic evidence supporting the target gene's association with late-onset Alzheimer's disease from multiple genetic studies. The score is based on genetic evidence retrieved from both genome wide association (GWAS) and GWAS by proxy (GWAX) along with QTL studies. In total, 27 different Alzheimer’s association data sources were queried (Supp Table 3). These datasets are not fully independent, as the patient cohorts used partially overlap between studies, and as such many genes are scored with associations across multiple studies (Supp Fig. 1A). Given the differences between the populations sampled and methodologies applied, we chose a windowed approach to assign variants to genes rather than accounting for the various linkage disequilibrium blocks in the source data. Furthermore, as the goal is to score as many targets as possible for genetic association with AD, we selected a permissive significance threshold for variant inclusion in order to stratify targets that are nominally significant in association with AD but do not meet a more standard criteria for genome-wide significance (e.g. p < 5x10^− 8). Had we restricted our analyses to genes that meet this threshold, only 1,706 of the 60,664 assessed genes (2.81%) would be included (Supp Fig. 1B). The vast majority of the genome would be omitted from consideration, using this standard, including many genes that have notable association with AD, but have not yet reached genome-wide significance for various reasons, such as a lack of statistical power and/or the limited diversity in cohort ancestry. Performing gene set enrichment analyses (GSEA) using the combined GWAS score to rank genes (Supp Fig. 1C) enriches GO terms from the Immune Response, Synapse, and Lipid Metabolism biological domains (Supp Fig. 1D).

We analyzed genes with identified eQTL or pQTL from studies where genetic variation is associated with expression differences in postmortem AD brains. In contrast with the GWA studies, only 492 genes are identified with a significant QTL in all three study populations used (Supp Fig. 1E-G), which is consistent with previous reports of limited overlap between brain eQTL and pQTL⁵⁸. Accordingly, GSEA using the integrated QTL signal across the studies to rank genes enriches GO terms from only 5 biological domains - namely Cell Cycle, Proteostasis, Oxidative Stress, Mitochondrial Metabolism, and Immune Response (Supp Fig. 1H).

In addition to accounting for the association of a gene with AD traits, we also sought to assess the predicted severity of identified variants. We collected 7,198,607 unique variants with at least a nominal association to AD or related phenotypes, including 50,833 variants predicted to affect the coding sequence of a target gene (Supp Fig. 2A) and 1,365,759 variants associated with altered gene or protein expression (eQTL or pQTL) in postmortem AD brain (Supp Fig. 2D). Variant severity is scored differently for coding and noncoding variants; while the preponderance of identified variants are noncoding (99.29%), the prediction of functional effects of variation is more straightforward for coding variants. Coding variant severity was assessed for 17,892 distinct genes, 13,088 of which (73.2%) are identified with at least one deleterious coding variant. In general, genes with larger CDS lengths tend to have a higher number of deleterious coding variants (Supp Fig. 2B). In addition, there are 939 genes that are among the least tolerant of loss-of-function variation (bottom 10% gnomAD LOEUF) where we identify at least one deleterious coding variant associated with AD. The coding variant summary is enriched for genes in Structural Stabilization, Proteostasis, Immune Response, Endolysosome, Synapse, Epigenetic, and Mitochondrial Metabolism biological domains (Supp Fig. 2C). Noncoding variant severity was assessed for variants identified through QTL studies (namely, those associated with altered expression of the target gene or protein) for a total of 15,812 distinct genes. The noncoding variant summary score integrates information from the RegulomeDB probability score and DeepSEA mean -log e-value (Supp Fig. 2E). The noncoding variant summary is enriched for terms from the Immune Response and Proteostasis biological domains (Supp Fig. 2F). For the 7,394 genes assessed for both coding and noncoding variant severity, there is a very weak positive correlation between the coding and noncoding variant summary scores (Pearson r = 0.05, p = 1.19x10^− 5, Supp Fig. 2G).

The score incorporates phenotypic evidence supporting a given target’s potential to impact AD relevant phenotypes from both human and model organism sources. The human phenotype score leverages the Human Phenotype Ontology to query the phenotypic abnormalities annotated to each gene. 4,131 genes have at least one phenotype in common with AD (MONDO:0004975) or dementia (MONDO:0001627) (Supp Fig. 3A), and these are enriched for terms annotated to 15 biological domains, but predominantly to the Synapse, Mitochondrial Metabolism, Lipid Metabolism, and Proteostasis biological domains (Supp Fig. 3B). For relevant ortholog phenotypes we queried the uPheno ontology⁹⁷ and scored 8,901 genes with orthologs that have at least one phenotype in common with AD or dementia (Supp Fig. 3C). Genes with a strong ortholog phenotype score are enriched in GO terms from 17 biological domains, with Synapse, Immune Response, and Lipid Metabolism being the three with the largest number of enriched GO terms (Supp Fig. 3D). For the 3,063 genes with relevant phenotypes detected in both human and model organism resources there is a relatively weak positive correlation in phenotype scores (pearson r = 0.20, p = 6.7x10^− 30, Supp Fig. 3E). We provide an additional weight to genes that have been positively identified within MODEL-AD studies to be associated with LOAD¹¹³. MODEL-AD is an NIA funded effort to develop new animal models that better represent features of human AD pathophysiology based on signatures from AD GWAS. We include in our score whether genes have a model in development by MODEL-AD as well as the maximum correlation between human transcriptomic module gene expression from AMP-AD and mouse model gene expression, where available.

The genetic risk score was calculated for a total of 60,664 genes. After tallying all genetic evidence sources, we found support for 52,375 genes, though the majority have only weak or sparse evidence connecting the locus to Alzheimer’s risk. To limit spurious associations, the list was reduced to only the top 25 percent of ranked genes (15,166 genes) or those scored in the multi-omics analysis (see below). This resulted in a final list of 24,278 scored genes (Supp Table 4). The genes re-introduced to genetic scoring based on their inclusion in the multi-omic analyses are represented as the lower mode of the distribution (Fig. 2A, between 0–1). Genes contained within known AD GWAS loci are enriched among the top scores (Fig. 2A). Genes identified as high in AD risk by Open Targets (https://www.opentargets.org/)/)^{105–107,114}, a large-scale effort to rank genes based on genetic support for translational relevance, were also among the top scoring genes (Fig. 2A). Comparing the TREAT-AD genetics score with the Open Targets genetic association score for AD (Fig. 2B) reveals a very weak positive correlation (Pearson r = 0.185, p = 4.4x10^− 4) which is stronger when only considering known AD GWAS genes (Pearson r = 0.321, p = 6.7x10^− 3). In general, most genes receive a relatively higher score from the TREAT-AD genetics score. GSEA using the genetics score enriches GO terms from 17 of 19 biological domains; the biological domains with the largest number of enriched GO terms are Synapse, Lipid Metabolism, and Structural Stabilization (Fig. 2C-D, Supp Table 5). The Open Targets genetic association score enriches GO terms from 10 biological domains, with terms in the APP Metabolism domain by far the most significantly enriched and Synapse, Immune Response, and Lipid Metabolism being the domains with most terms enriched (Supp Fig. 4C). The relative emphasis of APP Metabolism in the Open Targets score likely reflects the inclusion of evidence from early onset autosomal dominant forms of AD in which causal variants are clustered in the Presenilin genes (PSEN1, PSEN2) involved in proteolytic processing of APP, whereas the TREAT-AD genetics score draws primarily from genetic associations of late onset sporadic forms of the disease. Notably Synapse, Lipid Metabolism, Structural Stabilization, and Immune Response are among the biological domains with the largest number of enriched GO terms for both scores.

Multi-omic Risk Score Component

The multi-omic risk score is a summary metric encapsulating available evidence supporting that target gene expression is altered in the brains of AD patients. The score makes use of proteomic and transcriptomic datasets generated as part of the consortium efforts of AMP-AD (Supp Tables 6 and 7). For each data modality (i.e. transcriptomic or proteomic), a meta-analysis of samples is used to generate weights for significantly differentially expressed genes based on the observed fold changes (Fig. 3A & 3B). Using ratio of means meta-analysis with a random effects model for each data modality, we analyzed the directionality of expression change due to AD using GSEA (Fig. 3C & 3D). This analysis demonstrates that Synapse and Mitochondrial Metabolism are the biological domains with the largest number of down-regulated GO terms and Immune Response and Structural Stabilization are the biological domains with the largest number of up-regulated GO terms – across both transcriptomics (Fig. 3C) and proteomics (Fig. 3D).

The calculated weights for each modality are combined using a scoring harness (Supp Table 8) that yields a higher score for targets with (A) evidence of differential expression at both the protein level and the RNA level, followed by (B) those targets that are only significantly differentially expressed at the protein level, followed lastly by (C) those that are only significantly differentially expressed at the RNA level (Fig. 3E). The rationale for this harness is two-fold: first, concordant evidence from multiple data modalities leads to higher confidence that a target gene’s expression is altered in AD brains, and second, that protein levels more accurately relate to the state of the biology within an in vivo system^35,115,116. Importantly the effect of the harness is tuned to prioritize this imposed hierarchy without disregarding results specific to a single profiling modality. This reflects a balance between higher relevance of proteomic evidence to disease state versus the increased sensitivity of detection for transcriptomics.

We compare the distribution of all scored targets (Supp Table 9) to those scored by the Open Targets Platform^{105–107, 114} and those nominated for follow-up by the AMP-AD consortium^59,117. The 605 targets nominated by AMP-AD investigators tend to have higher multi-omics scores than the population of all targets (Fig. 3F), and there is a very weak correlation between the multi-omics score and the number of nominations received by a given target (r = 0.153; Supp Fig. 5E). Numerous targets receive only one nomination yet are ranked among the highest multi-omics scores. Conversely, there are many targets that received several nominations with a low multi-omics score. This likely reflects the fact that AMP-AD investigators use diverse methods to identify targets, beyond differential expression analysis, and that some modalities (e.g. metabolomics) are not included within the multi-omic score. Future efforts will include work to integrate additional data modalities into the score. Using the multi-omics score to rank genes to perform GSEA results in the enrichment of GO terms from 18 biological domains, with DNA Repair the only domain with no terms enriched. The biological domains with the largest number of enriched terms include Synapse, Immune Response, Structural Stabilization, and Mitochondrial Metabolism (Fig. 3G-H, Supp Table 10). Terms from the Mitochondrial Metabolism domain are both among the most significantly enriched (Fig. 3H) and have the highest normalized enrichment score (NES) (Fig. 3G). Interestingly, the top terms within Mitochondrial Metabolism focus upon mitochondrial translation and complex I of the electron transport chain (Fig. 3G), showing these terms are the most significantly down-regulated biological processes associated with LOAD.

Composite Target Risk Score

The TRS is a metric derived by summing the component risk scores (i.e. genetic and multi-omic scores). The maximum observed score for any target is 4.74 out of a total of 5 (Supp Table 11). As with the component scores, the top TRS scores are enriched for GWAS loci, AMP-AD nominated targets, and targets considered by the Open Targets platform (Fig. 4A). Considering both the genetic and multi-omic scores for each target (Fig. 4B), the targets with the top TRS tend to have relatively higher genetic scores. When we compare the TREAT-AD overall TRS with the Open Targets target score (Supp Fig. 4A) we see that many targets receive a relatively higher score from the TRS, likely due to the unique inclusion of disease-relevant transcriptomic and proteomic datasets from the AMP-AD consortium in the TRS. AD GWAS loci are generally scored highly by both the TREAT-AD TRS and the OpenTargets target score metrics. GSEA using the overall TRS enriches 3,142 GO terms, including 1,358 (43.5%) annotated to at least one of each of the 19 biological domains (Supp Table 12). The biological domains with the largest number of enriched GO terms are Synapse, Immune Response, and Lipid Metabolism (Fig. 4D). In comparison, the Open Targets target score enriches terms from 16 of 19 biological domains, and the domains with the largest number of enriched terms are also Synapse, Lipid Metabolism, and Immune Response (Supp Fig. 4B).

Considering all GO terms within any of the biological domains that are significantly enriched with the TRS, some have more significant enrichment from GSEA using the genetics score versus the omics score to rank genes (Fig. 5A, Supp Tables 5, 10, and 12). For example, from the Lipid Metabolism biological domain the “fatty acid metabolic process” term is enriched using both scores, but for the genetics score based tests the adjusted p-value for the term is 5.9x10^− 6 while for the omics score-based test the adjusted p-value is 7.9x10^− 3. The inverse is true for the “membrane raft” term which is more significant using the multi-omics scores (adjusted p = 1.3x10^− 5) compared to the genetics scores (adjusted p = 1.8x10^− 2). A similar dichotomy is observed for terms associated with developmental processes from the Immune Response domain, especially “myeloid leukocyte differentiation”, having relatively more significant enrichment from the genetics score whereas corresponding terms for mature structures and associated processes (e.g. “leukocyte degranulation”) have a relatively more significant enrichment from the multi-omics score (Fig. 5B). Other biological domain resident dichotomies are more consistent across risk modalities. For example, considering all presynaptic and postsynaptic terms from the Synapse biological domain, there are more postsynaptic terms enriched (40 terms) relative to presynaptic terms (9 terms) and the postsynaptic terms are enriched with more significance across the TRS as well as the genetic and multi-omic score components (Fig. 5C). The optimal targets reflect a parity of ranking between the two contributing modalities, with accumulation of both genetic and multi-omic risk. For example, enrichment of genetic risk without multi-omic risk can indicate risk in developmental processes that are not present in later stages or disease or postmortem tissue, whereas enrichment of multi-omic risk without genetic risk can indicate processes that are changing as a response to disease pathology but are not causal to underlying disease etiology. Moreover, processes that are risk enriched across two distinct measures increases our confidence of disease association. For example, while the multi-omics score enriches terms related to mitochondrial electron transport chain complexes I and III, the genetics score only enriches terms related to complex I (Fig. 5D), yet both measures point to the centrality of electron transport chain related events in our scored AD risk. GO terms close to the diagonal dashed line (Fig. 5A), as seen in Structural Stabilization with the term “cell-substrate adhesion”, suggest that the risk associated with genes in these terms are embedded equivalently in each data modality and therefore are worth consideration for further resource development. The GO terms that are significantly enriched but do not associate with any biological domain (i.e. Figure 5A, ‘none’) seem to reflect either biological process categorization that is too general to be mapped into endophenotypic space, such as “ATP Metabolic Process”, or terms that map to high order positions, or do not imply any specific associated biological process, in the molecular function or cell component ontologies within GO, such as “Cell Leading Edge” (Fig. 5A) also rendering them uncategorizable in endophenotypic space.

Example Use Case: AMP-AD Co-expression Modules Risk Scores and Biological Domains

To demonstrate how this information can be useful more broadly, we apply these approaches to published co-expression modules for transcripts^20,22and proteins³⁵ produced by AMP-AD consortia members. For each module we calculated the median TRS and genetics and multi-omics component scores (Supp Table 13), performed GO term enrichment analyses using the identities of the genes in each module, and categorized enriched GO terms into biological domains. The STGblue module has both the highest median TRS and the highest median multi-omic scores of the 30 transcript co-expression modules from Wan et al, while the DLPFCblue module has the highest median genetics score (Fig. 6A). Among the 45 protein co-expression modules from Johnson et al, the M11 Cell-ECM Interaction module has the highest median TRS and the highest median multi-omics score while M27 Extracellular Matrix has the highest median genetics score (Fig. 6B). The GO term enrichments for the three proteomics modules with the top median TRS are primarily from the Structural Stabilization (Fig. 6C & 6D) and Mitochondrial Metabolism (Fig. 6E) biological domains. Comparing the GO term enrichments for the modules from each study with the highest median genetics scores and highest median multi-omics scores, we note several common enrichments across modules from all three studies. The GO terms enriched for proteins in modules with the highest median multi-omics (M11, Supp Fig. 6B) and highest median genetics scores (M27, Supp Fig. 6C) are predominantly from the Structural Stabilization biological domain. This is also similar for the genes in the transcriptomic sub-modules from Milind et al, where the enriched GO terms from the sub-module with the highest median multi-omics score (IFGturquoise_2, Supp Fig. 6F) are primarily from the Structural Stabilization biological domain. The transcriptomic sub-module with the highest median genetics score (PHGturquoise_2, Supp Fig. 6E) is enriched for many terms from the Immune Response biological domain. The biological domains enriched for each of the top-scoring transcriptomic modules from Wan et al are very similar - primarily Immune Response, Structural Stabilization, Vasculature, and Lipid Metabolism, in varying orders (Supp Fig. 6H and I). In Wan et al these modules were associated with microglial, pericyte and endothelial cell types relevant to these biological domains. The large number of biological domains enriched is likely because these modules are large (between 504 and 4,673 genes), compared with the proteomic modules (between 28 and 2,231 proteins) and the transcriptomic sub-modules (between 77 and 2,090 genes), so more terms are enriched for each module and the overall pattern of enrichment is less specific. The top risk-enriched co-expression modules from each study implicate terms from the Structural Stabilization and Immune Response biological domains. Moreover, there are GO terms that are shared among these co-expression modules, including “focal adhesion” and “extracellular matrix” for the Structural Stabilization biological domain and “neutrophil degranulation” and “cytokine-mediated signaling pathway” for the Immune Response biological domain.

The work presented here represents the most comprehensive effort to date to employ an unbiased computational system of AD risk assessment in coordination with an objective, systematic and comprehensive alignment with AD endophenotypes. Broadly, we have executed this task in two parts. First, we have developed an AD risk scoring paradigm that draws from multiple genetic association studies, as well as large multi-omic datasets consisting of 1,699 brains with transcriptomic data (Supplementary Table 6) and 1,188 brains with proteomic data (Supplementary Table 7), each drawn from multiple large brain bank studies, to objectively rank genes genome wide. Second, we have developed 19 biological domains that align with known AD endophenotypes, defining each with a specific and exhaustive set of GO terms and annotated gene sets. This enables the organization of risk associated genes into objective and unbiased biological processes, to facilitate the characterization of risk enriched areas for therapeutic development. The goal is to provide the scientific community with an evaluative resource that can be employed across studies and modalities to help integrate knowledge about potential domains and targets for future investigation as well as provide a unified framework for defining exactly what is meant by an AD endophenotype. The biological domains described in this work have already been adopted by several groups - including the AD Knowledge Portal¹⁴ (https://adknowledgeportal.synapse.org), the AD Atlas¹¹⁸ (https://adatlas.helmholtz-muenchen.de), Agora (agora.adknowledgeportal.org), the AMP-AD Consortium, and the TREAT-AD Consortium, and others as a common organizational reference tool. The expanding utilization of these biological domain endophenotypic definitions will provide the opportunity for increased interoperability and harmonization across large platform research efforts. Both methodologies - risk scoring and biological domain annotation - are easily updated as new information becomes available, as they draw from open-source communal resources central to AD biological investigation.

The quantitative scoring of genetic and multi-omic components of the TRS enables us to perform enrichment tests on both the composite score and the individual score components. In this way we can interrogate genes across the biological domains for cumulative risk and assess whether the risk comes primarily from one data modality. These analyses (Fig. 2–4) highlight the observation that the different modalities emphasize independent biological areas of disease risk, but employed in unison are a powerful tool for mapping disease risk into specific biological areas within objectively defined disease-associated endophenotypes (e.g. Figure 5A, Supp Tables 5, 10, and 12).

Using the biological domain framework, we investigated which biological domains were the most up- and down-regulated based on post-mortem differential transcriptomic expression and proteomic abundance analyses. The top two down-regulated areas of AD-linked biology are Mitochondrial Metabolism and Synapse (Fig. 3C and 3D), which are also the biological domains with the most significantly enriched terms in both the genetics and multi-omics score as well as the overall TRS. The down-regulation of synaptic genes span both pre- and postsynaptic gene sets, however there are 3–4 times as many postsynaptic GO terms enriched and postsynaptic terms are more significantly enriched across all component risk scores (Fig. 5C). This aligns with previous research into the role postsynaptic mechanisms play in the maintenance of dendrites and synaptic plasticity, lost during the cognitive decline of AD ^119–121. Numerous studies have demonstrated elements of mitochondrial dysfunction and hypometabolism in AD, corroborating the data-driven narrative emerging from these studies^1,122−124. Furthermore, these two domains of AD-linked biology are associated with cognitive stability in aging⁶⁹, and may be coordinately progressive in AD^123–125. One of the long-term objectives of our approach is to be able to detect frameworks of convergent biology involved in disease progression. While the analysis presented here only points to the independent identification of mitochondrial hypometabolism and synaptic dysfunction or loss, future work employing a broader set of analytical approaches may be able to use risk enrichment in portions of the biological domains to highlight aspects of interacting biology at the intersection of these domains.

There are multiple up-regulated biological domains demonstrated within our analysis, chief among them are Immune Response, Structural Stabilization, Lipid Metabolism and Proteostasis (Fig. 3C and 3D). The investigation of microglial activation in AD pathological progression has received increased attention recently, with respect to both janus faces of microglia: the homeostatic and degenerative roles. A meaningful discussion of these areas is beyond the scope of this work, but the topic has been extensively reviewed in recent years ^126–130. Similarly, structural stabilization is observed in many recent brain proteomic studies, suggesting a compelling role for heparin binding proteins, extracellular factors, and cytoskeleton associated proteins in either resilience or progression of AD^35,68. The field of metabolomics focuses extensively upon dysfunction in lipid metabolism in AD, and is mapped out within the AD Atlas (adatlas.helmholtz-muenchen.de) of the AD Metabolomics Consortium^131–133. Proteostasis is highly implicated in AD as well, suggesting complex linkages between autophagy and proteome stability, changes in chaperone mediated protein processing and ER stress responses may all play a role in AD ^134–137. The identification of sets of biological domains that are up-regulated and align with AD risk provides defined areas of biology to explore further for both disease driving mechanisms and compensatory processes that may facilitate future therapeutic development.

In order to assess how our ranking system compares to resources that perform a similar function, we benchmarked our scoring process with corresponding rankings from the Open Targets platform^104–107. The two methodologies share several features in common; both approaches integrate results from genetic association studies to implicate genes with variation that contributes to disease risk, both capture expression differences relevant to the disease, and both identify animal models with phenotypic relevance to disease. These similarities are reflected in the correlation observed between the two scores (e.g. Supp Fig. 4A). The TRS is distinct in several important ways. The TRS includes both transcriptomic and proteomic datasets derived from multiple brain tissues from numerous brain banks via our partners in the AMP-AD Consortium. Similarly, information pertaining to relevant AD mouse models from the MODEL-AD centers is included. Finally, the efforts to stratify risk assessment into discrete and objective endophenotypic disease areas is unique to our approach. While this is not part of the objective scoring methodology, it does enable us to use the scoring metric to aggregate risk into higher-level hypotheses about specific biological processes. We benchmarked the scoring output of Open Targets and the TRS and observed very similar patterns of enrichment (e.g. Supp Fig. 4B), with the Synapse, Lipid Metabolism, and Immune Response biological domains among the strongest enrichments for each score. However, the enrichments using the TRS implicate Mitochondrial Metabolism (120 enriched terms) and Tau Homeostasis (2 enriched terms) more strongly than the enrichments based on the Open Targets score (1 enriched term and 0 enriched terms, respectively).

Finally, we used the TRS and biological domain framework to assess sets of co-expression modules produced from large scale genomic AD investigations. We found highly similar biological domain enrichments for the top scoring co-expression modules. The proteins and transcripts from co-expression modules and submodules that implicate extracellular matrix and cell junction biology (M11, STGblue, and IFGturquoise_2) are enriched for terms from the Structural Stabilization biological domain, share enrichments in the “focal adhesion” and “extracellular matrix” GO terms, and tend to have high median multi-omics scores relative to other modules. The proteins and transcripts from co-expression modules and submodules predominantly implicating immune function (M21, DLPFCblue, and PHGturqoise_2) are enriched for terms from the Immune Response biological domain, share enrichments in the “neutrophil degranulation”, and “interferon-gamma-mediated signaling pathway”, among others, and tend to have high median genetics scores relative to other modules. Thus, using a common framework to rank and organize the genes and proteins that emerge from these systems-level studies highlights common processes associated with increased risk emerging from multimodal platforms of investigation.

This work is the largest integrated effort to combine genetic and multi-omic AD risk scoring with an automatable system of endophenotypic genetic characterization. The dual processes of TRS ranking genes across the genome and assembling the risk areas into biological domains points to a consistency between our objective analyses and observations made across the field, which supports our approach to identifying focal areas of AD risk. The advantage of our system is that we can utilize the comprehensive representation of AD risk genes organized in an unbiased fashion into specific areas of biology to expand the study of disease domain transitions, by identifying potential points of convergence between interacting domains, and examining the genetic entities at those cross-roads. This approach is utilized by the TREAT-AD center led by Emory-Sage-SGC to help identify specific dark targets for future exploration as potential therapeutics – the informatics and material resources developed will be made openly available to the AD scientific community. These approaches will continue to be refined and expanded based upon newly emerging data and input from the scientific community. We hope these resources and analytical techniques may help the field foster current efforts leading to growth of novel translational approaches, or the repurposing of therapeutics developed in divergent fields, for use in the treatment of AD. The objective identification of the ranked areas of disease risk scored genome wide and organized into defined biological domains highlight the significance of multiple domains of biology for translational development, that current resources can help hone into specific subdomains–such as mitochondrial complex I related factors and postsynaptic targets, as well as up-regulated targets in the Immune Response and Structural Stabilization biological domains.

AD: Alzheimer’s disease

TRS: Target Risk Score

CADRO: Common Alzheimer’s and Related Dementias Research Ontology

NIA: National Institute on Aging

AA: Alzheimer’s Association

GO: Gene Ontology

GWAS: Genome-wide Association Study

GWAX: Genome-wide association study by proxy

AMP-AD: Accelerating Medicines Partnership for Alzheimer’s Disease

TREAT-AD: Target Enablement to Accelerate Therapy Development in AD

QTL: quantitative trait locus

eQTL: expression quantitative trait loci

pQTL: protein quantitative trait loci

MAGMA: multi-marker analysis of genomic annotations

GSEA: Gene set enrichment analysis

APP: amyloid beta precursor protein

ATP: adenosine triphosphate

DNA: deoxyribonucleic acid

RNA: ribonucleic acid

EBI: European Bioinformatics Institute

FDR: False discovery rate

MLE: mean -log e-value

MODEL-AD: Model Organism Development and Evaluation for Late-onset Alzheimer’s Disease

LFQ: label-free quantitation

TMT: Tandem Mass Tagging

LOAD: late-onset Alzheimer’s disease

STG: Superior temporal gyrus

DLPFC: dorsolateral prefrontal cortex

PHG: parahippocampal gyrus

ADSP: Alzheimer’s disease sequencing project

ROSMAP: Religious Orders Study/Memory and Aging Project

MSBB: Mount Sinai Brain Bank

NIH: National Institute of Health

Data Availability Statement

All data generated during this study are included in this published article, including supplementary information files, and are available via the AD Knowledge Portal: https://adknowledgeportal.org. The data analyzed in the generation of the various metrics are also available via the AD Knowledge Portal. Data are available for general research use according to the following requirements for data access and data attribution (https://adknowledgeportal.org/DataAccess/Instructions).

Code Availability Statement

Custom code used to analyze data are available via GitHub (https://github.com/caryga/AD_TargetRank). The details of software packages used during the completion of this work, including the software or database version or the date of API access, are included in the methods section.

Acknowledgements

The authors would like to acknowledge the support of Drs. Lea T. Grinberg, Joshua M. Shulman, David Li-Kroeger, Jessica E. Young, Suman Jayadev, Ranjita Betarbet, and Benoit Lehallier for insightful discussions and suggestions during the development of these resources. The Target Enablement to Accelerate Therapy Development for Alzheimer’s Disease (TREAT-AD) Consortium was established by the National Institute on Aging (NIA). The research reported in this manuscript was led by the Emory-Sage-SGC TREAT center and supported by grant U54AG065187 from the NIA. Certain data used in this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24AG041689), funded by the NIA; detailed citations and accessions can be found in Supplementary Table 3. Data used in this study from the Accelerating Medicines Partnership Program for Alzheimer's Disease (AMP-AD) Consortium members below:

Mayo RNAseq Study: Study data were provided by the following sources: The Mayo Clinic Alzheimer's Disease Genetic Studies, led by Dr. Nilufer Ertekin-Taner and Dr. Steven G. Younkin, Mayo Clinic, Jacksonville, FL using samples from the Mayo Clinic Study of Aging, the Mayo Clinic Alzheimer's Disease Research Center, and the Mayo Clinic Brain Bank. Data collection was supported through funding by NIA grants P50 AG016574, R01 AG032990, U01 AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01 AG017216, R01 AG003949, NINDS grant R01 NS080820, CurePSP Foundation, and support from Mayo Foundation. Study data includes samples collected through the Sun Health Research Institute Brain and Body Donation Program of Sun City, Arizona. The Brain and Body Donation Program is supported by the National Institute of Neurological Disorders and Stroke (U24 NS072026 National Brain and Tissue Resource for Parkinson's Disease and Related Disorders), the National Institute on Aging (P30 AG19610 Arizona Alzheimer's Disease Core Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer's Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001 to the Arizona Parkinson's Disease Consortium) and the Michael J. Fox Foundation for Parkinson's Research

Religious Orders Study/Memory and Aging Project (ROSMAP): We are grateful to the participants in the Religious Order Study, the Memory and Aging Project. This work is supported by the US National Institutes of Health [U01 AG046152, R01 AG043617, R01 AG042210, R01 AG036042, R01 AG036836, R01 AG032990, R01 AG18023, RC2 AG036547, P50 AG016574, U01 ES017155, KL2 RR024151, K25 AG041906-01, R01 AG30146, P30 AG10161, R01 AG17917, R01 AG15819, K08 AG034290, P30 AG10161 and R01 AG11101.

Mount Sinai Brain Bank (MSBB): This work was supported by the grants R01AG046170, RF1AG054014, RF1AG057440 and R01AG057907 from the NIH/National Institute on Aging (NIA). R01AG046170 is a component of the AMP-AD Target Discovery and Preclinical Validation Project. Brain tissue collection and characterization was supported by NIH HHSN271201300031C.

Ethics approval and consent to participate: all human data has been employed in an ethically appropriate manner as overseen by the governance team at Sage Bionetworks.
Consent for publication: all authors consent to publication
Availability of data and material: the data employed in this analysis is fully available as described in the data availability section of the paper.
Competing interests: none of the authors of this work have any competing interests to disclose.
Funding: Funding for this work comes multiple sources, most noteably the TREAT-AD Sage-Emory-SGC center grant as noted within the acknowledgements section.
Authors' contributions: Greg Cary and Jesse Wiley are the co-first authors, having performed seminal roles in the technology development, implementation and analysis. Jake Gockley played an instrumental role in the development of the transcriptomics and proteomics scoring algorithm. Stephen Keegan worked with the scoring and biological domain models. Sruthi Ganesh was involved with biological domain development. Robert Butler and Laura Heath facilitated the data processing and access, as well as provided key technical and biological insight into the pipeline. Lara Mangravite, Allen Levey, Frank Longo and Ben Logsdon contributed to the overall design and leadership of the consortia effort and the tactical assessment of the informatics needs. Anna Greenwood and Greg Carter led the Sage Bionetworks and Jackson Laboratories informatics development efforts and provided the project oversight and leadership throughout the project.
Acknowledgements: see the acknowledgements section at the end of the body of the manuscript.

Knopman, D. S. et al. Alzheimer disease. Nat Rev Dis Primers 7, 33, doi:10.1038/s41572-021-00269-y (2021).
Ryan, N. S., Rossor, M. N. & Fox, N. C. Alzheimer's disease in the 100 years since Alzheimer's death. Brain 138, 3816-3821, doi:10.1093/brain/awv316 (2015).
Hardy, J. A hundred years of Alzheimer's disease research. Neuron 52, 3-13, doi:10.1016/j.neuron.2006.09.016 (2006).
Jack, C. R., Jr. et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease. Alzheimers Dement 14, 535-562, doi:10.1016/j.jalz.2018.02.018 (2018).
Refolo, L. M. et al. Common Alzheimer's Disease Research Ontology: National Institute on Aging and Alzheimer's Association collaborative project. Alzheimers Dement 8, 372-375, doi:10.1016/j.jalz.2012.05.2115 (2012).
Cummings, J. et al. Alzheimer's disease drug development pipeline: 2022. Alzheimers Dement (N Y) 8, e12295, doi:10.1002/trc2.12295 (2022).
Cummings, J., Lee, G., Zhong, K., Fonseca, J. & Taghva, K. Alzheimer's disease drug development pipeline: 2021. Alzheimers Dement (N Y) 7, e12179, doi:10.1002/trc2.12179 (2021).
Cummings, J., Lee, G., Ritter, A., Sabbagh, M. & Zhong, K. Alzheimer's disease drug development pipeline: 2020. Alzheimers Dement (N Y) 6, e12050, doi:10.1002/trc2.12050 (2020).
Cummings, J., Lee, G., Ritter, A., Sabbagh, M. & Zhong, K. Alzheimer's disease drug development pipeline: 2019. Alzheimers Dement (N Y) 5, 272-293, doi:10.1016/j.trci.2019.05.008 (2019).
Cummings, J., Lee, G., Ritter, A. & Zhong, K. Alzheimer's disease drug development pipeline: 2018. Alzheimers Dement (N Y) 4, 195-214, doi:10.1016/j.trci.2018.03.009 (2018).
Cummings, J., Lee, G., Mortsdorf, T., Ritter, A. & Zhong, K. Alzheimer's disease drug development pipeline: 2017. Alzheimers Dement (N Y) 3, 367-384, doi:10.1016/j.trci.2017.05.002 (2017).
Cummings, J., Morstorf, T. & Lee, G. Alzheimer's drug-development pipeline: 2016. Alzheimers Dement (N Y) 2, 222-232, doi:10.1016/j.trci.2016.07.001 (2016).
Gauthier, S. et al. Combination Therapy for Alzheimer's Disease: Perspectives of the EU/US CTAD Task Force. J Prev Alzheimers Dis 6, 164-168, doi:10.14283/jpad.2019.12 (2019).
Greenwood, A. K. et al. The AD Knowledge Portal: A Repository for Multi-Omic Data on Alzheimer's Disease and Aging. Curr Protoc Hum Genet 108, e105, doi:10.1002/cphg.105 (2020).
Bellenguez, C. et al. New insights into the genetic etiology of Alzheimer's disease and related dementias. Nat Genet 54, 412-436, doi:10.1038/s41588-022-01024-z (2022).
Wightman, D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer's disease. Nat Genet 53, 1276-1282, doi:10.1038/s41588-021-00921-z (2021).
Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Abeta, tau, immunity and lipid processing. Nat Genet 51, 414-430, doi:10.1038/s41588-019-0358-2 (2019).
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet 51, 404-413, doi:10.1038/s41588-018-0311-9 (2019).
Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet 45, 1452-1458, doi:10.1038/ng.2802 (2013).
Wan, Y. W. et al. Meta-Analysis of the Alzheimer's Disease Human Brain Transcriptome and Functional Dissection in Mouse Models. Cell Rep 32, 107908, doi:10.1016/j.celrep.2020.107908 (2020).
Mukherjee, S. et al. Molecular estimation of neurodegeneration pseudotime in older brains. Nat Commun 11, 5781, doi:10.1038/s41467-020-19622-y (2020).
Milind, N. et al. Transcriptomic stratification of late-onset Alzheimer's cases reveals novel genetic modifiers of disease pathology. PLoS Genet 16, e1008775, doi:10.1371/journal.pgen.1008775 (2020).
Allen, M. et al. Human whole genome genotype and transcriptome data for Alzheimer's and other neurodegenerative diseases. Sci Data 3, 160089, doi:10.1038/sdata.2016.89 (2016).
Shi, Y. et al. Transcriptomic Analyses for Identification and Prioritization of Genes Associated With Alzheimer's Disease in Humans. Front Bioeng Biotechnol 8, 31, doi:10.3389/fbioe.2020.00031 (2020).
Roussarie, J. P. et al. Selective Neuronal Vulnerability in Alzheimer's Disease: A Network-Based Analysis. Neuron 107, 821-835.e812, doi:10.1016/j.neuron.2020.06.010 (2020).
Morabito, S., Miyoshi, E., Michael, N. & Swarup, V. Integrative genomics approach identifies conserved transcriptomic networks in Alzheimer's disease. Hum Mol Genet 29, 2899-2919, doi:10.1093/hmg/ddaa182 (2020).
Patel, H., Dobson, R. J. B. & Newhouse, S. J. A Meta-Analysis of Alzheimer's Disease Brain Transcriptomic Data. J Alzheimers Dis 68, 1635-1656, doi:10.3233/jad-181085 (2019).
Hong, G. et al. A Qualitative Analysis Based on Relative Expression Orderings Identifies Transcriptional Subgroups for Alzheimer's Disease. Curr Alzheimer Res 16, 1175-1182, doi:10.2174/1567205016666191122125035 (2019).
Hatcher, C., Relton, C. L., Gaunt, T. R. & Richardson, T. G. Leveraging brain cortex-derived molecular data to elucidate epigenetic and transcriptomic drivers of complex traits and disease. Transl Psychiatry 9, 105, doi:10.1038/s41398-019-0437-2 (2019).
Wingo, T. S. et al. Integrating human brain proteomes with genome-wide association data implicates novel proteins in post-traumatic stress disorder. Mol Psychiatry, doi:10.1038/s41380-022-01544-4 (2022).
Gao, Y. et al. Proteomic analysis of human hippocampal subfields provides new insights into the pathogenesis of Alzheimer's disease and the role of glial cells. Brain Pathol, e13047, doi:10.1111/bpa.13047 (2022).
Swarup, V. et al. Identification of evolutionarily conserved gene networks mediating neurodegenerative dementia. Nat Med 25, 152-164, doi:10.1038/s41591-018-0223-3 (2019).
Johnson, E. C. B. et al. Deep proteomic network analysis of Alzheimer's disease brain reveals alterations in RNA binding proteins and RNA splicing associated with disease. Mol Neurodegener 13, 52, doi:10.1186/s13024-018-0282-4 (2018).
Seyfried, N. T. et al. A Multi-network Approach Identifies Protein-Specific Co-expression in Asymptomatic and Symptomatic Alzheimer's Disease. Cell Syst 4, 60-72.e64, doi:10.1016/j.cels.2016.11.006 (2017).
Johnson, E. C. B. et al. Large-scale deep multi-layer analysis of Alzheimer's disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nat Neurosci 25, 213-225, doi:10.1038/s41593-021-00999-y (2022).
Schwartzentruber, J. et al. Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer's disease risk genes. Nat Genet 53, 392-402, doi:10.1038/s41588-020-00776-w (2021).
Kunkle, B. W. et al. Novel Alzheimer Disease Risk Loci and Pathways in African American Individuals Using the African Genome Resources Panel: A Meta-analysis. JAMA Neurol 78, 102-113, doi:10.1001/jamaneurol.2020.3536 (2021).
Moreno-Grau, S. et al. Genome-wide association analysis of dementia and its clinical endophenotypes reveal novel loci associated with Alzheimer's disease and three causality networks: The GR@ACE project. Alzheimers Dement 15, 1333-1347, doi:10.1016/j.jalz.2019.06.4950 (2019).
Witoelar, A. et al. Meta-analysis of Alzheimer's disease on 9,751 samples from Norway and IGAP study identifies four risk loci. Sci Rep 8, 18088, doi:10.1038/s41598-018-36429-6 (2018).
Vojinovic, D. et al. Genome-wide association study of 23,500 individuals identifies 7 loci associated with brain ventricular volume. Nat Commun 9, 3945, doi:10.1038/s41467-018-06234-w (2018).
Davies, G. et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat Commun 9, 2098, doi:10.1038/s41467-018-04362-x (2018).
Marioni, R. E. et al. GWAS on family history of Alzheimer's disease. Transl Psychiatry 8, 99, doi:10.1038/s41398-018-0150-6 (2018).
Qin, W. et al. Exome sequencing revealed PDE11A as a novel candidate gene for early-onset Alzheimer's disease. Hum Mol Genet 30, 811-822, doi:10.1093/hmg/ddab090 (2021).
Fan, K. H. et al. Whole-Exome Sequencing Analysis of Alzheimer's Disease in Non-APOE*4 Carriers. J Alzheimers Dis 76, 1553-1565, doi:10.3233/jad-200037 (2020).
Curtis, D., Bakaya, K., Sharma, L. & Bandyopadhyay, S. Weighted burden analysis of exome-sequenced late-onset Alzheimer's cases and controls provides further evidence for a role for PSEN1 and suggests involvement of the PI3K/Akt/GSK-3β and WNT signalling pathways. Ann Hum Genet 84, 291-302, doi:10.1111/ahg.12375 (2020).
Bis, J. C. et al. Correction: Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry 25, 1901-1903, doi:10.1038/s41380-019-0529-7 (2020).
Bis, J. C. et al. Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry 25, 1859-1875, doi:10.1038/s41380-018-0112-7 (2020).
Ma, Y. et al. Analysis of Whole-Exome Sequencing Data for Alzheimer Disease Stratified by APOE Genotype. JAMA Neurol, doi:10.1001/jamaneurol.2019.1456 (2019).
Jiang, B. et al. Mutation screening in Chinese patients with familial Alzheimer's disease by whole-exome sequencing. Neurobiol Aging 76, 215.e215-215.e221, doi:10.1016/j.neurobiolaging.2018.11.024 (2019).
Raghavan, N. S. et al. Whole-exome sequencing in 20,197 persons for rare variants in Alzheimer's disease. Ann Clin Transl Neurol 5, 832-842, doi:10.1002/acn3.582 (2018).
Patel, T. et al. Whole-exome sequencing of the BDR cohort: evidence to support the role of the PILRA gene in Alzheimer's disease. Neuropathol Appl Neurobiol 44, 506-521, doi:10.1111/nan.12452 (2018).
N'Songo, A. et al. African American exome sequencing identifies potential risk variants at Alzheimer disease loci. Neurol Genet 3, e141, doi:10.1212/nxg.0000000000000141 (2017).
Cukier, H. N. et al. Exome Sequencing of Extended Families with Alzheimer's Disease Identifies Novel Genes Implicated in Cell Immunity and Neuronal Function. J Alzheimers Dis Parkinsonism 7, doi:10.4172/2161-0460.1000355 (2017).
Nicolas, G. et al. Screening of dementia genes by whole-exome sequencing in early-onset Alzheimer disease: input and lessons. Eur J Hum Genet 24, 710-716, doi:10.1038/ejhg.2015.173 (2016).
Bras, J. et al. Exome sequencing in a consanguineous family clinically diagnosed with early-onset Alzheimer's disease identifies a homozygous CTSF mutation. Neurobiol Aging 46, 236.e231-236, doi:10.1016/j.neurobiolaging.2016.06.018 (2016).
Saad, M., Brkanac, Z. & Wijsman, E. M. Family-based genome scan for age at onset of late-onset Alzheimer's disease in whole exome sequencing data. Genes Brain Behav 14, 607-617, doi:10.1111/gbb.12250 (2015).
Guerreiro, R. J. et al. Exome sequencing reveals an unexpected genetic cause of disease: NOTCH3 mutation in a Turkish family with Alzheimer's disease. Neurobiol Aging 33, 1008.e1017-1023, doi:10.1016/j.neurobiolaging.2011.10.009 (2012).
Robins, C. et al. Genetic control of the human brain proteome. Am J Hum Genet 108, 400-410, doi:10.1016/j.ajhg.2021.01.012 (2021).
Sieberts, S. K. et al. Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. Sci Data 7, 340, doi:10.1038/s41597-020-00642-8 (2020).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol 11, e1004219, doi:10.1371/journal.pcbi.1004219 (2015).
Andrews, S. J., Fulton-Howard, B. & Goate, A. Interpretation of risk loci from genome-wide association studies of Alzheimer's disease. Lancet Neurol 19, 326-335, doi:10.1016/s1474-4422(19)30435-1 (2020).
Van Acker, Z. P., Perdok, A., Bretou, M. & Annaert, W. The microglial lysosomal system in Alzheimer's disease: Guardian against proteinopathy. Ageing Res Rev 71, 101444, doi:10.1016/j.arr.2021.101444 (2021).
Van Acker, Z. P., Bretou, M. & Annaert, W. Endo-lysosomal dysregulations and late-onset Alzheimer's disease: impact of genetic risk factors. Mol Neurodegener 14, 20, doi:10.1186/s13024-019-0323-7 (2019).
Swarup, V. et al. Identification of Conserved Proteomic Networks in Neurodegenerative Dementia. Cell Rep 31, 107807, doi:10.1016/j.celrep.2020.107807 (2020).
Wingo, A. P. et al. Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer's disease pathogenesis. Nat Genet 53, 143-146, doi:10.1038/s41588-020-00773-z (2021).
Rayaprolu, S. et al. Systems-based proteomics to resolve the biology of Alzheimer's disease beyond amyloid and tau. Neuropsychopharmacology 46, 98-115, doi:10.1038/s41386-020-00840-3 (2021).
Zhou, M. et al. Targeted mass spectrometry to quantify brain-derived cerebrospinal fluid biomarkers in Alzheimer's disease. Clin Proteomics 17, 19, doi:10.1186/s12014-020-09285-8 (2020).
Johnson, E. C. B. et al. Large-scale proteomic analysis of Alzheimer's disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nat Med 26, 769-780, doi:10.1038/s41591-020-0815-6 (2020).
Wingo, A. P. et al. Large-scale proteomic analysis of human brain identifies proteins associated with cognitive trajectory in advanced age. Nat Commun 10, 1619, doi:10.1038/s41467-019-09613-z (2019).
Wang, M. et al. The Mount Sinai cohort of large-scale genomic, transcriptomic and proteomic data in Alzheimer's disease. Sci Data 5, 180185, doi:10.1038/sdata.2018.185 (2018).
Allen, M. et al. Conserved brain myelination networks are altered in Alzheimer's and other neurodegenerative diseases. Alzheimers Dement 14, 352-366, doi:10.1016/j.jalz.2017.09.012 (2018).
McKenzie, A. T. et al. Multiscale network modeling of oligodendrocytes reveals molecular components of myelin dysregulation in Alzheimer's disease. Mol Neurodegener 12, 82, doi:10.1186/s13024-017-0219-3 (2017).
Wingo, A. P. et al. Shared proteomic effects of cerebral atherosclerosis and Alzheimer's disease on the human brain. Nat Neurosci 23, 696-700, doi:10.1038/s41593-020-0635-5 (2020).
Huang, J. L. et al. Comprehensive analysis of differentially expressed profiles of Alzheimer's disease associated circular RNAs in an Alzheimer's disease mouse model. Aging (Albany NY) 10, 253-265, doi:10.18632/aging.101387 (2018).
Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439-3440, doi:10.1093/bioinformatics/bti525 (2005).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4, 1184-1191, doi:10.1038/nprot.2009.97 (2009).
Li, Y. et al. Transfer learning-trained convolutional neural networks identify novel MRI biomarkers of Alzheimer's disease progression. Alzheimers Dement (Amst) 13, e12140, doi:10.1002/dad2.12140 (2021).
Santos, L. R. D., Almeida, J. F. F., Pimassoni, L. H. S., Morelato, R. L. & Paula, F. The combined risk effect among BIN1, CLU, and APOE genes in Alzheimer's disease. Genet Mol Biol 43, e20180320, doi:10.1590/1678-4685-gmb-2018-0320 (2020).
Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet 51, 414-430, doi:10.1038/s41588-019-0358-2 (2019).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005-d1012, doi:10.1093/nar/gky1120 (2019).
Vardarajan, B. N. et al. Whole genome sequencing of Caribbean Hispanic families with late-onset Alzheimer's disease. Ann Clin Transl Neurol 5, 406-417, doi:10.1002/acn3.537 (2018).
Ligthart, S. et al. Genome Analyses of >200,000 Individuals Identify 58 Loci for Chronic Inflammation and Highlight Pathways that Link Inflammation and Complex Disorders. Am J Hum Genet 103, 691-706, doi:10.1016/j.ajhg.2018.09.009 (2018).
Dourlen, P., Chapuis, J. & Lambert, J. C. Using High-Throughput Animal or Cell-Based Models to Functionally Characterize GWAS Signals. Curr Genet Med Rep 6, 107-115, doi:10.1007/s40142-018-0141-1 (2018).
Jun, G. R. et al. Transethnic genome-wide scan identifies novel Alzheimer's disease loci. Alzheimers Dement 13, 727-738, doi:10.1016/j.jalz.2016.12.012 (2017).
Deming, Y. et al. Genome-wide association study identifies four novel loci associated with Alzheimer's endophenotypes and disease modifiers. Acta Neuropathol 133, 839-856, doi:10.1007/s00401-017-1685-y (2017).
Chapuis, J. et al. Genome-wide, high-content siRNA screening identifies the Alzheimer's genetic risk factor FERMT2 as a major modulator of APP metabolism. Acta Neuropathol 133, 955-966, doi:10.1007/s00401-016-1652-z (2017).
Ibrahim-Verbaas, C. A. et al. GWAS for executive function and processing speed suggests involvement of the CADM2 gene. Mol Psychiatry 21, 189-197, doi:10.1038/mp.2015.37 (2016).
Chouraki, V. et al. Evaluation of a Genetic Risk Score to Improve Risk Prediction for Alzheimer's Disease. J Alzheimers Dis 53, 921-932, doi:10.3233/jad-150749 (2016).
Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet 45, 1452-1458, doi:10.1038/ng.2802 (2013).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164, doi:10.1093/nar/gkq603 (2010).
Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum Mutat 37, 235-241, doi:10.1002/humu.22932 (2016).
Jian, X., Boerwinkle, E. & Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 42, 13534-13544, doi:10.1093/nar/gku1206 (2014).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434-443, doi:10.1038/s41586-020-2308-7 (2020).
Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 22, 1790-1797, doi:10.1101/gr.137323.112 (2012).
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 50, 1171-1179, doi:10.1038/s41588-018-0160-6 (2018).
Shefchek, K. A. et al. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 48, D704-d715, doi:10.1093/nar/gkz997 (2020).
McMurry, J. A. et al. Navigating the Phenotype Frontier: The Monarch Initiative. Genetics 203, 1491-1495, doi:10.1534/genetics.116.188870 (2016).
Köhler, S. et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res 49, D1207-d1217, doi:10.1093/nar/gkaa1043 (2021).
Preuss, C. et al. A novel systems biology approach to evaluate mouse models of late-onset Alzheimer's disease. Mol Neurodegener 15, 67, doi:10.1186/s13024-020-00412-5 (2020).
Friedrich, J. O., Adhikari, N. K. & Beyene, J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study. BMC Med Res Methodol 8, 32, doi:10.1186/1471-2288-8-32 (2008).
Larry V. Hedges, J. G., Peter S. Curtis. THE META-ANALYSIS OF RESPONSE RATIOS IN EXPERIMENTAL ECOLOGY. Ecology 80, 1150-1156 (1999).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284-287, doi:10.1089/omi.2011.0118 (2012).
Neuner, S. M., Tcw, J. & Goate, A. M. Genetic architecture of Alzheimer's disease. Neurobiol Dis 143, 104976, doi:10.1016/j.nbd.2020.104976 (2020).
Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet 53, 1527-1533, doi:10.1038/s41588-021-00945-5 (2021).
Ghoussaini, M. et al. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res 49, D1311-d1320, doi:10.1093/nar/gkaa840 (2021).
Ochoa, D. et al. Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res 49, D1302-d1310, doi:10.1093/nar/gkaa1027 (2021).
Peat, G. et al. The open targets post-GWAS analysis pipeline. Bioinformatics 36, 2936-2937, doi:10.1093/bioinformatics/btaa020 (2020).
Ladia, B. H. K. synapser: R language bindings for Synapse API. https:www.synapse.org R package version 0.10.101 (2021).
Zhou, Y. et al. AlzGPS: a genome-wide positioning systems platform to catalyze multi-omics for Alzheimer's drug discovery. Alzheimers Res Ther 13, 24, doi:10.1186/s13195-020-00760-w (2021).
Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat Genet 47, 856-860, doi:10.1038/ng.3314 (2015).
King, E. A., Davis, J. W. & Degner, J. F. Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet 15, e1008489, doi:10.1371/journal.pgen.1008489 (2019).
Ochoa, D. et al. Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nat Rev Drug Discov, doi:10.1038/d41573-022-00120-3 (2022).
Pandey, R. S. et al. Genetic perturbations of disease risk genes in mice capture transcriptomic signatures of late-onset Alzheimer's disease. Mol Neurodegener 14, 50, doi:10.1186/s13024-019-0351-3 (2019).
Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res 47, D1056-d1065, doi:10.1093/nar/gky1133 (2019).
Christopher, J. A., Geladaki, A., Dawson, C. S., Vennard, O. L. & Lilley, K. S. Subcellular Transcriptomics and Proteomics: A Comparative Methods Review. Mol Cell Proteomics 21, 100186, doi:10.1016/j.mcpro.2021.100186 (2022).
Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol Syst Biol 15, e8503, doi:10.15252/msb.20188503 (2019).
Baird, D. A. et al. Identifying drug targets for neurological and psychiatric disease via genetics and the brain transcriptome. PLoS Genet 17, e1009224, doi:10.1371/journal.pgen.1009224 (2021).
Wörheide, M. K., J; Nataf, S; Nho, K; Greenwood, AK; Wiley, JC; Wu, T; Huynh, K; Weinisch, P; Römisch-Margl, W; Lehner, N; Baumbach, J; Meikle, PJ; Saykin, AJ; Doraiswamy, PM; van Duijn, C; Suhre, K; Kaddurah-Daouk, R; Kastenmüller, G; Arnold M. An Integrated Molecular Atlas of Alzheimer's Disease. medRxiv, doi:doi: 10.1101/2021.09.14.21263565 (2022).
Dejanovic, B. et al. Changes in the Synaptic Proteome in Tauopathy and Rescue of Tau-Induced Synapse Loss by C1q Antibodies. Neuron 100, 1322-1336 e1327, doi:10.1016/j.neuron.2018.10.014 (2018).
Shao, C. Y., Mirra, S. S., Sait, H. B., Sacktor, T. C. & Sigurdsson, E. M. Postsynaptic degeneration as revealed by PSD-95 reduction occurs after advanced Abeta and tau pathology in transgenic mouse models of Alzheimer's disease. Acta Neuropathol 122, 285-292, doi:10.1007/s00401-011-0843-x (2011).
Gong, Y. & Lippa, C. F. Review: disruption of the postsynaptic density in Alzheimer's disease and other neurodegenerative dementias. Am J Alzheimers Dis Other Demen 25, 547-555, doi:10.1177/1533317510382893 (2010).
Ashleigh, T., Swerdlow, R. H. & Beal, M. F. The role of mitochondrial dysfunction in Alzheimer's disease pathogenesis. Alzheimers Dement, doi:10.1002/alz.12683 (2022).
Torres, A. K. et al. Synaptic Mitochondria: An Early Target of Amyloid-β and Tau in Alzheimer's Disease. J Alzheimers Dis 84, 1391-1414, doi:10.3233/jad-215139 (2021).
Fessel, J. Does synaptic hypometabolism or synaptic dysfunction, originate cognitive loss? Analysis of the evidence. Alzheimers Dement (N Y) 7, e12177, doi:10.1002/trc2.12177 (2021).
Morton, H. et al. Defective mitophagy and synaptic degeneration in Alzheimer's disease: Focus on aging, mitochondria and synapse. Free Radic Biol Med 172, 652-667, doi:10.1016/j.freeradbiomed.2021.07.013 (2021).
Pons, V. & Rivest, S. Targeting Systemic Innate Immune Cells as a Therapeutic Avenue for Alzheimer Disease. Pharmacol Rev 74, 1-17, doi:10.1124/pharmrev.121.000400 (2022).
McManus, R. M. The Role of Immunity in Alzheimer's Disease. Adv Biol (Weinh) 6, e2101166, doi:10.1002/adbi.202101166 (2022).
De Sousa, R. A. L. Reactive gliosis in Alzheimer's disease: a crucial role for cognitive impairment and memory loss. Metab Brain Dis 37, 851-857, doi:10.1007/s11011-022-00953-2 (2022).
Ahn, K., Lee, S. J. & Mook-Jung, I. White matter-associated microglia: New players in brain aging and neurodegenerative diseases. Ageing Res Rev 75, 101574, doi:10.1016/j.arr.2022.101574 (2022).
Ahmad, M. A. et al. Neuroinflammation: A Potential Risk for Dementia. Int J Mol Sci 23, doi:10.3390/ijms23020616 (2022).
Abubakar, M. B. et al. Alzheimer's Disease: An Update and Insights Into Pathophysiology. Front Aging Neurosci 14, 742408, doi:10.3389/fnagi.2022.742408 (2022).
Sriwichaiin, S., Chattipakorn, N. & Chattipakorn, S. C. Metabolomic Alterations in the Blood and Brain in Association with Alzheimer's Disease: Evidence from in vivo to Clinical Studies. J Alzheimers Dis 84, 23-50, doi:10.3233/jad-210737 (2021).
Sun, Y. et al. Metabolism: A Novel Shared Link between Diabetes Mellitus and Alzheimer's Disease. J Diabetes Res 2020, 4981814, doi:10.1155/2020/4981814 (2020).
Fernández, D., Geisse, A., Bernales, J. I., Lira, A. & Osorio, F. The Unfolded Protein Response in Immune Cells as an Emerging Regulator of Neuroinflammation. Front Aging Neurosci 13, 682633, doi:10.3389/fnagi.2021.682633 (2021).
Bourdenx, M. et al. Chaperone-mediated autophagy prevents collapse of the neuronal metastable proteome. Cell 184, 2696-2714.e2625, doi:10.1016/j.cell.2021.03.048 (2021).
Koopman, M. B. & Rüdiger, S. G. D. Alzheimer Cells on Their Way to Derailment Show Selective Changes in Protein Quality Control Network. Front Mol Biosci 7, 214, doi:10.3389/fmolb.2020.00214 (2020).
Dematteis, G. et al. Proteomic analysis links alterations of bioenergetics, mitochondria-ER interactions and proteostasis in hippocampal astrocytes from 3xTg-AD mice. Cell Death Dis 11, 645, doi:10.1038/s41419-020-02911-1 (2020).

SupplementalFigures3.docx

Download PDF

Journal Publication

published 31 Mar, 2024

Read the published version in Alzheimer's & Dementia: Translational Research & Clinical Interventions →

Version 1

posted

You are reading this latest preprint version

Genetic and Multi-omic Risk Assessment of Alzheimer’s Disease Implicates Core Associated Biological Domains

Status:

Journal Publication

Version 1

Abstract

Figures

Background

Methods

Alzheimer’s Disease Biological Domains & Enrichment Analysis

Target Risk Score (TRS) Development and Process

Genetic Risk Score Component

Multi-omic Risk Score Component

GSEA Analysis using the Biological Domains

Other data

Results

Alzheimer’s Disease Biological Domains

Target Risk Score Overview

Genetic Risk Score Component

Multi-omic Risk Score Component

Composite Target Risk Score

Example Use Case: AMP-AD Co-expression Modules Risk Scores and Biological Domains

Discussion

Conclusion

Abbreviations

Declarations

References

Supplementary Files

Status:

Journal Publication

Version 1