Harmonization of Biomarkers of Aging and Datasets
We harmonized a comprehensive set of 39 well-established epigenetic, transcriptomic, and clinical biomarkers (Table 1) and implemented these BoAs in Biolearn, representing the largest collection of BoAs in a single package to date. We have validated the implementation of these biomarkers with their respective developers to ensure accuracy and reliability. The epigenetic biomarkers encompass a wide range of categories, including: (1) Chronological clocks: Horvath’s multi-tissue clock and Hannum’s blood clock 3,17; (2) Healthspan and mortality-related clocks: GrimAge, GrimAge2, PhenoAge, and Zhang clock 10,11,18,19; (3) Biomarkers of the rate of aging: DunedinPoAm38 and DunedinPACE 5,20; (4) Causality-enriched clocks: Ying’s CausAge, DamAge, and AdaptAge 12; and (5) Various other clocks, including DNAm-based biomarkers and disease predictors, transcriptomic clocks, and clinical clocks (Table 1).
Table 1
Harmonized biomarkers in Biolearn.
Biomarker | Year | Tissue | Predicts | Omic Type |
HorvathV1 3 | 2013 | Multi-tissue | Age (Years) | DNA Methylation |
Hannum 17 | 2013 | Blood | Age (Years) | DNA Methylation |
Lin 39 | 2016 | Blood | Age (Years) | DNA Methylation |
PhenoAge 11 | 2018 | Blood | Age (Years) | DNA Methylation |
HorvathV2 40 | 2018 | Skin, blood | Age (Years) | DNA Methylation |
PEDBE 41 | 2019 | Buccal | Age (Years) | DNA Methylation |
Zhang_10 19 | 2019 | Blood | Mortality Risk | DNA Methylation |
GrimAge 10 | 2019 | Blood | Age Adjusted by Mortality Risk (Years) | DNA Methylation |
GrimAge2 18 | 2022 | Blood | Age Adjusted by Mortality Risk (Years) | DNA Methylation |
DunedinPoAm38 20 | 2020 | Blood | Aging Rate (Years/Year) | DNA Methylation |
DunedinPACE 5 | 2022 | Blood | Aging Rate (Years/Year) | DNA Methylation |
DNAmTL 42 | 2019 | Blood, Adipose | Telomere Length | DNA Methylation |
Knight 43 | 2016 | Cord Blood | Gestational Age | DNA Methylation |
LeeControl 44 | 2019 | Placenta | Gestational Age | DNA Methylation |
LeeRefinedRobust 44 | 2019 | Placenta | Gestational Age | DNA Methylation |
LeeRobust 44 | 2019 | Placenta | Gestational Age | DNA Methylation |
YingCausAge 12 | 2022 | Blood | Age (Years) | DNA Methylation |
YingDamAge 12 | 2022 | Blood | Age (Years) | DNA Methylation |
YingAdaptAge 12 | 2022 | Blood | Age (Years) | DNA Methylation |
SmokingMcCartney 45 | 2018 | Blood | Smoking Status | DNA Methylation |
AlcoholMcCartney 45 | 2018 | Blood | Alcohol Consumption | DNA Methylation |
BMI_McCartney 45 | 2018 | Blood | BMI | DNA Methylation |
EducationMcCartney 45 | 2018 | Blood | Educational Attainment | DNA Methylation |
TotalCholesterolMcCartney 45 | 2018 | Blood | Total Cholesterol | DNA Methylation |
HDLCholesterolMcCartney 45 | 2018 | Blood | HDL Cholesterol | DNA Methylation |
LDLCholesterolMcCartney 45 | 2018 | Blood | LDL with Remnant Cholesterol | DNA Methylation |
BodyFatMcCartney 45 | 2018 | Blood | Percentage Body Fat | DNA Methylation |
BMI_Reed 46 | 2020 | Blood | BMI | DNA Methylation |
ProstateCancerKirby 47 | 2017 | Prostate | Prostate Cancer Status | DNA Methylation |
HepatoXu 48 | 2017 | Circulating DNA | Hepatocellular Carcinoma Status | DNA Methylation |
CVD_Westerman 49 | 2020 | Blood | Coronary Heart Disease Status | DNA Methylation |
AD_Bahado-Singh 50 | 2021 | Blood | Alzheimer’s Disease Status | DNA Methylation |
DepressionBarbu 51 | 2021 | Blood | Depression Risk | DNA Methylation |
Phenotypic Age 11 | 2018 | Blood | Phenotypic Age | Clinical Biomarker |
Mahalanobis Distance 34 | 2023 | Blood | Mahalanobis Distance | Clinical Biomarker |
Peters_tAge 30 | 2015 | Blood | Age (Years) | RNA |
Multispecies-blood_tAge | 2024 | Blood | Relative Age (Years) | RNA |
Human-blood_tAge | 2024 | Blood | Relative Age (Years) | RNA |
Human-multitissue_tAge | 2024 | Multi-tissue | Relative Age (Years) | RNA |
The harmonization of these diverse biomarkers is crucial for enabling consistent and reproducible analyses across different datasets, facilitating cross-population validation studies, and advancing our understanding of the aging process. To achieve this, all biomarkers were formatted into standardized input structures, ensuring their consistent application across disparate datasets. The harmonization process involved collecting and unifying the annotation of clock specifications, such as tissue type, predicted age range, and source references. This meticulous approach guarantees transparent and reproducible analyses, enabling researchers to readily compare and interpret results across different studies and populations.
To further support ongoing research in this field, we developed and implemented an open-source framework. This framework provides a standardized format for epigenetic biomarkers, facilitating the seamless integration and comparison of any future aging clocks and biomarkers that are developed. By providing a unified platform for the harmonization and analysis of aging biomarkers, this framework aims to foster collaboration and innovation. Researchers can easily contribute new biomarkers, compare their performance against existing ones, and explore their potential applications in various datasets. This collaborative approach is essential for accelerating progress in the field and developing more accurate and robust biomarkers of aging.
To facilitate cross-population validation studies using publicly available data, we harnessed Biolearn’s capabilities to integrate and structure multiple public datasets (Table 2). The structured datasets were refined to enable a shared analysis platform, addressing the challenges of data heterogeneity and formatting inconsistencies 21,22. With this capacity, Biolearn is used as the backend of ClockBase for epigenetic age computation 22, enabling the systemic harmonization of over 200,000 human samples from Gene Expression Omnibus (GEO) array data.
Table 2
Harmonized datasets in Biolearn.
ID | Title | Format | Samples | Age Present | Sex Present |
GSE40279 | Genome-wide Methylation Profiles Reveal Quantitative Views o… | Illumina450k | 656 | Yes | Yes |
GSE19711 | Genome wide DNA methylation profiling of United Kingdom Ovar… | Illumina27k | 540 | Yes | No |
GSE51057 | Methylome Analysis and Epigenetic Changes Associated with Me… | Illumina450k | 329 | Yes | Yes |
GSE42861 | Differential DNA methylation in Rheumatoid arthritis | Illumina450k | 689 | Yes | Yes |
GSE41169 | Blood DNA methylation profiles in a Dutch population | Illumina450k | 95 | Yes | Yes |
GSE51032 | EPIC-Italy at HuGeF | Illumina450k | 845 | Yes | No |
GSE73103 | Many obesity-associated SNPs strongly associate with DNA met… | Illumina450k | 355 | Yes | Yes |
GSE69270 | Aging-associated DNA methylation changes in middle-aged indi… | Illumina450k | 184 | Yes | No |
GSE36054 | Methylation Profiling of Blood DNA from Healthy Children | Illumina450k | 192 | No | No |
GSE64495 | DNA methylation profiles of human blood samples from a sever… | Illumina450k | 113 | Yes | Yes |
GSE30870 | DNA methylomes of Newborns and Nonagenarians | Illumina450k | 40 | Yes | No |
GSE52588 | Identification of a DNA methylation signature in blood from … | Illumina450k | 87 | Yes | Yes |
GSE157131 | Methylation data from stored peripheral blood leukocytes fro… | IlluminaEPIC | 946 | Yes | Yes |
GSE132203 | DNA Methylation (EPIC) from the Grady Trauma Project | IlluminaEPIC | 795 | Yes | Yes |
GSE134080 | RNASeq whole blood of Dutch 500FG cohort | IlluminaHiSeq2500 | 100 | Yes | Yes |
NHANES | National Health and Nutrition Examination Survey | Phenotypic | 2877 | Yes | Yes |
FHS | Framingham Heart Study | Phenotypic | 4434 | Yes | Yes |
Quality Control, Imputation, and Deconvolution
Biolearn provides a comprehensive toolkit for data preprocessing, normalization, and cell-type deconvolution (Fig. 1a). Quality control metrics, such as sample deviation from the population mean, missingness, and the number of sites with a high percentage of missingness, can be readily visualized (Fig. 1b). This functionality enables researchers to identify potential outliers or problematic samples and make informed decisions about data inclusion and exclusion criteria.
Moreover, Biolearn enables the prediction of sample sex from DNA methylation data with high accuracy, which can be compared against actual sex distributions (Fig. 1c,d)23. This feature is particularly useful for identifying potential sample mislabeling or investigating sex-specific effects in aging research. We also implemented predictors of common traits, including smoking, BMI, and epigenetic scores for diseases like Down Syndrome (Fig. 1e)24. These predictors allow researchers to explore the associations between aging biomarkers and various lifestyle factors or disease conditions, providing valuable insights into the complex interplay between aging and health.
Missing DNA methylation data can be easily imputed with different methods in just a few lines of code (Fig. 1f,g) 25, ensuring that researchers can make the most of the available data and minimize the impact of missing values on their analyses. Biolearn also facilitates the integration of multiple datasets for large-scale analyses, such as the comparison of DNA methylation levels of CpG sites across different datasets (Fig. 1h). This feature allows researchers to investigate the consistency and reproducibility of aging biomarkers across diverse populations and experimental settings, strengthening the robustness and generalizability of their findings.
In addition to these features, Biolearn offers a deconvolution tool that estimates the proportion of cell types in a given sample based on a single bulk-level methylation measurement. Biolearn provides two modes for deconvolution, optimized for the 450K (DeconvoluteBlood450K) and EPIC (DeconvoluteBloodEPIC) methylation platforms, respectively 26. These modes are designed for estimating cell proportions in blood methylation samples and account for technology-specific biases that can affect the accuracy of deconvolution 27. The reference methylation matrices for each mode consist of methylation profiles for six cell types representing the most abundant cell types found in the blood: neutrophils, monocytes, natural killer cells, B cells, CD4 + T cells, and CD8 + T cells 28,29.
We benchmarked the accuracy of our deconvolution tool using datasets with known cell proportions assessed via fluorescence-activated cell sorting (FACS) and in vitro cell mixing 29. Both deconvolution methods generated accurate predictions that matched known cell proportions (Fig. 1i,j). These results demonstrate the reliability and utility of Biolearn’s deconvolution feature, which can help researchers account for cellular heterogeneity in their analyses and gain insights into the cell type-specific contributions to aging biomarkers.
Systematic Evaluation of the Biomarkers of Aging
To demonstrate the utility of Biolearn in facilitating the systematic evaluation of aging biomarkers, we conducted a comprehensive benchmarking analysis of various epigenetic aging clocks across multiple datasets. By leveraging Biolearn’s harmonized dataset library and standardized clock implementations, we assessed the performance, robustness, and generalizability of these clocks in diverse biological contexts and populations (Fig. 2). Our analysis included a wide range of datasets, spanning the Human Aging Rates Study (GSE40279, N = 656), EPIC-Italy (GSE51032, N = 845), RA Case-control Cohort (GSE42861, N = 689), Dutch Schizophrenia Case-control Cohort (GSE41169, N = 95), Obesity Genetics Study (GSE73103, N = 355), Developmental Disorder Study (GSE64495, N = 113), African American GENOA (GSE157131, N = 1218), and Grady Trauma Project (GSE132203, N = 795) (Fig. 2a). These diverse datasets allowed us to evaluate the performance of the biomarkers across various age ranges, ethnicities, and disease states. Biolearn’s user-friendly interface and efficient data handling capabilities streamlined data loading and implementation of aging clocks (Fig. 2b). This highlights the library’s potential to accelerate the development and validation of novel aging biomarkers by providing a standardized framework for their evaluation.
The benchmarking results (Fig. 2a-c) revealed that the HorvathV2 clock (i.e., skin and blood clock) exhibited the highest overall accuracy in terms of predicting chronological age, with a mean R2 of 0.88 across all datasets, followed closely by the Hannum clock (R2 = 0.81), the Horvath1 clock (R2 = 0.78) and YingCausAge clock (R2 = 0.77). These findings suggest that these four clocks are the most robust and generalizable across diverse biological contexts and age-related conditions. Note that the GrimAgeV1 and GrimAgeV2 clocks use the age of the sample as the predictor, therefore they cannot be compared directly to the other clocks. These results highlight the applicability of these clocks across diverse age ranges and biological contexts, further emphasizing the importance of systematic evaluation in identifying the most suitable biomarkers for specific research questions or clinical applications.
Overall, our findings demonstrate the value of Biolearn in enabling the systematic evaluation of epigenetic aging clocks across multiple datasets. By providing a standardized framework for clock implementation and evaluation, Biolearn facilitates the identification of robust and generalizable aging biomarkers, paving the way for their translation into clinical settings and advancing our understanding of the aging process.
Mortality and Morbidity Risk Analysis
It is also important to evaluate the predictive power of epigenetic aging biomarkers in predicting aging-associated outcomes such as mortality risk. Here, we conducted the most comprehensive evaluation of 17 representative epigenetic clock models to the Normative Aging Study (NAS) dataset (N = 1,488, 38.8% deceased) and the Massachusetts General Brigham (MGB) cohort (N = 500, 8.8% deceased), comparing their performance in predicting mortality risk (Fig. 3). The biomarkers showed a strong correlation with chronological age in both NAS (Fig. 3a) and MGB cohorts (Fig. 3b).
We then examined their performance in predicting mortality risk using Cox Proportional Hazards analysis, adjusted for age and sex (Fig. 3c, d). In the NAS cohort (Fig. 3c), the top-performing clock in terms of hazard ratio (per standard deviation increase of the biomarker) was DunedinPoAm38 (HR = 1.38, P = 9.48e-18), followed closely by GrimAgeV2 (HR = 1.35, P = 1.21e-18) and DunedinPACE (HR = 1.35, P = 3.04e-19). Other clocks with strong predictive value include Zhang_10 (HR = 1.28, P = 9.66e-12), GrimAgeV1 (HR = 1.29, P = 7.05e-14), YingDamAge (HR = 1.25, P = 1.48e-11), PhenoAge (HR = 1.20, P = 8.96e-08), Hannum (HR = 1.13, P = 3.91e-04), and YingCausAge (HR = 1.09, P = 7.37e-03). In the MGB cohort (Fig. 3d), the top-performing clocks in predicting mortality risk were GrimAgeV2 (HR = 2.08, P = 2.64e-03), PhenoAge (HR = 2.03, P = 2.52e-02), and GrimAgeV1 (HR = 1.84, P = 1.46e-02). Followed by Zhang_10 (HR = 1.48, P = 7.96e-05), and DunedinPACE (HR = 1.46, P = 7.40e-04).
We further assessed the association between the predictive power of the epigenetic clocks for chronological age and mortality risk. Interestingly, in both cohorts, we observed a negative but insignificant correlation between Pearson’s R with chronological age and hazard ratio of mortality risk (Fig. 3e). This suggests that the predictive power of epigenetic clocks on mortality risk, after adjusting for age and sex, is independent of their ability to predict chronological age, highlighting the importance of interpreting the meaning of age deviation (AgeDev) with caution for aging biomarkers. We also observed strong heterogeneous associations of epigenetic clocks with mortality risk in different cohorts. The analysis demonstrates the utility of Biolearn in facilitating systematic evaluation of epigenetic clocks in predicting mortality across multiple cohorts, emphasizing the importance of systematic evaluation in identifying the most suitable biomarkers for specific applications.
Besides mortality, it is also important to evaluate aging biomarkers in predicting various other clinically relevant aging outcomes. We analyzed the associations between 14 aging biomarkers and six event categories (Stroke, Dementia, Operation, Lifespan, Cancer, and Healthspan, which is defined by the first incidence of any event) in the NAS cohort (Fig. 4a). To assess the predictive power of these biomarkers, we performed Cox Proportional Hazards analyses, adjusted for age, and calculated the hazard ratios (HR) per standard deviation increase for each biomarker (Fig. 4b-f).
Across five clinical outcomes tested, DunedinPACE was the strongest predictor for three of the outcomes, namely healthspan (HR = 1.18), dementia (HR = 1.40), and stroke (HR = 1.50). PhenoAge was the strongest predictor for surgery (HR = 1.24), and HorvathV2 was the strongest predictor for cancer (HR = 1.12). These results suggest considerable heterogeneity of aging biomarkers in predicting different clinical outcomes.
We further investigated the associations between the AgeDev term of aging biomarkers after adjusting for age in the NAS cohort and found strong positive correlations among most epigenetic clocks (Fig. 4g). We observed two main clusters: (1) PhenoAge, GrimAgeV1, GrimAgeV2, YingCausAge, HorvathV1, Lin, Hannum, and HorvathV2; and (2) YingDamAge, DunedinPoAm38, Zhang10, and DunedinPACE. The DNAmTL telomere length clock and YingAdaptAge do not cluster with other clocks, suggesting their unique biological underpinnings.
Lastly, we compared the predictive power of aging biomarkers for healthspan and lifespan (Fig. 4h). In general, we observed a significant positive correlation using an inverse-variance-weighted approach (weighted correlation coefficient = 0.58, P = 0.012). DunedinPoAm38, GrimAgeV2, GrimAgeV1, and Zhang_10 demonstrated strong associations with healthspan and lifespan, indicating their potential as comprehensive aging biomarkers. These findings highlight the utility of Biolearn in facilitating the systematic evaluation of aging biomarkers for predicting various health outcomes. The results provide insights into the comparative performance of these biomarkers and their potential applications in clinical settings and aging research.
Transcriptomic and Clinical Biomarkers
To demonstrate Biolearn’s multi-omic capabilities, we also evaluated the performance of transcriptomic and phenotypic aging biomarkers. We include four transcriptomic age predictors: tAge.Peters 30, tAge.Multispecies.Blood, tAge.Human.Multi-tissue, and tAge.Human.Blood 31,32. We applied these predictors to the JenAge RNA-Seq dataset (Jena Centre for Systems Biology of Ageing, Illumina TruSeq 2.0, Whole Blood, N = 62) 33. The tAge predictor showed a strong correlation with chronological age, with Pearson’s R ranging from 0.68 (Peters) to 0.90 (Human.Multi-tissue) (Fig. 5a), highlighting their potential as robust aging biomarkers. All these predictors are implemented in Biolearn with simple and easy-to-use functions (Fig. 5b).
Next, we investigated the performance of the Phenotypic Age predictor 11, a blood-test-based biomarker, using the NHANES 2010 dataset. We calculated the Phenotypic Age for each individual using Biolearn’s implementation of the predictor and compared it to chronological age (Fig. 5c). We found a strong linear relationship between Phenotypic Age and chronological age (Pearson’s R = 0.96, P < 2.2e-16), indicating the predictor’s ability to capture age-related changes in clinical biomarkers.
To assess the predictive power of Phenotypic Age on mortality risk, we performed a survival analysis on the NHANES 2010 dataset. Individuals were stratified into five groups based on their AgeDev (Fig. 5d,e). We found that individuals with higher AgeDev had a significantly higher mortality risk (HR = 1.58, 95% CI: 1.31–1.91, P = 1.08e-06), while those with lower AgeDev had a lower risk (HR = 0.60, 95% CI: 0.49–0.74, P = 6.19e-07). These results underscore the predictive power of Phenotypic Age in assessing mortality risk and demonstrate Biolearn’s capability to facilitate such analyses.
To investigate the association between aging and cell type proportions, we performed deconvolution analysis on the NAS cohort blood samples. This analysis revealed distinct changes for different cell types over the life course (Extended Fig. 1). Neutrophil and natural killer cell proportions showed significant positive correlations with age (R = 0.09, P = 5.64e-04 and R = 0.14, P = 1.03e-07, respectively). In contrast, B cell, CD4 T cell, and CD8 T cell proportions exhibited significant negative correlations with age (R=-0.09, P = 2.45e-04; R=-0.15, P = 6.07e-09; and R=-0.05, P = 4.01e-02, respectively). Monocyte proportion did not show a significant correlation with age (R = 0.02, P = 4.07e-01).
We assessed the predictive power of cell type proportions on various health outcomes in the NAS cohort (Extended Fig. 2). For healthspan, natural killer cell proportion was a significant protective factor (HR = 0.93, 95% CI: 0.88–0.99, P = 1.55e-02), while CD8 T cell proportion was a slight risk factor (HR = 1.06, 95% CI: 1.00-1.12, P = 4.65e-02). Similarly, for lifespan, natural killer cell proportion was a significant protective factor (HR = 0.91, 95% CI: 0.85–0.98, P = 1.03e-02), and CD8 T cell proportion was a significant risk factor (HR = 1.07, 95% CI: 1.01–1.14, P = 2.59e-02). Natural killer cell proportion was also a significant protective factor for dementia (HR = 0.87, 95% CI: 0.77–0.98, P = 2.03e-02). For stroke risk, neutrophil proportion was a significant risk factor (HR = 1.33, 95% CI: 1.11–1.60, P = 2.18e-03), while natural killer cell proportion was a significant protective factor (HR = 0.70, 95% CI: 0.55–0.89, P = 4.21e-03). Similarly, for surgical events, neutrophil proportion was a significant risk factor (HR = 1.16, 95% CI: 1.05–1.28, P = 3.99e-03), and natural killer cell proportion was a significant protective factor (HR = 0.89, 95% CI: 0.80–0.99, P = 2.58e-02). No significant associations were found between cell type proportions and cancer risk.
The integration of transcriptomic and phenotypic biomarkers in Biolearn enables researchers to investigate aging processes from different biological perspectives. The strong performance of the tAge predictor and Phenotypic Age in their respective datasets showcases the potential of multi-omic approaches in uncovering the complex mechanisms underlying aging. By leveraging Biolearn’s comprehensive framework, researchers can gain valuable insights into the interplay between different biological layers and their contributions to the aging process, ultimately facilitating the development of targeted interventions and personalized aging management strategies.