Methodology overview
Briefly, the first step in the methodology was to gather existing DNAm datasets from public open access and controlled access data repositories, to assemble an exhaustive database of DNAm profiles in blood (Fig. 1, Extended Data Fig. 1). We collected 56 whole blood datasets with a combined sample size of 32,136 samples (Extended Data Fig. 1). This large database of human methylomes spanned a broad age range (6–101 years) and laid a solid foundation to quantify DMPs, VMPs, and entropy in each dataset.
Our analysis began with identifying the linear age-related DNAm changes at individual CpGs (i.e. DMPs and VMPs). To identify DMPs, we completed an independent EWAS by fitting a linear model for each dataset, regressing DNAm against age and other covariates that are known to modulate DNAm levels (i.e. sex, ethnicity, batch, body mass index (BMI)) (Supplementary Table 1). We then extracted the summary statistics (i.e. the age-related change in DNAm level) from each EWAS and conducted an inverse variance based fixed-effects meta-analysis of differential methylation and age. To identify VMPs, we followed a similar approach, using the Breusch-Pagan test for heteroscedasticity in each independent dataset, which models the change in DNAm variance as a function of age. The summary statistics extracted from the independent EWAS (i.e. the age-related change in DNAm variance) were then pooled using a sample-size based fixed effects meta-analysis of variable methylation and age. We then compared age-related CpGs that were only DMPs (homoscedastic DMPs), those that were only VMPs (constant VMPs) and those that were both DMPs and VMPs (DMPs-VMPs). Finally, we compared DMPs that carried different degrees of information, as defined by Shannon entropy: we compared DMPs that trend towards the mean (entropic DMPs) with those that trend away from the mean towards fully methylated and unmethylated states (anti-entropic DMPs).
Finally, we performed a comprehensive entropy analysis by looking at genome-wide Shannon entropy. We took the same statistical approach as with the DMP and VMP analyses described above: first, we estimated the strength of the association between age and Shannon entropy in each independent cohort; then, we pooled these effect sizes across the different cohorts using a fixed-effects meta-analysis to obtain an overall meta-analysis effect size of change in entropy per decade of age (Supplementary Table 2). We also calculated entropy on the age-related CpGs (i.e. the complete list of DMPs and VMPs) and the remaining non-age-related CpGs, and meta-analysed the results to compare the change in entropy with age at age-related CpGs vs non-age-related CpGs, evaluating whether it is these age-related shifts driving changes in entropy. Then, we took a more granular approach and investigated whether different classes of DMPs and VMPs contribute differently to entropy measurements (homo-DMPs, constant VMPs, DMPs-VMPs, or entropic vs anti-entropic DMPs). This allowed us to confirm whether it is the differential or variable shifts in DNAm that increase entropy, or both.
Throughout the pipeline, we compared these different classes of age-related changes in terms of genomic location (chromatin states profiled in PBMCs10, annotated thank to a comprehensive annotation of the Illumina Methylation arrays11), biological pathways (gene ontology (GO) terms, human phenotype ontology (HPO) terms, canonical pathways (CP), expression signatures of genetic and chemical perturbations (CGP), and immunologic signatures (C7)), and overrepresentation in various epigenetic clocks (chronological age: Horvath’s pan-tissue clock12, Hannum’s clock4, the blood clock developed by Zhang et al. 201913, the centenarian clock14, and the mammalian universal clock15; biological age: PhenoAge16; pace of aging: DunedinPoAm17, DunedinPACE18). We also explored the effect of cellular heterogeneity on these age-associated signatures. CpGs that determine cell identity are typically lowly methylated in a given cell type, while being highly methylated in other cell types, and the overall methylation fraction in bulk tissue at those cell-type-specific CpGs would be highly sensitive to changes in the relative proportions of different cell types. Ageing is associated with an increase in monocytes, neutrophils, basophils, NK cells, CD4 + and CD8 + T memory cells, with a concomitant decrease in naïve B cells, T-regulatory cells, CD4 + and CD8 + naïve T cells19. We deconvoluted the proportions for granulocytes, monocytes, Natural Killer cells (NK), CD4 + T cells, CD8 + T cells and B cells for each sample using a reference-based method20, and repeated all the above-mentioned analyses after adjusting the linear model for blood cell type proportions.
Ageing is associated with widespread changes in DNAm levels and increases in DNAm variance in blood
With the unprecedented statistical power granted by > 32,000 samples from 56 datasets, we found that nearly half of all tested sites (333,300 CpGs) were DMPs (48%) at a stringent FDR < 0.005. Two-thirds of DMPs (66%) decrease in DNAm levels (‘hypoDMPs’), while the remaining third increase in DNAm levels with age (‘hyperDMPs’) (Fig. 2A). HyperDMPs increase by an average of 0.027% methylation fraction per year of age, noting a maximum increase of 0.46% per year of age for cg26079664, and hypoDMPs decrease by an average of -0.034% per year of age, with the maximum decrease of -0.55% per year of age for cg10501210. Our meta-analysis identified DMPs that were highly consistent across datasets. For example, cg16867657, which is in the promoter of ELOVL2 and has been associated with ageing in a plethora of studies21–24, was estimated to gain 0.45% DNAm per year of age across the different datasets (Fig. 2B).
There was an inverse correlation between the overall methylation fraction of a CpG and the direction of change during ageing: DMPs whose DNAm levels were usually high (> 75% on average), were overwhelmingly hypoDMPs, while DMPs whose DNAm levels were usually low (< 25% on average), were overwhelmingly hyperDMPs (Extended Data Fig. 2). In contrast, DMPs with intermediate DNAm levels trend equally frequently towards high and low DNAm levels. For example, in the BIOS dataset (n = 1408), 21% of DMPs were considered to have an ‘intermediate’ methylation level, of which 40% gained methylation with age, and 60% lost methylation with age (Extended Data Fig. 2).
HypoDMPs were over-represented in quiescent chromatin regions and those weakly repressed by Polycomb complexes, while hyperDMPs were over-represented in bivalent promoters and enhancers as well as regions repressed by Polycomb complexes (ꭓ2 test p-value < 2.2e-16) (Extended Data Fig. 3A). Despite being located in distinct chromatin states, hypoDMPs and hyperDMPs were found in similar genes (Fisher’s exact test p-value < 2.2e-16), related to e.g. signal transduction & signaling (GO), developmental conditions (HPO), naïve to memory T-cell (MSigDB immunologic gene set) (Extended Data Fig. 3B). With the exception of the universal pan-mammalian clock, all chronological and biological clocks were enriched for both hypo and hyperDMPs, with no difference in enrichment for these two classes of DMPs in biological vs chronological clocks (Fisher’s exact test FDR < 0.005, Extended Data Fig. 3C). Pace of aging clocks did not show any enrichment for DMPs (Fisher’s exact test FDR > 0.005, Extended Data Fig. 3C).
These results remain largely unchanged when the meta-analysis was adjusted for cellular heterogeneity (Pearson’s correlation of meta-analyses effect sizes = 0.94, p value < 2.2e-16) (Extended Data Fig. 4A).
We then meta-analysed the same 56 whole blood datasets to identify changes in methylation variability (VMPs) during ageing. We identified 243,958 VMPs (37% of tested CpGs) at FDR < 0.005, nearly all of which increased in variance (99% of VMPs). The magnitude of the age-related changes in variance is small, for example, the average increase in variance across all datasets for the most significant VMP, cg21899500, is 0.01% per year of age (Fig. 3A). There was a large overlap between DMPs and VMPs (i.e. a CpG site whose average DNAm level changed during ageing was also more likely to see its variance increase with age; Fischer’s exact test p value < 2.2 x 10− 16). We identified 196,192 DMPs-VMPs, 137,108 homoscedastic DMPs (i.e. DMPs only), and 47,766 constant VMPs (i.e. VMPs only) (Fig. 3B). Among the DMP-VMPs, 73,357 (37%) increased in both average methylation and variance, and 122,835 (63%) decreased in average methylation but increased in variance.
We compared the distributions of homoscedastic DMPs, constant VMPs, and DMPs-VMPs in different chromatin states profiled in PBMCs10. Constant VMPs were enriched in enhancers and Polycomb regions, DMPs-VMPs in bivalent promoters and enhancers as well as Polycomb regions, and homoscedastic DMPs were in quiescent regions (ꭓ2 test p-value < 2.2e-16) (Extended Data Fig. 5A). The three classes of age-related changes were related to a plethora of pathways, very similar to hypo and hyperDMPs (Extended Data Fig. 5B). All chronological and biological clocks were strongly enriched for DMPs-VMPs, while pace of aging clocks were not enriched for any kind of age-related CpGs (Extended Data Fig. 5C). Homoscedastic DMPs were overrepresented in the pan-tissue clock but depleted in Zhang et al.’s clock. Constant VMPs were depleted in two chronological clocks but were overrepresented in the PhenoAge.
We repeated the VMP meta-analysis after adjusting for blood cell type composition and as for the DMP analysis, results remained largely unchanged (Pearson’s correlation of meta-analyses Zscore = 0.93, p-value < 2.2e-16) (Extended Data Fig. 4B). However, VMPs seemed to be more sensitive than DMPs to confounding by cell type proportions, as more than a third of VMPs (37%) were only significant in the meta-analysis not adjusted for cell types. With that said, an additional 5,913 CpGs were classified as VMPs (4%) only after we adjusted for cell types. We identified 159,166 VMPs (22% of tested CpGs) after correcting for cell type composition.
Entropy increases in the ageing blood methylome, driven by the cumulative changes in differential but not variable methylation at entropic CpGs
We determined whether the ageing blood methylome increases in entropy (‘chaos’) with age, and what type of epigenetic changes underpin this phenomenon. Entropy captures the amount of information encoded by the epigenome: if a CpG is highly (~ 100%) or lowly (~ 0%) methylated, this implies that said CpGs is highly “predictable” over all cells in a given sample; conversely, if a CpG has a methylation fraction closer to 50%, it is deemed “unpredictable” across cells within a sample. As the methylation state of genes determines cellular identity and therefore cellular function, entropy (i.e. ‘chaos’) increases when multiple CpGs throughout the genome drift towards a methylation fraction of 50%. An entropy of 0 means that every CpG is either methylated at 0% or 100%, and an entropy of 1 means that every CpG is methylated at exactly 50%3. In these two opposite scenarios, the methylome of a cell is either entirely predictable, or entirely unpredictable.
When taking all CpGs into account (both age- and non-age-related CpGs), we observed a very small but significant increase in entropy of 0.0005 per decade of age (p-value < 0.0001), with substantial heterogeneity between cohorts (I2 = 88%) (Extended Data Fig. 6).
As an increase in entropy with age reflects a drift towards a methylation fraction of 50% over multiple CpGs, we hypothesised that the increase in entropy would be driven by age-related CpGs (DMPs and/or VMPs). We re-calculated entropy in each sample from each dataset, but only taking into account the methylation levels at age-related CpGs (Supplementary Table 2). As a ‘control’, we also re-calculated entropy in each sample from each dataset, but only taking into account the methylation levels of non-age-related CpGs. In line with our hypothesis, we found that non-age-related CpGs do not contribute to the global increase in entropy with age, with a meta-analysis effect size of -0.0003 change in entropy per 10 years of age (p-value < 0.0001, I2 statistic 46%) (Fig. 4A). In contrast, age-related CpGs increase in entropy by 0.002 per decade of age (p-value < 0.0001, I2 statistic 85%) (Fig. 4A). Moreover, the baseline entropy (i.e. the entropy value at the youngest age in a particular dataset) for the non-age-related sites is lower than the baseline entropy for the age-related sites (Fig. 4B). This can be explained by the fact that there are more CpGs with intermediate methylation levels among age-related sites (~ 20%), than non-age-related sites (~ 1%).
To dissect the respective contributions of differential or variable methylation to changes in entropy, we calculated entropy for two datasets with a large sample size and broad age range (BIOS and FHS) (Supplementary Table 1) on homoscedastic DMPs, DMPs-VMPs, and constant VMPs (Extended Data Fig. 7A, Supplementary Table 3), and regressed entropy against age in each category. DMPs-VMPs display the largest significant increase in entropy during ageing. We also observed that entropy significantly increases at homoscedastic DMPs, however, is lower both overall and at baseline than for DMPs-VMPs, which reflects the type of CpG affected by differential methylation or by a change in variance. While both DMPs and VMPs affect CpGs whose methylation levels start at high or low levels, VMPs have a greater proportion of CpGs with intermediate DNAm levels at baseline (~ 28% of VMPs are intermediately methylated vs ~ 20% of DMPs). While the overall entropy at CpGs that are only VMPs (i.e. constant VMPs) is high, we observed a decrease in entropy at those sites that was significant in only one of the two examined datasets (Extended Data Fig. 7A, Supplementary Table 3C), suggesting that it is the differential shifts in DNAm towards the mean that contribute to the overall increases in entropy with age.
To further our investigation into the contribution of DMPs to changes in entropy, we used the BIOS blood dataset, which has a large sample size and distribution of samples across a large age range (Extended Data Fig. 1, Supplementary Table 1). Although the majority of DMPs (~ 73%) converge to the mean with age, one third of DMPs (~ 27%) diverge away from the mean towards high and low methylation fractions (Fig. 5A). To determine this effect on entropy, we then recalculated entropy on the converging and diverging DMPs, respectively. Remarkably, we found a highly significant increase in entropy in the converging sites of 0.005 increase in entropy per decade of age (p-value < 2.2e-16), and a stark contrast with the diverging sites, which significantly decrease entropy with age, and could be considered “anti-entropic” since they become more predictable with age (Fig. 5B). We validated these results in a second dataset, GSE128235, and found highly concordant results in both the proportion of DMPs that converge to (72%) and diverge from (28%) the mean with age, but also the significant increase in entropy at the converging sites of 0.005 per decade of age (p-value < 2.2e-16), and significant decrease in entropy of -0.004 per decade of age at the diverging sites (p-value < 2.2e-16).
We then investigated the distribution of the entropic DMPs, anti-entropic DMPs and non-DMPs in chromatin states of PBMCs (Fig. 5A), noting that entropic DMPs are overrepresented in bivalent promoters and enhancers, regions bound by Polycomb proteins, and quiescent states (ꭓ2 test p-value < 2.2e-16). In contrast, anti-entropic DMPs are overwhelmingly found at regions of strong transcription and enhancers (ꭓ2 test p-value < 2.2e-16) (Fig. 5B).
We hypothesised that cell type heterogeneity would bias age-related changes in entropy estimates upwards (i.e. the increase in entropy with age would be inflated because of changes in cell type % with age). We first repeated the analyses after adjusting the DNAm profiles for blood cell types in each dataset (see Methods). There was a moderate correlation of 0.53 (p-value = 2.9e-5) between the effect sizes (i.e. the change in entropy per decade of age) before vs after adjustment for cell type proportions (Extended Data Fig. 8A), and half of the datasets displaying effect sizes that declined in magnitude after adjusting for cell type proportions (Extended Data Fig. 8B). However, the overall meta-analysis effect size remained unchanged after adjustment (0.0005 change in entropy per decade of age) (Extended Data Fig. 8C).
We then looked at age-related changes in entropy in datasets containing isolated cell types, speculating that if age-related changes in entropy were solely driven by changes in cell type %, we would fail to see an increase in entropy in these datasets. We looked at monocytes (GSE56046), CD4 + T cells (GSE59065, GSE56581 & GSE137593), CD8 + T cells (GSE59065), and B cells (GSE137594) (Supplementary Table 4). In nearly all datasets, we failed to detect any change in entropy during ageing, but both CD4 + T and CD8 + T cells in GSE59065 displayed a marked increase in entropy during ageing (Supplementary Table 4). Finally, we took advantage of the unique design of dataset GSE184269 that contains both mixed and sorted blood cells from the same individuals (naïve B cells, naïve CD4 + T cells, naïve CD8 + T cells, NK cells, monocytes and granulocytes in GSE184269), speculating that if age-related changes in entropy were solely driven by changes in cell type %, PBMCs would show higher entropy levels than sorted cell types. Entropy was markedly higher in PBMCs than NK, naïve CD4 + T, naïve CD8 + T and naïve B cells, but the highest entropy levels were in the heterogeneous class of granulocytes that comprise basophils, eosinophils, and neutrophils (Extended Data Fig. 9). These results suggest that changes in cell type composition during ageing partially account for age-related increases in entropy.