Selection of unbiased fCpG sites
To identify a set of unbiased fCpG sites in breast cancer, we used 450K methylation array data from 634 invasive breast cancers in The Cancer Genome Atlas16 (TCGA) and 79 normal breast tissue samples.17 Using a three-step selection process, we identified an ensemble of fCpG sites with balanced (de-)methylation rates as follows.
In the first step, we eliminated regulatory and genic loci due to their increased likelihood of being under selection during tumor growth. This choice contrasts with the majority of cancer epigenomic studies focused on functional CpG sites located in or around promoter regions.17–19 Next, we identified CpG sites with an average \(\:\beta\:\)-value close to 0.5 in both normal breast tissue and breast cancers, thus excluding sites with an inherent bias toward methylation or de-methylation, and sites that are subject to systematic selection during homeostasis and/or tumorigenesis (Fig. 2A). In the third step, we ordered the set of unbiased CpG sites by between-tumor variability and included only the 500 most fluctuating sites in the final clock set of unbiased fCpGs (Fig. 2B). Importantly, this final step excludes non-informative sites that either do not fluctuate at all (i.e., imprinted hemi-methylated state) or fluctuate too fast (i.e., steady-state methylation of \(\:\beta\:\approx\:0.5\) reached on time scales much shorter than the average mitotic tumor age at diagnosis).
Next, we sought to validate the unbiased and fluctuating nature of the clock set in two independent cohorts. In a cohort of 146 breast cancer patients (Lund cohort),20 we found significantly higher inter-tumor variability in \(\:\beta\:\)-values among the CpG sites in the clock set, as compared to the CpG sites not included in the clock set (Fig. 2C). Similarly, in a small cohort of 5 patients with multiple samples from their primary tumors,21 we found elevated intra-tumor variability in clock set vs non-clock set sites (Fig. 2D). Together, these patterns corroborate the unbiased and fluctuating nature of the clock set of CpG sites.
Interestingly, the fCpG sites in the clock set were more tightly concentrated around \(\:\beta\:=0.5\) in normal breast tissue, as compared to breast cancers (Fig. 2E), with a median standard deviation of \(\:\beta\:\)-values in normal samples of 0.09, compared to 0.21 in tumors (P = 2x10− 45, Wilcoxon rank sum test). Consistent with the underlying dynamic model of the clock (Fig. 1A), this suggests that over decades of breast development and maintenance, the fCpGs had converged to the stationary methylation state of \(\:\beta\:=0.5\) (Fig. 1C).
Epigenetic clock index
At the level of individual tumors, the 500 fCpG sites in the clock set exhibited primarily unimodal or bimodal distributions of \(\:\beta\:\)-values (Fig. 3A). We explored how these tumor-specific distributions of \(\:\beta\:\)-values could be used to estimate tumor mitotic age. In the founding tumor cell, each fCpG starts in either the unmethylated (\(\:\beta\:=0\)), hemi-methylated (\(\:\beta\:=0.5\)), or methylated (\(\:\beta\:=1)\) state (Fig. 1A). Although the trajectories of individual sites are subject to stochastic fluctuations (Fig. 1C), an ensemble of sites starting in the same initial configuration collectively drift toward the steady state of \(\:\beta\:=0.5\) (Fig. 3B).
By considering the histograms of \(\:\beta\:\)-value distributions at different mitotic ages, we can track the evolution of the three “peaks” corresponding to the subsets of initially unmethylated, hemi-methylated, and methylated clock sites (Fig. 3C). As the tumor’s mitotic age increases, the left peak of the histogram (consisting of originally unmethylated clock sites) starts moving to the right, whereas the right peak (originally methylated sites) moves to the left; the middle peak (originally hemi-methylated sites) remains stationary. By measuring the extent to which the three peaks have converged to the stationary value of \(\:\beta\:=0.5\), we can thus estimate the mitotic age of individual tumors.
Concretely, we used the standard deviation of the \(\:\beta\:\)-values, denoted by \(\:{s}_{\beta\:}\), to quantify the relationship between mitotic tumor age and the evolving clock set profile (Fig. 3C). Because \(\:{s}_{\beta\:}\) is highest at time 0, when the \(\:\beta\:\)-value distribution exhibits three sharp peaks, and then monotonically decreases over time (Fig. 3D), we introduced the epigenetic clock index \(\:{c}_{\beta\:}=1-{s}_{\beta\:}\) as a proxy measure of mitotic tumor age (Fig. 3E).
In the next two sections, we characterize the relationship between a tumor’s mitotic age, as quantified by the epigenetic clock index \(\:{c}_{\beta\:}\), and its evolutionary-ecological context as determined by its intrinsic growth potential and external pressures from the microenvironment. Because there is a pronounced difference in \(\:{c}_{\beta\:}\) between ductal carcinomas (median, 0.79) and lobular carcinomas (median, 0.82; P = 2x10− 16, Wilcoxon rank-sum test), we restricted subsequent analyses to the more common ductal carcinomas to avoid unnecessary confounding.
Younger tumors have more aggressive phenotypes
As a breast tumor grows, its likelihood of detection on the basis of imaging or symptoms increases. Because fast growing tumors are expected to reach a detectable size sooner than slow growing ones, we hypothesized that younger tumor mitotic age would correlate with established markers of tumor aggressiveness. To test this hypothesis, we correlated the epigenetic clock index with several established features of tumor aggressiveness, including molecular subtype,22,23 genomic instability,24,25 grade,26 and size.27
There was a clear relationship between mitotic age and molecular subtype: luminal A tumors, which have a more favorable prognosis, were older than luminal B and basal tumors (Figs. 4A-B; Suppl. Table S1). Similarly, there was a strong correlation between genomic instability and younger tumor age (Figs. 4C-D), and younger tumors were of higher histopathologic grade (Fig. 4E). In contrast, there was only a weak relationship between mitotic age and summary stage (Fig. 4F)
Another prognostic factor in breast cancer is tumor size, with larger lesions having worse outcomes. We found that smaller tumors were of older mitotic age compared to larger tumors (Figs. 4G-H), presumably because slow growing tumors spend more time at the smaller end of the detectable size range, and are, therefore, more likely to be detected at a smaller size.
Finally, the relationship between tumor mitotic age and patient age at diagnosis was inconclusive, with a weak negative correlation in TCGA (R=-0.18) and no correlation in the Lund cohort (R=-0.06; Suppl. Table S1). This is consistent with the notion that the fCpG clock measures the age of the tumor—starting with the most recent common ancestor cell—and not the age of the patient.
Identifying modulators of mitotic tumor age
The time it takes for a tumor to grow from a single cell to a detectable mass depends on its effective growth rate, that is the difference between cell proliferation and cell death (Fig. 5A). Cell proliferation primarily reflects the tumor’s intrinsic growth potential and aggressiveness, whereas cell death is often the result of extrinsic selective pressures applied by the tumor microenvironment, such as immune surveillance and resource constraints due to limited vascularization.28,29
To explore putative modulators of effective tumor growth and mitotic age at diagnosis, we performed genome-wide correlation analyses of the epigenetic clock index \(\:{c}_{\beta\:}\) against gene expression. As predicted, mitotically younger tumors exhibited increased expression of proliferation-related genes such as Ki67 and MCM2 (Fig. 5B, Suppl. Table S1). The signal was further augmented when considering the average expression across a set of genes involved in M-phase and mitotic checkpoint regulation (Fig. 5C-D) and the fraction of cells in S-phase (Fig. 5E).
Next, we examined the microenvironment's ability to decrease the effective growth rate of a tumor through increased cell death. As hypothesized, the expression of immune cell markers such as CD3, CD4, CD8 and FOX3 was elevated in mitotically older tumors (Fig. 5B; Suppl. Table S1). This suggests that tumors which are subject to immune surveillance—e.g., through neo-antigen directed immune control by CD8 + T-cells—have a lower effective growth rate and, thus, reach a detectable size at an older mitotic age, as compared to tumors that successfully evade immune control and thus reach a detectable size at a younger mitotic age.
To perform a systematic analysis of mitotic tumor age modulation, we performed a genome-wide gene set enrichment analysis (GSEA) (Fig. 6A). Consistent with the univariate gene expression analyses, mitotically younger tumors were enriched for pathways related to proliferation and cell cycle control. Conversely, mitotically older tumors were enriched for immune pathways and immune-related signaling pathways, again supporting the notion of effective immune control in older, slower growing lesions.
For a more in-depth analysis of the immune infiltrate, we used the CIBERSORTx algorithm30 to estimate the extent and composition of the immune compartment. As expected, the extent of the immune compartment increased with mitotic tumor age (Fig. 6B). When decomposing each tumor’s immune compartment into the major cell types, we found an increase in the fraction of T-cells in mitotically older tumors (Fig. 6C, Suppl. Table S2), again suggestive of T-cell mediated immune surveillance.
Analysis of paired tumor samples validates epigenetic clock
Multiple tumor samples from the same patient provide a unique opportunity to assess the internal validity of the epigenetic clock. Indeed, paired samples should be epigenetically more related—via their most recent common ancestor cell—than samples from different patients. In a cohort of 8 women with multi-focal breast cancer,31 we found that the within-patient correlations of the clock set fCpG sites were higher (median, 0.72) than the between-patient correlations (median, 0.10; P = 3x10− 6, Wilcoxon rank-sum test; Fig. 7A). The same held true for a cohort of 18 patients with paired primary tumors and lymph node metastases32 (median, 0.82 vs. 0.14, P = 5x10− 13; Fig. 7B) and a subset of 22 patients with paired primary tumors and metastases (including lymph node and distant metastases) from the AURORA US Metastasis Project33 (median, 0.73 vs. 0.08, P = 2x10− 26; Suppl. Figure S4).
The two cohorts of patients with paired primary and metastasis samples32,33 allowed us to test two additional properties of the fCpG clock. First, assuming that each metastasis is seeded by a single cell from the primary tumor, synchronous metastases should be younger than their matched primaries. Indeed, the epigenetic clock measures the age of the metastasis relative to the seeding event, which occurred after initiation of the primary tumor. Consistent with this prediction, in 36/40 patients, we found the metastases to have a lower epigenetic clock index compared to their matched primaries (Fig. 7C). This provides direct support for our interpretation of the epigenetic clock index as a proxy measure for mitotic age.
Second, the timing of metastatic dissemination relative to the primary tumor's age is expected to impact the epigenetic similarity of the two samples: if the metastasis is seeded early during primary tumor growth (i.e., similar \(\:{c}_{{\beta\:}}\) values), the \(\:\beta\:\)-values of the two samples are expected to be closely related (Figs. 7D-E) because the metastasis seeding cell came from a mostly homogenous population; conversely, if the metastasis is seeded late (i.e., different \(\:{c}_{\beta\:}\) values), the \(\:\beta\:\)-values are expected to differ more substantially (Fig. 7F) because the seeding cell came from a heterogenous population. Corroborating this hypothesis, and consistent with a corresponding simulation of metastatic seeding based on the oscillator model (Fig. 7G), we found a negative correlation between mitotic age difference and \(\:{\beta\:}\)-value similarity (Fig. 7H).
Quantifying mitotic tumor age
So far, we have used the epigenetic clock index \(\:{c}_{\beta\:}\) as a correlate of mitotic tumor age. To derive quantitative estimates of each tumor's mitotic and calendar ages, we proceeded as follows (see Methods for details). First, we invoked the mathematical oscillator model (Fig. 1A) to relate mitotic tumor age to the measured \(\:\beta\:\)-values of fCpG sites in the clock set. Next, we decomposed each tumor’s empirical fCpG \(\:\beta\:\)-value distribution into three groups (Fig. 8A): originally unmethylated fCpGs (left peak in the histogram), originally hemi-methylated fCpGs (middle peak), and originally methylated fCpGs (right peak). Finally, we combined the peak location in each group with the oscillator model to infer the estimated mitotic age of the tumor (Fig. 8B).
Finally, we combined tumor-specific estimates of mitotic age (Fig. 8B) and proliferation rate (Figs. 8C-D) to derive tumor-specific estimates of calendar age. Anchoring the median tumor age at a consensus estimate of 3 years (see Methods), the distribution of calendar ages across the TCGA and Lund cohorts ranged from 0.2 to 35.2 years, with an interquartile range of 1.5 to 5.6 years (Fig. 8E). There were notable differences in median tumor calendar ages by molecular subtype, ranging from 1.0 years in basal cancers to 6.5 years in Luminal A cancers (Suppl. Table S3).
Adjusting for tumor purity
Bulk samples contain a mixture of tumor and stroma. Because the epigenetic clock index exhibited correlations with tumor purity as measured by the consensus purity estimate34 (CPE; R=-0.67; Suppl. Figure S1A), we restricted our analyses to samples of high tumor purity (CPE ≥ 0.6). Nevertheless, we cannot rule out that the observed variability in \(\:\beta\:\)-value distributions among the selected fCpG sites—which are used to estimate mitotic tumor age—were at least partially driven by the methylation patterns of admixed non-epithelial cells. If this is the case, then, e.g., the immune pathway enrichment of older tumors (Fig. 6A) may be confounded by the presence of non-epithelial cells that alter the measured \(\:\beta\:\)-value distribution.
To adjust for possible confounding by tumor purity, we derived purity-adjusted \(\:\beta\:\)-values for the tumor cells by modeling the measured methylation as a mixture of tumor and stroma methylation, see Methods for details. The resulting purity-adjusted epigenetic clock index \(\:{c}_{\beta\:}^{a}\) exhibited a lower correlation with tumor purity (R=-0.22; Suppl. Figure S1B) and was lower than the unadjusted epigenetic clock index \(\:{c}_{\beta\:}\) (Suppl. Figure S1C).
When replacing the unadjusted epigenetic clock index with the purity-adjusted version, the strength of correlations between markers of tumor aggressiveness and younger mitotic age remained unaltered (Suppl. Table S1, Suppl. Figure S2, Suppl. Figure S3). Individual immune genes and the extent of immune infiltration remained associated with older mitotic age, although the correlations were attenuated (Suppl. Table S1, Suppl. Table S2). While the immune pathways were no longer enriched in older tumors (Suppl. Figure S3), there was still a positive correlation between the fraction of T cells and mitotic age (Suppl Table S2).
.