The blood has something to say: A hematology-based clock to measure aging in mice

doi:10.21203/rs.3.rs-3017838/v1

Download PDF

Resource

The blood has something to say: A hematology-based clock to measure aging in mice

https://doi.org/10.21203/rs.3.rs-3017838/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 18 Oct, 2024

Read the published version in Nature Aging →

Version 1

posted

You are reading this latest preprint version

Background: Chronological age alone does not sufficiently explain aging heterogeneity. Biological clocks and molecular biomarkers proposed to predict biological age can be difficult to implement in a clinical setting. Aim: In the framework of translational science, use of routinely collected hematological markers to develop a biological clock to predict biological age and estimate aging acceleration in mice.

Methods: Data from 2,562 mice of both sexes and three strains were drawn from the Study of Longitudinal Aging in Mice and from The Jackson Laboratory’s longitudinal study of aging. Fourteen hematological variables and two metabolic indices were collected longitudinally (11,998 observations). Biological age was predicted using a deep neural network. Aging acceleration (positive or negative) was calculated as residuals from a nonlinear regression of predicted age on chronological age and tested for association with all-cause mortality.

Results: Biological age was significantly correlated with chronological age (Mean Absolute Error [MAE] = 11.95 weeks, Root Mean Squared Error [RMSE] = 15.41 weeks, r = 0.87) and positive aging acceleration was associated with shorter lifespan.

Conclusion: An aging clock based on routinely collected blood measures has the potential to provide a practical clinical tool to better understand individual variability in the aging process.

Biological sciences/Computational biology and bioinformatics/Computational models

Biological sciences/Physiology/Ageing

aging clocks

CBC

hematology

ADVIA

mice

machine learning

survival

age acceleration

deep neural networks

The aging process affects nearly all living organisms with many different manifestations, such as a decline in metabolic homeostasis, physical performance, and cognitive abilities. Numerous factors across multiple domains differentially affect the pace of the aging process and highlight the fact that chronological age may not capture the diversity or accurately measure individual variation in the rate of biological aging. There is an emerging interest in estimating biological age using multi-omics strategies, including epigenetic markers of aging, which has resulted in the creation of different models or mathematical formulas to predict the age of an individual, broadly referred to as biological clocks. For instance, epigenetic clocks based on DNA methylation (DNAm) accurately predict age and mortality risk^1–3. DNAm clocks use the DNA methylation patterns at a specific number of loci based on thousands of samples at different ages. Using machine learning, these clocks can estimate biological age, remaining lifespan, and a composite of age and physiological measurements known as “phenotypic indicators,” which were previously identified to predict mortality rates⁴. These biostatistical approaches hypothesize that the discrepancies between predicted and chronological ages represent the relative rate of aging, or age acceleration, and can be used to identify aging biomarkers, uncover novel mechanisms, identify aging-specific population clusters, and evaluate interventions aimed to slow or retard the aging process³. The key challenge in creating biological clocks is to identify metrics that can detect the difference between biological and chronological age.

While effective, the calculation of predicted age or age acceleration from epigenetic clocks utilizes complex methods not readily available in most clinical settings, as they require substantial resources and time both to obtain and to interpret the data. Furthermore, this approach is primarily based on cross-sectional studies, thus susceptible to selection or survival bias, as individuals must still be alive or healthy enough to be enrolled, missing valuable information on those who did not survive or were of poor health at the inception of the study⁵. Cross-sectional studies provide only limited information on how the predicted age or related age acceleration may change over the lifespan of an individual. Additionally, this type of study makes it impossible to validate how predictors change over time within an individual. A measure of the aging process that is affordable, easy to obtain, and able to be assessed more frequently over the entire lifespan may have significant clinical and research potential. Complete blood count (CBC) and fasting blood glucose (FBG) are among the most frequently ordered laboratory tests in biomedicine and analysis of this type of data is common practice in the clinic⁶. The CBC, a widely used clinical assay in humans and mice, provides a wealth of information about the components of blood (e.g., red blood cells, white blood cells, platelets) that are crucial for multiple biological functions involved in maintaining homeostasis [the homeostatic range]. A blood draw is a minimally invasive procedure routinely used in a clinical setting to obtain information about health biomarkers that could be easily utilized as candidate metrics associated with age-related outcomes^7,8. Additionally, there is strong evidence of the interconnection between blood variables and aging progression. Functional changes in hematopoiesis, hemostasis, and inflammation through platelet-neutrophil interactions, fibrinolysis, anemia, and elevated FBG have all been associated with poor outcomes and comorbidities, including an augmented risk of mortality⁹^–13. Many factors, including poor renal function and nutrient malabsorption, can contribute to age-associated alterations in CBC and FBG.

Mouse models in aging studies have the obvious advantages of short lifespan and genetic malleability. They provide the opportunity to collect aging-related data without the interference of confounding variables (e.g., diet, socioeconomical status, education) perennially present in most longitudinal studies in humans, which can have a considerable impact on outcomes. Hence, the Study of Longitudinal Aging in Mice (SLAM) was launched in 2015 by the National Institute on Aging to conduct studies aimed at better understanding the biology of aging in this commonly used preclinical animal model and addressing the challenge of finding clinically relevant metrics of biological aging¹⁴. Parallel longitudinal studies of mice are being carried out at other institutions, including The Jackson Laboratory (JAX). Here, we leverage these two studies to address the question of whether fluctuations in blood variables can provide enough information to build a machine learning model capable of accurately quantifying the biological aging process.

Data collection and study design. A workflow of the overall approach is described in Figure 1. After selection of a set of blood variables¹⁵ and two metabolic indices (i.e., body weight [BW] and FBG), a deep neural network (DNN) was trained (80 % data) and tested (20 % data) to predict blood age, and then to calculate aging acceleration and its impact on all-cause mortality. Initial work used the SLAM dataset alone, which consisted of 10,463 measurements from 1,997 mice of both sexes and two different strains (i.e., C57BL/6J and HET3), and housed at the National Institute on Aging vivarium (Table T1, cohorts C1–C10).

Good results in terms of low mean error and high correlation between blood-predicted age and chronological age were obtained when training-testing the clock with the SLAM dataset (MAE = 14.12, RMSE = 18.52, r = 0.82 and p-value < 0.001, all results in the testing dataset).

Longitudinal changes in blood variables. Given these promising results, we moved to assess whether the longitudinal structure of the data could provide further biologically relevant information and result in better prediction of blood age. To do so, the neural network was trained and tested by adding the longitudinal changes between timepoints for each blood variable (up to ten samples collected over the lifespan of each animal with an average of 4.8 blood tests per animal). The inclusion of this information consistently improved model performance, reducing the mean error more than 16% and increasing the correlation by almost 7.5%, from 0.82 to 0.88 (MAE = 12.45, RMSE = 15.95, r = 0.88, p-value < 0.001). This result might be relevant when translating this blood-based clock into humans, as more than one blood draw would be required for the clock to perform optimally.

The hematological clock shows consistent age prediction in an independent sample of genetically diverse mice. One of the main issues when constructing prediction tools is overfitting. Several approaches can be implemented to protect the models from such limitation and improve their generalizability. Here, the information was pre-processed separately after an 80:20 split (training:testing); the training process included regularization techniques such as elastic net penalization, dropout, and early stopping and all samples from the same animal were clustered together, either in the training or the testing dataset, to avoid knowledge of the hold-out set used for validation leaking from the one used to train the DNN.

To directly assess the statistical generalizability of our approach, we trained and validated this same set of features on an independent longitudinal dataset encompassing 1,535 samples from 563 mice. This second dataset comprised Diversity Outbred (DO) mice of both sexes, with up to three blood samples per animal (Table T1, cohorts G7–G11). In this case, the DO mice were maintained on a different diet and housed at The Jackson Laboratory’s animal facility.

Although the initial version of the clock included all blood variables derived from the ADVIA® 2120i Hematology System¹⁵, only those that overlapped from the SLAM project and JAX study were retained as features in the clock. Additionally, highly correlated variables (pairwise correlation above 0.80) were removed from the set of features to avoid redundancy. After all this, a group of 14 blood variables and two metabolic indices comprised the core set of features used to build the final version of the hematological clock (Supplementary Table S1). The training, testing, and assessment of the clock performance was carried out in two ways: i) merging of the SLAM and DO datasets, and ii) using only blood samples from the DO mice to train and test the model. In both cases, the pipeline had the same structure, so the results were comparable. A low error and high correlation between blood age and chronological age was achieved when merging the two datasets (MAE = 11.95, RMSE = 15.41, r = 0.87, p-value < 0.001). Using the DO dataset alone resulted in an additional improvement in the model (MAE = 4.28, RMSE = 6.98, r = 0.95, p-value < 0.001). We hypothesize that part of that improvement could be explained by the simplified longitudinal structure of the samples in the JAX dataset as this study has observations only at three specific timepoints (i.e., 12, 18, and 24 months) compared with the 10 timepoints in the SLAM study. Nevertheless, the good results achieved by the clock in the DO strain when considering the analysis cross-sectionally (MAE = 10.93, RMSE = 14.45, r = 0.78, p-value < 0.001) suggest that some other factors beyond the specific longitudinal structure of the sample contribute to the improved prediction in the JAX dataset compared with SLAM. Given that the DO mice are genetically more heterogeneous than the B6/HET3 strains, and have higher CBC variability, as can be seen in their phenome database¹⁶, we performed an intraclass correlation coefficient (ICC) analysis to assess whether blood variability was associated with clock performance. To determine blood variability among HET3/B6 mice and maintain the size and longitudinal structure comparable to that of the DO dataset, we filtered the SLAM dataset to include only the three overlapping timepoints present in the JAX study (blood samples collected at 12, 18 and 24 months of age). To avoid sample bias, the ICC was calculated on multiple iterations composed of 70% of SLAM samples (approximately the same size as the JAX dataset), and averaged the results. The calculations were done in harmonized datasets to remove batch effect. As anticipated, higher ICC was observed in most of the 16 features used in the clock when comparing the heterogeneous DO strain with HET3 or B6, and between HET3 vs. B6 (Supplementary Table S2). In this study, the genetic heterogeneity of the DO strain was associated with higher CBC variability and could be playing a role in the superior performance of the hematologic clock. However, the results from the HET3/B6 strains indicated higher blood variability but less accurate predictions in HET3 vs. B6 mice, making it very difficult to ascertain any correlation between blood variability and clock performance. We also found that the hematological clock had a slightly better predictive performance when processing the B6 strain alone (MAE = 11.17, RMSE = 14.82, r = 0.89, p-value < 0.001) compared to the clock based on HET3 mice (MAE = 13.21, RMSE = 17.54, r = 0.85, p-value < 0.001). The clock also obtained slightly better predictions in males vs. females (MAE = 11.2, RMSE = 14.7, r = 0.89, p-value < 0.001) vs (MAE = 12.6, RMSE = 16.5, r = 0.86, p-value < 0.001). (Figure 2).

Computing the SLAM and JAX studies at the same time led to a ~2% reduction in MAE after the validation dataset performance was compared with the average performance of the clock in the 10-Fold Cross Validation (10FCV) process. Despite the difference in the longitudinal structure of SLAM and JAX datasets, clock implementation in the JAX dataset was adequate, with a consistent error of less than one-half,and a clearly improved correlation coefficient (above 7% increment) compared with predictions in the SLAM dataset.

The fact that similar results were obtained in two sites, three strains, and two sexes, when processing all the information in the training process can be taken as a promising indicator of proper performance for other datasets. We surmise that this hematologic clock may be applicable in other strains and different settings, like drug testing or aging interventions, and pave the way for further translation into age-related human studies.

Clock performance by age range. To compare clock performance at different ages and measure which blood samples showed higher marginal importance when predicting age, we performed further sensitivity analyses, training, and testing the neural network with blood samples from animals of three different age ranges, i.e., under 52 weeks (Young), between 52 and 82 weeks (Adult) and above 82 weeks (Old). We changed the structure of the 10FCV (10 iterations per analysis to avoid sample bias) and averaged the results. Importantly, while the predictive power of the clock was lower when data from only a specific age range was provided, the overall performance was still very strong. The best performance in terms of correlation coefficient was obtained when the network processed samples from old animals (R=0.83 [0.81-0.86] RMSE=8.17 MAE=6.00), then young (R=0.81 [0.78-0.83] RMSE=6.90 MAE=4.81) and finally adults (R=0.76 [0.73-0.79] RMSE=6.96 MAE=5.70), while the best behavior in terms of error was achieved when computing blood samples from young individuals.

Longitudinal Feature Importance. To investigate the relevance of each feature within this core set of blood variables and assess whether the ranking of feature importance remained stable during the life of each mouse, we performed a feature importance analysis by age range, stratifying the dataset in three groups segregated by percentile P33 and P66 (young [tercile 1], adult [tercile 2] and old [tercile 3]). A method based on the random forest algorithm¹⁷ was applied for determining the feature importance. This method not only ranks the variables, using the depth in the tree as a metric, but also discriminates those features whose contribution to the model are unambiguously achieved by chance. The results revealed that feature importance was not stable throughout life. Platelet count (adjusted by clumps), one of the most important features according to the above-mentioned method, appeared to be a strong predictor of age during the early stage of life, but its relevance clearly dropped during midlife and remained low within the last stage of life. Mean corpuscular hemoglobin and red blood cells displayed similar trajectories of importance, with a slight increment late in life. FBG levels followed an opposite course, being a weak predictor of age in young animals and progressively increased in importance with age until the last third of life, when the maximum was reached. Lymphocytes (%), eosinophils (adjusted by clumps), white blood cells, and neutrophils steadily lost importance as predictors of age throughout life (Figure 3 and Supplementary Figure S3).

One key aspect of the experimental design was the fact that mice were bred, delivered, and tested in 15 batches or cohorts (i.e., 10 cohorts from SLAM and 5 cohorts from the JAX study), due to the inherent difficulty in handling such large number of animals. We tested the existence of batch effect (BE) by fitting a linear mixed effect model (LMM) to each blood variable, adjusted by available covariates. We observed a variation up to a 14% of the total variation to be attributable to the batch, implying the presence of a slight BE that was more accentuated in some features than others (Supplementary Table S4). To harmonize the dataset and remove BE, each blood variable was adjusted by subtracting the random estimates assigned by the LMM to each cohort and those residuals were then used in all downstream analyses (Supplementary Table S4, right most column)

This specific structure of data in subpopulations or batches may contribute to artificially increase model performance, even after batch harmonization, if the batches are not well balanced in terms of the outcome. An essential premise in the development of tools for biological age prediction is the removal of any chronological feature that alters the true ability of said tool to predict age. The inclusion of the categorical variable “batch” as a feature in the clock improved the neural network prediction by more than 5%. Initially represented as an ordinal label with no other purpose than indicating the sequence each order of animals was delivered by the supplier, the variable “batch” has, in fact, pertinent “age-related” information for the clock. SLAM animals arrived at the National Institute on Aging (NIA) vivarium in cohorts of approximately 200 animals every 3 months, balanced in number between strains and sexes (cohorts C1 to C10 in Table T1). Blood samples were typically collected every three months from baseline, but COVID-19 pandemic restrictions caused some batches of SLAM mice to have missing observations at specific timepoints, leading to unbalanced representation of the outcome “age” across batches. On the other hand, the JAX study (cohorts G07 to G11 in Table T1) had significantly fewer timepoints, with blood collected at 12, 18, and 24 months of age. Hence, depending on the specific batch number and its associated age frequency distribution, the supposedly inert categorical “batch” label was now providing an important chronological hint for the model to predict age and improve performance. Therefore, variables such as the blood test number (i.e., the sequential visit number in epidemiological studies), the study (i.e., SLAM vs. JAX), or the batch label were excluded from the set of features used to predict age. This concept is applicable to any other categorical variable that is not well balanced between levels or classes in terms of the outcome representation.

We also tested the null hypothesis by permuting age as the dependent variable (50 iterations) while maintaining the values of all explanatory variables unaltered. Low correlations between predicted and chronological age were found from this analysis, indicating that the highly correlated predictions made above were unlikely to have been achieved by chance (Supplementary Table S5).

Age acceleration. Age acceleration was calculated as the residuals after modeling chronological age and blood age (predicted by the clock) using a local polynomial regression to account for the nonlinear relationship between chronological age and blood age (Figure 2B). To determine sample bias, we trained and tested the clock to calculate the age acceleration associated with each blood sample computing iteratively 50 random splits, altering the 10FCV each time. The results of such calculations are visualized as a heatmap with each row representing age acceleration per sample and each column depicting an individual blood sample out of 2,381 sorted by age and labeled as young, adult, and old (Figure 4A, left panel). Age acceleration appeared to be a relatively stable variable across the 50 re-samplings, providing similar predictions in all of them, as evidenced by the presence of clear vertical columns in the absence of horizontal bands. To test for a possible association between any available phenotypical trait or batch and age acceleration, we then performed unsupervised hierarchical clustering of the same matrix generated in the previous 50-split process and labeled the samples according to the animal’s age group (e.g., young, adult, or old). The algorithms identified a series of clusters, none of which associated with a particular sex, strain, cohort, or site (NIA vs. Jackson’s lab). Furthermore, age-range labels of each mouse were strikingly unsorted across the band (Figure 4A, right panel, upper band), indicating adequate model fitting of blood predicted age vs. chronological age through the nonlinear regression. Taken together, our mathematical model was able to determine that age acceleration, calculated from the plot of residuals, was a relatively stable variable by providing similar predictions regardless of the sampling process, even though it could not detect a significant association between age acceleration and any specific phenotypical or categorical group in the two studies.

Fast agers show reduced lifespan. Probably one of the most complicated aspects of developing an aging clock is detecting the eventual association between aging acceleration and the clinical implications associated with that acceleration. Even the selection of the model is challenging, as the assumptions required by the model initially selected are not always met (proportional hazards, linearity…), or the sample size is not large enough to detect significant differences. The process does not become simpler even when longitudinal data are examined. To determine whether the discrepancies between predicted age and chronological age were biologically relevant and did not stem from a lack of fit of the model, the association between age acceleration and lifespan was examined using four different approaches.

We initially performed a cross-sectional examination of the relationship between age acceleration and lifespan and found very low correlations between the two variables (Figure 4B). Comparable lifespan was observed whether mice aged faster (positive acceleration) or slower (negative acceleration) (fast agers and lifespan: r = -0.01, p = 0.9035; slow agers and lifespan: r = -0.12, p < 0.001). Collection of longitudinal data allows age prediction at multiple timepoints. Age acceleration can vary within the same animal as rapid age acceleration with a particular mortality risk may be observed at a given timepoint and become slower thereafter (Figure 4C). The converse may be happening as well. Thus, it is paramount to explore the association between trajectories of age acceleration and mortality from a longitudinal perspective. Initially, we analyzed mortality using Cox regressions by modeling time to death as the outcome and age acceleration as a time-dependent risk factor (e.g., a covariate that changes at each timepoint). The results showed that mice with a higher age acceleration had higher mortality risk compared to those with lower acceleration. Indeed, mice within the highest tercile for age acceleration had nearly 1.3 times the mortality rate (Entire dataset: HR = 1.28, 95% confidence interval [CI] = 1.14–1.44; p ≤ 0.001, CI = 0.57, standard error [SE] = 0.008, number of events (n) = 1,780; Test dataset only: HR = 1.27, 95% confidence interval [CI] = 0.98–1.63; p = 0.06, CI = 0.56, standard error [SE] = 0.14, n = 351) of those within the lowest tercile (Percentile P33 and P66 used as cutting points), after adjustment for cohort, strain, sex, and age (Figure 5A [Entire dataset]).

Since the selection of the distribution could be particularly determinant when fitting survival models in long follow-up studies like SLAM, we verified the results obtained in Cox survival analysis by fitting Gompertz regressions to the data and compared subpopulation survivorship. This second approach produced similar results to the ones generated by the Cox regression (Entire dataset: Gompertz coef. = 0.34, Relative Risk = 1.40, [CI] = 1.28–1.51, n = 1,780; Test dataset only: Gompertz coef. = 0.25, Relative Risk = 1.28, [CI] = 1.02–1.53, n = 351). Next, the profiles of the acceleration curves were examined to evaluate whether the blood clock could predict mortality. Linear mixed regression was used to rank slopes of the age acceleration trajectories and classify individuals as slow or fast agers (Figure 5B). The maximum lifespan in each group was then calculated and compared to detect whether higher slopes were associated with shorter lifespan. The method QT3 of Wang and Allison¹⁸ clearly showed that animals with higher acceleration slopes (above 90^th percentile, i.e., fast agers or highly accelerated) had a significantly lower proportion of long-lived individuals, defined as living above the 90^th percentile of maximum lifespan, compared to the group with lower slope (below 10^th percentile, i.e., slow agers or highly decelerated) (Table T2) (Figure 5B).

Different age acceleration trajectories between long- versus short-living mice. The final analysis was meant to analyze the age acceleration trajectories of two subpopulations of animals, those with long lifespan and those that had shorter lives. Instead of classifying acceleration and then testing its association with lifespan, we first classified animals by lifespan and then analyzed whether they were predicted to be fast or slow agers using linear mixed regressions. In order to do so, the lifespan of the entire population was segregated into terciles (Percentile P33 and P66 as cutting points) and the clock was run using the blood samples of those animals in the two tails of the distribution (Figure 6A). As anticipated, the age acceleration trajectories of long-lived mice were below the ones from short-living animals throughout almost the entire lifespan (Figures 6B and 6C), indicating the clock’s ability to link longitudinal trajectories of blood variables with aging acceleration. All these approaches seem to provide a biological foundation to this computational tool and may lead toward helping to refine a better understanding of aging by being able to identify biological age and, somehow, quantify a very complex process.

Almost 12,000 observations on approximately 2,500 mice were used to build the hematological clock. It is a highly controlled study with good representation of mice at different ages, based on three different strains and both sexes. By considering not only blood cell counts at specific timepoints but also the longitudinal changes across time, this method appears to measure the aging process and be able to transform the set of blood variables into a metric of aging. Used as the clock’s output, age accelerations were computed for each individual blood collection and a single longitudinal trajectory per animal was generated (Example of trajectories in Figure 4C). Part of its ability to predict age is the fact that the hematological clock goes beyond measuring the abundance of specific biomolecules; it directly quantifies blood cell populations, and this might help to better gauge fluctuations in fundamental biological mechanisms associated with the rate of aging.

As a final consideration, several interesting hypotheses about this blood-based method to measure aging come to mind when planning future analyses beyond the need of translating this method in other species like primates. These include changes in acceleration trajectories in response to interventions that either shorten lifespan (e.g., radiation, high-fat diets, etc.) or extend longevity, such as caloric restriction, rapamycin, or senolytic drugs. Examination of the blood clock's behavior in heterochronic and isochoric parabiotic pairings, and in mice with specific mutations that promote blood-related diseases such as leukemia or anemia would also be of great interest. Another avenue for this work would encompass further data mining, for example using the whole raw ADVIA output with more than 500 variables (not all of them biologically relevant) to determine which features are better predictors of age. Moreover, machine learning tools could be helpful in identifying putative interactions among these and other less studied variables as well as predicting different time-to-event outcomes other than mortality such as tumor onset or the inception of cognitive-associated morbidities. It would also be very interesting to assess whether other sets of variables outperform the one evaluated in this work.

This study has several strengths and limitations worth noting. As the first large longitudinal study of normative aging in mice, SLAM assesses many phenotypes, biological metrics, physical variables, and pathologies, which results in a very rich dataset with a large sample size. Unlike the data used to generate other aging clocks, the blood features used in this analysis are routinely collected in a clinical setting and could provide a unique opportunity to assess aging progression in human populations. Our initial analysis utilized data from two different strains of mice of both sexes, including both inbred and genetically heterogeneous populations of mice. We then expanded our analysis by including data from a study conducted at a different facility in an even more genetically diverse mouse population, further emphasizing the robustness of our findings. Besides the obvious highly controlled environment of the preclinical model-based studies, one particular limitation arises from the fact that the animals in the study were virgin mice and housed at sub-thermoneutral conditions which is different from what would be observed in human populations.

In conclusion, this model can serve as a viable and powerful alternative to epigenetic clocks and a valuable tool to help drive forward the aging field. We have demonstrated that a biological clock based on routinely collected blood variables can provide reliable predictions of biological age. The difference between blood age and chronological age appears to be associated with mortality risk, as mice with an older biological versus chronological age have nearly a 30% higher death rate. Translation of these findings to clinical application in humans will require further study to account for the myriad of uncontrolled variables and challenges that study participants face daily.

Study of Longitudinal Aging in Mice

SLAM is an ongoing, longitudinal cohort-based assessment of normative aging in mice conducted at NIA¹⁴. Inbred C57BL/6J (B6, stock number: 000664) and outbred HET3 (stock number: 409673) mice of both sexes were born at The Jackson Laboratory (Bar Harbor, ME), shipped to the NIA animal facility (Baltimore, MD) at 12 weeks of age in cohorts of approximately 200 animals every 3 months, and followed longitudinally thereafter. For the current analysis, data from cohorts 1 to 10 (about 50% female, about 50% HET3) were collected between 2015–2021. Mice were group housed (4 mice per cage) and fed NIH-31 diet ad libitum with free access to water. Mice were housed in rooms maintained at 22.2 ± 1ºC and 30%–70% humidity. Routine tests were performed to ensure that mice were pathogen-free and sentinel cages were maintained and tested according to American Association for Accreditation of Laboratory Animal Care criteria. This study was approved by the Animal Care and Use Committee of the NIA in Baltimore, MD (Protocol number TGB-458-2022).

To improve the generalization capability of the machine learning models tested, blood variables, FBG, and BW data were collected from a cohort of Diversity Outbred (DO) mice maintained at The Jackson Laboratory’s research facilities¹⁹ (Table 1, cohorts G7–G11) and were included in this analysis.

Sample size

The dataset comprises 11,998 observations from 2,562 animals grouped in two studies, 15 cohorts, three strains (B6, HET3, DO), and both sexes (Total sample: 492 females B6, 479 males B6, 501 females HET3, 468 males HET3, 270 females DO, and 270 males DO) (Table T1). Blood draw was performed at different timepoints during the mouse lifespan, ranging from 1 to 10 samplings, with an average of 4.8 (SD 2.03) blood collections per mouse.

Blood variables, metabolic indices, and mortality assessment

A machine learning approach was used to develop a biological clock to predict age and age acceleration from 14 blood variables and two metabolic indices, namely FBG and BW (Supplemental Table S1). The ADVIA® 2120i Hematology System (Siemens Healthineers, Erlangen, Germany) was used to quantitatively measure blood variables²⁰. The hematology profile and FBG were assessed every 3 months for each animal. Briefly, mice were fasted for 6 hours from 8:00 AM to 2:00 PM and blood was drawn by puncture of the submandibular vein. Blood glucose levels were measured using Contour test strips and glucose meter (Ascencia Diabetes Care, Parsippany, NJ). For ADVIA measurements, 50 μL of whole blood was resuspended in 150 μL of RPMI-EDTA-K2 medium. The instrument was calibrated on the day of testing as per the manufacturer’s instructions. Body weight was measured every 2 weeks at the same time of day under ad libitum conditions to avoid diurnal fluctuations.

Most mice were followed until their natural death; however, subsets of mice whose cage mates had died were preferentially assigned to planned euthanasia to minimize the impact that single housing has on mouse behavior (e.g., feeding, grooming, etc.). Only animals that died of natural death or whose conditions were incompatible with life and euthanized as per veterinarian’s guidance were accounted in survival analyses. Lost to follow-up animals and mice that underwent planned euthanasia were censored but included in the population ‘‘at risk’’ until being removed from the study. New input data was introduced in the machine learning algorithm whereby the increment of each feature (i.e., blood variables, FBG, BW) between current and previous timepoints was added to create outputs containing longitudinal information.

Statistical analyses

To predict age, we built a machine learning pipeline consisted of four steps: 1) data cleaning and splitting, 2) data preprocessing, 3) model training and evaluation, and 4) model validation (Figure 1B). Since age was the primary outcome for evaluating the accuracy of predicted age, age and other chronological variables were excluded from data preprocessing as well as the training and testing of the models. All analyses were performed using R version 4.0.5 and RStudio version 1.4.1717. The full statistical analysis code will be publicly available, see “Data Sharing and Availability.”

Data cleaning and splitting—Outliers were identified and removed based on the 1.5x interquartile range method after variable log transformation. The data was split 80%:20% (training:testing), stratified by cohort, strain, and sex. The models were trained using 10-fold cross validation and stratified, only including the 80% training set and maintaining all measurements pertaining to the same animal together in either the training or the testing set to avoid information leakage.

Data preprocessing—Data preprocessing included data transformation, handling of missing data, and cohort harmonization. We compensated for batch effects using a linear mixed effect model as described²¹. Missingness was imputed using two methods with similar results: i) computing the longitudinal information of each mouse for a specific variable to fit a spline and using the model to predict the missing point and ii) using a random-forest-based algorithm²². Harmonization across cohorts was performed after the preprocessing step mentioned above and data splitting to avoid non normality and information leakage. We transformed percentages to logit and adjusted platelets and eosinophils to clumps following the same approach described for batch effect correction. All numerical features were standardized, and all categorical variables were binary-transformed using one-hot encoding.

Model building, training, and evaluation—We used an initial model based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back propagation through the Deep Learning algorithm in the AWS.H2o platform (https://github.com/h2oai/h2o-3) using the default setting. We structured the pipeline using the Caret R library (topepo.github.io/caret) and the tidymodels R library (www.tidymodels.org). No variables carrying information about the chronological order of the blood draw were included in the models.

Model validation—After training and performing the 10-Fold Cross Validation to build the DNN using the 80% training dataset, predictions and final validation were obtained using the 20% testing dataset held out during the previous process.

Age Acceleration: — Age acceleration was calculated as the residuals after regressing the blood-predicted age, as determined by our DNN, on chronological age fitting a nonlinear regression model.

Survival Analysis—Cox and Gompertz regression: Aging acceleration was used as a prognostic factor to identify differences in survival risk. Age acceleration was computed as a time-dependent covariate, clustered by animal, in the Cox and Gompertz regression models adjusted by cohort, sex, strain, and age, following Therneau’s approach fully described in the survival R package:

(cran.r-project.org/web/packages/survival/vignettes/timedep.pdf).

Maximum lifespan comparison — Linear mixed regression was used to rank animals by the slope of the acceleration trajectory through their lifetime and then compare maximum lifespan of animals with the sharpest positive slope (slope above percentile P90) vs those with the sharpest negative slope (slope below percentile P10). Maximum lifespan was calculated and compared between groups using the method QT3 (coupling quantile regression and Boschloo’s test) as described¹⁸.

Acknowledgments

The authors would like to thank all the personnel and research staff involved in the study, Steve Horvath and Gennady Denisov for their comments, codes, and suggestions helping to improve the accuracy of machine learning models and statistical analysis, Stephanie Dickinson for her coordination and facilitation when structuring and developing the code pipelines, and Yolanda L. Jones, NIH Library, for editing assistance.

Funding

This research is supported in part by the Intramural Research Program at the NIA, National Institutes of Health.

Conflict of interest

The authors declare no competing financial interest or conflict of interest.

Copyright permission

All figures and illustrations have been created by Jorge Martinez-Romero using Blender²³ 2.90.1, a free and open-source 3D computer graphics software toolset.

Data and code availability

All murine data is available from the lead contact upon request. All original code will be available online as of the date of publication. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol 14,1–20 (2013).
Lu AT, Quach A, Wilson JG, Reiner AP, Aviv A, Raj K, Hou L, Baccarelli AA, Li Y, Stewart J.D. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging (Albany NY) 2019, 11(2),303–327.
Levine ME, Lu AT, Quach A, Chen BH, Assimes TL, Bandinelli S, Hou L, Baccarelli AA, Stewart JD, Li Y et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY) 2018, 10(4),573–591.
Levine ME. Assessment of epigenetic clocks as biomarkers of aging in basic and population research. In., Oxford University Press US; 2020.
Nelson PG, Promislow DEL, Masel J. Biomarkers for Aging Identified in Cross-sectional Studies Tend to Be Non-causative. The Journals of Gerontology, Series A 2019, 75(3),466–472.
Harris N, Kunicka J, Kratz A. The ADVIA 2120 hematology system: flow cytometry-based analysis of blood and body fluids in the routine hematology laboratory. Laboratory Hematology 2005, 11(1),47–61.
Coppola L, Caserta F, De Lucia D, Guastafierro S, Grassia A, Coppola A, Marfella R, Varricchio M. Blood viscosity and aging. Archives of Gerontology and Geriatrics 2000, 31(1),35–42.
Simmonds MJ, Meiselman HJ, Baskurt OK. Blood rheology and aging. J Geriatr Cardiol 2013, 10(3),291–301.
Morrison SJ, Wandycz AM, Akashi K, Globerson A, Weissman IL. The aging of hematopoietic stem cells. Nat Med 1996, 2(9),1011–1016.
Pang WW, Price EA, Sahoo D, Beerman I, Maloney WJ, Rossi DJ, Schrier SL, Weissman IL. Human bone marrow hematopoietic stem cells are increased in frequency and myeloid-biased with age. Proceedings of the National Academy of Sciences 2011, 108 (50),20012–20017.
Hager K, Setzer J, Vogl T, Voit J, Platt D. Blood coagulation factors in the elderly. Arch Gerontol Geriatr 1989, 9(3),277–282.
Chhetri JK, Zheng Z, Xu X, Ma C, Chan P. The prevalence and incidence of frailty in Pre-diabetic and diabetic community-dwelling older population: results from Beijing longitudinal study of aging II (BLSA-II). BMC Geriatrics 2017, 17(1),47.
Sorkin JD, Muller DC, Fleg JL, Andres R. The Relation of Fasting and 2-h Postchallenge Plasma Glucose Concentrations to Mortality: Data from the Baltimore Longitudinal Study of Aging with a critical review of the literature. Diabetes Care 2005, 28(11),2626–2632.
Palliyaguru DL, Vieira Ligo Teixeira C, Duregon E, di Germanio C, Alfaras I, Mitchell SJ, Navas-Enamorado I, Shiroma EJ, Studenski S, Bernier M et al. Study of Longitudinal Aging in Mice: Presentation of Experimental Techniques. The Journals of Gerontology, Series A 2020, 76(4),552–560.
Harris N, Jou JM, Devoto G, Lotz J, Pappas J, Wranovics D, Wilkinson M, Fletcher SR, Kratz A. Performance evaluation of the ADVIA 2120 hematology analyzer: an international multicenter clinical trial. Laboratory Hematology 2005, 11(1),62–70.
Bogue MA, Churchill GA, Chesler EJ. Collaborative Cross and Diversity Outbred data resources in the Mouse Phenome Database. Mammalian Genome 2015, 26(9),511–520.
Kursa MB, Rudnicki WR. Feature selection with the Boruta package. Journal of statistical software. 2010; 36, 1–13.
Wang C, Li Q, Redden DT, Weindruch R, Allison DB. Statistical methods for testing effects on "maximum lifespan". Mech Ageing Dev. 2004;125(9),629–32.
Churchill GA, Gatti DM, Munger SC, Svenson KL. The diversity outbred mouse population. Mammalian Genome 2012, 23(9),713–718.
Harris N, Jou JM, Devoto G, Lotz J, Pappas J, Wranovics D, Wilkinson M, Fletcher SR, Kratz A. Performance evaluation of the ADVIA 2120 hematology analyzer: an international multicenter clinical trial. Laboratory Hematology 2005, 11(1),62–70.
Hession LE, Sabnis GS, Churchill GA, Kumar V. A machine-vision-based frailty index for mice. Nature Aging. 2022;2(8),756–66.
Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 2011, 28(1),112–118.
Hess, R. 2010. Blender Foundations: The Essential Guide to Learning Blender 2.6. Focal Press.

Table T1 and T2 are available in the Supplementary Files section.

There is NO Competing Interest.

Download PDF

Journal Publication

published 18 Oct, 2024

Read the published version in Nature Aging →

Version 1

posted

You are reading this latest preprint version

The blood has something to say: A hematology-based clock to measure aging in mice

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results And Discussion

Methods

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1