To our knowledge, this is the first study to construct brain age models derived from network-level descriptions of neuroanatomical organization across the cortex. These models using morphometric similarity as a basis for predicting chronological age did not outperform non-network models, using ‘standard’ morphometric features.
Specifically, the MSN model was outperformed (in terms of lowest MAE and highest predicted R2) by models which included all individual structural features, followed by cortical thickness and volumetric models. However, the MSN edge weight model, alongside these better performing models, all performed significantly better than null models on testing data, suggesting that these Brainage models are capturing ‘real’ patterns of variation indicative of age.
The best performing (individual) structural feature for age prediction in this study was cortical thickness. Conversely, in a previous report of lifespan (8-96yrs) brain-age prediction, in the 8–18 year old group, across all approaches using either cortical area, thickness or volume, the greatest performance (i.e. lowest mean prediction error) was actually seen using brain volume model [40]. However, across the six prediction techniques investigated in [40], cortical thickness models outperformed cortical volume models in 3/6 methods. This similar performance is maybe unsurprising given that volume measurements are typically derived from cortical thickness (and surface area) measurements. `The findings of this analysis, alongside previous reports [19–21], highlights the sensitivity of cortical thickness as an index of brain maturation.
All other tested structural features (Surface Area, Curvature Index, Folding Index, Gaussian Curvature, Mean Curvature) did not significantly outperform null models, suggesting that these models may have been overfit in the training process. This is further evidenced given that the confidence intervals for performance in these models crossed zero.
We also found that combining best performing models (cortical thickness, volume and MSN edge weights) resulted in a drop in performance compared to the cortical thickness model. Whilst not a direct statistical comparison, this suggests that these models do not capture independent variance in relation to age. This seems to disagree with previous work [19] which found that joint covariation across multiple structural features predicted variance in age independently from variance in individual features.
As well as feature sets affecting brainage estimation, the machine learning or prediction workflow is also a key factor. This study found GPR to outperform the RVR approach. These methods were selected as they have been shown to outperform other linear approaches [35], including in pediatrics [36]. On the surface, our finding seems to contradict other, comparative analyses of machine learning models in predicting brainage using morphometric data who found RVR to systematically outperform GPR [41]. However, the one scenario in which GPR did outperform RVR in [41], was in the test case with the smallest number of participants, closer to that of the sample size used here. Therefore, machine learning model will be an important consideration for future use cases.
Overall, whilst these network models of sMRI such as the MSN seem to mature as a function of age in typical neurodevelopment [28], and capture meaningful variation indicative of chronological age in the Brainage framework, these networks are not most sensitive to the changes across childhood compared to other, more simplistic features, for instance cortical thickness measures.
Currently, only two other study predicted brain age from sMRI in the ABIDE cohort [26, 42]. Using a complex network approach to T1w MRI, in 7–20 year olds, [42] achieved a MAE of 1.53 years using deep learning models. The slightly larger age range means that the MAE are not entirely comparable with the current study, although the present study has outperformed this. It is important to note that the network approach to T1w MRI in this study modelled correlation gray-levels of the image rather than structural metrics.
When BrainageΔ was calculated for the test cohort, there was great variability in of an individual’s delta values for each of the feature sets; there appeared to be little consistency in these values between models. The varying individual profiles of brain age delta has two possible explanations. Firstly, brain age delta represents the combined measure of individual variance from the expected developmental trajectory plus the error in the normative age model. It therefore may be the case that the random error in each of the models is resulting in variance in BrainageΔ, across feature sets, at the individual level. This could have potential implications for the comparison of studies utilizing the Brainage measure if there is limited consistency in these measures within an individual participant. Alternatively, a potentially more interesting explanation, is that each brainage model is indexing relevant divergences/individual differences in different aspects of cortical architecture, resulting in between model variance in BrainageΔ. This could prove to be useful in neurological conditions that influence difference aspects of brain development/organization in the paediatric brain, for instance a brain age model based upon MRI measure of white matter may be more sensitive to differences from normative brain development in acute demyelinating disorders such as multiple sclerosis. In this scenario, multiple BrainageΔ’s from different features, or even imaging modalities could be used, as potential biomarkers of clinically relevant outcomes.
However, it is difficult to statistically test each of these explanations (model error vs meaningfully different divergences) because there are a limited number of models used in any one study. Future meta-analytic research could compare within-participant brain-age delta values across feature sets, whilst controlling for the MAE of the model themselves, in order to isolate ‘real’ within-individual variation in the brain age delta measure. Future studies could also use multiple (even multi-modal) brain-age models and use the feature specific BrainageΔ’s as individual predictors in regression models, to assess unique predictive variance offered by each feature.
The results of the current study are still impressive and meaningful contributions to the field. We set the bar for evaluating performance and reproducibility exceptionally high, given that;
-
we tested all models on a relatively large, hold-out test set,
-
we assessed robustness of performance in terms of sampling (assessing the 95% CI of performance) and against meaningful null models and,
-
we investigated correlations between BrainageΔ and biases/cognition explicitly in the testing sample.
As noted by [42], ABIDE is also a particularly challenging dataset for the estimation of brainage, due to the number of different sites and acquisition protocols. For future brainage studies of development, this high bar should at least be maintained, with future improvements seen by validating on an entirely independent dataset (for example as seen in [19]).
The current study provides potential benefit to the use of the Brainage framework in clinical populations to investigate the effect of disease states on the brain. By giving estimates of error in these predictive models, brain age delta estimates can be interpreted with the appropriate amount of caution if they do not exceed ‘healthy’ variability in brain age.
An outstanding question for future research is whether there is need for models such as morphometric similarity as the popularity for deep learning/machine learning approaches become more prevalent. [43] report the results of the Predictive Analytic Competition (2019) for predicting chronological age from structural neuroimaging. They highlight the high-performing nature of neural networks for deep machine learning within the Brainage framework. Morphometric Similarity models the covariance structure of anatomical MRI features in a way which is constrained by anatomy (either using ROIs or voxels for instance) typically using a very specific, linear approach to these covariances/similarity (Pearson’s correlation coefficients). The morphometric similarity model has been shown to capture biologically meaningful information [28] however, imposing such a model as an anatomical-prior may be redundant in analyzing larger sample sizes with machine learning approaches. The machine learning/deep learning approaches that are becoming more popular in the neuroimaging literature, when fed all the individual features which are used to construct the morphometric similarity network (as we have done here), should be able to recover any covariance between structural features (even beyond linear relationships) that is captured by the morphometric similarity network approach. This may be supported by the results reported here, with greater performance seen for a model using all features compared to the morphometric similarity models.
We performed several analyses of correlates of BrainageΔ, across meaningful outcome measures and nuisance covariates/biases. We found no relationship between BrainageΔ as a measure of individual-difference and cognition in this typically-developing cohort. This suggests that, when these models are generalized to ‘novel’ cases (in this situation the testing sample), the resultant age predictions and BrainageΔ measures do not hold information pertinent to individual differences in cognition. This replicates previous similar findings. [44] reported no significant relationship between individual-level brain-age Δ (derived from voxel-based cortical thickness, volume and surface area) and cognitive abilities (as measured by the NIH Toolbox Cognition Battery). They hypothesized that this may be due to the methods they utilised which maximized the captured age-related variance in neuroanatomical measures, and that cognition-related variance (non-age related) may be captured by a different, orthogonal pattern of neuroanatomical correlates. However other studies have also found no convincing relationship between brainage and cognition in typical developing children [21, 45]. Of those that did find a relationship in developing cohorts [46, 47], these associations were small to moderate in size and thus likely require large sample sizes to reliably detect [45].
In the case of morphometric similarity (in adults), outside of the brainage framework, we found no relationship between these measures and cognitive abilities [27], failing to replicate the findings of [28]. However, a recent study of adolescence has highlighted the predictive validity of the MSN across cognition/intelligence and psychiatric symptoms [48], and so this is still very much an open area of research.
We failed to find a relationship between EFC (as a proxy measure of motion) and estimates of brainage Δ. This is most likely due to the stringent quality control procedures applied to both the training and testing cohorts, rather than a robustness to motion artefact.
One of the biggest limitations of the current study is that, in the current brainage framework, Brain Age Δ estimates are generated at the whole brain level, with a single value representing the whole brains deviance from the typical trajectory of development/aging. This comes at the cost of regional specificity which we know can be obtained by ROI/voxel cluster driven neuroimaging analyses over and above studies of whole brain. Given that we know that neurodevelopmental patterns are spatiotemporally dynamic in nature (that is they vary in location and over time [49]), and that many of the neurological disease we are interested in studying uses the Brainage framework show distinct spatial patterns in damage/cortical changes (e.g. TBI [50], AD [51]) this limitation limits the methods utility. In a recent study [52]outlined and systematically validated a local Brainage approach using a patch-based machine learning algorithm to enable estimation of voxel-wise and regional deviations from typical developmental trajectories (albeit in an adult aging population). It is yet unclear as to what contribution morphometric similarity may play in either a deep-learning framework or in the context of regional-level predictions.