Data Preprocessing
Typically, in tasks involving predicting brain age based on neuroimaging data, the original images undergo preprocessing steps before being input into the network.30 This involves slicing the original 3D images into 2D images for network input. However, this approach loses contextual information in the feature space, leading to lower prediction accuracy compared to directly inputting 3D images. Therefore, this study exclusively employs the fully automated processing pipeline "recon-all" from the medical image processing software Freesurfer for the preprocessing of original images. This includes steps such as skull stripping, image correction, image registration, image segmentation, spatial normalization, and spatial smoothing, as illustrated in Fig. 4, which contrasts the preprocessed image with the original image.
(1) Skull Stripping:
In the original medical imaging data, there are non-brain tissues such as the skull, blood vessels, muscles, and cerebellum. To avoid impacting subsequent processing steps, the accuracy of brain tissue segmentation, and the final experimental results, it is common to strip non-brain structures from the image during the preprocessing operation.
(2) Image Correction:
The image correction step primarily involves anterior-posterior commissure (AC-PC) correction. It uses standard 256×256×256 mode for resampling and employs the N3 algorithm to correct non-uniform tissue intensity.
(3) Image Registration:
For quantitative analysis of several different images, strict alignment of these images is necessary, known as image registration. Medical image registration involves seeking a spatial transformation or a series of spatial transformations for a medical image so that corresponding points on this image match spatially with points on another image. This consistency refers to the same anatomical point on the human body having the same spatial position in two matching images.
(4) Image Segmentation:
In MRI data processing, there are instances where only the states of specific regions are of interest. This requires extracting the tissues of the target region based on the brain's anatomical structure. Once the brain regions are identified, the required brain areas are segmented as input images for the network, followed by individual and joint analysis.
(5) Spatial Normalization:
Spatial normalization involves registering images to the standard brain template space known as the Montreal Neurological Institute (MNI). This process unifies the coordinate space of all images. The MNI template space is a standardized brain image obtained by averaging a large amount of brain MRI data from healthy subjects and is commonly used as a template for brain image standardization.
(6) Spatial Smoothing:
Spatial smoothing is employed to suppress image noise, enhance the signal-to-noise ratio, and reduce inconsistencies in anatomical or functional structures between images. Typically, Gaussian kernel functions with standard deviation are used for smoothing.
Comparison with other models
First, cropped images with dimensions of 128×128×128 were used as input data for Tri-UNet and two baseline models (U-Net and ResNet 34). Tri-UNet achieved a minimum Mean Absolute Error (MAE) of 7.46 years. Compared to the best MAE of the brain age prediction network proposed by Popescu et al. [48] (9.5 years), Tri-UNet showed an improvement of 2.04 years. Subsequently, cropped images with dimensions of 32×32×32 were used as input data for Tri-UNet and the two baseline models, conducting ablation experiments. Finally, a comparative experiment between single-channel input networks and multi-channel input networks was performed using ResNet34.
The experimental results for input data with dimensions of 128×128×128 are shown in Table 2. Tri-UNetResNet denotes the model where ResNet34 is concatenated directly after the Tri-UNet network for brain age prediction. The results demonstrate the effectiveness of the proposed Tri-UNet method in the task of predicting brain age.
Table 2
Comparison of Tri-UNet with Other Models
Models | Dataset | MinMAE | MaxMAE | MeanMAE |
---|
U-Net | Whole Brain | 15.05 | 43.21 | 17.23 |
ResNet34 | Whole Brain | 9.17 | 36.6 | 12.42 |
Tri-UNet | Whole Brain | 7.46 | 16.9 | 10.05 |
When the input data size is 128×128×128, as the results predicted by U-Net are far less favorable compared to ResNet34 and the proposed model Tri-UNet concatenated with ResNet34, the comparison is made only between ResNet34 and the proposed model in other evaluation metrics. The experimental results are shown in Table 3.
Table 3
Comparison between Tri-UNet and ResNet34
Models | Dataset | MAE | RMSE | MSE |
---|
ResNet34 | Whole Brain | 9.17 | 11.47 | 131.68 |
Tri-UNet | Whole Brain | 7.46 | 9.32 | 86.96 |
Result of single-channel input network
To validate the performance of the multi-channel network and the approach using brain region segmentation maps obtained with medical prior knowledge as input data, a comparative experiment was conducted on the 3D ResNet 34 model using single-channel input. The input data consisted of untrimmed images (size: 256×256×256), and the experimental results are shown in Table 4.
Table 4
Results of the single-channel input network (256×256×256)
Dataset | MinMAE | MaxMAE | MeanMAE |
---|
Whole Brain | 4.98 | 115.42 | 22.4 |
AM | 8.26 | 225.95 | 24.91 |
HA | 6.95 | 154.43 | 18.78 |
HBT | 9.07 | 160.46 | 18.02 |
When the input data size is 32×32×32, similarly, the performance of the single-channel network was validated on the 3D ResNet 34 model, obtaining different evaluation metrics. The experimental results are shown in Table 5.
Table 5
Results of the single-channel input network (32×32×32)
Dataset | MAE | RMSE | MSE |
---|
Whole Brain | 15.18 | 17.56 | 308.54 |
HA | 16.7 | 19.18 | 367.99 |
To validate the performance of the multi-channel input network and the method of using anatomically informed brain region segmentation maps as input data, a comparative experiment was conducted on the 3D ResNet 34 model. The entire dataset used untrimmed images (size: 256×256×256), and the experimental results are presented in Table 6.
Table 6
Results of the multi-channel input network
Dataset | MinMAE | MaxMAE | MeanMAE |
---|
HA L + R | 8.6 | 285.96 | 30.54 |
HA + AM | 8.6 | 165.65 | 22.12 |
HA + Whole Brain | 14.47 | 224.45 | 31.66 |
Result of different input
The following figure illustrates the changes in Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Squared Error (MSE) loss metrics over training epochs when using the entire brain as input data (as shown in Fig. 5):
The depicted graph illustrates the evolution of MAE, RMSE, and MSE metrics with the progression of training epochs, utilizing the Amygdala (AM) as input data (refer to Fig. 6).
The following figure depicts the variations in MAE, RMSE, and MSE metrics with the increase in training epochs, employing the Hippocampus (HA) as input data (as shown in Fig. 7).
Through the comparison of results for the whole brain, hippocampus (HA), and amygdala presented in Table 4, it is observed that the minimum MAE for the whole brain is 4.98 years, reaching the lowest value among all experimental outcomes. In contrast, the minimum MAE for HA is 6.95 years, and for AM, it is 8.26 years. This difference may be attributed to the information richness provided by the whole brain as input data, potentially yielding more accurate results in certain rounds of training. Deep learning, which involves the neural network learning detailed features, benefits from a larger Region of Interest (ROI) to enhance global feature representation. The larger ROI or whole brain as input features leads to a larger receptive field, resulting in better predictive performance for brain age. Furthermore, from Table 4, it is observed that the minimum MAE for HA is lower than that for AM, suggesting that HA is more correlated with age compared to AM. However, in terms of average MAE, the whole brain has an average MAE of 22.40, HA has an average MAE of 18.78, and AM has an average MAE of 24.91. This indicates that larger regions may offer more information for more accurate predictions, but sometimes the smaller region chosen brings less noise. Therefore, a balance must be struck between obtaining more information and achieving finer segmentation features of brain regions. Comparing the curves in Figs. 5, 6, and 7, it can be observed that both HA and AM provide less information than the whole brain, resulting in faster convergence of the models, confirming the above conclusions.
Additionally, this study used a high-resolution method to segment the hippocampus (HBT) and used it as input data, as shown in Fig. 8 displaying the loss change curve. From the figure, it is evident that the minimum MAE results for predicting brain age using high-resolution hippocampal images as input data are not as good as those obtained with low-resolution hippocampus (HA) as input data. The minimum MAE for HBT is 9.07 years, which is 2.12 years higher than the result for HA. This may be attributed to the richer information content in high-resolution images, which also introduces more redundant information. This finding aligns with the results of the comparison experiment between the whole brain and HA, providing mutual confirmation for the observed reasons.
Hence, to confirm this hypothesis, the whole brain and hippocampus (HA) were used together as input for the multi-channel network to predict the brain age. The resulting loss change curve is shown in Fig. 9:
Observing the case where the whole brain and hippocampus are used together as input, the minimum MAE is 14.47 years. The result is neither as good as training the model with the whole brain alone (MAE of 4.98 years) nor as good as training the model with the hippocampus alone (MAE of 6.95 years). This validates the hypothesis in this study that in the task of predicting brain age, finer voxel predictions do not necessarily yield better results, and including more features does not guarantee better outcomes. In the presence of redundant features, the results may not be as good as those obtained with individual features.
Therefore, separating the left and right sides of the hippocampus (HA L + R) as input for the multi-channel network, the loss change curves for the three evaluation metrics (MAE, RMSE, MSE) with increasing training epochs are shown in Fig. 10. Combining the results in Table 6, it can be observed that separating the left and right sides of the hippocampus (HA L + R) as input for the dual-channel network does not perform as well as inputting the hippocampus alone. This may be due to both providing the same amount of information, but separating the sides increases the regions with pixel intensity values of 0, resulting in more noise and, therefore, less effective results compared to inputting the hippocampus alone. Trimming off the regions with intensity values of 0 might improve the performance.
Finally, using the hippocampus and amygdala (HA + AM) as input for the multi-channel network, the loss change curves for the three evaluation metrics (MAE, RMSE, MSE) with increasing training epochs are shown in Fig. 11.
Combining the information from Table 6, it is observed that the minimum MAE when using the hippocampus and amygdala together as input for the multi-channel network is 8.60 years. This is slightly higher than the minimum MAE with the entire brain as input (4.98 years). However, the average MAE improves to 22.12 years compared to 22.40 years for the entire brain, demonstrating that a multi-channel input network can enhance stability in performance by combining different data. Nevertheless, as the input images are of the same size, the double-channel input images contain many more regions with pixel grayscale values of 0, as shown in Fig. 12. In the future, it would be beneficial to crop out the informative regions as input, which may lead to better results.