In multi-site neuroimaging studies, it is important to examine the inter-scanner reproducibility of volumetry data acquired from different MRI scanners before further statistical analysis with the integrated data. To this aim, MRI images of fifteen healthy subjects acquired multiple times from different MRI scanners were collected for scanner-related comparison and three structural brain MRI analysis software (FreeSurfer, FSL-FIRST and AccuBrain®) were selected to test software-related differences in measurements of brain volumetry. The segmentation accuracies of the three software have been evaluated and compared in many literatures[13]. As the segmentation accuracy of different structures is highly dependent on the anatomical definition of structures in a specific software, the comprehensive comparison of region-specific segmentation accuracy among the different software is out of the scope of this study. Our major objective is to investigate the reproducibility of brain volumetry in inter-scanner acquisitions and to test the influence of quantification software selection on inter-scanner reproducibility of brain volumetry.
In this study, AccuBrain® presented less inter-scanner variability than FreeSurfer and FSL-FIRST according to the comparison of their CV values of brain volumetry. These findings might result from the superior performance of AccuBrain® due to its large atlas pool, which consists of template images from a wide range of MRI scanners for knowledge transfer. Although FreeSurfer also employs atlas-based segmentation, it uses only one specific atlas (including one MRI template with labeled atlas) for knowledge transfer, which may influence its performance in inter-scanner reproducibility. Furthermore, several brain substructures (e.g. hippocampus, amygdala, pallidum and accumbens) had relatively higher CVs than other structures in the tested ABS tools, while brain tissues with larger volume (e.g. WM and GM) presented much smaller CV values (Table 2). This finding may result from the relative volume of the tested brain structures or tissues, where the misclassified voxels from segmentation may have larger impact on the CV values if the volume of the structure is small. The secondary cause may be the differences in boundary definition and tissue contrast. One of the most important features that triggers brain MRI segmentation is brain tissue intensity [3, 15], and the fuzzy boundary and lower contrast of background are more likely to cause tissue misclassification.
In addition, we found that the variabilities of the quantified brain volumetry between each pair of scanners (GE vs. Philips, GE vs. Siemens, Philips vs. Siemens) were quite different when different ABS tools were used (Table 3). When using AccuBrain® or FreeSurfer as the quantification tool, the inter-scanner variability of GE and Siemens scanners was the lowest compared with the other pairs of scanners, and when using FSL-FIRST, the inter-scanner variability between GE and Philips scanners was the lowest. In view of the segmentation algorithm, both AccuBrain® and FreeSurfer employ atlas-based segmentation method, while FSL-FIRST uses model-based segmentation method. The performance of atlas-based segmentation depends on the matching of the intensity in template image and that in the image to be segmented, while model-based segmentation relies more on fitting a prior model for the image to be segmented. In fact, the images acquired from GE and Siemens scanners are more similar in terms of intensity level and image contrast than the other pairwise comparison of scanners, which may also serve as a reason for the better reproducibility of the data from GE and Siemens scanners with AccuBrain® and FreeSurfer. In contrast, FSL-FIRST, which is less affected by intensity level, does not follow the similar trend of pairwise inter-scanner variability in brain volumetry as identified by AccuBrain® and FreeSurfer. In fact, FSL-FIRST presented the highest CV values among all the pair-wise inter-scanner comparisons, indicating its inferior inter-scanner reproducibility. Regarding the applications of the three segmentation tools, they all have their own superiorities. For example, although FreeSurfer takes the longest time to process one dataset, it supports not only quantification of subcortical brain volumetry, but also cortical parcellation and quantification. FSL-FIRST tool also enables surface-based morphometry analysis for the subcortical structures in addition to quantification of brain volumetry. As this paper mainly discussed about the reproducibility of brain volumetric quantification as affected by ABS tools, the comparison regarding different functions of the mentioned ABS tools is out of the scope of this study.
Of note, if the CVs (that indicate inter-scanner variability in brain volumetric quantification) are relatively higher when involving comparisons with a specific scanner, it does not necessarily imply that this scanner is inferior to the others, as the contrast and intensity level can be changed by modulating imaging parameters[15]. Although segmentation algorithm is the primary factor that influences inter-scanner reproducibility, the effect of the pulse sequence selected for a specific scanner cannot be underestimated, since it also has a large impact on the quantification results of brain volumetry. The misclassification rates can be reduced by a suitable and proper choice of pulse sequences [17], and the CV values obtained in our study may be reduced by adjustments of image acquisition parameters, which warrants further validations in the future.
Segmentation and quantification of specific brain regions are common tasks in the study of neurological disorders such as movement disorders[18], Alzheimer’s disease [19] and epilepsy [20]. Disease progression is often reported using annualized rate of tissue volume loss, which may be very small[2]. Therefore, highly reproducible measurements are important to detect and monitor brain volumetric changes at multiple time points. Routine use of brain morphology analysis in clinical nursing needs reliable and reproducible measurements, because radiologists often give advice on treatment decisions according to brain volumetric changes[2]. High reproducibility is also necessary for detecting the subtle yet important changes of brain disease, especially in multi-site researches. The change of interest cannot be studied if the inter-scanner reproducibility of brain volume has large discrepancy[21, 22]. In such background, the proper selection of brain segmentation software is a critical step in computer-aided diagnosis and measurement [3]. In addition, choosing same scanner manufacturer, field strength, head coil, magnetic gradient[23], and pulse sequence[9] is helpful to improve inter-scanner reproducibility.
There are some limitations of this study that need to be considered. First, the results of our study were grounded on the examinations of young healthy volunteers. Therefore, the variability of brain volumetry in a cohort with severe brain atrophy and/or with brain lesions remain unclear. The accuracy of ABS tools might decrease when brain anatomic segmentation is performed in patients with demyelinating lesions (e.g. multiple sclerosis), mass-like lesions (e.g. tumors) [24] or brain atrophy. In this respect, further studies with focus on the reproducibility of ABS tools in brain volumetry should expand the cohort to be tested from healthy individuals to individuals with brain lesions and/or atrophy. Second, as the primary goal of this study was to test inter-scanner reproducibility in a way as in clinical practice, the applied imaging parameters in this study were all daily used in clinic without any additional modulation, and the software parameters were set as default without specific preference in parameter selection[2]. However, it has been reported that appropriate adjustments of image acquisition parameters can help achieve better reproducibility of brain volumetry[25]. Therefore, future efforts should also aim to investigate the optimal imaging parameters and protocols to further improve the inter-scanner reproducibility in multicenter studies.