Volume estimates
Overall, the discovery and validation datasets gave similar volume estimates with the 95% HDIs of the trimmed means overlapping (Table 2).
Effects on TIV
The repeated measures T2 ANOVA on the discovery set revealed (i) a significant effect of the modality (F(1,257) = 1748; p < .001) such that adding the T2w image decreased the TIV by ~ 37ml; (ii) a significant effect of the parameterisation (F(1,257) = 107; p < .001) such that going from 1 Gaussian to 2 Gaussians increased TIV by ~ 1 ml and (iii) no interaction (F(1,257) = 0.57; p = .41). Although there was no main scanner effect (F(1,257) = 1.61; p = .2), it interacted with segmentation parameters (scanner * modality * parameterisation, F(1,257) = 12.31; p < .001) such that adding the T2w image using 1 Gaussian or 2 Gaussians had a difference of -1.4ml for the Prisma and 0.03ml for the Prisma-fit. The main effects observed in the discovery dataset were replicated in the validation dataset, such that adding a T2w image decreased TIV by ~ 29ml, whereas adding 1 Gaussian increased TIV by ~ 3ml (p = 0.0017). Effect sizes were comparable between datasets except for a significantly larger effect of adding 1 Gaussian in the validation dataset compared to the discovery dataset (while similar in size to the discovery sub-group using the Prisma scanner only -- see confidence intervals, Table 3 and Fig. 1).
Table 2
Trimmed mean total intracranial, grey matter, white matter, and CSF volumes with 95% Highest density intervals for the discovery (N = 259) and validation (N = 87) datasets.
| Discovery set | Validation set |
| TIV |
T1w input, 1 Gaussian per tissue class | 1442 ml [1401 1463] | 1399 ml [1311 1443] |
T1w input, 2 Gaussians per tissue class | 1443 ml [1405 1465] | 1400 ml [1327 1444] |
T1w and T2w inputs, 1 Gaussian per tissue class | 1404 ml [1363 1425] | 1366 ml [1253 1411] |
T1w and T2w inputs, 2 Gaussians per tissue class | 1405 ml [1362 1427] | 1373 ml [1287 1416] |
| GM |
T1w input, 1 Gaussian per tissue class | 741 ml [722 752] | 722 ml [681 744] |
T1w input, 2 Gaussians per tissue class | 725 ml [703 736] | 708 ml [670 727] |
T1w and T2w inputs, 1 Gaussian per tissue class | 721 ml [701 733] | 704 ml [654 726] |
T1w and T2w inputs, 2 Gaussians per tissue class | 731 ml [709 741] | 710 ml [673 730] |
| WM |
T1w input, 1 Gaussian per tissue class | 457 ml [435 466] | 451 ml [426 466] |
T1w input, 2 Gaussians per tissue class | 471 ml [450 479] | 464 ml [435 481] |
T1w and T2w inputs, 1 Gaussian per tissue class | 431 ml [415 440] | 437 [415 453} |
T1w and T2w inputs, 2 Gaussians per tissue class | 463 ml [444 471} | 462 ml [429 478] |
| CSF |
T1w input, 1 Gaussian per tissue class | 243 ml [233 251] | 218 ml [196 231] |
T1w input, 2 Gaussians per tissue class | 246 ml [234 253] | 221 ml [196 237] |
T1w and T2w inputs, 1 Gaussian per tissue class | 252 ml [240 258] | 218 ml [187 232] |
T1w and T2w inputs, 2 Gaussians per tissue class | 212 ml [201 217] | 193 ml [167 205] |
Table 3
TIV trimmed mean differences with 95% HDI for simple effects.
| Discovery set | Validation set |
1 Gaussian per tissue class Effect of adding a T2w image | -37.3 ml [-42 -35] | -32.2 ml [-40 -28] |
2 Gaussians per tissue class Effect of adding a T2w image | -36.9 ml [-40 -35] | -27.1 ml [-33 -24] |
T1w as model input Effect of adding a Gaussian | + 0.68 ml [0.3 0.89] | + 0.68 ml [0.14 1.06] |
T1w and T2w as model inputs Effect of adding a Gaussian | + 1.19 ml [0.62 1.48] | + 5.69 ml [3.91 6.82] |
Effects on tissue types
Overall, results indicate significant modulatory effects of adding a T2w image as input and of adding a Gaussian to the model with differential effect on tissue types. Strong effect sizes were also reproduced between datasets (overlap of 95% HDI), and all changes are summarised in Table 4.
Comparable effects were observed for grey and white matter tissues. For GM, the repeated measures ANOVA revealed a main effect of input (F(1,257) = 97; p < .001) and parameterisation (F(1,257) = 23, p < .001) and an interaction (F(1,257) = 403; p < .001). Adding a T2w image reduced GM by 21ml when using a single Gaussian but increased volume estimates by 3.58ml when using 2 Gaussians. This effect was not replicated in the validation dataset (p = 0.29), for which only a reduction in volume was observed using a single Gaussian but no effect when using 2 Gaussians. For WM, the repeated measures ANOVA revealed a main effect of input (F1,257) = 1467; p < .001) and parameterisation (F(2,257) = 8970; p < .001) and an interaction (F1,257) = 1986; p < .001). Adding a T2w image reduced WM by 25ml when using a single Gaussian and 7.9ml when using 2 Gaussians. This effect was not replicated in the validation dataset (p = 0.35), for which a reduction in volume was observed using a single Gaussian but no effect when using 2 Gaussians. There were also significant scanner * modality * parameterisation interactions in the discovery dataset. For GM (F(1,257) = 21; p < .001), the effect of adding the T2w image using 1 Gaussian or 2 Gaussian has a difference of -35ml for the Prisma and − 21ml for the Prisma-fit. For WM (F(1,257) = 75; p < .001), the effect of adding a T2w image using 1 Gaussian or 2 Gaussian has a difference of -10 ml for the Prisma and − 19 ml for the Prisma-fit.
CSF showed the opposite effects as GM/WM. The repeated measure ANOVA revealed a main effect of input (F(1,257) = 102; p < .001) and parameterisation (F(2,156) = 846;p < .001) and an interaction (F(1,257) = 1687; p < .001). Adding a T2w image increased CSF by 9.2ml when using 1 Gaussian but decreased volume estimates by 32.7ml when using 2 Gaussians. This effect was not replicated in the validation dataset (p = 0.38), for which a reduction in volume was observed using 2 Gaussians but no effect when using 1 Gaussian only. In the discovery dataset, there were also significant scanner * modality (F(1,257) = 10; p < .001) and scanner * parameter effects (F(1,257) = 25; p < .001) such as adding a T2w image has a difference of -3.7ml for the Prisma and − 13.8ml for the Prisma-fit and adding a Gaussian has a difference of -24 ml for the Prisma and − 15.8ml for the Prisma-fit.
Table 4
20% GM, WM and CSF trimmed mean volume differences width 95% HDI for simple effects.
| Discovery set | Validation set |
1 Gaussian per tissue class Effect of adding a T2w image | GM: -21.3 [-28.2 -18.1] WM: -25.6 [-29 -24.1] CSF: 9.2 [3.8 12.2] | GM: -19,4 [-27.9 -16.5] WM: -13,1 [-17.6 -11.4] CSF: -0.4 [-7.6 2.8] |
2 Gaussians per tissue class Effect of adding a T2w image | GM: 3.5 [-1.6 5.9] WM: -7.93 [-9.3 -6.9] CSF: -32.7 [-38.9 -29.9] | GM: 0.7 [-4.3 4] WM: 0.1 [-3.4 1,3] CSF: -28.6 [-38.9 -24.2] |
T1w as model input Effect of adding a Gaussian | GM: -15.6 [-17.8 -14.5 ] WM: 13.5 [12.2 14.1] CSF: 2.87 [0.32 4.1] | GM: -18.4 [-14.7 -12.9] WM: 12.5 [10.6 13.5] CSF: 2.8 [-0.9 4.5] |
T1w and T2w as model inputs Effect of adding a Gaussian | GM: 8.4 [2.9 11.77] WM: 31.5 [29.6 32.5] CSF; -38.8 [-45.5 -36] | GM: 5.4 [-4.6 9.79] WM: 25.3 [22.5 26.7] CSF: -24.8 [-32 -21.8] |
Importantly, the relative increases and decreases in GM/WM vs WM/CSF reported were proportional when changing parameterisation, i.e. voxels are attributed differently to GM/WM/CSF tissue classes by changing the number of Gaussian in the model. In comparison, ‘missing’ volumes observed after adding a T2w image are compensated by a re-attribution of intra-cerebral tissue to (mostly) the soft tissue class. This is illustrated in Fig. 2, where one can see how CSF using T1w images as input has more voxels all around the edges of the brain.
Tissue volume change
Focusing on change induced by adding the T2w image as input, subjects were classified by increasing/decreasing volumes, taking the three tissue types simultaneously. The classification table (Table 5) shows that the most common change is a decrease in WM volumes. When the generative model includes 1 Gaussian per tissue class, there is also a decrease in GM, while subjects are distributed among increase/decrease of CSF (~ 45% of all subjects). When the generative model includes 2 Gaussians per tissue class, WM changes are accompanied by a consistent CSF decrease. At the same time, subjects are distributed among the increase/decrease of GM (~ 33% of all subjects).
Clustering on volume differences (Fig. 3) revealed, for both the 1 Gaussian and the 2 Gaussians parameterisation, that changes in the discovery dataset can be summarised by 3 clusters with equal covariances (model 6), pulling out one small subgroup of 4 or 5 outlier subjects with substantial CSF change (-91ml and − 131ml while the CSF 95% confidence interval upper bound of the trimmed mean is + 33ml and − 7ml). The middle size cluster (N = 25 and 33, with 22 subjects in common between the 1 Gaussian and 2 Gaussian parameterisations), while lying in a different subspace than the main cluster, did not show specific univariate characteristics. Applying this model built from the discovery dataset onto the validation dataset gave satisfactory results with equivalent mean model errors (the average of subjects’ posterior probability), indicating that the model generalises. Computing new models on the validation dataset gave different results, with 8 clusters in the 1 Gaussian parameterisation and 2 clusters in the 2 Gaussian parameterisation. In this latter case, the clustering model was similar (model 7) to the discovery dataset clustering but without the ‘outlier’ cluster, showing that the segmentation did not just give similar volume estimates but also affected images across subjects in a reproducible manner. The similarity of the model 6 in the discovery dataset and model 7 in the validation dataset is striking, looking at the covariance matrix: main cluster discovery µ= [3.5–8.8 -30, diag(𝚺)= [0.22 0.028 0.3], main cluster validation (µ= [5 -2.8 -27] diag(𝚺)= [0.22 0.03 0.27]. Post-hoc class attribution did not segregate sex, group (control/patient), scanner or age in the different clusters.
Table 5
Percentages of subjects showing concurrent GM, WM and CSF volume changes caused by adding a T2w image as input for discovery (Disc) and validation (Valid) datasets when using either 1 or 2 Gaussians in the generative model.
| GM increase | GM decrease |
WM increase | WM decrease | WM increase | WM decrease |
1 Gaussian | CSF increase | Disc: 0 Valid: 0 | Disc: 3.47 Valid: 2.29 | Disc: 0.38 Valid: 0 | Disc: 66.02 Valid: 45.97 |
CSF decrease | Disc: 0 Valid: 0 | Disc: 13.12 Valid: 3.44 | Disc: 0.38 Valid: 0 | Disc: 16.6 Valid: 48.27 |
2 Gaussians | CSF increase | Disc: 0 Valid: 0 | Disc: 0 Valid: 2.29 | Disc: 0 Valid: 0 | Disc: 3.47 Valid: 3.44 |
CSF decrease | Disc: 5.40 Valid: 29.88 | Disc: 56.75 Valid: 24.13 | Disc: 3.47 Valid: 25.28 | Disc: 30.88 Valid: 17.24 |
Distribution shift function analyses
Looking at how voxels were classified across the whole brain, it is essential to note that 60% of voxels were classified as having some GM, 40% with WM and 50% with CSF (i.e. many deciles are at 0). When adding the T2w image, all voxel probabilities changed across datasets, and model parameterisations and effects were reproduced between datasets (see Fig. 4). Statistical analyses showed that with 1 Gaussian per tissue class, GM and WM deciles were lower on average, i.e. adding T2w images as input leads to voxels with smaller probabilities of GM and WM, consistent with the observed lower volume estimates. With 2 Gaussians per tissue class, the lowest deciles show a lower probability of being GM or WM, but the highest deciles show a higher probability of being GM or WM. This result shows that adding T2w images as input increased the skewness of Gaussians. Because of the symmetric changes in skewness, volume estimates are virtually unchanged despite increased tissue differentiability. For CSF, adding T2w images as input consistently led to assigning lower probabilities, particularly for the last decile and when using 2 Gaussians. Again, results are consistent with volume changes. The high inter-subject variance can explain the lack of replicability of decreased CSF volumes when using a single Gaussian.
Segmentation accuracy: In analysing voxels containing major arteries, more accurate tissue segmentation corresponds to a higher-than-one ratio (i.e., a higher low than high probability of being of a given tissue). As shown in Fig. 5, ratios were above 1, indicating that the segmentation performed well. For GM, the repeated measure ANOVA showed a main effect of input (F(1,257) = 112) and parameterisation (F(1,257) = 220) and an interaction (F(1,257) = 594 - p < 0.001). Adding T2w images increases the GM ratio when using 1 Gaussian ([0.23 0.28] p = 0.001) but decreases it with 2 Gaussians ([-0.07 -0.04] p = 0.001). This interaction effect replicated (percentile bootstrap of the difference, p = 0.001), although only the 1 Gaussian case showed an increase ([0.207 0.267] vs [-0.04 0.008]) in the validation set. For WM, similar results were observed with a main effect of input (F(1,257) = 726) and parameterisation (F(1,257) = 2041) and an interaction (F(1,257) = 1341 - p = 0.001). Adding T2w images, however, always increased the WM ratio (1 Gaussian [10.3 11.7] p = 0.001; 2 Gaussians([0.74 1.33] p = 0.001). This interaction effect was replicated, although only the 1 Gaussian case showed an increase ([3.2 4.4] vs [-1.52 -0.65]) in the validation set. Finally, for CSF, a main effect of input (F(1,257) = 522) and parameterisation (F(1,257) = 500) and an interaction (F(1,257) = 1661 - p = 0.001) were observed. In this case, adding T2w images always increases the CSF ratio: 1 Gaussian [0.80 1.08] p = 0.001, 2 Gaussians [2.14 2.48] p = 0.001 and this effect was replicable (interaction p = 0.001, 1 Gaussian case [1.34 2.038], 2 Gausians [2.29 3.1]).
For completeness, it should also be reported that, in the discovery set, the scanner had no main effect on any of the tissue ratios. The scanner type nethertheless always interacted with parameters (GM input F(1,257) = 6.29 p = .01, parameterisation F(1,257) = 5.82 p = .01, input*parametrization F(1,257) = 1.2 p = .28; WM input F(1,257) = 5.45 p = .02, parameterisation F(1,257) = 1.02 p = .29, input*parametrization F(2,256) = 0.02 p = .95; CSF input F(1,257) = 1.43 p = .23, parameterisation F(1,257) = 4.34 p = .043, input*parametrization F(1,257) = 4.94 p = .034). These scanner interaction effects indicate that while observed changes stand across scanners (no main effect and replication between datasets), the amount of change observed varies differently in different tissues even when images are tightly matched, as in this dataset.
Next, we considered the classification of tissue in subcortical nuclei. First, only the nucleus accumbens (> 90% GM probability), the mammillary nucleus (~ 60%), the caudate nucleus (~ 60%), the hypothalamus (~ 58%) and the putamen (~ 54%) showed probabilities of GM above 50% on average for voxels labelled as such from the atlas. The habenular nuclei (~ 48%) and external amygdala (~ 45%) were just below average, the ventral pallidum voxels at ~ 25%, whilst other nuclei were simply not well captured (voxel GM probabilities below 10% and mostly seen as WM). Second, across all nuclei, confidence intervals of tissue ratios (GM/(WM + CSF)) mostly overlapped, allowing to conclude that results did not diverge among segmentation approaches (Fig. 6).