The Shape Up! Adults study population has been previously described [15] and summarized in Table S1 in the Supplementary Material. Only this subset of the total data was used for body composition regression training and testing.
Geometric reconstruction error for both 3DAE and PCA shape models of four increasing sizes are shown in Table 3. The dimensionality d represents either the number of PCA coefficients used to parameterize shape (linear model) or the number of latent variables in the bottleneck layer of the 3DAE (nonlinear model) connecting the encoder module to the symmetric decoder. Reconstruction error was calculated as the geometric mean absolute error (MAE) between original and reconstructed vertex 3D positions. As expected, larger models were able to reconstruct the test data with lower error. Both linear and deep methods are comparable in terms of geometric reconstruction accuracy for the first three model sizes. PCA achieved lower geometric reconstruction error at the highest parameter count while 3DAE reconstruction error leveled off at just above 2mm MAE.
Table 2. Per-vertex reconstruction error measured as mean absolute error (MAE) between input and decoded meshes for held-out test meshes from Shape Up! Adults (n=424). The dimensionality (d) represents the number of latent layer variables in the autoencoder bottleneck or the number of principal components used for a linear shape space reconstruction.
|
D=49
|
d=301
|
d=630
|
d=4284
|
3DAE MAE (mm)
|
5.16
|
2.57
|
2.18
|
2.02
|
PCA MAE (mm)
|
5.26
|
2.41
|
2.23
|
1.0
|
Figure 3 depicts the body composition prediction results from d=4284 sized models using different permutations of linear and nonlinear shape feature extraction and regression methods. Excluding the baseline column, each subsequent column represents a change of exactly one model parameter holding all others constant; i.e. OLS to GPR, PCA to 3DAE, 4824 total parameters to 400x64 parameters. All model permutations were trained and tested on the exact same mesh data.
The baseline GPR model accuracy is the RMSE resulting from a regression using only known features [height, weight, age] without any conditioning on the 3DO shape features as was previously done in [15] and [29]. This RMSE was higher than any subsequent regression model using shape features as input. The reduction of prediction error when shape information is introduced demonstrated that 3DO shape is a useful signal for body composition prediction even when nonlinear regression algorithms are employed.
PCA-OLS is equivalent to the linear methods of prior work. This is a linear regression model for body composition prediction taking PCA features as inputs. The PCA model had 4284 parameters to stay exactly consistent with the dimensionality the 3DAE.
PCA-GPR represents a hybrid pipeline predicting body composition with nonlinear GPR from linear PCA features. Nonlinear regression from linear shape features achieved lower RMSE on every predicted metric relative to linear regression except arm lean on females (which was equal) and leg lean on females (which was higher by 0.02 kg, or 5%). These results indicated that GPR was a more accurate regression method than OLS when all other factors were held constant for most body composition targets.
3DAE-GPR represents the fully nonlinear pipeline where body composition is predicted from GPR using 3DAE deep features as inputs. The results for both the bottleneck layer (7x612) and the layer with the lowest RMSE are shown. The third layer (dimensions 400x64) was the most accurate feature layer for males and the first layer (dimensions 6890x16) was the best for females. In males, 3DAE feature extraction lowered RMSE relative to the previous model permutations, but in females no level of feature extraction outperformed GPR on the original mesh coordinates. This result suggests the features extracted from female meshes were less informative relative to male features or were highly correlated regardless of the method and model size. GPR always improves accuracy relative to OLS, but 3DAE feature extraction only improved accuracy for males.
OLS regression to body composition with 3DAE features resulted in very low correlations with DXA due to the nonlinearity of the deep features and was not reported. We also tested concatenating all feature layers into a single multi-scale feature vector for GPR regression but did not achieve lower errors in doing so.
We rescaled the charts in Figure 3 by normalizing each column by the RMSE of the fully linear PCA_OLS model and plotted the values in Figure 4 for visual clarity. In males, all subsequent model permutations had RMSEs less than that of PCA_OLS (indicated by a normalized RMSE of 100%). For females, only leg lean increased in RMSE when moving from PCA_OLS to PCA_GPR. However, as previously observed most RMSEs increased in females when incrementing PCA to 3DAE. When comparing nonlinear methods to linear methods, GPR is more accurate for body composition prediction in most metrics, but 3DAE is not always a better feature extraction method for body composition prediction from shape relative to linear PCA in females.
Test-retest precision of percent fat and visceral fat estimation is shown in Table 3. For visceral fat mass precision, Coefficient of Variation (%CV) was calculated according to the definition in Glüer et al. [30] between two scans of the same individual in the test set taken on the same day. RMSE between trial 1 and trial 2 was shown for percent fat precision as the %CV of a percentage measurement is not used in convention.
GPR was more precise on retests than OLS as shown by the PCA-OLS versus PCA-GPR trials, where the latter resulted in up to a 30% decrease in precision error. 3DAE also decreased precision error relative to PCA in both sexes, as illustrated by the 3DAE-GPR versus PCA-GPR trials. 3DAE coupled with nonlinear GPR had the highest precision. Compared to DXA, percent fat precision error for 3DAE-GPR was roughly twice as high. However, visceral fat precision was lower than DXA, indicating the 3DAE model paired with GPR prediction is much more reliable on retest accuracy than prior work. The larger d=4284 3DAE model was comparable in precision to the smaller model. 3DAE was more precise than PCA, and GPR was more precise than OLS.
Table 3. Test-retest precision between repeat 3DO scan pairs of each participant in the SUA test set on the same scan device. 3DAE models were benchmarked with GPR trained from their bottleneck layers to standardize the comparison between model sizes. 3DAE was more precise than PCA, and GPR was more precise than OLS.
Visceral fat % CV
|
3DAE-GPR 301
|
3DAE-GPR 4284
|
PCA-GPR 4284
|
PCA-OLS 4284
|
Tian et al. 2022 [15]
|
DXA (Criterion)
|
Males (n = 143)
|
4.4%
|
5.0%
|
5.2%
|
7.2%
|
8.8%
|
4.8%
|
Females (n=199)
|
5.5%
|
5.5%
|
6.2%
|
8.6%
|
12.3%
|
6.3%
|
Percent fat precision RMSE
|
3DAE-GPR 301
|
3DAE-GPR 4284
|
PCA-GPR 4284
|
PCA-OLS 4284
|
Tian et al. 2022 [15]
|
DXA
(Criterion)
|
Males (n = 143)
|
1.0%
|
0.9%
|
1.1%
|
1.6%
|
1.9%
|
0.5%
|
Females (n=199)
|
1.2%
|
1.2%
|
1.4%
|
2.0%
|
2.9%
|
0.5%
|
Comparisons of our 3DAE-GPR models against prior work are shown in Table 4 using the d=4284 latent size model and Gaussian process regression with the best performing feature layer (third for males, first for females). Percent fat (PFAT) and visceral fat mass (VFAT) were selected as the comparative target variable as they achieved the lowest accuracy in previous works based on linear models.
GPR using the exact same PCA features of Tian et al. (PCA-GPR M391/F457) lowered the RMSE from OLS for percent fat in both sexes and in female visceral fat but increased it slightly in male visceral fat by 8%. The models presented in [15] were built on a dataset containing an order of magnitude fewer members than what we presented in this work. These results suggest linear pipelines may be more competitive with nonlinear methods when training data quantity is more limited. We included RMSEs for our best performing, maximum size PCA model (PCA-GPR 4284) to demonstrate that increased parameter count does not cause overfitting and degrade accuracy relative to a sparser model.
Our fully nonlinear model, 3DAE-GPR, produced the lowest error on visceral fat and percent fat estimation compared to all prior work on 3DO body composition estimation (bolded). We note that compared to Tian et al (2022), our best 3DAE-GPR model achieved lower RMSE on all of the 10 body composition metrics measured in Figure 4, shown in Supplementary Table 2. However, predicted metrics other than percent fat and visceral fat already showed high correlation with DXA in prior work using linear methods. Thus, we focused the comparison to prior work in Table 4 on the metabolically significant and previously underperforming predictions of percent fat and visceral fat. The test set used in this work was held the same as [15]. However, the training dataset was greatly expanded in size and scope relative to prior works.
Table 4. Root-mean-squared errors (RMSE) for predicted percent fat (PFAT) and visceral fat (VFAT) of all current 3D-optical body composition prediction literature on Shape Up! Adults compared to the 3DAE-GPR prediction of the d=301 and d=4284 models using the most accurate feature layer identified in Figure 3. Best performing values are bolded.
Paper
|
N test meshes
|
PFAT RMSE (%)
|
VFAT RMSE (kg)
|
Ng et al (2019) Anthro only
|
M: 177
F: 230
|
M: 4.03
F: 3.99
|
M: 0.15
F: 0.14
|
Ng et al (2019) [15]
|
M: 177
F: 230
|
M: 3.55
F: 3.88
|
M: 0.14
F: 0.13
|
Tian et al (2020) [28]
|
M: 31
F: 39
|
M: 3.90
F: 3.29
|
M: 0.15
F: 0.17
|
Wong et al (2021) [31]
|
M: 159
F: 202
|
M: 2.73
F: 3.46
|
M: 0.13
F: 0.13
|
Tian et al (2022) [15]
|
M: 182
F: 248
|
M: 3.24
F: 4.22
|
M: 0.12
F: 0.14
|
PCA-GPR M391/F457 [15]
|
M: 182
F: 248
|
M: 2.79
F: 3.09
|
M: 0.13
F: 0.12
|
PCA-GPR 4284
|
M: 181
F: 239
|
M: 2.68
F: 2.85
|
M: 0.11
F: 0.11
|
3DAE-GPR 4284
|
M: 181
F: 239
|
M: 2.50
F: 2.81
|
M: 0.11
F: 0.11
|
Ablation results
Figure 5 shows a mesh reconstruction using a d=4284 model trained with 400 epochs on the finetuning ensemble training data from 1) a random initialization state and 2) from a pretrained initialization trained with 200 epochs using DFAUST data only. The model trained from a random initialization achieved 22.7mm MAE on test data reconstruction, more than 10x the error of the error show in Table 3. The pretraining steps using 40,000 DFAUST meshes was essential for creating an accurate shape model using a nonlinear 3DAE.
Figure 6 shows visualizations of geometric reconstruction error as heat maps for a male and female subject before and after fine-tuning the d=4284 3DAE model with high resolution Shape Up! and CAESAR data. Without fine-tuning, the 3DAE model was equivalent to the work presented in Zhou et al. [18] trained exclusively on DFAUST data and achieved a 3D reconstruction error of 8.7 mm, as opposed to the 2.0 mm shown in Table 2 for the finetuned model. This model generalized very poorly to unseen scans of many unique individuals such as in Shape Up!, as DFAUST only contained 10 unique individuals captured in thousands of different poses. A 3DAE model for clinical machine learning applications needs to generalize well to any individual scanned from the general population in a neutral pose. Our fine-tuned model represented the non-rigid, identity-dependent deformations of unique individuals much more accurately.
Withholding non-SUA data from 3DAE training did not improve 3D reconstruction error on SUA test meshes. In all three ablation trials, reconstruction accuracy on the same SUA test set (2.58mm, 2.71mm, 2.76mm for removing only SUK, only CAESAR, and both, respectively) was worse than the model trained on all data combined (reconstruction accuracy of 2.57) in Table 3. This ablation result validates the inclusion of a diverse dataset across multiple collection protocols.
GPR models trained on the 3DAE model excluding SUK data showed 1% difference in RMSE in both directions. Not all target variables were uniformly lower in error when including versus excluding SUK. The variation could be attributed to noise and does not justify excluding SUK. Excluding CAESAR data from the training scans had similarly negligible effects on male prediction accuracy but increased female RMSEs without exception by up to 4%. Excluding both SUK and CAESAR produced results similar to the case where only CAESAR was excluded, but with even higher errors in females. Overall, we found that including all available 3D mesh data when training shape and regression models produced the lowest errors for 3D reconstruction and body composition prediction.