Body composition is an important subject in oncology because its modifications reflect the way in which cancer affects body mass status, having implications in patients’ nutrition, symptoms, and treatments. While weight loss has been considered a prognostic factor, the decrease in specific compartments could be more relevant. Losses in lean body mass (LBM) can result in a wide range of physiological impairments. This metabolically active compartment plays a role in immune function, glucose metabolism, protein synthesis and mobility [1–3]. Additionally, body composition has been linked to cancer patients’ outcomes [3, 4]. Sarcopenia was associated with worse overall survival [5] and post-surgical complications [6–9]. Sarcopenia and LBM could be better to determine drug dose than body-surface area (BSA) or flat-fixed dosing [10, 11]. There is a significant association between sarcopenia and a decrease in LBM with toxicity across different oncology treatments, tumor types, and stages; suggesting an effect of sarcopenia on pharmacokinetics [3, 4, 10–16].
Several methods are available to determine body composition [17], such as anthropometry, computed tomography, and magnetic resonance [18–21], dual-energy X-ray absorptiometry (DXA) [22], and bioelectrical impedance analysis (BIA) [23]. Most of them imply high costs and are not applied in clinical practice. Anthropometry, through predictive models, has been used in some clinical fields but it has the disadvantage of having been developed only on samples of healthy people and its implementation needs training, specific supplies, and time [24]. Additionally, these predictive models have methodological issues that could be considered. For instance, all of them use linear models [25–29]. Nowadays, there is a wide set of statistical and computational tools, such as machine learning, to reach a better understanding of data (see Supplementary Material: summary of machine learning and imputation concepts) [30]. There is a variety of modern machine learning techniques which can be used to predict quantitative variables, as ridge regression, lasso regression, and generalized additive models (GAM) [30]. On the other hand, machine learning methods, which share the concept of learning from the data with machine learning, apply computational algorithms to resolve their tasks [31]. One example is random forests (RF). All these techniques, from classical to most sophisticated ones, have a key point: the way in which they can model the data. There are more restrictive methods, as linear regression, and other flexibles ones as ridge, lasso, GAM with smoothing splines and RF [30, 32]. With regards to how to select the most important variables, some techniques were developed to solve it as best subset selection (BSS) or lasso [30]. Cross-validation, a strategy to avoid overfitting, is another important aspect to consider when a predictive model is developed. Finally, missing data is a frequent problem that could introduce bias and weaken generalizability [33]. Thus, imputation methods are useful tools to handle it [34, 35].
In conclusion, body composition has prognostic value, treatment implications, and is related to patients’ symptoms and care. Specific devices, as DXA or BIA, can measure it. However, these are used in research and they are not implemented in daily clinical practice. Furthermore, neither equations nor predictive models applying clinical variables have been built in cancer patients to estimate body composition, especially considering current machine learning methods.
We performed a study to develop two predictive models to estimate body fat mass and skeletal muscle mass with clinical variables, applying several modern statistical techniques, to analyze the performance of machine learning methods and to develop a practical everyday tool.