The in vivo evaluation of body composition is essential in many clinical investigations in order to accurately describe and monitor the status of a range of medical conditions in patients with cancer[12,13], osteoporosis[14], and many other diseases[15]. In most research and clinical practices, assessing body compositions in CT images, such as skeletal muscle, visceral adipose, and subcutaneous adipose, requires manual labeling and measurement. This repetitive and tedious process significantly limits related research and clinical practice progress. To overcome this challenge, we have developed a web application that uses a well-performed model to automatically identify skeletal muscle and adipose tissue in L3 CT images and calculate their area. This highly automated approach opens up the possibility of large-scale screening, making it easier for doctors to estimate body composition and use it for research and diagnosis.
We experimented with different attention algorithms to improve the segmentation performance of the Unet framework. We incorporated attention gates, SEblock, and ASPP into every model layer. This resulted in the L3BCSM model being as effective as other segmentation frameworks like UNETR[16] and AHNet[17]. The Attention UNet architecture uses attention gates to enhance feature representations during the upsampling and downsampling processes. The attention gate is a crucial component that connects the encoder and decoder paths in a neural network. It comprises two main branches: the Skip Connection and the Complementary Path[18]. The Skip Connection copies the high-resolution feature map from the encoder path and passes it to the decoder path. The Complementary Path is a low-resolution feature map from the encoder path that undergoes a set of operations, typically involving convolutional and activation layers, to compute attention weights adaptively. These attention weights represent the importance of each spatial location in the low-resolution feature map concerning the corresponding locations in the high-resolution feature map[19]. The attention weights are then applied to the Skip Connection, emphasizing or de-emphasizing different spatial locations based on their importance. This adaptive fusion of features helps the network focus on relevant details during upsampling. The SEblock structure enhances convolutional neural networks (CNNs) by capturing channel-wise dependencies. It involves two steps: squeeze and excitation. The squeeze step reduces spatial dimensions, yielding a 1D vector for each channel via global average pooling. The excitation uses fully connected layers to capture inter-channel dependencies and produce scaling factors. The final output is obtained by scaling the original feature maps with the computed factors, allowing the network to emphasize or suppress specific channels adaptively for improved representation[20]. The ASPP structure, introduced in DeepLab, consists of parallel atrous convolutional layers that capture multi-scale information and a global average pooling layer for global context [21]. These features are then combined and processed to generate refined predictions in tasks like semantic segmentation. This design enables the network to effectively understand local and global contextual information, leading to superior performance.
Using attention gates, SEblock, and ASPP in the UNet architecture allows the model to selectively attend to informative regions and channel and scale level information, improving its ability to capture fine details and handle complex relationships between features in medical image segmentation and other tasks. The L3BCSM is a body composition assessment model using multiple CT image sources as the training dataset, including liver, gastric, and colorectal cancer patients, et al. This makes it more robust in clinical practice in gastroenterology compared with existing models in which a single image source was used as a training dataset[7,12,22]. The external test results in cohort 2 confirm this. Additionally, the external test in cohort 3 suggests that this model is suitable for L3 CT images from intra-abdominal sepsis patients. Meanwhile, the training dataset contains two types of CT images: plain scan and enhanced-contrast scan. Therefore, the model was supposed to adapt to both kinds of CT images. Based on these characteristics of traninng data, the ideal parameters for the input CT image for L3BCSM are L3 CT images with plain or enhanced-contrast scans from patients with gastroenterological tumors or abdominal infections.
To assess the generalizability and robustness of L3BCSM, multiple external tests were conducted using CT images from different medical centers. These tests utilized datasets with diverse parameters, allowing for a comprehensive evaluation of the model's performance. Cohort 2 consisted of a dataset with parameters similar to the L3BCSM training dataset, including CT image types (plain and enhanced-contrast) and patient demographics (abdominal disease patients). As expected, L3BCSM performed well on this dataset, demonstrating its effectiveness for body composition segmentation within its intended domain. Cohort 3 presented a more challenging scenario with CT image parameters deviating from the training data. Despite the model's ability to accurately segment body compositions, the performance metrics indicated poor performance. This apparent contradiction was investigated further through manual result checking, which revealed a critical inconsistency. The human labeling in cohort 3 was identified as the source of the discrepancy, with human labels being significantly less accurate than the model's segmentation. Consequently, the misleading human labels led to the mischaracterization of L3BCSM's performance in cohort 3.
The L3BCSM model serves as the central component of this application, driving its functionality and enabling various usage scenarios. It seamlessly interacts with other complementary elements, including data frame creation, plotting, and statistical analysis, to present a comprehensive and informative user experience. The application offers four distinct functionalities, each catering to specific needs. The “Population Analysis” module allows users to explore and understand the distribution of body compositions within specific populations, which is the most common study type[7,23]. L3BCSM's output is presented as data tables and visualizations, providing insight into the data characteristics. The “Time Series Analysis” module provides a platform for tracking individual patients' temporal changes in body composition over a defined period, like in a case study following a drug-free male bodybuilder for six months[24]. L3BCSM's output is displayed as data tables and charts, enabling users to monitor and analyze the body composition trends within a patient. Before utilizing the L3BCSM model, it’s necessary to ensure that the input CT image meets the model's requirements and that the calculated results are reliable. In the “Consistency Test” module, Bland-Altman plots and student’s t-tests are used to reflect the consistency between methods[25], which were used as supporting modules to assess the consistency and accuracy of the model's predictions. The “Manual Result Checking” module allows users to manually verify the L3BCSM's segmentation results, providing more confidence in the model's performance besides the above consistency test. This is particularly helpful when working with new datasets and indicating which method is the better. Before using the "Population Analysis" module, it is recommended to run the "Consistency Test" and "Manual Result Checking" modules on the new dataset. This ensures the application's compatibility with the new dataset and facilitates reliable analysis.