As the fourth most deadly cancer and the sixth most common cancer in the world, liver cancer poses a serious threat to human health [1]. The prevention and treatment of liver cancer require effective clinical diagnosis technologies [2]. Computed tomography (CT) or magnetic resonance imaging (MRI) have been widely used to detect abnormalities of the liver by analyzing the shape and texture of the liver captured in these medical images[3, 4]. In most clinical applications, delineation and detection of livers and liver tumors are still manually performed by human experts, such as radiologists and oncologists. However, manual image processing is time-consuming and heavily relies on experience. Therefore, it is necessary to develop an effective method for automatically identifying or measuring liver tumors in 3D space to facilitate diagnosis and treatments. So far, fully automatic segmentation of liver and liver tumors from medical images is still impeded by several obstacles[5]. First, the intensity and resolution of liver images are sensitive to different acquisition schemes and modalities of scanners; Second, the shape, size, number, and locations of liver tumors are diverse. Last but not least, it is difficult to distinguish between a liver tumor and surrounding liver tissues due to low contrast.
Many automated, semi-automated, and interactive segmentation methods have been proposed for liver and liver tumor segmentation [6–11]. Compared with manual segmentation, these methods indeed improved the efficiency of segmentation. However, these algorithms were typically designed for specific clinical applications, so it is hard to deploy them in other applications without strong expertise in algorithm development and careful calibration. In the recent two decades, deep learning, which employs convolutional neural networks with many hidden layers in the depth, has gained considerable attention. Numerous studies and machine learning competitions have demonstrated that they can achieve impressive performance in tackling a variety of image processing tasks[12]. As high-performance computing sources (such as GPUs) and open-source deep learning libraries have been available, it becomes more convenient to implement deep learning models, and the time for training millions of model parameters using a large dataset is significantly decreased. Therefore, in this study, we tackled liver tumor segmentation using deep learning.
Despite the merits of deep learning, deep convolutional networks are susceptible to suffering gradient vanishing in their shallow layers, because their weights are often trained by backpropagation of gradients from their deep layers in stochastic optimization[13]. As a result, deep networks may be trained very slowly and even cannot be trained toward the required accuracy in performing tasks. Therefore, we proposed a Dynamic Context Encoder Network (DCE-Net) to perform automatic liver tumor segmentation from CT images. The DCE-Net features the U-Net architecture, but it included multiple mechanisms such as dynamic convolution and context extraction to promote feature extraction as well as feed more accurate context information from downsampling layers to upsampling layers. A liver tumor CT image dataset of LiTS2017 was used to train and test the DCE-Net, which segments the liver tumor region on each CT slice. The 3D model of a liver tumor was ultimately reconstructed by integrating segmented tumor masks on all CT slices. As the LiTS2017 dataset only includes 194 CT volumes of liver tumor patients, learning CT slices can greatly boost samples for deep neural network training, which needs sufficient data to avoid high bias and variance.
With the advent of the fully convolutional neural network (CNN), many researchers have used it in medical image processing and proved its utility for liver tumor segmentation [14–16]. Chung et al. introduced a CNN for liver segmentation on abdominal CT images and investigated the performance of model generalization and accuracy. They showed that an auto-context algorithm can improve the generalization of CNN[17]. Hong et al. proposed an automatic segmentation framework based on a 3D U-Net with dense connections and global refinement. They learned a probability map of the liver boundary as a prior shape to reduce the influence of the surrounding tissues on the detection of the tumor region[18]. Luan et al. introduced an attention mechanism to their network (S-NET) for segmentation of liver tumors from CT images. The attention mechanism was used to find the corresponding relationship between the layers in the contraction and expansion paths[19]. Hu et al. proposed a scale-attention deep learning network (SA-NET), which extracted features with different scales in the residual modules, and the scale-attention capability was enforced by the attention module[20]. Zhang et al. proposed a dynamic scale attention mechanism that assigns adaptive weights to multi-scale convolutions, and modeled spatial long-range dependencies by Axis Attention to improve the capability of model generalization[21]. Wu et al. proposed a hybrid framework for liver tumor segmentation in multi-phase images and developed a cascade region-based CNN to locate the tumors[22].
Although impressive results have been achieved in previous works, several limitations remain to be further addressed. First, the architecture of CNNs has become more sophisticated due to incorporating more structures and mechanisms, whereas they appear to be less effective in the fusion of the spatial feature information from the downsampling path to the upsampling path. In addition, when CNNs are implemented to directly segment 3D medical images (e.g., CT and MRI), 3D convolution requires more trainable weights than 2D convolution. Thus, it may lead to overfitting and poor generalization, especially when the amount of 3D medical images for training is insufficient.