DCE-Net: A Dynamic Context Encoder Network for Liver Tumor Segmentation

doi:10.21203/rs.3.rs-2272616/v1

Download PDF

Research Article

DCE-Net: A Dynamic Context Encoder Network for Liver Tumor Segmentation

https://doi.org/10.21203/rs.3.rs-2272616/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Segmentation of a tumor region from medical images is critical for clinical diagnosis and the planning of surgical treatments. Recent advancements in machine learning have shown that convolutional neural networks are powerful in such image processing while largely reducing human labor. However, the variant shapes of liver tumors with blurred boundaries in medical images cause a great challenge for accurate segmentation. The feature extraction capability of a neural network can be improved by expanding its architecture, but it inevitably demands more computing resources in training and hyperparameter tuning. This paper presents a Dynamic Context Encoder Network (DCE-Net), which incorporated multiple new modules, such as the Involution Layer, Dynamic Residual Module, Context Extraction Module, and Channel Attention Gates, for feature extraction and enhancement. In the experiment, we adopted a liver tumor CT dataset of LiTS2017 to train and test the DCE-Net for liver tumor segmentation. Experimental results showed that the precision, recall, Dice, and AUC were 0.8961, 0.9711, 0.9270, and 0.9875, respectively, outperforming other state-of-the-art networks. Furthermore, our ablation study shows that the accuracy and training efficiency are markedly superior to those networks without involution or dynamic residual modules. Therefore, the proposed DCE-Net has potential for the automatic segmentation of liver tumors in clinical settings.

Dynamic Context Encoder Network (DCE-Net)

Liver tumor

Segmentation

Feature extraction

Attention

As the fourth most deadly cancer and the sixth most common cancer in the world, liver cancer poses a serious threat to human health [1]. The prevention and treatment of liver cancer require effective clinical diagnosis technologies [2]. Computed tomography (CT) or magnetic resonance imaging (MRI) have been widely used to detect abnormalities of the liver by analyzing the shape and texture of the liver captured in these medical images[3, 4]. In most clinical applications, delineation and detection of livers and liver tumors are still manually performed by human experts, such as radiologists and oncologists. However, manual image processing is time-consuming and heavily relies on experience. Therefore, it is necessary to develop an effective method for automatically identifying or measuring liver tumors in 3D space to facilitate diagnosis and treatments. So far, fully automatic segmentation of liver and liver tumors from medical images is still impeded by several obstacles[5]. First, the intensity and resolution of liver images are sensitive to different acquisition schemes and modalities of scanners; Second, the shape, size, number, and locations of liver tumors are diverse. Last but not least, it is difficult to distinguish between a liver tumor and surrounding liver tissues due to low contrast.

Many automated, semi-automated, and interactive segmentation methods have been proposed for liver and liver tumor segmentation [6–11]. Compared with manual segmentation, these methods indeed improved the efficiency of segmentation. However, these algorithms were typically designed for specific clinical applications, so it is hard to deploy them in other applications without strong expertise in algorithm development and careful calibration. In the recent two decades, deep learning, which employs convolutional neural networks with many hidden layers in the depth, has gained considerable attention. Numerous studies and machine learning competitions have demonstrated that they can achieve impressive performance in tackling a variety of image processing tasks[12]. As high-performance computing sources (such as GPUs) and open-source deep learning libraries have been available, it becomes more convenient to implement deep learning models, and the time for training millions of model parameters using a large dataset is significantly decreased. Therefore, in this study, we tackled liver tumor segmentation using deep learning.

Despite the merits of deep learning, deep convolutional networks are susceptible to suffering gradient vanishing in their shallow layers, because their weights are often trained by backpropagation of gradients from their deep layers in stochastic optimization[13]. As a result, deep networks may be trained very slowly and even cannot be trained toward the required accuracy in performing tasks. Therefore, we proposed a Dynamic Context Encoder Network (DCE-Net) to perform automatic liver tumor segmentation from CT images. The DCE-Net features the U-Net architecture, but it included multiple mechanisms such as dynamic convolution and context extraction to promote feature extraction as well as feed more accurate context information from downsampling layers to upsampling layers. A liver tumor CT image dataset of LiTS2017 was used to train and test the DCE-Net, which segments the liver tumor region on each CT slice. The 3D model of a liver tumor was ultimately reconstructed by integrating segmented tumor masks on all CT slices. As the LiTS2017 dataset only includes 194 CT volumes of liver tumor patients, learning CT slices can greatly boost samples for deep neural network training, which needs sufficient data to avoid high bias and variance.

With the advent of the fully convolutional neural network (CNN), many researchers have used it in medical image processing and proved its utility for liver tumor segmentation [14–16]. Chung et al. introduced a CNN for liver segmentation on abdominal CT images and investigated the performance of model generalization and accuracy. They showed that an auto-context algorithm can improve the generalization of CNN[17]. Hong et al. proposed an automatic segmentation framework based on a 3D U-Net with dense connections and global refinement. They learned a probability map of the liver boundary as a prior shape to reduce the influence of the surrounding tissues on the detection of the tumor region[18]. Luan et al. introduced an attention mechanism to their network (S-NET) for segmentation of liver tumors from CT images. The attention mechanism was used to find the corresponding relationship between the layers in the contraction and expansion paths[19]. Hu et al. proposed a scale-attention deep learning network (SA-NET), which extracted features with different scales in the residual modules, and the scale-attention capability was enforced by the attention module[20]. Zhang et al. proposed a dynamic scale attention mechanism that assigns adaptive weights to multi-scale convolutions, and modeled spatial long-range dependencies by Axis Attention to improve the capability of model generalization[21]. Wu et al. proposed a hybrid framework for liver tumor segmentation in multi-phase images and developed a cascade region-based CNN to locate the tumors[22].

Although impressive results have been achieved in previous works, several limitations remain to be further addressed. First, the architecture of CNNs has become more sophisticated due to incorporating more structures and mechanisms, whereas they appear to be less effective in the fusion of the spatial feature information from the downsampling path to the upsampling path. In addition, when CNNs are implemented to directly segment 3D medical images (e.g., CT and MRI), 3D convolution requires more trainable weights than 2D convolution. Thus, it may lead to overfitting and poor generalization, especially when the amount of 3D medical images for training is insufficient.

2.1 The Architecture of DCE-Net

The DCE-Net adopts the architecture of U-Net [23], consisting of an encoder and a decoder, as shown in Fig. 1. Differently, the DCE-Net incorporated four new components, including: 1) Involution Layer: replaces the first convolutional layer to process the input image and enhance the parameter aggregation capability on the channel [24]. 2) Dynamic Residual Module (DRM): replaces convolution with dynamic convolution and residual blocks to enhance the feature extraction capability [25]. 3) Context Extraction Module (CEM): is added at the bottleneck of the DCE-Net to compensate for the loss of spatial information after multiple downsampling layers [26]. 4) Channel Attention Gate (CAG): is set at the paths of skip connections in the upsampling layers of the decoder to improve the extraction accuracy of context features that experiences a loss of information due to downsampling [27, 28]. Details of these components are further described as follows.

2.1.1 Involution Layer

Involution combines self-attention and image convolution, as illustrated in Fig. 2. First, 1 × 1 convolutions were acted on each pixel of the input image (with the dimension of W × H × C) to expand the channel from 3 to K². Then, the single-pixel block (1 × 1 × K²) was reshaped to a K × K × 1 matrix, and was applied onto each channel (i.e., C = 1, 2, 3, respectively) of the pixel and its neighbor. Later, the K × K × C block surrounding each pixel was shrunk to 1 × 1 × C by summation contraction over the block dimensions of K and K. Ultimately, a new image with the same dimension of W × H × C was generated. In this study, we tried to reduce trainable weights by replacing traditional convolution with Involution. It has been reported that involution layers can decouple the information interactions tactfully to balance accuracy and efficiency[24].

2.1.2 Dynamic Residual Module (DRM)

In the encoder of the original U-Net, each downsampling layer contains two convolution layers and one maximum pooling layer. In our encoder, we adopted five dynamic residual modules (DRMs) as the feature encoder blocks (Fig. 1), where Dynamic Resnet-34 was used to replace the feature encoder module in the original U-Net[25]. The DRM with the shortcut mechanism[29, 30] can alleviate gradient vanishing caused by the increasing network depth. Moreover, it can accelerate the convergence speed of network and reduce the computational cost[25].

The DRM is built by incorporating dynamic convolution (Fig. 3A) into the residual block (Fig. 3B). Dynamic convolution applies attention operations such as average pooling layer, fully connected (FC) layer, rectified linear unit (ReLU) activation, and softmax layer, to obtain the kernel weights of convolutions. Then these weighted convolutions were combined to obtain an effective convolution. Dynamic convolution can dynamically aggregate multiple parallel convolution kernels based on attention, so it is input dependent. Assembling of multiple kernels is not only computationally efficient due to the small kernel size, but also increases representation power via attention in a highly non-linear way. Therefore, Dynamic convolution can achieve a good balance between performance and computational cost.

2.1.3 Context Extraction Module (CEM)

The context encoder module (CEM) in this DCE-Net is composed of a Dense Atrous Convolution (DAC) block and a Residual Multi-kernel Pooling (RMP) block[26]. The CEM can extract and integrate contextual semantic information to generate high-level feature maps.

By combining the atrous convolutions with different dilation rates and reception fields, the DAC block is able to extract features of objects with various sizes. As shown in Fig. 4, atrous convolutions were stacked in a cascade mode[31–33]. In total, there are four cascade branches with the gradually increasing number of atrous convolutions, as well as a shortcut connection that fed the input features directly to the output features. In each atrous branch, the last convolution was activated by a ReLU. The dilation rates (representing a spacing between values in a kernel) of all atrous convolutions were selected to be the odd numbers of 1, 3, or 5. Similar to the Inception structures[31], each atrous branch employed different receptive fields. The size of the receptive field in each branch is recursively calculated according to Eq. (1):

$${l}_{k}={l}_{k-1}+\left({f}_{k}-1\right)\prod _{i=1}^{k-1}{s}_{i}$$

where k = 1, 2, ... represents convolution layers from the input to the output in each branch. ${l}_{k}$ is the size of the receptive field at the current layer k, and ${l}_{0}$ = 1 is set. ${f}_{k}$ is the kernel size of atrous convolution at layer k; ${f}_{k}$ = dilation rate × (kernel size − 1) + 1. ${s}_{i}$ is the stride at layer i, and ${s}_{0}$ = 1 is set (in the DAC block, the strides of all layers were chosen as ${s}_{i}$ = 1). Therefore, the sizes of receptive fields of each atrous branch in this DAC were 3, 7, 9, and 19, respectively. The convolution with a larger reception field can extract more abstract features for large objects, while the convolution with a smaller reception field extracts more detailed features for small objects.

Because the sizes of liver tumors could vary in different development stages, RMP considering multiple effective receptive fields was introduced to capture tumor features with different sizes. In contrast to common max-pooling layers with a single kernel size [34], RMP encodes global context information using four max-pooling layers with the kernel sizes of 2 × 2, 3 × 3, 5 × 5 and 6 × 6, respectively, as illustrated in Fig. 5. Then, a 1 × 1 convolution was applied after each max-pooling. The resulting low-dimensional feature maps were upsampled via bilinear interpolation to restore the dimension to that of the input feature map, meanwhile concatenating with the input feature map via a skip connection.

2.1.4 Channel Attention Gate (CAG)

In the CT image dataset, the contrast between liver tumors and surrounding tissues is low, and liver tumors could be irregular and discontinuous. Therefore, it is necessary to incorporate an effective attention mechanism for detecting and segmenting liver tumors. To that end, a Channel Attention Gate (CAG) was introduced to suppress irrelevant regions while highlight salient features in a feature map. Additionally, we incorporated channel attentions modeled by squeeze-excitation networks into the original attention gate [27, 28]. As shown in Fig. 6, attention weights (i.e., the output of the Sigmoid activation), that were learned from both the downsampling and upsampling, were multiplied by the input upsampling to generate an updated upsampling. Furthermore, we used average-pooled and max-pooled features simultaneously to compute channel attentions for both the input and updated upsamplings in the CAG. The CAG can be easily integrated into the U-shaped CNN architecture (Fig. 1) with minimal computation overhead while effectively improving representation power and model sensitivity[35].

2.2 Experimental Design

The CT images used in the experiment were collected from the public dataset LiTS2017[41, 42]. In the LiTS2017 dataset, there are image data of 201 patients, of which 194 patients were diagnosed with liver tumors. Figure 7 presents a CT slice and the labeled liver tumor from LiTS2017. In the experiment, all CT data of the 194 patients capturing the tumor regions were included, and they were further divided into a training set, validation set, and test set using a splitting ratio of 7:2:1. The image size of each CT slice is 512×512.

The segmentation performance of the DCE-NET was compared with three well-established network architectures: U-Net[23], Attention U-Net[27], and CE-Net[26]. To evaluate the segmentation performance comprehensively, we adopted evaluation criteria including Precision, Recall, and Dice[43], which were defined below:

Where TP, TN, FP, and FN represent the numbers of true positives, true negatives, false positives, and false negatives, respectively.

In addition, we also measured the segmentation performance using the Area Under the Curve (AUC) of the receiver operating characteristic[44]:

$AUC=\frac{1}{2}\sum\limits_{{i=1}}^{{m - 1}} {\left( {{x_{i+1}} - {x_i}} \right)} \cdot \left( {{y_i}+{y_{i+1}}} \right)$

(5)

where m is the total number of samples, x is the false positive rate, and y is the true positive rate.

2.3 Training

For segmenting liver tumors from CT images, binary classification is required to predict the probabilities of liver tumor tissues located at each pixel of a CT image. The proposed DCE-Net (Fig. 1) was trained to perform such segmentation task, using a loss function which is written as:

$$L={L}_{dice}+{L}_{reg}$$

where ${L}_{reg}$ represents the regularization loss to prevent overfitting [36]. ${L}_{dice}$ is the dice loss, which is defined as:

$${L}_{dice}=1-\frac{\sum _{i}^{N}{p}_{i}{g}_{i}}{\sum _{i}^{N}{p}_{i}^{2}+\sum _{i}^{N}{g}_{i}^{2}}$$

where N is the number of pixels in an input image. $0\le {p}_{i}\le 1$ and ${g}_{i}\in$ [0, 1] represent the predicted probabilities and ground truth labels (0: not tumor; 1: tumor) at each pixel, respectively. As a liver tumor is typically a small portion of a CT image, the number of tumor pixels is much smaller than that of background pixels, causing a high imbalance in the training examples. Compared to the binary cross-entropy loss, the dice loss is much more effective for training networks to achieve high accuracy for segmentation [37–39].

In this study, all neural networks were developed and implemented using the open-source machine learning framework of PyTorch in a Windows 10 desktop, which includes two NVIDIA GeForce RTX 3090 graphics cards and a total of 48G GPU memory[40]. The Adam optimizer was adopted for batch training with a batch size of 8 and maximum iterations of 500. The initial learning rate of the optimizer was set as 0.004. Using a polynomial learning rate decay strategy, the learning rate was determined by multiplying the initial learning rate by (1 – iter / iter_max)^power, where the power was chosen to be 0.9. The momentum and weight decay of the optimizer were set to 0.9 and 0.0001, respectively.

3.1 Training Performance

Figure 8 shows the loss learning curves of each network during training undergoing 500 epochs in total. Oscillations occurred in the loss curve of the DCE-Net when it converged in the initial 10 epochs, potentially in response to the change of the learning rate. During the gradient descent process, the learning rate was scheduled to be large in the early stage, so the optimal weights of the DCE-Net may be overshot. Although other networks converged faster initially (in the first 20 epochs), ultimately (at epoch 500), the loss value of the DCE-Net is the lowest among these networks, suggesting the better optimality of the DCE-Net.

Table 1 shows the numbers of trainable parameters and computational cost required by each network. The number of parameters of DCE-Net is about 2 million, approximately 1 order less than those of other networks. Due to fewer floating-point operations per second (FLOPS) and multiply–accumulate (MAC) operations, it was trained at a speed of 10.08 epochs/h, approximately twice faster than other networks. Therefore, it is suitable to deploy the DCE-Net on smaller platforms.

Table 1

Training performance of the DCE-Net, U-Net, Attention U-Net and CE-Net.
Models	Params	FLOPS (G)	MAC (G)	Time (100 epochs/h)
U-Net	13,395,329	95.06	89.84	4.24
Attention U-Net	34,878,573	204.00	407.57	8.56
CE-Net	29,003,093	18.25	39.97	5.22
DCE-Net	2,659,593	2.51	14.93	10.08

3.2 Segmentation Accuracy

Quantitative results of liver tumor segmentation in the testing set (i.e., the 10% data that were not used during training) are shown in Table 2. The Precision, Recall, Dice, AUC of the DCE-Net are 89.61%, 97.11%, 92.70%, and 98.75%, respectively. In general, these accuracy measures of the DCE-Net are better than all other networks, except that the recall of the DCE-Net is secondary to the CE-Net.

Table 2

Accuracy measures of liver tumor segmentation using the DCE-Net, U-Net, Attention U-Net, and CE-Net
Models	Precision	Recall	Dice	AUC
U-Net	0.8711	0.9703	0.9161	0.9804
Attention U-Net	0.8776	0.9631	0.9177	0.9809
CE-Net	0.8605	0.9768	0.9120	0.9847
DCE-Net	0.8961	0.9711	0.9270	0.9875

In addition, 2D segmentation and 3D reconstruction of individual liver tumors using different networks are visually presented in Fig. 9 and Fig. 10, respectively. Although CT slices contain liver tumors with different scales, our proposed DCE-Net can achieve good segmentation results, due to the adaptive feature extraction capability of dynamic convolution.

To demonstrate the effectiveness of each module in the DCE-Net, we examined the performance of networks by gradually adding involution, dynamic convolution, and attention, respectively, as shown in Table 3. Compared to the basic network (ResNet-34 + Convolution), both the accuracy and dice were improved by 0.59% and 0.38% after adding involution, by 1.89% and 0.46% after adding dynamic convolution, and by 3.48% and 0.84% after adding attention. This demonstrates that every module can enhance the performance of the networks. Furthermore, after adding Involution and dynamic convolution, the number of network parameters and FLOPS of the DCE-Net are 1/15 and 1/9, respectively, compared to the basic network.

Table 3

Improvement of accuracy measures and computational efficiency by adding involution, dynamic convolution, and attention.
Backbone	Neck	Precision	Dice	Params	FLOPS (G)
ResNet-34	Convolution	0.8605	0.9120	29,003,093	18.25
ResNet-34	Involution	0.8664	0.9158	22,728,021	17.02
Dynamic ResNet-34	Involution	0.8853	0.9204	8,337,325	3.67
Dynamic ResNet-34	Attention + Involution	0.8961	0.9270	2,659,593	2.51

In this study, we proposed a new neural network, DCE-Net, to segment liver tumors. The precision, recall, dice, and AUC of tumor segmentation using the DCE-Net were 0.8961, 0.9711, 0.9270, and 0.9875, respectively, outperforming other well-established networks, such as the U-Net, Attention U-Net, and CE-Net. The excellent performance was contributed by multiple newly introduced modules. To get more valid information from the input image, traditional convolution was replaced with involution as the first feature extraction layer. The involution layer can promote sharing of channels and increase the spatial receptive fields, while reducing the number of parameters. To effectively extract liver tumor features, the dynamic residual module was introduced, because dynamic convolution can provide strong representation power by parallel convolutions. Meanwhile, at the bottleneck of the DCE-Net, a Context Extraction Module was used to compensate for the loss of spatial information due to multiple downsamplings. In addition, we also introduced the channel attention gate, which was applied to the skip connections. It can overcome the problem of semantic information mismatch caused by skip connections. As a result, it enabled enhancing the semantic information of tumors and detecting blurring tumor edges. Therefore, the segmentation performance and computation efficiency of the DCE-Net were simultaneously improved, as shown in Table 3.

As only 194 CT volumes of patients’ liver tumors were available in LiTS2017, the dataset may not be sufficient to train a 3D convolutional neural network for direct 3D reconstruction. Therefore, we adopted 2D segmentation predictions and developed the DCE-Net based on the 2D U-Net architecture. In this way, the size of training data was significantly expanded (for example, a CT volume with an average dimension of 512 × 512 × 120 in LiTS2017 includes 120 axial CT slices), such that it can effectively avoid overfitting and better generalize the trained model. Moreover, through predicting 2D segmentation of tumors on each CT slice, the resulting tumor masks can be assembled to create 3D liver tumor models, as demonstrated in Fig. 10. However, it should be acknowledged that the 3D context in the CT image was not directly leveraged in 2D segmentation. Future work should further investigate whether 3D convolution can achieve excellent performance for 3D reconstruction of liver tumors. In addition, pre-clinical tests of 2D segmentation using the DCE-Net are required for clinical applications.

To achieve automatic segmentation of liver tumors, we proposed the DCE-Net, which provides an effective automatic approach to liver tumor segmentation. In the DCE-Net, multiple new modules, including the Involution layer, dynamic residual module, context extraction module, and channel attention gate, were introduced to promote feature extraction and capture semantic information. We demonstrated that the DCE-Net can be trained efficiently to achieve better accuracy for 2D segmentation of liver tumors. The resulting 2D tumor masks generated from 2D segmentation using the DCE-Net can be assembled to create 3D tumor models. It is anticipated that the DCE-Net can assist in rapid clinical diagnosis and decision-making for liver cancer treatments.

Funding This work was supported in part by the National Natural Science Foundation of China under Grant 61961028.

Data availability The data and materials used in the experiment are from https://drive.google.com/drive/folders/1V6X3CwnHMoVyuArASiNgoOcC5N4oNpLA

Code availability

The codes used during the study are conditionally available from the authors by request.

Conflict of interest The authors declare that they have no conflicts of interest/competing interests.

Ethical approval This article does not contain any studies with human participants and animals performed by any of the authors. Public dataset is used.

Informed consent Informed consent was obtained from all individual participants included in the study.

Sung H, Ferlay J, Siegel R L, Laversanne M, Soerjomataram I, Jemal A, Bray F(2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians 71(3): 209-249. https://doi.org/10.3322/caac.21492
Haugen A S, Softeland E, Almeland S K, Sevdalis N, Vonen B, Eide G E, Nortvedt M W, Harthug S(2015) Effect of the World Health Organization Checklist on Patient Outcomes: A Stepped Wedge Cluster Randomized Controlled Trial. Annals of Surgery 261(5): 821-828. https://doi.org/10.1097/SLA.0000000000000716
Menon B K, D’esterre C D, Qazi E M, Almekhlafi M, Hahn L, Demchuk A M, Goyal M(2015) Multiphase CT angiography: a new tool for the imaging triage of patients with acute ischemic stroke. Radiology 275(2): 510-520. https://doi.org/10.1148/radiol.15142256
Zhao J, Li D, Xiao X, Accorsi F, Marshall H, Cossetto T, Kim D, Mccarthy D, Dawson C, Knezevic S(2021) United adversarial learning for liver tumor segmentation and detection of multi-modality non-contrast MRI. Medical Image Analysis 73: 102154. https://doi.org/10.48550/arXiv.2201.02629
Durrani M Y, Yasmin S, Rho S(2022) An Internet of Medical Things Based Liver Tumor Detection System using Semantic Segmentation. Journal of Internet Technology 23(2): 363-375.
Conze P-H, Noblet V, Rousseau F, Heitz F, De Blasi V, Memeo R, Pessaux P(2017) Scale-adaptive supervoxel-based random forests for liver tumor segmentation in dynamic contrast-enhanced CT scans. International journal of computer assisted radiology and surgery 12(2): 223-233. https://doi.org/10.1007/s11548-016-1493-1
Dakua S P(2013) Use of chaos concept in medical image segmentation. Computer Methods in Biomechanics & Biomedical Engineering Imaging & Visualization 1(1): 28-36. https://doi.org/10.1080/21681163.2013.765709
Sun C, Guo S, Zhang H, Li J, Chen M, Ma S, Jin L, Liu X, Li X, Qian X(2017) Automatic segmentation of liver tumors from multiphase contrast-enhanced CT images based on FCNs. Artificial intelligence in medicine 83: 58-66. https://doi.org/10.1016/j.artmed.2017.03.008
Tomoshige S, Oost E, Shimizu A, Watanabe H, Nawano S(2014) A conditional statistical shape model with integrated error estimation of the conditions; Application to liver segmentation in non-contrast CT images. Medical Image Analysis 18(1): 130-143. https://doi.org/10.1016/j.media.2013.10.003
Gotra A, Sivakumaran L, Chartrand G, Vu K-N, Vandenbroucke-Menu F, Kauffmann C, Kadoury S, Gallix B, De Guise J A, Tang A(2017) Liver segmentation: indications, techniques and future directions. Insights into imaging 8(4): 377-392. https://doi.org/10.1007/s13244-017-0558-1
Zhang X, Tian J, Deng K, Wu Y, Li X(2010) Automatic liver segmentation using a statistical shape model with optimal surface detection. IEEE Transactions on Biomedical Engineering 57(10): 2622-2626. https://doi.org/10.1109/TBME.2010.2056369
Isensee F, Jaeger P F, Kohl S A, Petersen J, Maier-Hein K H(2021) nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2): 203-211. https://doi.org/10.1038/s41592-020-01008-z
Kingma D P, Ba J(2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Chlebus G, Schenk A, Moltz J H, Van Ginneken B, Hahn H K, Meine H(2018) Automatic liver tumor segmentation in CT with fully convolutional neural networks and object-based postprocessing. Scientific reports 8(1): 1-7. https://doi.org/10.1038/s41598-018-33860-7
Zheng S, Fang B, Li L, Gao M, Wang Y, Peng K(2020) Automatic liver tumour segmentation in CT combining FCN and NMF-based deformable model. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization 8(5): 468-477. https://doi.org/10.1080/21681163.2018.1493618
Pang Y, Hu D, Sun M(2019) A modified scheme for liver tumor segmentation based on cascaded FCNs. Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing pp 1-6. https://doi.org/10.1145/3371425.3371451
Chung M, Lee J, Park S, Lee C E, Lee J, Shin Y-G(2021) Liver segmentation in abdominal CT images via auto-context neural network and self-supervised contour attention. Artificial Intelligence in Medicine 113: 102023. https://doi.org/10.1016/j.artmed.2021.102023
Hong Y, Mao X-W, Hui Q-L, Ouyang X-P, Peng Z-Y, Kong D-X(2021) Automatic liver and tumor segmentation based on deep learning and globally optimized refinement. Applied Mathematics-A Journal of Chinese Universities 36(2): 304-316. https://doi.org/10.3969/j.issn.1005-1031.2021.02.012
Luan S, Xue X, Ding Y, Wei W, Zhu B(2021) Adaptive Attention Convolutional Neural Network for Liver Tumor Segmentation. Frontiers in Oncology 11: 2945-2945. https://doi.org/10.3389/fonc.2021.680807
Hu J, Wang H, Wang J, Wang Y, He F, Zhang J(2021) SA-Net: A scale-attention network for medical image segmentation. PloS one 16(4): e0247388. https://doi.org/10.1371/journal.pone.0247388
Zhang C, Lu J, Hua Q, Li C, Wang P(2022) SAA-Net: U-shaped network with Scale-Axis-Attention for liver tumor segmentation. Biomedical Signal Processing and Control 73: 103460. https://doi.org/10.1016/j.bspc.2021.103460
Wu J, Furuzuki M, Li G, Kamiya T, Mabu S, Tanabe M, Ito K, Kido S(2022) Segmentation of liver tumors in multiphase computed tomography images using hybrid method. Computers & Electrical Engineering 97: 107626-107636. https://doi.org/10.1016/j.compeleceng.2021.107626
Ronneberger O, Fischer P, Brox T(2015) U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention pp 234-241.
Li D, Hu J, Wang C, Li X, She Q, Zhu L, Zhang T, Chen Q(2021) Involution: Inverting the inherence of convolution for visual recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 12321-12330. https://doi.org/10.48550/arXiv.2103.06255
Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z(2020) Dynamic convolution: Attention over convolution kernels. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 11030-11039. https://doi.org/10.1109/CVPR42600.2020.01104
Gu Z, Cheng J, Fu H, Zhou K, Hao H, Zhao Y, Zhang T, Gao S, Liu J(2019) Ce-net: Context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging 38(10): 2281-2292. https://doi.org/10.1109/TMI.2019.2903562
Oktay O, Schlemper J, Folgoc L L, Lee M, Heinrich M, Misawa K, Mori K, Mcdonagh S, Hammerla N Y, Kainz B(2018) Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999. https://doi.org/10.48550/arXiv.1804.03999
Woo S, Park J, Lee J-Y, Kweon I S(2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV) pp 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
Liu J, Yuan C, Sun X, Sun L, Dong H, Peng Y(2021) The measurement of Cobb angle based on spine X-ray images using multi-scale convolutional neural network. Physical and Engineering Sciences in Medicine 44(3): 809-821. https://doi.org/10.1007/s13246-021-01032-z
He K, Zhang X, Ren S, Sun J(2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition pp 770-778. https://doi.org/10.48550/arXiv.1512.03385
Punn N S, Agarwal S(2020) Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16(1): 1-15. https://doi.org/10.1145/3376922
Yu F, Koltun V, Funkhouser T(2017) Dilated Residual Networks. IEEE Computer Society pp. https://doi.org/10.1109/CVPR.2017.75
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H(2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on computer vision (ECCV) pp 801-818.
Rota Bulo S, Neuhold G, Kontschieder P(2017) Loss max-pooling for semantic image segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition pp 2126-2135. https://doi.org/10.1109/CVPR.2017.749
Apostolopoulos S, Zanet S D, Ciller C, Wolf S, Sznitman R(2017) Pathological OCT retinal layer segmentation using branch residual u-shape networks. International Conference on Medical Image Computing and Computer-Assisted Intervention pp 294-301. https://doi.org/10.1007/978-3-319-66179-7_34
Hoerl A E, Kennard R W(1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1): 55-67. https://doi.org/10.2307/1271436
Crum W R, Camara O, Hill D L(2006) Generalized overlap measures for evaluation and validation in medical image analysis. IEEE transactions on medical imaging 25(11): 1451-1461. https://doi.org/10.1109/TMI.2006.880587
Milletari F, Navab N, Ahmadi S-A(2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. 2016 fourth international conference on 3D vision (3DV) pp 565-571. https://doi.org/10.1109/3DV.2016.79
Soomro T A, Afifi A J, Gao J, Hellwich O, Paul M, Zheng L(2018) Strided U-Net model: Retinal vessels segmentation using dice loss. 2018 Digital Image Computing: Techniques and Applications (DICTA) pp 1-8. https://doi.org/10.1109/DICTA.2018.8615770
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L(2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32. https://doi.org/10.48550/arXiv.1912.01703
Fan T, Wang G, Li Y, Wang H(2020) Ma-net: A multi-scale attention network for liver and tumor segmentation. IEEE Access 8: 179656-179665. https://doi.org/10.1109/ACCESS.2020.3025372
Bilic P, Christ P F, Vorontsov E, Chlebus G, Chen H, Dou Q, Fu C-W, Han X, Heng P-A, Hesser J(2019) The liver tumor segmentation benchmark (lits). arXiv preprint arXiv:1901.04056. https://doi.org/10.48550/arXiv.1901.04056
Thada V, Jaglan V(2013) Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for web retrieved documents using genetic algorithm. International Journal of Innovations in Engineering and Technology 2(4): 202-205.
Lobo J M, Jiménez‐Valverde A, Real R(2008) AUC: a misleading measure of the performance of predictive distribution models. Global ecology and Biogeography 17(2): 145-151. https://doi.org/10.1111/J.1466-8238.2007.00358.X

Download PDF

Version 1

posted

You are reading this latest preprint version

DCE-Net: A Dynamic Context Encoder Network for Liver Tumor Segmentation

Status:

Version 1

Abstract

Figures

1 Introduction

2 Methodology

2.1 The Architecture of DCE-Net

2.1.1 Involution Layer

2.1.2 Dynamic Residual Module (DRM)

2.1.3 Context Extraction Module (CEM)

2.1.4 Channel Attention Gate (CAG)

2.2 Experimental Design

2.3 Training

3 Results

3.1 Training Performance

3.2 Segmentation Accuracy

4 Discussion

5 Conclusion

Declarations

References

Status:

Version 1