Semantic Segmentation of Remote Sensing Images Based on Dual Channel Attention Mechanism（DCAM）

doi:10.21203/rs.3.rs-3006101/v1

Download PDF

Research Article

Semantic Segmentation of Remote Sensing Images Based on Dual Channel Attention Mechanism（DCAM）

https://doi.org/10.21203/rs.3.rs-3006101/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 25 Apr, 2024

Read the published version in IET Image Processing →

Version 1

posted

You are reading this latest preprint version

With the rapid increase of remote sensing data and the vigorous development of remote sensing technology, deep learning-based segmentation of remote sensing image targets has become the optimal solution. This paper proposes a deep learning remote sensing image semantic segmentation model based on the dual-channel attention mechanism (DCAM), which combines traditional deep learning segmentation techniques with the latest attention mechanism module. The proposed algorithm inputs RGB remote sensing images into an Unet downsampling network combined with the Convolutional Block Attention Module (CBAM) to extract color space features, and Gray remote sensing images into an Unet downsampling network combined with the Self-Attention Module (SAM) to extract shape space features. Finally, the Conact method is used to combine the CBAM layer and SAM layer with the Unet upsampling model to achieve semantic segmentation of remote sensing images. The training and testing on the remote sensing image dataset CCF demonstrate that the algorithm proposed in this paper achieves significantly superior segmentation accuracy compared to traditional semantic segmentation algorithms and other semantic segmentation algorithms that combine other attention modules in various aspects.

Remote sensing image semantic segmentation refers to the process of dividing a remote sensing image into several sets of pixel regions with specific semantic meanings, identifying the category of each region, and ultimately obtaining an image with pixel-level semantic annotations. In recent years, remote sensing technology has made significant progress with the research and development of high-performance visual sensors and aerospace technology. The resolution and data acquisition speed of remote sensing satellites have been significantly improved, providing good data conditions for image semantic segmentation.Remote sensing image semantic segmentation has been applied in various fields, such as military defense, disaster monitoring, urban planning, and environmental protection. As a result, accurately and efficiently segmenting these high-resolution remote sensing images has become a research hotspot in this field.

Traditional remote sensing image segmentation methods include manual segmentation, machine segmentation based on region selection, feature extraction, and classifier, etc. Although manual segmentation has the advantage of accurate segmentation, it has a high workload and is gradually being replaced by machine algorithm segmentation. While machine segmentation frees people from the heavy manual annotation, it is not optimal for remote sensing images with complex background information due to problems such as slow detection speed and low detection performance. With the development of neural network technology, remote sensing image object segmentation technology based on deep learning has gradually become the mainstream. Currently, commonly used deep learning segmentation techniques for remote sensing images include FCN [1], Unet [2], Segnet [3], DeepLab [4], and others, all of which have achieved good results in semantic segmentation of remote sensing images. In recent years, many scholars have paid attention to network architecture design, and with the deepening of research on attention mechanisms, the segmentation accuracy of remote sensing images has significantly improved. Jaderberg M [5] (2015) first proposed the Spatial Transformer Networks (STN) model, which embeds the Spatial Transformer attention module into Net to address translation invariance such as rotation and translation. Jie [6] (2019) proposed a Squeeze and Exclusion Network (SENet) based on the relationship between feature channels. By adding Squeeze and Exclusion attention modules, it enhances useful features while suppressing invalid features based on the importance of each feature channel. Additionally, Fei W [7] (2017) proposed a Residual Attention Network, consisting of stacked layers of attention modules, including mask branches and trunk branches, which can effectively improve the accuracy of target classification.Hu W [8] (2021) proposed a self-attention mechanism model that introduces local feature interaction perception, drawing on the idea of PCA (Principal Component Analysis). This extracts key information from local feature sets, reducing information redundancy and noise between features. Woo S [9] (2018) proposed the attention mechanism of Convolutional Block Attention Module (CBAM) convolutional module, which combines spatial and channel attention mechanism modules. This not only saves parameters and computational power but can also be integrated into existing network architectures as a plug and play module. In addition, Fu J [10] (2019) proposed a Dual Attention Network (DANet) to adaptively integrate the correlation between local and global features. It includes the position attention module (PAM) and channel attention module (CAM), which enhance feature representation further by utilizing channel dependencies between channel mappings and updating each channel mapping with the weighted sum of all channel mappings. Other research on attention mechanisms includes Crisis Cross Attention (CCNet) [11], OCNet [12], GCNet [13], and so on. Currently, attention mechanisms in deep learning target recognition and segmentation have become hotspots due to their ability to enhance the feature representation in the image.

Due to the shooting angles of remote sensing images and their susceptibility to interference from other factors, it has often been difficult to effectively capture features of these images through target segmentation techniques. After analyzing the characteristics of traditional semantic segmentation algorithm models and various attention mechanism modules, this paper proposes remote sensing image semantic segmentation based on the dual channel attention mechanism (DCAM) to better grasp the characteristics of remote sensing images. The semantic segmentation of remote sensing images based on DCAM uses Unet as the backbone and adopts two downsampling network structures, which respectively combine CBAM and SAM [14], to extract color space features of RGB remote sensing images and shape space features of Gray remote sensing images. The extracted attention feature layers are then concatenated with the upsampling network of Unet to better preserve the distinct features of remote sensing images. The experimental results also verified that the model algorithm proposed in this paper performs better than traditional segmentation models such as Unet, FCN, Segnet, Deeplab, and models that combine attention mechanisms, such as DANet, CCNet, and OCNet models, for both subjective visual observation and objective indicator statistics.

The deep learning attention mechanism is a biomimetic approach to the human visual attention mechanism. It quickly scans the global image to find the target area that requires attention and then allocates more attention to this area to obtain more detailed information while suppressing other useless information. The attention mechanism was derived from the natural language processing field in the machine translation model of Bahdanau et al [15]. With the deepening of research on the attention mechanism, many scholars have applied it to the field of deep learning image processing. Currently, there are many attention mechanism modules, each with its own advantages. This paper will apply the characteristics of Convolutional Block Attention Module (CBAM) and Self-Attention Module (SAM) to the Unet semantic segmentation model based on the characteristics of remote sensing images.

2.1 Convolutional Block Attention Module(CBAM)

In 2018, scholars proposed the Convolutional Block Attention Module (CBAM) attention mechanism for the convolutional module [9]. This mechanism combines the Channel Attention Mechanism and Spatial Attention Mechanism modules. Attention weights are calculated along the spatial and channel dimensions in the intermediate feature map of deep learning, and then multiplied with the original feature map to adaptively adjust the features. CBAM modules can be added to classic structures such as VGG [16], ResNet [17], and MobileNet [18]. The structure of the CBAM module is shown in Figure (1).

The Channel Attention Module passes the input feature maps F through global max pooling and global average pooling based on width and height, respectively, to obtain two feature maps. They will be fed into a two-layer neural network (MLP), which is shared. Then, the MLP output features are added and activated through sigmoid to generate the final channel attention feature, which is ${\text{M}}_{\text{c}}$. Finally, ${\text{M}}_{\text{c}}$ and input feature map F for multiplication operation. The process is shown in formula (1).

$$\begin{gathered} {M_c}(F)=\sigma (MLP(AvgPool(F))+MLP(MaxPool(F))) \hfill \\ {\text{ = }}\sigma ({W_1}({W_0}(F_{{Avg}}^{c})+{W_1}({W_0}(F_{{\hbox{max} }}^{c}))) \hfill \\ \end{gathered}$$

Where ${\sigma }$ denotes the sigmoid function, ${W}_{0}\in {\mathbb{R}}^{\text{c}/\text{r}\times \text{c}}$, and ${W}_{1}\in {\mathbb{R}}^{\text{c}\times \text{c}/\text{r}}$. Note that the MLP weights, ${W}_{0}$ and ${W}_{1}$, are shared for both inputs and the ReLU activation function is followed by ${W}_{0}$.

The Spatial Attention Module uses the feature map F, which is the output of the Channel Attention Module (CAM), as the input feature map for this module. Firstly, global max pooling and global average pooling are performed based on the channel to obtain two feature maps. Then, these two feature maps are concatenated along the channel dimension. After a convolutional layer operation, the dimensionality is reduced to 1 channel. The spatial attention feature is generated through the sigmoid function, i.e., M_s. Finally, the feature is multiplied with the input feature of the module to obtain the final generated feature. The process is shown in formula (2).

$$\begin{gathered} {M_s}(F)=\sigma ({f^{n \times n}}(\left[ {AvgPool(F);MaxPool(F)} \right])) \hfill \\ {\text{ = }}\sigma ({f^{n \times n}}(\left[ {F_{{avg}}^{s};F_{{\hbox{max} }}^{s}} \right])) \hfill \\ \end{gathered}$$

Where${\sigma }$ denotes the sigmoid function and${ f}^{n\times n}$ represents a convolution operation with the filter size of$n\times n$.

2.2 Self- Attention Module(SAM)

The Self Attention Module (SAM) [14] captures the spatial dependencies of feature maps between any two positions and aggregates and updates the features of all positions through weighted summation. The weight is determined by the similarity of the corresponding features in the two positions. SAM can understand the mapping of queries and a set of values to inputs, forming a query-key-value mapping to output. The output can be viewed as a weighted sum of values, and the weighted values are obtained from SAM by multiplying the input features by three different weight matrices. The structure of SAM is shown in Figure (2):

The three different vectors are Query vector, Key vector, and Value vector, with the same length, which can be expressed as formula 3.

$\left\{ \begin{gathered} Q={W^q} \cdot X \hfill \\ K={W^k} \cdot X \hfill \\ V={W^v} \cdot X \hfill \\ \end{gathered} \right.{\text{ (3)}}$

X is the input feature, $W$ is the weight matrix, and $Q$, K, and V represent query, key, and value, respectively.

By using Q and K, the correlation between every two input vectors can be calculated, which is the value of attention. As shown in formula (4)

$A={\bar {K}^T} \cdot \bar {Q}{\text{ (4)}}$

$\overline{K}$ and $\overline{Q}$ are the vectors after the K and Q reshape. Using the obtained A and V, calculate the output vector of the self attention layer corresponding to each input vector, as shown in formula 5.

$\bar {Y}=V \cdot A{\text{ (5)}}$

After reshape the obtained $\overline{Y}$ and add it to the original feature X, we can obtain the final attention feature Z.

2.3 Dual Channel Attention Mechanism (DCAM) Model

Currently, attention mechanism models include CBAM, DAnet, SEnet, among others, which typically use concatenation for feature enhancement. To better utilize color and shape features of input images, this study adopts a dual channel model for feature enhancement, consisting of two feature extraction models and two attention mechanisms. This approach can better obtain image features and complement each other's strengths. Deep learning feature extraction models include FCN, VGG [16], Unet, ResNet [17], among others, all of which can achieve good results. However, due to increasing network depth, the attention module may experience a loss of focus. Therefore, we adopt the Unet structure, which has moderate network depth and can maintain attention concentration while effectively extracting features, as the backbone. Its structure is displayed in Fig. 3:

The downsampling network structure of the dual channel Unet network comprises of two networks, while the upsampling network structure remains unchanged. For the downsampling of GRB images, the Unet network uses the CBAM mechanism for attention enhancement, while for gray image downsampling, the SAM mechanism is used for attention enhancement. The CBAM feature module and SAM feature module are concatenated with the upsampling module separately to preserve the color and shape features of different categories of remote sensing images as much as possible. Finally, semantic segmentation of the image is performed.

3.1 Experimental Environment

This experiment was conducted on the Win10 platform and developed using tools such as Python, Tensorflow, PyTorch, and Keras. Both CUDA 11.0 and cuDNN 11.0 were used for accelerated calculations. For ease of computation, the experiments were performed using 256x256 remote sensing images after data augmentation, with a training set of 10,000 and a testing set of 5,000. The batch size was set to 4, buffer size to 100, and the number of iterations to 20.

3.2 DataSet

The dataset was provided by the CCF Big Data Competition and consists of high-definition remote sensing images of a city in southern China in 2015. This dataset is relatively small and contains five large-sized RGB remote sensing images with annotations, ranging in size from 3000×3000 to 6000×6000. The annotations in the dataset classify four types of objects, including vegetation (marker 1), buildings (marker 2), water bodies (marker 3), roads (marker 4), and others (marker 0). To better visualize the labeling, we visualize three of the training images as follows: blue-water, yellow-house, green-vegetation, and brown-road, as shown in Fig. 4.

As the size of the remote sensing images is too large, we cannot directly feed them into the network for training due to memory constraints and their varying sizes. Therefore, we randomly crop small 256x256 images from these coordinates and perform data augmentation. The results are shown in Fig. 5.

3.3 Experimental Results and Analysis

In this experiment, Unet was used as the backbone, and CBAM, SAM, and Dual Channel Attention Module (DCAM) were combined for experimentation. Different channel structures were compared and analyzed. Figure 6 displays the changes in loss and accuracy curves of Unet, Unet + CBAM, Unet + SAM, and Unet + DCAM models on the train data, as well as the IOU change curve on the test data. From the train data loss curve in Fig. 6 (a), it can be observed that Unet has the highest convergence value; The convergence process and values of CBAM and SAM are similar; The DCAM curve converges faster and has a significantly smaller convergence value. Figure 6 (b) shows the variation process of train data accuracy value, which is essentially synchronized with the variation process of train data loss value, with Unet having the lowest accuracy value; The accuracy of CBAM and SAM has always been similar; DCAM has the highest accuracy. Figure 6 (c) displays the IOU variation process on Test data. Through the variation curve, it can be seen that Unet has the lowest IOU; Although the IOU of SAM experiences significant vibration during training, it is generally similar to CBAM; The IOU of DCAM is the highest. Through the loss, accuracy, and IOU statistics of the comparative experiments of Unet, Unet + CBAM, Unet + SAM, and Unet + DCAM in Fig. 6, it is clear that the DCAM model is effective and significantly improves the semantic segmentation effect of remote sensing images.

To further validate the superiority of the model, we conducted objective model metric statistics on Unet, Unet + CBAM, Unet + SAM, and Unet + DCAM models, including loss, accuracy, and IOU metrics on trial and test data. As shown in Table 1, it can be observed that Unet + DCAM has significantly better loss, accuracy, and IOU values than Unet or the single channel Unet + CBAM and Unet + SAM models, whether in the Train data or Test data.

Table 1

Objective indicators were statistically analyzed on train data and test data after 20 iterations, under different models of Unet, Unet + CBAM, Unet + SAM, and Unet + DCAM model.
	Train_data Loss	Train_data Accuracy(%)	Train_data IOU	Test_data Loss	Test_data Accuarcy(%)	Test_data IOU
Unet	0.1384	94.5	0.867	1.1824	69.5	0.508
Unet + CBAM	0.0792	95.9	0.875	1.1025	71.4	0.536
Unet + SAM	0.1124	95.6	0.889	1.1549	71.1	0.536
Unet + DCAM	0.0504	97.8	0.943	1.0480	73.9	0.567

The accuracy statistics in Fig. 6 and Table 1 represent the overall pixel accuracy statistics. To further illustrate the accuracy of target segmentation for vegetation, building, water, road, and others, we conducted statistics on the classification pixel accuracy of each target on the Train data and Test data, as shown in Fig. 7. Figure 7 (a) displays the accuracy of the Train data pixel accuracy class, while Fig. 7 (b) shows the accuracy of the Test data pixel accuracy class. From Figs. 7 (a) and 7 (b), it can be observed that overall, the classification accuracy of Unet + DCAM is higher than that of Unet + CBAM, Unet + SAM, and Unet. In terms of specific targets, the segmentation accuracy for vegetation, building, and water is not significantly different, with the accuracy of road being relatively low, which is related to the small size of the road target.

Figure 8 displays the semantic segmentation results of Unet, Unet + CBAM, Unet + SAM, and Unet + DCAM models on Test data. As shown in Fig. 8, each model has achieved commendable results in object segmentation. However, upon closer analysis, the segmentation results of Unet + DCAM are more precise and closer to the label image. Therefore, whether evaluated through objective indicators or intuitive perception of segmentation results, the Unet + DCAM model in this paper outperforms the Unet, Unet + CBAM, and Unet + SAM models.

The aforementioned experiment supports the claim that the proposed DCAM model performs better in semantic segmentation of remote sensing images than single channel CBAM or SAM models. To further validate the superiority of the model, we conducted a comparison with other mainstream semantic segmentation methods on the CCF dataset, including FCN, Segnet, DeeplabV3 [19], DAnet, OCnet, and raw CCnet. The FCN, Segnet, and DeeplabV3 segmentation models were implemented using TensorFlow and Keras, with the following parameters: batch size of 8, 20 epochs, a learning rate of 0.0001, and loss using sparse_categorical_crossentropy. DAnet, OCnet, and CCnet model algorithms were implemented using PyTorch, with ResNet50 as the backbone. The batch size, epoch, and learning rate were set similarly as the previous models, using cross_entropy_loss as the loss function. Figure 9 shows the experimental test results on the Test data after each model was trained. From the segmentation results of various algorithms, FCN, Segnet, DeeplabV3, DAnet, OCnet, and raw CCnet each have their own strengths and can segment the target to a certain extent. However, in terms of overall segmentation effect and accuracy of details, the Unet + DCAM algorithm in this paper outperforms other models and is closer to the actual label value.

In order to verify the precise performance of FCN, Segnet, DeeplabV3, DAnet, OCnet, CCnet algorithms, and Unet + DCAM semantic segmentation algorithms, we analyzed the experimental results using four objective indicators: PA, MPA, MIOU, and Kappa [20, 21].

· Pixel Accuracy(PA) is the ratio of correctly marked pixels to the total number of pixels

· Mean Pixel Accuracy(MPA) is an improvement to the PA calculation, which calculates the proportion of correctly classified pixels in each class, and then averages all classes.

· Mean Intersection over Union(MIoU) computes the ratio of the intersection and union of two sets, which are ground truth and predicted segmentation. It is the average after computing the IoU on each class.

· Kappa is an indicator used to measure the consistency of two variables. If the two variables are replaced by classification results and verification samples, it can be used to evaluate the classification accuracy.

Table 2

Comparison of objective indicators for semantic segmentation of remote sensing images using FCN, Segnet, DeeplabV3, DAnet, OCnet, CCnet, and DACM algorithms, including PA, MPA, MIOU, and Kappa.
	PA	MPA	MIoU	kappa
FCN	0.518	0.223	0.148	0.079
SegNet	0.539	0.383	0.183	0.253
DeepLav V3	0.744	0.433	0.331	0.538
DANet	0.774	0.451	0.348	0.541
CCNet	0.803	0.524	0.482	0.582
OCNet	0.798	0.502	0.423	0.559
Unet + DCAM	0.878	0.547	0.503	0.671

Table 2 shows the objective indicator statistics of the experimental results of FCN, Segnet, DeeplabV3, DAnet, OCnet, CCnet algorithms, and Uent + DCAM on Test data. Through comparison, we found that Uent + DCAM is significantly superior to other algorithms in terms of PA, MPA, MIOU, and Kappa.

This paper proposes a semantic segmentation model for remote sensing images based on a Dual Channel Attention Mechanism (DCAM) by introducing the CBAM and SAM attention modules into the Unet network. The characteristics of remote sensing images are taken into consideration in this approach. The downsampling network combined with the CBAM attention module can effectively extract color space features of remote sensing images and enhance them. The downsampling network combined with the SAM attention module can effectively extract the shape space features of remote sensing images for reinforcement. By concatenating the color space features and shape space feature layers extracted by CBAM and SAM into the Unet upsampling network, more target detail information can be obtained, and invalid information can be suppressed. Comparisons were made with Unet, Unet + CBAM, Unet + SAM, and attention models such as DAnet, CCNet, and OCNet with ResNet50 as the backbone, as well as traditional semantic segmentation methods such as FCN, SegNet, and DeepLab. The results showed that the semantic segmentation model based on DCAM was significantly more effective and superior, whether from a subjective or objective perspective. In the future, remote sensing images from different bands can be applied to dual channel or multi-channel attention mechanism semantic segmentation, which will further improve the accuracy of remote sensing image target segmentation.

Ethical Approval

Ethical approval is not required for the semantic segmentation research of remote sensing images that does not involve human and/or animal studies.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Authors' contributions

Jionghui Jiang and Hui Huang have drafted the manuscript and made substantial contributions to conception, design, and analysis of the experiment. Xi’an Feng has provided guidance for experiment and revised the manuscript critically.

Funding

This work is supported by Zhejiang Provincial Natural Science Foundation of China (Grant LY20F030019),as well as the Natural Science Foundation of China (Grant No.62072340, 61671378, 62071386), and the Key Science and Technology Innovation Project of Wenzhou (Grant No. 2018ZG011).

Availability of data and materials

The dataset used in this paper is provided by the CCF Big Data Competition, which includes high-resolution remote sensing images of a certain city in southern China in 2015. The dataset can be accessed through the following link: https://www.datafountain.cn/#/competitions/270/data-intro

Long J, Shelhamer E, Darrell T . Fully Convolutional Networks for Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
Ronneberger O , Fischer P , Brox T . U-Net: Convolutional Networks for Biomedical Image Segmentation[J]. Springer International Publishing, 2015.
Badrinarayanan V , Handa A , Cipolla R . SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling[J]. Computer Science, 2015.
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. 2016(4).
Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]. Advances in neural information processing systems. 2015: 2017-2025
Jie, Shen, Samuel, et al. Squeeze-and-Excitation Networks.[J]. IEEE transactions on pattern analysis and machine intelligence, 2019.
Fei W , Jiang M, Chen Q, et al. Residual Attention Network for Image Classification[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
Hu W, Liu H, Du Y, et al. Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification.[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, PP.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C].Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.
Fu J, Liu J, Tian H , et al. Dual Attention Network for Scene Segmentation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
Huang Z , Wang X , Huang L , et al. CCNet: Criss-Cross Attention for Semantic Segmentation[J]. 2018.
Yuan Y , Wang J . OCNet: Object Context Network for Scene Parsing[J]. 2018.
Lin X , Guo Y A , Wang J . Global Correlation Network: End-to-End Joint Multi-Object Detection and Tracking[J]. 2021.
Lin Z , Feng M , Santos C , et al. A Structured Self-attentive Sentence Embedding[J]. 2017.
Bahdanau D , Cho K , Bengio Y . Neural Machine Translation by Jointly Learning to Align and Translate[J]. Computer Science, 2014.
Simonyan K , Zisserman A . Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014.
He K , Zhang X , Ren S , et al. Deep Residual Learning for Image Recognition[J]. IEEE, 2016.
Sandler M , Howard A , Zhu M , et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018.
Chen L C , Papandreou G , Schroff F , et al. Rethinking Atrous Convolution for Semantic Image Segmentation[J]. 2017.
Garcia-Garcia A , Orts-Escolano S , Oprea S , et al. A Review on Deep Learning Techniques Applied to Semantic Segmentation[J]. 2017.
Liu X , Cheng H M , Zhang Z Y . Evaluation of Community Structures using Kappa Index and F-Score instead of Normalized Mutual Information[J]. 2018.

No competing interests reported.

Download PDF

Journal Publication

published 25 Apr, 2024

Read the published version in IET Image Processing →

Version 1

posted

You are reading this latest preprint version

Semantic Segmentation of Remote Sensing Images Based on Dual Channel Attention Mechanism（DCAM）

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Dual Channel Attention Mechanism (DCAM)

2.1 Convolutional Block Attention Module(CBAM)

2.2 Self- Attention Module(SAM)

2.3 Dual Channel Attention Mechanism (DCAM) Model

3. Experimental Testing and Analysis

3.1 Experimental Environment

3.2 DataSet

3.3 Experimental Results and Analysis

4. Conclusion

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1