By capturing the thermal infrared information of objects, infrared sensors can effectively highlight salient infrared targets even in extreme conditions, bad weather and partial occlusion (Feng et al. 2020). However, infrared image cannot provide enough background environment information and lack texture details (Liu et al. 2019). On the contrary, the visible image obtained by the visible light sensor by reflecting visible light contains abundant texture details (Ma et al. 2020), but its imaging conditions are harsh and easily affected by natural weather conditions. Therefore, the purpose of infrared image and visible image fusion is to fuse the supplemental details in the source image. The fused image contains clear infrared targets andabundant texture details (Li et al. 2020). At present, infrared and visible image fusion technology is widely utilized across various domains, including military reconnaissance, target identification and tracking, security monitoring, agricultural production, remote sensing measurement, etc (Zhang et al. 2021).
There are two main fusion methods for infrared and visible image. One corresponds to the conventional image fusion algorithm, while the other pertains to the deep learning approach. Traditional image fusion algorithms usually perform activity level measurement in spatial domain or transform domain and design fusion rules manually to achieve image fusion. Classical conventional image fusion frameworks primarily encompass fusion frameworks founded on multi-scale transformation (Toet et al. 1989), sparse representation (Liu et al. 2015), subspace (Kong et al. 2014), saliency (Ma et al. 2017), variational model (Ma et al. 2016), etc. Image fusion algorithms founded on deep learning can be subdivided into image fusion framework based on autoencoder (AE) (Li et al. 2019), convolutional neural network (CNN) (Liu et al. 2017) and generative adversarial network (GAN) (Ma et al. 2019).
In the framework of the conventional image fusion, the fusion rules are designed manually. The rules are not suitable for the complex environment, and to enhance the effectiveness of the fusion image, the fusion rules are becoming increasingly intricate. Complex fusion rules have become a major problem that hinders traditional image fusion algorithms. Deep learning has powerful ability of feature extraction and expression, which provides some new methods for image fusion.
The image fusion framework based on AE first trains an autoencoder on a large data set to achieve feature extraction and image reconstruction. Then the fusion strategy is designed manually to fuse the features extracted by the encoder to realize image fusion. To enhance the feature extraction capacity of autoencoders, Li et al. (2020) added nest connecting blocks to the fusion framework based on AE. Liu et al. (2022) integrated an attention module into the image fusion framework founded on AE, it can pay more attention to the key details within the source images during the process of feature extraction.
The image fusion framework based on CNN realizes end-to-end feature extraction, aggregation, and image reconstruction by constructing a network framework and a loss function. It avoids the manual design of fusion rules. In this method, the extracted last layer features are used as image features to reconstruct images, many useful features extracted from the middle layer are lost. Li et al. (2019) used dense connection networks to minimize the loss of information during the feature extraction. Long et al. (2021) suggested an approach for fusing infrared and visible images utilizing aggregated residual dense networks. It can automatically evaluate the information retention of source images and extract hierarchical features to achieve effective fusion.
In 2019, Ma et al. (2019) applied GAN to infrared and visible image fusion for the first time. The fusion process can be regarded as a game between generator and discriminator. However, the single discriminator is easy to cause modal imbalance during feature extraction. The fused image will miss the infrared target information or the texture details of source images. Ma et al. (2020) come up with dual-discriminators to maintain the balance between different modes in the fused image. Li et al. (2021) integrated an attention module into the fusion framework of generative adversarial networks. It can make generator and discriminators pay more attention to important information in source images and generate fusion images with more prominent targets. Liu et al. (2022) proposed joint optimization of image fusion and object detection, which has high detection accuracy and higher visual effect of the fused image.
The above fusion methods are widely used in a variety of scenarios, and the fused images obtained also achieve good visual effects. However, there are still some problems, such as unbalanced display of infrared target and texture features in fused images, and unclear texture details in fused images.
To solve these problems, we propose a pseudo-color infrared and visible image fusion method based on attention-dense network. Firstly, the pseudo-color processed infrared image and color visible image are used for feature extraction, feature aggregation and image reconstruction to train the model. When the three-channel image is used as input, the input information is more, and the fused image will retain more texture details. Secondly, a generator network structure composed of convolutional layers and dense connected blocks with attention modules is designed. The attention module will focus on key information in the source image, such as infrared information and texture details. Dense connection blocks will reduce the information loss during feature extraction and enhance the ability of network to extract source image information. Finally, the introduction of the content loss function will maintain the stability between infrared target and texture detail information in the fused image.