Due to the absorption and scattering of light by airborne particles, foggy images acquired by imaging equipment have problems such as decreased contrast, color distortion and loss of details, and thus affect the application of images in subsequent tasks, such as object recognition and scene understanding. At the same time, it is not conducive to image feature extraction and recognition, and reduces the effectiveness of outdoor vision system. Therefore, image defogging has important research significance in the field of computer vision.
According to the physical scattering models [1, 2, 3], the hazing process is usually expressed as:
I(x) = J(x)t(x) + A[1-t(x)], (1)
t(x)=\({e}^{-\beta \left(\lambda \right)d\left(x\right)}\),, (2)
where I(x) and J(x) are the observed hazy image and the haze-free scene radiance, A is global atmospheric light which represents the intensity of ambient light, and t(x) indicates scene transmission describing the portion of light that is not scattered and reaches the camera sensors. d(x) and \({\beta }\)(\({\lambda }\)) represent scene depth and atmospheric scattering parameters, respectively.
However, it is often difficult to estimate the transmission image from foggy images. Early prior based methods attempted to estimate transmission images by using the statistical characteristics of clear images, such as non-local prior [4] and color attenuation prior [5]. However, these images have a large prior error, resulting in serious color distortion and contrast reduction in the restored images. At present, with the improvement of computer computing power, dehazing methods based on convolutional neural networks have become the main stream of research. These methods are effective and superior to prior-based algorithms, and have significant performance improvement. However, most of the current methods of defogging are to directly remove the observed image, ignoring the destruction of texture details in the process of defogging, resulting in the phenomenon of noise amplification and color distortion after the image defogging. In addition, it usually minimizes the mean square error between foggy images and fog-free images, which is easy to lose high-frequency image details. In foggy conditions, this method may cause transition smoothness and produce artifacts for some regions with rich texture boundaries. In addition, the receptive field of traditional CNN is relatively small, and expanding the receptive field by deepening the network structure will lead to high resource consumption.
In this paper, MIMS-UNet is used to solve the above problems. We propose to use a network to process multi-scale images and then fuse multi-input information to compensate for the lost high-frequency image details. In view of the relatively small receptive field of traditional CNN, the context module is introduced, which not only does not increase the network structure, but also can further expand the receiving domain and capture multi-scale information. Compared with the most advanced methods, our method can achieve good performance while maintaining relatively small computational overhead.
The contributions of this work are summarized as follows:
We propose a new MIMS-UNet network for effective dehazing. The network can extract relevant features from fogged image content and recover details and textures from fogged images.
We propose to use encoder-decoder structure with context block to capture multi-scale information and defog from coarse to fine.
Through a lot of experiments, this model has better performance than other advanced algorithms.