High-quality Image Compression Algorithm Design Based on Unsupervised Learning

doi:10.21203/rs.3.rs-4947963/v1

Download PDF

Research Article

High-quality Image Compression Algorithm Design Based on Unsupervised Learning

https://doi.org/10.21203/rs.3.rs-4947963/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Increasingly massive image data is restricted by conditions such as information transmission and reconstruction, and it is increasingly difficult to meet the requirements of speed and integrity in the information age. To solve the urgent problems faced by massive image data in information transmission, this paper proposes a high-quality image compression algorithm based on unsupervised learning. Among them, a content-weighted autoencoder network is proposed to achieve image compression coding on the basis of a smaller bit rate to solve the entropy rate optimization problem. Binary quantizer are used for coding quantization, and importance map are used to achieve better bit allocation. The compression rate is further controlled and optimized. A multi-scale discriminator suitable for the generative adversarial network image compression framework is designed to solve the problem that the generated compressed image is prone to blurring and distortion. Finally, through training with different weights, the distortion of each scale is minimized, so that the image compression can achieve a higher quality compression and reconstruction effect. The experimental results show that the algorithm model can save the details of the image and greatly compress the memory of the image. Its advantage is that it can expand and compress a large number of images quickly and efficiently, and realize the efficient processing of image compression.

high-quality image compression

content-weighted autoencoder

compression ratio

multi-scale discriminator

unsupervised learning

In the modern intelligent, Internet information exponentially explosive growth, the information in the network transmission and storage must satisfy two requirements: First, it must be fast and timely without delay; Second, it must have low data loss and maintain the integrity of the information content [1] [2]. Timely transmission of information is one of the important needs of today, and images contain more abundant and specific content, and occupy a large proportion of Internet information [3]. Faced with the explosive development of information interaction, image information processing has become increasingly cumbersome. Under the conditions of limited network bandwidth and memory resources [4], the requirements for image data compression quality and efficiency are becoming increasingly high [5] [6]. On the other hand, the more efficient the compression performance, in turn, can greatly promote the development of network transmission capabilities.

Image compression is essentially data compression, and its purpose is to store image data in less space [7], thereby improving the transmission efficiency of images in the network and ensuring that the quality of compressed images is not reduced [8] [9]. The existing compression methods can achieve good compression of images, but they cannot achieve both compression efficiency and compression quality when compressing a large number of images at the same time. In order to better solve this problem, images can be compressed in a shorter time and with a better compression rate while ensuring higher compression quality. Therefore, this paper proposes a high-quality image compression algorithm based on unsupervised learning, in which a content-weighted autoencoder, an importance map and a binary quantizer are specially designed. At the same time, a multi-scale discriminator is designed to determine whether the generated compressed image meets the standard, as well as a composite loss function to improve the compression quality and efficiency. Finally, the network training process is redesigned for the compression algorithm to optimize the algorithm network and improve the work efficiency. The corresponding verification experiments of the algorithm and other mainstream algorithms are carried out on the open datasets Kodark24, Cityscapes and Urban100. The results show that the compressed images generated by the proposed compression algorithm have better compression ratio, smaller memory usage, more complete detail information and higher efficiency.

The main work of this study is as follows:

(1) A new content-weighted autoencoder is designed to preserve more image detail information.

(2) A new importance map and binary quantizer are designed to improve the generation quality of compressed images.

(3) A multi-scale discriminator is designed to make a more comprehensive judgment on the authenticity and global consistency of the generated compressed images.

(4) A composite loss function of the composite algorithm model is designed to improve the efficiency and quality of compressed images.

(5) Verification experiments were conducted on multiple open datasets. The results show that the algorithm has the advantages of good compression ratio and higher efficiency.

Traditional image compression algorithms are divided into lossless compression methods and lossy compression methods. Lossless compression is commonly used in the fields of medical images, fingerprint images, remote sensing images, etc. Representative algorithms include: (1) Huffman coding [10], (2) Run-length coding [11], (3) Arithmetic coding [12]. Image lossy coding is a compression method with a very high compression ratio. It is most commonly used in people's production and life, such as network images, streaming media, videos and other scenes. Representative algorithms include: (1) Predictive coding [13], (2) Fractal coding [14], (3) Subband coding [15], (4) JPEG [16], (5) JPEG2000 [17].

JPEG is often used for digital image compression processing. Its lossy compression method based on discrete cosine transform is simple and efficient, but easy to distort. The principle flow of JPEG image compression is shown in Fig. 1.

In addition, the upgraded version of JPEG, JEPG2000, is based on the discrete wavelet (DWT) multi-channel analytical image compression coding method with a higher compression ratio. Through calculation, the image signal can be divided more finely, and the high frequency and low frequency are both good. The processing effect is good, but the disadvantage is that it takes a long time and cannot process a large amount of image data in a short time. The JPEG2000 image compression coding and decoding system process is shown in Fig. 2.

The encoding process of JPEG2000 requires preprocessing of the image first, in which slicing is for better slicing processing. Slicing does not require the shape and size, and each small block is encoded separately. Another part of the preprocessing canvas is to set the area of the image. Then FDWT (wavelet transform) is performed. After the wavelet is performed, subbands will be generated. Different quantization steps are used for different subbands to obtain quantization results. Then entropy coding is performed. After the entropy coding is completed, several rectangular blocks will be formed. The rectangular blocks divide the area into code blocks. The bit planes of the blocks are encoded from high to low to obtain a bit stream. After encoding all the code blocks, they are put together according to the same bit stream to form a code stream organization. After that, after decoding and quantization, IDWT transformation is performed, and finally the compressed image is output after post-processing.

Different from the traditional image compression algorithm, the image compression algorithm based on deep learning has developed rapidly due to its great potential, high compression ratio, and fast decompression speed. In particular, this type of deep learning image compression algorithm can be deployed on a low-cost hardware platform without restriction, which makes it a hot spot in the study of image compression algorithm.

Chao Dong et al. [18] proposed the ARCNN network, which is mainly used to solve the reconstruction problem after image compression. The introduction of this architecture also begins to show the prospects of deep learning in image compression. George Toderici et al. [19] proposed an RNN network architecture, which is based on the convolution process LSTM. The image compression ratio of this algorithm is determined by the repetition time of the residual autoencoder and the number of bits of the generated representation. George Toderici, Vincent D, [20] proposed a coding model for compressing the original image and the residual image. The iterative compression method can better reconstruct the image. However, this method has certain limitations. On the one hand, it will limit the R-D performance during training, and on the other hand, it has limitations in the performance of high-frequency residuals. Johannes Ballé et al. [21] proposed a rate-distortion optimized lossy compression network architecture. This method defines a new entropy loss function. By combining the autoencoder with Rate-Distortion optimization, the rate-distortion optimization curve is obtained. Then, the optimized compression model can be obtained by calculating the tangent value of the curve. Li et al. [22] proposed a new view that the local content in the image varies greatly, and bit allocation must be guided by the importance map, which is content-weighted, and the bit ratio of each part should be adapted to the local content. Mentzer et al. [23] added the importance map to the output of the encoder to perform spatial bit allocation, and explained the importance in a characterization form, showing that the number of channels used in different spatial positions of an image can be dynamically adjusted, so as to achieve the purpose of bit rate allocation. Duan et al. [24] proposed a content-aware image compression algorithm based on QoE. The algorithm compresses the image by generating an adversarial network, extracts the significant part from the input image, and realizes adaptive bit allocation on this basis. For the significant and non-significant parts, the mean square error (MSE) method and perceptual loss are used. Hu, J. et al. [25] combined neural networks with traditional compression techniques, and SVAC2 (which is drafted and maintained by VimicroAI and China's Ministry of Public Security) is further improved by adopting the reconstructed recovery CNN network. Minnen et al. [26] designed a context-adaptive entropy model for learning image compression by using autoregressive prior information, which obtained better results in all coders and decoders, achieved better compression efficiency, and further improved compression performance. Guo, Z. et al. [27] proposed a quantization error compensation method for an image compression framework based on end-to-end learning. The method uses Fourier series to approximate the periodic variation of quantization error, and adds Laplace noise to the quantized implicit vector during testing. The proposed method can be flexibly combined with different image compression methods based on end-to-end learning. Fu, H. et al. [28] proposed an efficient and effective image coding framework and designed an asymmetric paradigm in which the encoder uses three-stage MSRB to improve learning ability, while the decoder only uses one-stage MSRB, which reduces the decoder complexity and still achieves satisfactory performance. Wang, Z. et al. [29] proposed a new framework for CS measurement and reconstruction, namely, a multi-scale dilated convolutional neural network. During the measurement, all measurements are taken directly from a trained measurement network with a full convolutional structure and trained jointly with a reconstruction network from the input image. It does not need to be cut into blocks, thus effectively avoiding the block effect.

The above algorithm designs have good performance in special scenarios, but it still needs to be improved in the compression speed and compression rate, and it is difficult to face the complex and changeable timeliness and other common application scenarios due to the lack of generalization effect. At the same time, the widespread use of end-to-end smart devices urgently requires high-quality image compression algorithms with good performance and smaller memory usage.

This paper studies high-quality image compression technology. The compressed images are used for fast transmission of image information and to reduce memory usage. Therefore, high requirements are placed on compression speed and image details. In the past, the image compression algorithm based on autoencoder was designed [30]. Through reasonable allocation of positions, the image compression effect was significantly optimized. However, for some high-resolution images, the compression did not achieve good results and still had blurring and distortion problems. The generative adversarial network has excellent image compression performance for high-resolution images. Deep learning promotes the development of image compression technology, continuously improves the performance of image compression, and achieves better compression rate and better compression index. Therefore, it is very important to learn to establish a better image compression framework. Therefore, this paper re-designs content-weighted autoencoder as the basis of image compression, and deeply integrates it with the generative adversarial network to form a framework for high-quality image compression, so as to achieve the maximum preservation of image information at a faster compression speed and better compression rate. The following first introduces the overall network structure of the high-quality image compression algorithm designed in this paper, and then expands the key modules and the loss function used in detail. Finally, the training and use of this algorithm are explained.

3.1 Overall structure design of algorithm network

The network structure of the high-quality image compression algorithm designed in this paper includes the following main modules: content-weighted autoencoder, in which the decoding part is used as the generator G in the generative adversarial network to realize the function of the generator, and the compressed data output by the autoencoder can be used as the generation condition; importance map, in which Q(x) represents the importance map quantization process and M(x) represents the importance map mask calculation process; binary quantizer, which can set the activation function in the encoder and convert it with the activation function to generate the decoding result during decoding; multi-scale discriminator D_M and composite loss function L_cos. The above modules together realize high-quality image compression, and the overall structure design of the algorithm network is shown in Fig. 3.

3.2 Content-weighted autoencoder

The content-weighted autoencoder uses convolution operation to replace the traditional fully connected mode of encoding, which can achieve image compression based on a smaller bit rate and optimize the problem with entropy rate. Its structure includes two parts: encoding and decoding. Among them, the encoding part is a structure composed of a cascade combination of convolutional layers and residual modules, including 3 convolutional layers and 3 residual blocks. Each residual block has two convolutional layers and a ReLU function. The residual module is used to improve the anti-noise performance of the encoder; and the encoder designed in this paper does not add a normalization layer to avoid visual artifacts in smooth areas.

In the encoding process, the image is first input into the network, and after convolution through 64 convolution kernels Conv1 with a size of 8×8 and a stride of 4, it passes through a residual module Res1. Then it passes through a convolution layer of 128 convolution kernels Conv2 with a size of 4×4 and a stride of 2. After that, it passes through two residual modules Res2 and Res3, and the feature map is convolved with a 1×1 convolution kernel Conv3. Except for the last layer of the encoder which uses Sigmoid activation function, all convolutional layers use ReLU. The encoding process is shown in Fig. 4.

The encoding process is to add an activation function to the input signal $x=[{x_1},{x_2}, \cdot \cdot \cdot {x_n}]$ and map the data of the input signal to y, where y is the new data matrix. The mathematical principle is as shown in formula (1).

$$y=f(wx+b)$$

Wherein, f is the activation function, w is the mapping matrix, and b is the bias term of the encoding part.

The decoding part consists of an up-sampling layer and a deconvolution layer. The convolution layer is for feature extraction, and the deconvolution layer is for image reconstruction. Finally, through continuous iteration, the error between the output and the input is minimized to obtain the optimal autoencoder parameters. The feature extraction in the autoencoder is more efficient, and the parameter weights in the convolutional network are shared by neurons, so that the network complexity is easy to train the algorithm model, and the reconstruction quality of the compressed image is improved.

The decoding process is to restore the extracted effective features so that the result is close to the input signal x. The mathematical principle is as shown in formula (2).

$$x^{\prime}=f^{\prime}(w^{\prime}y+b^{\prime})$$

Wherein, $f^{\prime}$ is the mapping function, $w^{\prime}$ is the mapping matrix, and $b^{\prime}$ is the bias term of the decoding part.

3.3 Importance map

In the process of image compression, different regions have different compression difficulties, and smoother regions are easier to compress. However, regions with relatively rich textures are the important parts to obtain information [31], so bits should be allocated to parts with complex texture structures. In the process of extracting feature maps, different feature maps contain different information. The content-weighted importance maps can achieve better bit allocation and can control and optimize the compression rate.

The importance map is obtained from the input image through learning. The intermediate feature map can be obtained from the residual block of the encoder by processing the image, and then the importance map $F(x)$ is obtained by processing the convolution layer. The importance map extraction process is shown in Fig. 5.

In the network, if the input image is x, the output of the encoder is $E(x) \in {R^{h \times w \times n}}$, $F(x)$ is used to represent the importance map of size $h \times w$. When $\frac{{l - 1}}{L} \leqslant {F_{ij}} \leqslant \frac{l}{L}$, encode and store the output information from the first to the $\frac{{nl}}{L}$ bit. Among them, L represents the value of the importance, and $\frac{n}{L}$ represents the bit corresponding to each importance. The importance map is used to realize the allocation of bits. First, the size of the importance map $F(x)$ is recorded as $h \times w$, and the number of feature maps output by the encoder network is recorded as n. The importance map is quantized to become an integer less than n. Generate an importance feature mask m corresponding to $B(E(x))$, whose size is $h \times w \times n$. Here, ${f_{ij}}$ is recorded as an element in $F(x)$, and the process of obtaining the importance map by quantization is defined as formula (3).

$$Q({f_{ij}})=l - 1,{\text{ }}if{\text{ }}\frac{{l - 1}}{L} \leqslant {f_{ij}} \leqslant \frac{l}{L},{\text{ }}l=1, \cdot \cdot \cdot ,L$$

After quantizing the importance map, the importance feature mask m is calculated by formula (4).

$${m_{kij}}=\left\{ \begin{gathered} 1 \hfill \\ n \hfill \\ \end{gathered} \right.{\text{ }}if{\text{ }}k \leqslant \frac{n}{l}Q({f_{ij}})$$

The final encoding result of the input image x can be represented by $c=M \otimes B$, where the symbol $\otimes$ represents element multiplication. In this way, the content-weighted importance map is obtained, which guides the generation of images with clearer textures.

In the process of back propagation, the gradient still needs to be calculated. The feature map is convolved to generate the importance map. The importance feature mask is generated by the valuer, which causes the gradient of most areas to be zero. To calculate the gradient of the mask of the element ${p_{ij}}$ in the importance map, see formula (5).

$${m_{kij}}=\left\{ \begin{gathered} 1 \hfill \\ 0 \hfill \\ \end{gathered} \right.{\text{ }}if{\text{ }}\left| {\frac{{kl}}{n}} \right| \leqslant L{p_{ij}}$$

3.4 Binary quantizer

After encoding the image, a binary quantizer is used to complete the quantization process. The activation function is the Sigmoid function, which takes values between [0,1]. After nonlinear transformation, the value output by the encoder should also be between [0,1]. In forward propagation, the activation value greater than 0.5 can be defined as 1, and the activation value less than 0.5 can be defined as 0, as shown in formula (6).

$$B({e_{ij}})=l,{\text{ }}if{\text{ }}\frac{1}{2}<{e_{ij}}<\frac{{l+1}}{2},{\text{ }}l=0,1$$

In back propagation, the gradient is calculated by the chain rule, which will cause the gradient to be almost equal to 0 everywhere during back propagation. In order to solve this problem of gradient descent in back propagation, this paper designs a gradient back propagation function, which is shown in formula (7).

$$\widetilde {B}(x)=x,{\text{ 0}}<x<1$$

3.5 Multi-scale discriminator

The discriminator is the core of the generative adversarial network. Through adversarial training with the generator, the ability to distinguish the authenticity of the generated image is improved [32]. In order to obtain better compression and visual effects, a relatively large receptive field is required. To achieve this goal, a large convolution kernel or a more complex network is required, which will lead to overfitting, so a better convolutional network system needs to be introduced. The multi-scale discriminator can collect feature data at each scale, obtain a better global view and more accurate detail information, and fuse the data at each level through the multi-scale discriminator, so that the compressed image generated is as close to the original image as possible.

When the image data generated by the content-weighted autoencoder is input into the multi-scale discriminator, the pooling layer will down-sample the input data at different scales to obtain images of three different resolutions, and then use three discriminator networks to process the images of the three different scales. Among them, the low-resolution discriminator can obtain a larger field of view when training, and the high-resolution discriminator can minimize image distortion when training, and the generated compressed image texture is clearer. The network structure of the multi-scale discriminator is shown in Fig. 6 below.

The multi-scale discriminator obtains better identification ability by training the generated compressed image and the original image. The specific working principle of the multi-scale discriminator network is to obtain three different scale images by double and quadruple down-sampling the image obtained by the generator and the original image respectively. After that, the images are convolved through the discriminator module respectively. The discriminator modules at three different scales in the network structure have the same network structure, which consists of two convolutional layers, three convolutional block layers, and a Sigmoid function. Each convolutional block structure consists of a conv, a BN layer, and a Leaky-Relu. The number of convolutional layer operations increases successively. The first layer n = 128, the second layer n = 256, and the third layer n = 512. In the discriminator structure, the convolution kernels in all convolutional networks are 4×4. The step size in the first convolutional layer and the convolutional block is 2; the step size of the last convolutional layer is 1. Finally, the multi-scale discriminator analyzes and judges the image, fuses the image of each scale, and produces an output result, which is True if valid and Fake if invalid.

3.6 Composite loss function

The image compression design based on unsupervised learning introduces adversarial network into the end-to-end framework for generative image compression, so the loss function consists of content-weighted autoencoder loss function, decoder loss function, feature matching loss function and multi-scale discriminator loss function.

In the process of image data compression and reconstruction by content-weighted autoencoder, errors will be generated between input data and reconstructed data, that is, data volume will be lost. To achieve better learning and extraction of image data features, it is necessary to constantly adjust the distortion and bit rate of image reconstruction. The rate-distortion function is optimized and used as its loss function, as shown in formula (8).

$${L_C}={L_D}+\alpha {L_R}$$

Where, ${L_D}$ represents the rate-distortion loss, α is the weight used to adjust the bit rate, ${L_R}$ and is the rate loss.

The rate-distortion loss is expressed using the L₂ norm square, as shown in formula (9).

$${L_D}=\left\| {{{x^{\prime}}_n} - {x_n}} \right\|_{2}^{2}$$

Where, ${x^{\prime}_n}$ represents the reconstructed image, ${x_n}$ represents the input image.

In the network structure of the autoencoder, the rate loss is defined as the entropy of the intermediate feature map, and the encoder hidden space data stored is related to the degree of concentration of the quantized data, so the entropy of the intermediate data is selected to define ${L_R}$, see formula (10).

$${L_R}= - E\left[ {{{\log }_2}{P_q}} \right]$$

In the formula ${P_q}=\int_{{x - \frac{1}{2}}}^{{x+\frac{1}{2}}} {{P_d}}$, ${P_d}$ represents the probability density function of the original data.

Therefore, the content-weighted autoencoder loss function can be expressed as formula (11).

$${L_C}=\left\| {{{x^{\prime}}_n} - {x_n}} \right\|_{2}^{2}+\alpha \left\{ {\left. { - E\left[ {{{\log }_2}{P_q}} \right]} \right\}} \right.$$

The decoder is a generator in the generative antagonism network, which needs to allocate bits in the process of image compression. The rate-distortion function can optimize the balance between the reconstruction function and the bit rate. The rate-distortion function formula is (12).

$${L_d}+\beta R={L_d}+\beta H(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{w} )$$

The loss function of the optimized generator is formula (13).

$${L_G}={E_{x\sim {p_x}}}\left[ {\lambda R+d(x,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ) - \beta lb(D(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ,y))} \right]$$

Wherein, $d(x,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} )$ is the loss, $\lambda$ and $\beta$ are weight parameters.

The feature matching loss is represented by MAE (mean absolute error) here, which is less susceptible to outliers than MSE (mean square error), as shown in formula (14).

$${L_{FM}}=E\sum\limits_{{i=1}}^{{L1}} {\frac{1}{{{N_i}}}} \left[ {\left\| {F_{D}^{i}(x) - F_{D}^{i}(G(z))} \right\|} \right]$$

The loss function of the multi-scale discriminator is defined in formula (15).

$${L_M}={E_{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} \sim {p_g}}}\left[ {D(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} )} \right] - {E_{x\sim {p_r}}}\left[ {D(x)} \right]+\lambda {E_{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} \sim {p_{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} }}}}\left[ {{{({{\left\| {{\nabla _{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} }}D(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} )} \right\|}_2} - 1)}^2}} \right]$$

Therefore, the above loss functions together constitute a composite loss function, which can effectively improve the quality and effect of image compression generation from many aspects. The composite loss function is defined as formula (16).

$${L_{com}}=\rho {L_C}+\upvarphi {L_G}+\phi {L_{FM}}+\psi {L_M}$$

Wherein, $\rho$,$\upvarphi$, $\phi$ and $\psi$, are weight parameters. According to the experimental platform environment, the weight parameters are continuously adjusted through experiments to achieve the best image compression effect. The weight parameters selected in this paper are 0.5, 0.5, 5, and 3, respectively.

3.7 Algorithm training process

The specific process of algorithm training is as follows:

Step 1: Use the paired original image and input image as training data, and send the input image to the generator after passing through the encoder, binary quantizer, and importance map calculation to generate a compressed image;

Step 2: Send the generated compressed image and the original image to the multi-scale discriminator D_M, and the multi-scale discriminator D_M will discriminate the two, and judge whether the result has achieved the compression effect. If it reaches the standard, it will output the compressed image. If it does not reach the standard, it will continue to return the reconstructed image until a usable compressed image is generated. According to the results, the multi-scale discriminator loss, decoder loss, and content-weighted loss are calculated;

Step 3: Compare the generated compressed image with the original image, and calculate the feature matching loss;

Step 4: Back propagate, according to the losses calculated in steps 2 and 3 above, update the multi-scale discriminator D_M and generator G parameters respectively;

Step 5: Execute steps 1 to 4, where the input image is given as x in the encoder, and the encoder output is obtained by analyzing and transforming the input signal, which is recorded as$E(x) \in {R^{h \times w \times n}}$. Here $h \times w$ is the size, and n represents the number of feature map. The obtained is quantized by the valuer. Based on the characteristics of the binary valuer, the part of the output data greater than 0.5 is marked as 1, and the rest is marked as 0. The feature map in the encoder is extracted and the network convolution operation is performed separately to obtain the importance map, which is recorded as $F(x)$. Similarly, the quantization operation is carried out on $F(x)$, and the importance mask of the same size is generated after the quantization process as that after $E(x)$ quantization. The obtained importance mask is combined with the binary code generated by the valuer output of the encoder, so that the image can better preserve important information, and finally an image compression code is obtained. The decoder is symmetrical with the encoder structure, and the same analysis and transformation are performed to obtain the output result of the decoder and generate a compressed image. When the parameters of D_M are updated, the discrimination of effective generation compression can be close to 1, and the discrimination of invalid generation compression can be close to 0, that is, the multi-scale discriminator D_M is maximized and optimized. During the training of the generative network, the generator G is connected in series with the multi-scale discriminator D_M, and the error generated is passed to the generative network. At this time, the parameters of the generative network need to be minimized, that is, the generator G is minimized and optimized. In this process, the generator G and the multi-scale discriminator D_M form a dynamic game, and the loop is jumped out after reaching the Nash equilibrium. The generated compressed image is judged to be infinitely close to the real original image.

Step 6: Finally, adjust the parameters of each step according to specific needs, and the algorithm network outputs a usable high-quality compressed image.

4.1 Experimental platform

The experimental verification of the algorithm in this paper requires a good experimental platform as a basic condition, mainly including key devices such as CPU, GPU, RAM and programming framework. The specific parameters are shown in Table 1.

Table 1

Experimental basic environment configuration
Name	Parameter details
CPU	Intel i7 13700F
GPU	Nvidia RTX 3060Ti 8GB
RAM	32GB
Operating system	Windows 10
Compilation tool	Pycharm 2021.3.3
Program framework	Tensorflow-2.11.0

4.2 Evaluation indicators of compressed images

Whether the image compression is excellent and whether it can maintain good compression effect while retaining image information to the greatest extent requires scientific and objective evaluation indicators to compare and analyze the algorithm in this paper with other image compression algorithms. The objective evaluation indicators used in this algorithm include peak signal-to-noise ratio (PSNR), multi-scale structural similarity (MS-SSIM) and compression ratio BPP.

(1) PSNR

PSNR is an important indicator commonly used to measure the quality of image or video generation or compression [33]. The comparison between the compressed image and the original image is generally to compare the pixels in each part of the image. PSNR can intuitively represent the pixel difference between the original image and the reconstructed image. The larger its value, the smaller the image distortion, that is, the better the image quality, as shown in formula (17).

$$PSNR=20\cdot {\log _{10}}\left( {\frac{{MAX(x)}}{{\sqrt {MSE} }}} \right)$$

Where, $MAX(x)$ represents the maximum value of the pixel grayscale in image x. The mean square error (MSE) can directly reflect the accuracy loss of the original image and the reconstructed image [34]. The mathematical principle is to take the square average of the difference between the original image and the reconstructed image. The sizes of images x and $x^{\prime}$ are both $m \times n$, $x(i,j)$ is the grayscale value of the pixel at position $(i,j)$, and $x^{\prime}(i,j)$ is the gray value of the pixel after compression. The mathematical expression of MSE is shown in formula (18).

$$MSE=\frac{1}{{mn}}{\sum\limits_{{i=0}}^{{m - 1}} {\sum\limits_{{j=0}}^{{n - 1}} {\left[ {x(i,j) - x^{\prime}(i,j)} \right]} } ^2}$$

108

(2) MS-SSIM

MS-SSIM is an important indicator for evaluating the quality of image compression [35]. It is obtained by multi-scale calculation of SSIM. The purpose is to reduce the impact of different image resolutions on the performance of the compression algorithm. The resolution of the original image is denoted as Scale₁, and the resolution of the image after iterations is denoted as Scale_M. The contrast measure structure measure in SSIM is calculated at each iterative scale, and the brightness measure is only calculated at the last scale Scale_M. By synthesizing the results of measurements at different scales, the comprehensive index MS-SSIM is obtained as shown in formula (19).

$$MS - SSIM(x,x^{\prime})={\left[ {l(x,x^{\prime})} \right]^{\alpha M}}\cdot {\prod\limits_{{j=1}}^{M} {\left[ {c(x,x^{\prime})} \right]} ^{\beta j}}\cdot {\left[ {s(x,x^{\prime})} \right]^{\gamma j}}$$

119

Where, $\alpha M$, $\beta j$ and $\gamma j$ are used to adjust the relative importance of different components. The value range of MS-SSIM is between 0 and 1. The closer the value is to 1, the higher the similarity between the reconstructed image and the original image, and the better the image quality.

(3) BPP

In this paper, BPP is used to measure the size of image compression, and the purpose of compression is that the smaller the BPP, the better [36]. The essence of BPP is the average bit required for each pixel at each position of the image during the compression encoding process. Its mathematical expression is shown in formula (20).

$$BPP=\frac{{pn^{\prime}}}{{pn}}$$

Where, the number of original pixels is $pn$, and the number of compressed pixels is $pn^{\prime}$.

4.3 Dataset preparation and training strategy

In order to better train the high-quality image compression algorithm designed in this paper, the ImageNet training dataset is selected. At the same time, in order to objectively verify the superiority of the algorithm in this paper compared with other algorithms, data sets containing different scenes and different topics and data sets with rich texture information are selected as validation data sets. The specific information of the dataset used in the experiment is shown in the following Table 2.

Table 2

Details of the experimental dataset
Classification	Dataset	Number of images	Features
Training set	ImageNet	14,197,122	Large number, high resolution, contains more irrelevant noise
Validation set	Kodak24	24	24 image compression scenes of different types and difficulties
	Cityscapes	5000	Professional images of urban environments
	Urban100	100	Contains challenging urban scenery with details in different frequency bands

The algorithm model in this paper uses RTX3060 GPU for network training. The program is written based on Tensorflow-2.11.0 deep learning framework, and the CPU model is i7-13700F. The batch size is set to 2, and the Adam optimizer is used for training optimization. The network convergence speed is optimized by momentum and adaptive learning rate. The parameters are set to β₁ = 0.5, β₂ = 0.99, and ε = 0.5. The initial learning rate is set to 0.0003, which is reduced to 0.0001 after training 100 epochs, and a total of 300 epochs are trained.

4.4 Verification of image compression effect on Kodak dataset

The specific performance of the high-quality image compression algorithm in this paper needs to be verified on the available test set. The following first selects the images of Kodak24 dataset to demonstrate the visual compression effect, and uses the objective evaluation indicators BPP, PSNR and MS-SSIM described in Section 4.2 to measure it. Figure 7 shows two sets of comparative experiments, comparing the effects of the compression algorithm in this paper with other advanced compression methods. The sequence from left to right is: uncompressed original image, JPEG2000 compression results, compression results of references [27] and [28], and compression results of the proposed algorithm.

In the first group of experiments, our algorithm is 31.58% lower than the JPEG2000 algorithm in BPP, 6.48% higher in PSNR, and 5.13% higher in MS-SSIM. Compared with the advanced algorithm in the literature [27], our algorithm has reduced the BPP index by 27.27%, significantly improved the PSNR index by 15.29%, and improved the MS-SSIM index by 7.66%.

In the second group of experiments, our algorithm has reduced the BPP index by 30.15%, improved the PSNR index by 8.01%, and improved the MS-SSIM index by 6.09% compared with the JPEG2000 algorithm. Compared with the advanced algorithm in the literature [28], our algorithm has reduced the BPP index by 20.78%, improved the PSNR index by 3.22%, and improved the MS-SSIM index by 5.62%.

The experimental results show that the visualization effect shows that the traditional image compression algorithm JPEG2000 will have visual artifacts such as blur, and other algorithms will have distortion in some samples. The image compression network designed in this paper can better preserve the feature information of the image, and at the same time, the texture details are not seriously lost. The results show that our algorithm has good compression performance and compression effect.

4.5 Image compression effect verification on Cityscapes dataset

In order to verify that the high-quality image compression algorithm in this paper has good generalization ability, different test sets are selected for necessary verification. Therefore, images of Cityscapes dataset are selected to show the visual compression effect, and the above objective evaluation indicators BPP, PSNR and MS-SSIM are used to measure it. Figure 8 shows the comparison of the compression algorithm in this paper with other mainstream compression methods. The arrangement order from left to right is: uncompressed original image, JPEG2000 compression result, reference [29] compression result and compression result of this method.

The objective indicators of this group of experiments show that our algorithm is 45.12% lower than the JPEG2000 algorithm in BPP, 6.35% higher in PSNR, and 4.93% higher in MS-SSIM. Our algorithm is 30.77% lower than the advanced algorithm in reference [29] in BPP, 5.58% higher in PSNR, and 3.99% higher in MS-SSIM.

The experimental results show that from the perspective of visual image effects, JPEG2000 is not much different from the literature [29], and the overall effect is good, but the details still need to be improved, and there are some blurring phenomena on small objects in the distance. The algorithm in this paper has a good processing of detail blur artifacts, and can achieve a clearer effect close to the original image, whether it is the shadow in the image or the outline of the person, and the comprehensive performance of objective indicators is far better than other mainstream algorithms.

4.6 Verification of visual results of restoration and reconstruction on the Urban100 dataset

In order to verify that the image compressed by the algorithm in this paper has a good reconstruction effect, the Urban100 dataset is tested in the end-to-end training process, and the visual comparison results are obtained. The experimental results of some samples are shown in Fig. 9, where the first row of images are the original images and the second row of images are the reconstructed images.

The experimental results show that the reconstructed image on the Urban100 dataset perfectly restores the original image and has a good visual effect. Specific performance such as the image edge of (a) of the building and the window with the boundaries between the reduction is clear, no split phenomenon; The clouds and architectural corners in the image (b) are clearly restored in color and exposure, and there is no problem of blur and distortion. In image (c), the edge of the leaves, the light reflected by the glass, and the distant spire all have a satisfactory reconstruction effect, and some of the image details prone to blur artifacts, such as overlapping leaves, shaded parts of trees and houses, are well handled; In image (d), the light and dark texture of the yellow building is well restored, and the edge between the cloud and the building does not appear blurred.

4.7 Ablation experiment results verification

The key functional modules designed in Section 3 above have different effects on image compression performance, so the ablation comparison experiment is designed in this section to test the actual situation of different key modules on compression efficiency. The experiment includes 4 test groups and 1 control group. The specific information is as follows: 1. Replace the content-weighted autoencoder with a normal encoder; 2. Replace the importance map network with a normal feature extraction convolutional network; 3. Replace the binary quantizer with a normal quantizer; 4. Replace the multi-scale discriminator with a general discriminator; 5. Complete network of the high-quality image compression algorithm designed in this paper.

The above five groups of tests are all carried out in the same environment to ensure that there is no interference from other factors. The initial state of the test environment will be restored before each test to ensure the objectivity and fairness of the test results. The test results are shown in the Table 3.

Table 3

Test results of image batch file compression
No.	Content-weighted autoencoder	Importance map	Binary quantizer	Multi-scale discriminator	Compressed Size	Original Size	Compression Ratio
1	ⅹ	√	√	√	504,509,314 bytes	632,618,579 bytes	0.7975
2	√	ⅹ	√	√	221,275,152 bytes	261,585,473 bytes	0.8459
3	√	√	ⅹ	√	1,169,481,629 bytes	1,308,437,714 bytes	0.8938
4	√	√	√	ⅹ	121,733,768 bytes	258,074,555 bytes	0.4717
5	√	√	√	√	156,688,521 bytes	435,245,891 bytes	0.3600

According to the test results, different key modules can affect the overall image compression quality to varying degrees. Among them, the multi-scale discriminator has the greatest impact on the image compression efficiency, and other modules have relatively low image compression efficiency. Therefore, the effective implementation of high-quality image compression algorithms requires the close cooperation of all key modules. Finally, the designed high-quality image compression algorithm achieves good compression efficiency and can compress a large number of images quickly.

This paper studies the high-quality image compression algorithm based on unsupervised learning. Traditional image compression algorithms require more preprocessing to obtain prior knowledge, which greatly reduces the efficiency of image compression, and the compressed images obtained cannot preserve the details of the original images to a large extent. Existing image compression algorithms usually use relatively simple data sets, which leads to unstable performance or even sharp decline of some algorithms that perform well in training when applied to other data sets.

To solve the above problems, this paper first constructs the overall network structure of the image high-quality algorithm based on unsupervised learning, proposes a content-weighted convolutional autoencoder network module, realizes image compression based on a smaller bit rate, and use an importance map network and binary quantizer to reasonably guide the allocation of spatial bits. A generative adversarial network framework based on a multi-scale discriminator is designed to achieve end-to-end training. Through training with different weights, the accuracy distortion of each scale is minimized to obtain higher quality compressed images.

Experimental results show that the algorithm performance of the proposed algorithm model on more than Kodak24, Cityscapes and Urban100 datasets is better than that of the traditional algorithm JPEG2000 and other mainstream advanced algorithms. Its advantage is that the detailed texture and semantic information of original images can be saved better. At the same time, the efficient processing of image compression is realized.

Acknowledgement

This research was supported by the Flight Dynamics and Control Laboratory of Beijing Institute of Technology.

A. Mubashar, K. Asghar, A. R. Javed, M. Rizwan, G. Srivastava, T. R. Gadekallu, D. Wang, and M. Shabbir, Storage and proximity management for centralized personal health records using an ipfs-based optimization algorithm, Journal of Circuits, Systems and Computers, 31(01)(2022) 2250010.
K. Harley, and R. Cooper, Information Integrity: Are We There Yet?, ACM Computing Surveys (CSUR) 54(2)(2021) 1-35.
Y. Li, and Y. Xie, Is a picture worth a thousand words? An empirical study of image content and social media engagement, Journal of marketing research 57(1)(2020) 1-19.
Y. Li, J. Ma, and Y. Zhang, Image retrieval from remote sensing big data: A survey, Information Fusion 67(2021) 94-115.
U. Jayasankar, V. Thirumal, and D. Ponnurangam, A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications, Journal of King Saud University-Computer and Information Sciences 33(2)(2021) 119-140.
Y. Hu, W. Yang, Z. Ma, and J. Liu, Learning end-to-end lossy image compression: A benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence 44(8)(2021) 4194-4211.
X. Chai, J. Bi, Z. Gan, X. Liu, Y. Zhang, Y. Chen, Color image compression and encryption scheme based on compressive sensing and double random encryption strategy, Signal Processing 176(2020) 107684.
D. He, Z. Yang, W. Peng, R. Ma, H. Qin, Y. Wang, Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding, in: Proc. 2020 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
A. Mumuni, F. Mumuni, Data augmentation: A comprehensive survey of modern approaches, Array 16(2022), 100258.
R. R. Gajjala, S. Banchhor, A. M. Abdelmoniem, A. Dutta, M. Canini, P. Kalnis, Huffman coding based encoding techniques for fast distributed deep learning, in: Proc. 2020 the 1st Workshop on Distributed Machine Learning, 2020.
A. R. Idrais, I. Aljarrah, O. Al-Khaleel, A spatial image compression algorithm based on run length encoding, Bulletin of Electrical Engineering and Informatics 10(5)(2021) 2607-2616.
A. Said, Introduction to arithmetic coding--theory and practice, arXiv preprint arXiv:2302.00819 (2023).
B. Millidge, A. Seth, C. L. Buckley, Predictive coding: a theoretical and experimental review, arXiv preprint arXiv:2107.12979 (2021).
F. Khelaifi, H. He, Perceptual image hashing based on structural fractal features of image coding and ring partition, Multimedia Tools and Applications 79(27)(2020) 19025-19044.
B. Khalid, M. Majid, I. F. Nizami, S. M. Anwar, and M. Alnowami, EEG compression using motion compensated temporal filtering and wavelet based subband coding, IEEE Access 8(2020) 102502-102511.
X. Luo, H. Talebi, F. Yang, M. Elad, P. Milanfar, The rate-distortion-accuracy tradeoff: Jpeg case study, arXiv preprint arXiv:2008.00605 (2020).
M. A. Gungor, K. Gencol, Developing a compression procedure based on the wavelet denoising and JPEG2000 compression, Optik 218(2020) 164933.
C. Dong, Y. Deng, C. C. Loy, X. Tang, Compression artifacts reduction by a deep convolutional network, in: Proc. 2015 the IEEE international conference on computer vision, 2015.
G. Toderici, S. M. O'Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, and R. Sukthankar, Variable rate image compression with recurrent neural networks, arXiv preprint arXiv:1511.06085 (2015).
G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, M. Covell, Full resolution image compression with recurrent neural networks, in: Proc. 2017 the IEEE conference on Computer Vision and Pattern Recognition, 2017.
J. Ballé, V. Laparra, E. P. Simoncelli, End-to-end Optimized Image Compression, arXiv preprint arXiv:1611.01704 (2016).
M. Li, W. Zuo, S. Gu, D. Zhao, D. Zhang, Learning Convolutional Networks for Content-weighted Image Compression, in: Proc. 2018 the Conference on Computer Vision and Pattern Recognition, 2018.
F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, L. V. Gool, Conditional probability models for deep image compression, in: Proc. 2018 the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
Y. Duan, Y. Zhang, X. Tao, C. Han, M. Xu, C. Yang, J. Lu, Content-aware deep perceptual image compression, in: Proc. 2019 the 11th International Conference on Wireless Communications and Signal Processing (WCSP), 2019.
J. Hu, M. Li, C. Xia, Y. Zhang, Combine traditional compression method with convolutional neural networks, in: Proc. 2018 the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018.
D. Minnen, J. Ballé, G. Toderici, Joint Autoregressive and Hierarchical Priors for Learned Image Compression, arXiv preprint arXiv:1809.02736 (2018).
S. Jiang, H.Yuan, S. Li, X. Mao, Fourier Series and Laplacian Noise-Based Quantization Error Compensation for End-to-End Learning-Based Image Compression, in: Proc. 2023 the IEEE International Conference on Image Processing (ICIP), 2023.
H. Fu, F. Liang, J. Liang, B. Li, G. Zhang, J. Han, Asymmetric Learned Image Compression With Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering, IEEE Transactions on Circuits and Systems for Video Technology 33(8)(2023) 4309-4321.
Z. Wang, Z. Wang, C. Zeng, Y. Yu, X. Wan, High-Quality Image Compressed Sensing and Reconstruction with Multi-scale Dilated Convolutional Neural Network, Circuits Syst Signal Process 42(3)(2023) 1593–1616.
D. Mishra, S. K. Singh, R. K. Singh, Wavelet-based deep auto encoder-decoder (wdaed)-based image compression, IEEE Transactions on Circuits and Systems for Video Technology 31(4)(2020) 1452-1462.
L. Zhu, D. Ji, S. Zhu, W. Gan, W. Wu, J. Yan, Learning statistical texture for semantic segmentation, in: Proc. 2021 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
H. Li, B. Li, S. Tan, J. Huang, Identification of deep network generated images using disparities in color components, Signal Processing 174(2020) 107616.
D. I. M. Setiadi, PSNR vs SSIM: imperceptibility quality assessment for image steganography, Multimed Tools Appl 80(6)(2021) 8423–8444.
S. Liu, Y. Huang, H. Yang, Y. Liang, W. Liu, End-to-end image compression method based on perception metric, Signal, Image and Video Processing 16(7)(2022) 1803-1810.
F. Ullah, J. Lee, S. Jamil, O. J. Kwon, Subjective assessment of objective image quality metrics range guaranteeing visually lossless compression. Sensors 23(3)(2023) 1297.
D. Nayak, K. B. Ray, T. Kar, C. Kwan, A novel saliency based image compression algorithm using low complexity block truncation coding, Multimedia Tools and Applications 82(30)(2023) 47367-47385.

No competing interests reported.

Download PDF

Editor assigned by journal
24 Aug, 2024
Submission checks completed at journal
23 Aug, 2024
First submitted to journal
20 Aug, 2024

You are reading this latest preprint version

High-quality Image Compression Algorithm Design Based on Unsupervised Learning

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Work

3 Model establishment

3.1 Overall structure design of algorithm network

3.2 Content-weighted autoencoder

3.3 Importance map

3.4 Binary quantizer

3.5 Multi-scale discriminator

3.6 Composite loss function

3.7 Algorithm training process

4 Verification and analysis

4.1 Experimental platform

4.2 Evaluation indicators of compressed images

4.3 Dataset preparation and training strategy

4.4 Verification of image compression effect on Kodak dataset

4.5 Image compression effect verification on Cityscapes dataset

4.6 Verification of visual results of restoration and reconstruction on the Urban100 dataset

4.7 Ablation experiment results verification

5 Conclusion

Declarations

References

Additional Declarations

Status:

Version 1