4.1 Datasets and metrics
Following previous work, in our experiments, we used the DIV2K [38] dataset, which is widely used for image recovery tasks and contains 800 high-quality RGB training images to train the model. For testing, we used five widely used benchmark datasets: Set5 [39], Set14 [40], BSD100 [41], Urban100 [42] and Manga109 [43]. We use two metrics, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [44], to evaluate the quality of super-resolution images. Based on the existing work, we calculate PSNR and SSIM on the Y channel of YCbCr converted from RGB.
4.2 Implementation details
Our model is trained on the RGB channel, and the LR images are generated by downsampling (\(\times 2\), \(\times 3\) and\(\times 4\)) the HR images in MATLAB using bicubic interpolation. In this paper, we use a randomly cropped HR image patch of size \(192\times 192\) from the HR image as input to our model, with the mini-batch size set to 64. We augment the training data with random horizontal flips and 90 rotations. Our model was trained using the ADAM optimizer [45] with momentum parameters \(\beta 1=0.9\), \(\beta 2=0.999\), \(ϵ={10}^{-8}\). The initial learning rate was set to \(5\times {10}^{-4}\) and was reduced by half after every \(2\times {10}^{5}\) iterations. When training the final model, the \(\times 2\) model was trained from scratch. After the model converges, we use it as a pre-trained model for other scales. In the IRN, we set the number of IRBs to 4. We implemented our network on the Pytorch framework and trained it on an NVIDIA RTX A5000 GPU.
4.3 Model analysis
In this subsection, we investigate the model parameters, the validity of the ESA, the effect of the activation function on the SR model and the validity of the IRN.
Model parameters. In order to construct a lightweight SR model, the parameters of the network are crucial. From Table 3, we can observe that our IRN achieves comparative or better performance compared to other state-of-the-art SR methods such as LAPAR-A (NeurIPS′21), SRFBN-S (CVPR′19), etc. We also visualize the trade-off analysis between performance and Multi-Adds/Parameters in Fig. 5 We can see that our IRN achieves a better trade-off between performance and computational cost.
Effectiveness of ESA. An ablation study was conducted and used to validate the effectiveness of the ESA module. As shown in Table 1, the IRN without ESA showed a significant performance degradation for a parameter drop of approximately 10%, and the complete IRN showed significant performance improvements on the Set5, Set14, BSD100, Urban100 and Manga109 datasets. The results show that the ESA module can effectively improve the performance of SR.
Table 1 Ablation studies of ESA
Method
|
Params[K]
|
Multi-Adds[G]
|
Set5
|
Set14
|
B100
|
Urban100
|
Manga109
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
IRN-woESA
|
470
|
26
|
32.05
|
0.8932
|
28.45
|
0.7998
|
27.44
|
0.7350
|
25.86
|
0.7807
|
30.20
|
0.9050
|
IRN
|
524
|
28
|
32.15
|
0.8942
|
28.53
|
0.7810
|
27.53
|
0.7361
|
25.98
|
0.7841
|
30.35
|
0.9069
|
A study of different activation functions. When introducing the ConvNeXt Block, we retain GELU as its activation function. However, most previous SR networks have used ReLU [46] or LeakyReLU [47] as the activation function. Therefore, we investigate the effects of these three activation functions on the SR model. The results in Table 2 show that among these activation functions, GELU obtains a significant performance improvement. Therefore, we chose to retain GELU, as the activation function in our model.
Table 2 Quantitative comparison of different activation functions
Method
|
Set5
|
Set14
|
B100
|
Urban100
|
Manga109
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
ReLU
|
32.06
|
0.8933
|
28.49
|
0.7804
|
27.50
|
0.7352
|
25.91
|
0.7819
|
30.27
|
0.9055
|
LeakyReLU
|
32.12
|
0.8938
|
28.48
|
0.7806
|
27.52
|
0.7355
|
25.94
|
0.7833
|
30.32
|
0.9061
|
GELU
|
32.15
|
0.8942
|
28.53
|
0.7810
|
27.53
|
0.7361
|
25.98
|
0.7841
|
30.35
|
0.9069
|
Comparison with state-of-the-art methods. We compare the proposed IRN with state-of-the-art lightweight SR methods, and Table 3 shows the quantitative comparison results for different scale factors. We also provide the number of parameters and Multi-Adds computed on an output of 1280 × 720. We can observe that our IRN compares well with other state-of-the-art SR methods, including SRCNN [10], FSRCNN [11], VDSR [5], DRCN [12], MemNet [6], SRDenseNet [7], DRRN [48], LapSRN [20], SelNet [49], CARN-M [14], CARN [14], SRMDNF [50], SRFBN-S [51] and LAPAR-A [17], for \(\times 3\) and \(\times 4\) models, outperforming other comparative methods on most data sets, especially for the \(\times 3\) model, where IRN uses fewer parameters and Multi-Adds, greatly outperformed other methods on all benchmark datasets.
Figure 6 shows a comparison of the visualization on the Set14 and Urban100 datasets at \(\times 4\). For the Urban100 “img_62” image, we can see that the grid structure is better recovered. It also demonstrates the validity of our IRN
Table 3
Comparisons on multiple benchmark datasets for lightweight networks. The Multi-Adds is calculated corresponding to a 1280 × 720 HR image. Bold/red/blue: our/best/second best results
Scale
|
Method
|
Params
|
MultiAdds
|
Set5
|
Set14
|
BSD100
|
Urban100
|
Manga109
|
|
SRCNN [10]
|
57K
|
53G
|
36.66/0.9542
|
32.42/0.9063
|
31.36/0.8879
|
29.50/0.8946
|
35.74/0.9661
|
|
FSRCNN [11]
|
12K
|
6G
|
37.00/0.9558
|
32.63/0.9088
|
31.53/0.8920
|
29.88/0.9020
|
36.67/0.9694
|
|
VDSR [5]
|
665K
|
613G
|
37.53/0.9587
|
33.03/0.9124
|
31.90/0.8960
|
30.76/0.9140
|
37.22/0.9729
|
|
DRCN [12]
|
1,774K
|
17,974G
|
37.63/0.9588
|
33.04/0.9118
|
31.85/0.8942
|
30.75/0.9133
|
37.63/0.9723
|
|
MemNet [6]
|
677K
|
2,662G
|
37.78/0.9597
|
33.28/0.9142
|
32.08/0.8978
|
31.31/0.9195
|
-
|
|
DRRN [48]
|
297K
|
6,797G
|
37.74/0.9591
|
33.23/0.9136
|
32.05/0.8973
|
31.23/0.9188
|
37.92/0.9760
|
|
LapSRN [20]
|
813K
|
30G
|
37.52/0.9590
|
33.08/0.9130
|
31.80/0.8950
|
30.41/0.9100
|
37.27/0.9740
|
×2
|
SelNet [49]
|
974K
|
226G
|
37.89/0.9598
|
33.61/0.9160
|
32.08/0.8984
|
-
|
-
|
|
CARN-M [14]
|
412K
|
91G
|
37.53/0.9583
|
33.26/0.9141
|
31.92/0.8960
|
31.23/0.9193
|
-
|
|
CARN [14]
|
1,592K
|
223G
|
37.76/0.9590
|
33.52/0.9166
|
32.09/0.8978
|
31.92/0.9256
|
-
|
|
SRMDNF [50]
|
1,513K
|
348G
|
37.79/0.9600
|
33.32/0.9150
|
32.05/0.8980
|
31.33/0.9200
|
-
|
|
SRFBN-S [51]
|
282K
|
680G
|
37.78/0.9597
|
33.35/0.9156
|
32.00/0.8970
|
31.41/0.9207
|
38.06/0.9757
|
|
LAPAR-A [17]
IRN(Ours)
|
548K
503K
|
171G
106G
|
38.01/0.9605
38.08/0.9607
|
33.62/0.9183
33.64/0.9181
|
32.19/0.8999
32.20/0.8999
|
32.10/0.9283
32.11/0.9282
|
38.67/0.9772
38.83/0.9773
|
|
SRCNN [10]
|
57K
|
53G
|
32.75/0.9090
|
29.28/0.8209
|
28.41/0.7863
|
26.24/0.7989
|
30.59/0.9107
|
|
FSRCNN [11]
|
12K
|
5G
|
33.16/0.9140
|
29.43/0.8242
|
28.53/0.7910
|
26.43/0.8080
|
30.98/0.9212
|
|
VDSR [5]
|
665K
|
613G
|
33.66/0.9213
|
29.77/0.8314
|
28.82/0.7976
|
27.14/0.8279
|
32.01/0.9310
|
|
DRCN [12]
|
1,774K
|
17,974G
|
33.82/0.9226
|
29.76/0.8311
|
28.80/0.7963
|
27.15/0.8276
|
32.31/0.9328
|
|
MemNet [6]
|
677K
|
2,662G
|
34.09/0.9248
|
30.00/0.8350
|
28.96/0.8001
|
27.56/0.8376
|
-
|
|
DRRN [48]
|
297K
|
6,797G
|
34.03/0.9244
|
29.96/0.8349
|
28.95/0.8004
|
27.53/0.8378
|
32.74/0.9390
|
×3
|
SelNet [49]
CARN-M [14]
|
1,159K
412K
|
120G
46G
|
34.27/0.9257
33.99/0.9236
|
30.30/0.8399
30.08/0.8367
|
28.97/0.8025
28.91/0.8000
|
-
27.55/0.8385
|
-
-
|
|
CARN [14]
|
1,592K
|
119G
|
34.29/0.9255
|
30.29/0.8407
|
29.06/0.8034
|
28.06/0.8493
|
-
|
|
SRMDNF [50]
|
1,530K
|
156G
|
34.12/0.9250
|
30.04/0.8370
|
28.97/0.8030
|
27.57/0.8400
|
-
|
|
SRFBN-S [51]
|
376K
|
832G
|
34.20/0.9255
|
30.10/0.8372
|
28.96/0.8010
|
27.66/0.8415
|
33.02/0.9404
|
|
LAPAR-A [17]
|
594K
|
114G
|
34.36/0.9267
|
30.34/0.8421
|
29.11/0.8054
|
28.15/0.8523
|
33.51/0.9441
|
|
IRN(Ours)
|
512K
|
48G
|
34.46/0.9276
|
30.37/0.8430
|
29.11/0.8056
|
28.18/0.8529
|
33.70/0.9452
|
|
SRCNN [10]
|
57K
|
53G
|
30.48/0.8628
|
27.49/0.7503
|
26.90/0.7101
|
24.52/0.7221
|
27.66/0.8505
|
|
FSRCNN [11]
|
12K
|
5G
|
30.71/0.8657
|
27.59/0.7535
|
26.98/0.7150
|
24.62/0.7280
|
27.90/0.8517
|
|
VDSR [5]
|
665K
|
613G
|
31.35/0.8838
|
28.01/0.7674
|
27.29/0.7251
|
25.18/0.7524
|
28.83/0.8809
|
|
DRCN [12]
|
1,774K
|
17,974G
|
31.53/0.8854
|
28.02/0.7670
|
27.23/0.7233
|
25.14/0.7510
|
28.98/0.8816
|
|
MemNet [6]
|
677K
|
2,662G
|
31.74/0.8893
|
28.26/0.7723
|
27.40/0.7281
|
25.50/0.7630
|
-
|
|
DRRN [48]
|
297K
|
6,797G
|
31.68/0.8888
|
28.21/0.7720
|
27.38/0.7284
|
25.44/0.7638
|
29.46/0.8960
|
|
LapSRN [20]
|
813K
|
149G
|
31.54/0.8850
|
28.19/0.7720
|
27.32/0.7280
|
25.21/0.7560
|
29.09/0.8845
|
×4
|
SelNet [49]
SRDenseNet [7]
|
1,417K
2,015K
|
83G
390G
|
32.00/0.8931
32.02/0.8934
|
28.49/0.7783
28.50/0.7782
|
27.44/0.7325
27.53/0.7337
|
-
26.05/0.7819
|
-
-
|
|
CARN-M [14]
|
412K
|
33G
|
31.92/0.8903
|
28.42/0.7762
|
27.44/0.7304
|
25.62/0.7694
|
-
|
|
CARN [14]
|
1,592K
|
91G
|
32.13/0.8937
|
28.60/0.7806
|
27.58/0.7349
|
26.07/0.7837
|
-
|
|
SRMDNF [50]
|
1,555K
|
89G
|
31.96/0.8930
|
28.35/0.7770
|
27.49/0.7340
|
25.68/0.7730
|
-
|
|
SRFBN-S [51]
|
483K
|
1,037G
|
31.98/0.8923
|
28.45/0.7779
|
27.44/0.7313
|
25.71/0.7719
|
29.91/0.9008
|
|
LAPAR-A [17]
|
659K
|
94G
|
32.15/0.8944
|
28.61/0.7818
|
27.61/0.7366
|
26.14/0.7871
|
30.42/0.9074
|
|
IRN(Ours)
|
524K
|
28G
|
32.21/0.8952
|
28.61/0.7822
|
27.59/0.7370
|
26.04/0.7852
|
30.49/0.9091
|
4.4 Running Time
As shown in Table 4, our method has the lowest number of parameters and running time compared to LAPAR-A (NeurIPS′21) and IMDN (ACM′19). For the average running time, as it is related to the optimization of the code and the computation of specific testbeds for different operators (more \(1\times 1\) convolutions are used in our IRN than in IMDN and LAPAR-A). Therefore, our method does not differ much from the runtime of IMDN (ACM′19).
Table 4 Comparison of our IRN with IMDN, LAPAR-A's 3x SR. Run times are the average of 10 runs on the Urban100 test set
Method
|
Params[K]
|
Runtime[ms]
|
Set5
|
Set14
|
BSD100
|
Urban100
|
Manga109
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
PSNR
|
SSIM
|
IMDN
|
703
|
92.6
|
34.36
|
0.9270
|
30.32
|
0.8417
|
29.09
|
0.8046
|
28.17
|
0.8519
|
33.61
|
0.9445
|
LAPAR-A
|
594
|
103.2
|
34.36
|
0.9267
|
30.34
|
0.8421
|
29.11
|
0.8054
|
28.15
|
0.8523
|
33.51
|
0.9441
|
IRN
|
512
|
89.4
|
34.46
|
0.9276
|
30.37
|
0.8430
|
29.11
|
0.8056
|
28.18
|
0.8529
|
33.70
|
0.9452
|