fannymonori/gsoc_2019_fmonori.md

## gsoc_2019_fmonori.md

      
    Raw
  

              gsoc_2019_fmonori.md
            
          
    Google Summer of Code 2019 with OpenCV

DNN based super-resolution module

Student: Fanny Monori

Mentor: Vladimir Tyan

Other student on the project: Xavier Weber

Other mentor on the project: Yida Wang
Link to accomplished work:

PR in the opencv_contrib repository: opencv/opencv_contrib
ESPCN implementation in TensorFlow: TF-ESPCN
LapSRN implementation in TensorFlow: TF-LapSRN

Goals

The goal of the project was to integrate super resolution (SR) functionality within the OpenCV contrib, that is capable of
doing SR with deep learning models.

Super Resolution (SR) is a subset of algorithms that aim to up-sample a lower quality (LR) image to a higher quality (HR) one.
It’s goal is to create an up-sampled copy that is as detailed and visually pleasing as possible. It is used in a wide range of fields,
such as medical image processing or surveillance camera stream processing.

The most basic form of SR uses interpolation algorithms, such as bicubic interpolation. There are more sophisticated
classic computer vision algorithms as well. For example, A+ (Adjusted Anchored Neighborhood Regression for Fast Super-Resolution)
is a well known one. But using deep learning models for super resolution is also a widely researched area,
as they generally achieve better accuracy. There are many popular type of models used, ranging from supervised learning
to unsupervised learning methods. Supervised methods are learning with LR and HR pairs, and they make use of popular deep
learning architectures, such as convolutional neural networks, or residual networks.

For the Google Summer of Code 2019 project I implemented two deep learning based models, and the other student, Xavier,
implemented another two. One of mine is ESPCN [1], which was one of the first convolutional neural network based model.
It is tiny, and the fastest amongst the four implemented. The other one I implemented is LapSRN [2], which is a model that
has a Laplacian pyramid structure, and it can output multiple scales in one inference run. Xavier implemented EDSR [3], and FSRCNN [4],
one really accurate, and one fast model.
Accomplished work

The resulting module is capable of the following functionalities:

It has a dnn_superres named module in the opencv_contrib. The main functionality is reading in the SR models, and running
inference on it. It supports single and multi output inference. Multi output is only for the LapSRN right now, as that is the model
capable of multilevel SR in one run.
It can also run inference on video files, and save the result.
It has benchmarking functionality. The user can compare the dnn based models with bicubic, nearest neighbor, and
lanczos interpolation methods.
The module has fully working sample codes, for running single and multi output inference, video inference, and benchmarking.
It has working unit tests, tutorials, and documentation.

The summer in details

First phase (May 27 - June 28):

Implementing the ESPCN model in TensorFlow.
Integrate ESPCN support into the dnn_superres module (which was initialized by Xavier).
Do sample codes, and basic unit test.

Second phase (June 28 - July 26):

Implementing the LapSRN model in TensorFlow.
Add the support for LapSRN into the dnn_superres module, and add the multi output functionality.
We worked closely with Xavier to refine the dnn_superres modules structure.
Creating unit test for multi output inference.
Added new sample codes.

Final phase (July 26 - August 19):

Add video inference functionality.
Implementing benchmarking capability.
Add more unit tests.
Create the tutorials, and add new sample codes.
Wrapping up the work with Xavier.

Future work

The summer is over and we offer a wide variety of SR algorithm for users, but there are still new functionalities that
can be added to the module.
One disadvantage of these models, that they create so smooth image, that it lacks details. Other, classical methods also
suffer from this problem, but there are new models that can overcome it. In the future, I intend to integrate the SRGAN model,
which is a generative model, thus capable of "imagining" details into the upsampled image.
Another nice addition would be to add a better classic computer vision algorithm, such as A+.
The work in numbers and pictures

Benchmark on the General100 dataset


2x scaling factor


Avg inference time in sec (CPU)
Avg PSNR
Avg SSIM


ESPCN
0.008795
32.7059
0.9276


EDSR
5.923450
34.1300
0.9447


FSRCNN
0.021741
32.8886
0.9301


LapSRN
0.114812
32.2681
0.9248


Bicubic
0.000208
32.1638
0.9305


Nearest neighbor
0.000114
29.1665
0.9049


Lanczos
0.001094
32.4687
0.9327


4x scaling factor


Avg inference time in sec (CPU)
Avg PSNR
Avg SSIM


ESPCN
0.004311
26.6870
0.7891


EDSR
1.607570
28.1552
0.8317


FSRCNN
0.005302
26.6088
0.7863


LapSRN
0.121229
26.7383
0.7896


Bicubic
0.000311
26.0635
0.8754


Nearest neighbor
0.000148
23.5628
0.8174


Lanczos
0.001012
25.9115
0.8706


2x scaling factor


Set5: butterfly.png
size: 256x256


Original
Bicubic
Nearest neighbor
Lanczos


PSRN / SSIM / Speed (CPU)
26.6645 / 0.9048 / 0.000201
23.6854 / 0.8698 / 0.000075
26.9476 / 0.9075 / 0.001039


ESPCN
FSRCNN
LapSRN
EDSR


29.0341 / 0.9354 / 0.004157
29.0077 / 0.9345 / 0.006325
27.8212 / 0.9230 / 0.037937
30.0347 / 0.9453 / 2.077280


4x scaling factor


Set14: comic.png
size: 250x361


Original
Bicubic
Nearest neighbor
Lanczos


PSRN / SSIM / Speed (CPU)
19.6766 / 0.6413 / 0.000262
18.5106 / 0.5879 / 0.000085
19.4948 / 0.6317 / 0.001098


ESPCN
FSRCNN
LapSRN
EDSR


20.0417 / 0.6302 / 0.001894
20.0885 / 0.6384 / 0.002103
20.0676 / 0.6339 / 0.061640
20.5233 / 0.6901 / 0.665876


8x scaling factor


Div2K: 0006.png
size: 1356x2040


Original
Bicubic
Nearest neighbor


PSRN / SSIM / Speed (CPU)
26.3139 / 0.8033 / 0.001107
23.8291 / 0.7340 / 0.000611


Lanczos
LapSRN


26.1565 / 0.7962 / 0.004782
26.7046 / 0.7987 / 2.274290


References

[1] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A., Bishop, R., Rueckert, D. and Wang, Z.,
"Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network",
Proceedings of the IEEE conference on computer vision and pattern recognition CVPR 2016.
[PDF]
[arxiv]

[2] Lai, W. S., Huang, J. B., Ahuja, N., and Yang, M. H., "Deep laplacian pyramid networks for fast and accurate
super-resolution", In Proceedings of the IEEE conference on computer vision and pattern recognition CVPR 2017.
[PDF]
[arxiv]
[Project Page]

[3] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee, "Enhanced Deep Residual Networks for
Single Image Super-Resolution", 2nd NTIRE: New Trends in Image Restoration and Enhancement workshop and challenge
on image super-resolution in conjunction with CVPR 2017.

[PDF]
[4] Chao Dong, Chen Change Loy, Xiaoou Tang. "Accelerating the Super-Resolution Convolutional Neural Network",
in Proceedings of European Conference on Computer Vision ECCV 2016.
[PDF]
	Avg inference time in sec (CPU)	Avg PSNR	Avg SSIM
ESPCN	0.008795	32.7059	0.9276
EDSR	5.923450	34.1300	0.9447
FSRCNN	0.021741	32.8886	0.9301
LapSRN	0.114812	32.2681	0.9248
Bicubic	0.000208	32.1638	0.9305
Nearest neighbor	0.000114	29.1665	0.9049
Lanczos	0.001094	32.4687	0.9327
	Avg inference time in sec (CPU)	Avg PSNR	Avg SSIM
ESPCN	0.004311	26.6870	0.7891
EDSR	1.607570	28.1552	0.8317
FSRCNN	0.005302	26.6088	0.7863
LapSRN	0.121229	26.7383	0.7896
Bicubic	0.000311	26.0635	0.8754
Nearest neighbor	0.000148	23.5628	0.8174
Lanczos	0.001012	25.9115	0.8706
Set5: butterfly.png	size: 256x256
Original	Bicubic	Nearest neighbor	Lanczos

PSRN / SSIM / Speed (CPU)	26.6645 / 0.9048 / 0.000201	23.6854 / 0.8698 / 0.000075	26.9476 / 0.9075 / 0.001039
ESPCN	FSRCNN	LapSRN	EDSR

29.0341 / 0.9354 / 0.004157	29.0077 / 0.9345 / 0.006325	27.8212 / 0.9230 / 0.037937	30.0347 / 0.9453 / 2.077280