Skip to content

Instantly share code, notes, and snippets.

@fannymonori
Last active December 28, 2022 17:23
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save fannymonori/9d012e43b90e51666070b7a8a0454a5b to your computer and use it in GitHub Desktop.
Save fannymonori/9d012e43b90e51666070b7a8a0454a5b to your computer and use it in GitHub Desktop.
Google Summer of Code 2019 with OpenCV - Fanny Monori

Google Summer of Code 2019 with OpenCV

DNN based super-resolution module

Student: Fanny Monori
Mentor: Vladimir Tyan
Other student on the project: Xavier Weber
Other mentor on the project: Yida Wang

Link to accomplished work:

Goals

The goal of the project was to integrate super resolution (SR) functionality within the OpenCV contrib, that is capable of doing SR with deep learning models.
Super Resolution (SR) is a subset of algorithms that aim to up-sample a lower quality (LR) image to a higher quality (HR) one. It’s goal is to create an up-sampled copy that is as detailed and visually pleasing as possible. It is used in a wide range of fields, such as medical image processing or surveillance camera stream processing.
The most basic form of SR uses interpolation algorithms, such as bicubic interpolation. There are more sophisticated classic computer vision algorithms as well. For example, A+ (Adjusted Anchored Neighborhood Regression for Fast Super-Resolution) is a well known one. But using deep learning models for super resolution is also a widely researched area, as they generally achieve better accuracy. There are many popular type of models used, ranging from supervised learning to unsupervised learning methods. Supervised methods are learning with LR and HR pairs, and they make use of popular deep learning architectures, such as convolutional neural networks, or residual networks.
For the Google Summer of Code 2019 project I implemented two deep learning based models, and the other student, Xavier, implemented another two. One of mine is ESPCN [1], which was one of the first convolutional neural network based model. It is tiny, and the fastest amongst the four implemented. The other one I implemented is LapSRN [2], which is a model that has a Laplacian pyramid structure, and it can output multiple scales in one inference run. Xavier implemented EDSR [3], and FSRCNN [4], one really accurate, and one fast model.

Accomplished work

The resulting module is capable of the following functionalities:

  • It has a dnn_superres named module in the opencv_contrib. The main functionality is reading in the SR models, and running inference on it. It supports single and multi output inference. Multi output is only for the LapSRN right now, as that is the model capable of multilevel SR in one run.
  • It can also run inference on video files, and save the result.
  • It has benchmarking functionality. The user can compare the dnn based models with bicubic, nearest neighbor, and lanczos interpolation methods.
  • The module has fully working sample codes, for running single and multi output inference, video inference, and benchmarking.
  • It has working unit tests, tutorials, and documentation.

The summer in details

First phase (May 27 - June 28):

  • Implementing the ESPCN model in TensorFlow.
  • Integrate ESPCN support into the dnn_superres module (which was initialized by Xavier).
  • Do sample codes, and basic unit test.

Second phase (June 28 - July 26):

  • Implementing the LapSRN model in TensorFlow.
  • Add the support for LapSRN into the dnn_superres module, and add the multi output functionality.
  • We worked closely with Xavier to refine the dnn_superres modules structure.
  • Creating unit test for multi output inference.
  • Added new sample codes.

Final phase (July 26 - August 19):

  • Add video inference functionality.
  • Implementing benchmarking capability.
  • Add more unit tests.
  • Create the tutorials, and add new sample codes.
  • Wrapping up the work with Xavier.

Future work

The summer is over and we offer a wide variety of SR algorithm for users, but there are still new functionalities that can be added to the module. One disadvantage of these models, that they create so smooth image, that it lacks details. Other, classical methods also suffer from this problem, but there are new models that can overcome it. In the future, I intend to integrate the SRGAN model, which is a generative model, thus capable of "imagining" details into the upsampled image. Another nice addition would be to add a better classic computer vision algorithm, such as A+.

The work in numbers and pictures

Benchmark on the General100 dataset

2x scaling factor
Avg inference time in sec (CPU) Avg PSNR Avg SSIM
ESPCN 0.008795 32.7059 0.9276
EDSR 5.923450 34.1300 0.9447
FSRCNN 0.021741 32.8886 0.9301
LapSRN 0.114812 32.2681 0.9248
Bicubic 0.000208 32.1638 0.9305
Nearest neighbor 0.000114 29.1665 0.9049
Lanczos 0.001094 32.4687 0.9327
4x scaling factor
Avg inference time in sec (CPU) Avg PSNR Avg SSIM
ESPCN 0.004311 26.6870 0.7891
EDSR 1.607570 28.1552 0.8317
FSRCNN 0.005302 26.6088 0.7863
LapSRN 0.121229 26.7383 0.7896
Bicubic 0.000311 26.0635 0.8754
Nearest neighbor 0.000148 23.5628 0.8174
Lanczos 0.001012 25.9115 0.8706

2x scaling factor

Set5: butterfly.png size: 256x256
Original Bicubic Nearest neighbor Lanczos
Original Bicubic interpolation Nearest neighbor interpolation Lanczos interpolation
PSRN / SSIM / Speed (CPU) 26.6645 / 0.9048 / 0.000201 23.6854 / 0.8698 / 0.000075 26.9476 / 0.9075 / 0.001039
ESPCN FSRCNN LapSRN EDSR
ESPCN FSRCNN LapSRN EDSR
29.0341 / 0.9354 / 0.004157 29.0077 / 0.9345 / 0.006325 27.8212 / 0.9230 / 0.037937 30.0347 / 0.9453 / 2.077280

4x scaling factor

Set14: comic.png size: 250x361
Original Bicubic Nearest neighbor Lanczos
Original Bicubic interpolation Nearest neighbor interpolation Lanczos interpolation
PSRN / SSIM / Speed (CPU) 19.6766 / 0.6413 / 0.000262 18.5106 / 0.5879 / 0.000085 19.4948 / 0.6317 / 0.001098
ESPCN FSRCNN LapSRN EDSR
ESPCN FSRCNN LapSRN EDSR
20.0417 / 0.6302 / 0.001894 20.0885 / 0.6384 / 0.002103 20.0676 / 0.6339 / 0.061640 20.5233 / 0.6901 / 0.665876

8x scaling factor

Div2K: 0006.png size: 1356x2040
Original Bicubic Nearest neighbor
Original Bicubic interpolation Nearest neighbor interpolation
PSRN / SSIM / Speed (CPU) 26.3139 / 0.8033 / 0.001107 23.8291 / 0.7340 / 0.000611
Lanczos LapSRN
Lanczos interpolation LapSRN
26.1565 / 0.7962 / 0.004782 26.7046 / 0.7987 / 2.274290

References

[1] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A., Bishop, R., Rueckert, D. and Wang, Z., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network", Proceedings of the IEEE conference on computer vision and pattern recognition CVPR 2016. [PDF] [arxiv]
[2] Lai, W. S., Huang, J. B., Ahuja, N., and Yang, M. H., "Deep laplacian pyramid networks for fast and accurate super-resolution", In Proceedings of the IEEE conference on computer vision and pattern recognition CVPR 2017. [PDF] [arxiv] [Project Page]
[3] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee, "Enhanced Deep Residual Networks for Single Image Super-Resolution", 2nd NTIRE: New Trends in Image Restoration and Enhancement workshop and challenge on image super-resolution in conjunction with CVPR 2017.
[PDF] [4] Chao Dong, Chen Change Loy, Xiaoou Tang. "Accelerating the Super-Resolution Convolutional Neural Network", in Proceedings of European Conference on Computer Vision ECCV 2016. [PDF]

@AghababyanSG
Copy link

Is there any option to run the method on GPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment