Student: Fanny Monori
Mentor: Vladimir Tyan
Other student on the project: Xavier Weber
Other mentor on the project: Yida Wang
Link to accomplished work:
- PR in the opencv_contrib repository: opencv/opencv_contrib
- ESPCN implementation in TensorFlow: TF-ESPCN
- LapSRN implementation in TensorFlow: TF-LapSRN
The goal of the project was to integrate super resolution (SR) functionality within the OpenCV contrib, that is capable of
doing SR with deep learning models.
Super Resolution (SR) is a subset of algorithms that aim to up-sample a lower quality (LR) image to a higher quality (HR) one.
It’s goal is to create an up-sampled copy that is as detailed and visually pleasing as possible. It is used in a wide range of fields,
such as medical image processing or surveillance camera stream processing.
The most basic form of SR uses interpolation algorithms, such as bicubic interpolation. There are more sophisticated
classic computer vision algorithms as well. For example, A+ (Adjusted Anchored Neighborhood Regression for Fast Super-Resolution)
is a well known one. But using deep learning models for super resolution is also a widely researched area,
as they generally achieve better accuracy. There are many popular type of models used, ranging from supervised learning
to unsupervised learning methods. Supervised methods are learning with LR and HR pairs, and they make use of popular deep
learning architectures, such as convolutional neural networks, or residual networks.
For the Google Summer of Code 2019 project I implemented two deep learning based models, and the other student, Xavier,
implemented another two. One of mine is ESPCN [1], which was one of the first convolutional neural network based model.
It is tiny, and the fastest amongst the four implemented. The other one I implemented is LapSRN [2], which is a model that
has a Laplacian pyramid structure, and it can output multiple scales in one inference run. Xavier implemented EDSR [3], and FSRCNN [4],
one really accurate, and one fast model.
The resulting module is capable of the following functionalities:
- It has a dnn_superres named module in the opencv_contrib. The main functionality is reading in the SR models, and running inference on it. It supports single and multi output inference. Multi output is only for the LapSRN right now, as that is the model capable of multilevel SR in one run.
- It can also run inference on video files, and save the result.
- It has benchmarking functionality. The user can compare the dnn based models with bicubic, nearest neighbor, and lanczos interpolation methods.
- The module has fully working sample codes, for running single and multi output inference, video inference, and benchmarking.
- It has working unit tests, tutorials, and documentation.
First phase (May 27 - June 28):
- Implementing the ESPCN model in TensorFlow.
- Integrate ESPCN support into the dnn_superres module (which was initialized by Xavier).
- Do sample codes, and basic unit test.
Second phase (June 28 - July 26):
- Implementing the LapSRN model in TensorFlow.
- Add the support for LapSRN into the dnn_superres module, and add the multi output functionality.
- We worked closely with Xavier to refine the dnn_superres modules structure.
- Creating unit test for multi output inference.
- Added new sample codes.
Final phase (July 26 - August 19):
- Add video inference functionality.
- Implementing benchmarking capability.
- Add more unit tests.
- Create the tutorials, and add new sample codes.
- Wrapping up the work with Xavier.
The summer is over and we offer a wide variety of SR algorithm for users, but there are still new functionalities that can be added to the module. One disadvantage of these models, that they create so smooth image, that it lacks details. Other, classical methods also suffer from this problem, but there are new models that can overcome it. In the future, I intend to integrate the SRGAN model, which is a generative model, thus capable of "imagining" details into the upsampled image. Another nice addition would be to add a better classic computer vision algorithm, such as A+.
Avg inference time in sec (CPU) | Avg PSNR | Avg SSIM | |
---|---|---|---|
ESPCN | 0.008795 | 32.7059 | 0.9276 |
EDSR | 5.923450 | 34.1300 | 0.9447 |
FSRCNN | 0.021741 | 32.8886 | 0.9301 |
LapSRN | 0.114812 | 32.2681 | 0.9248 |
Bicubic | 0.000208 | 32.1638 | 0.9305 |
Nearest neighbor | 0.000114 | 29.1665 | 0.9049 |
Lanczos | 0.001094 | 32.4687 | 0.9327 |
Avg inference time in sec (CPU) | Avg PSNR | Avg SSIM | |
---|---|---|---|
ESPCN | 0.004311 | 26.6870 | 0.7891 |
EDSR | 1.607570 | 28.1552 | 0.8317 |
FSRCNN | 0.005302 | 26.6088 | 0.7863 |
LapSRN | 0.121229 | 26.7383 | 0.7896 |
Bicubic | 0.000311 | 26.0635 | 0.8754 |
Nearest neighbor | 0.000148 | 23.5628 | 0.8174 |
Lanczos | 0.001012 | 25.9115 | 0.8706 |
[1] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A., Bishop, R., Rueckert, D. and Wang, Z.,
"Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network",
Proceedings of the IEEE conference on computer vision and pattern recognition CVPR 2016.
[PDF]
[arxiv]
[2] Lai, W. S., Huang, J. B., Ahuja, N., and Yang, M. H., "Deep laplacian pyramid networks for fast and accurate
super-resolution", In Proceedings of the IEEE conference on computer vision and pattern recognition CVPR 2017.
[PDF]
[arxiv]
[Project Page]
[3] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee, "Enhanced Deep Residual Networks for
Single Image Super-Resolution", 2nd NTIRE: New Trends in Image Restoration and Enhancement workshop and challenge
on image super-resolution in conjunction with CVPR 2017.
[PDF]
[4] Chao Dong, Chen Change Loy, Xiaoou Tang. "Accelerating the Super-Resolution Convolutional Neural Network",
in Proceedings of European Conference on Computer Vision ECCV 2016.
[PDF]
Is there any option to run the method on GPU?