Skip to content

Instantly share code, notes, and snippets.

@the-star-sea
Last active December 29, 2022 22:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save the-star-sea/27fbdb0a6e93e1ba1735c4c1e9740aa2 to your computer and use it in GitHub Desktop.
Save the-star-sea/27fbdb0a6e93e1ba1735c4c1e9740aa2 to your computer and use it in GitHub Desktop.
Google Summer of Code 2022 with OpenCV Zoo Support lightweight text detection models

Lightweight Text Detection Models

Student: Tong Zhang

Mentor: Zihao Mu, Shiqi yu

Link to accomplished work:

Introduction

Hi, I'm Tong Zhang! I am the developer of OpenCV GSoC2022. The goal of this project is to provide lightweight text detection model for OpenCV. My work has four parts:support ppocr-v2 detect , support fp16 of ppocr-v2 and DB,add 3 metrics of text detection in benchmark,add front documents and fix bugs.

Timeline

First Period

  • Add pp-detect model and write benchmark and document
  • Add 3 metrics of text detection(AP,recall,precision,hmean).
  • Add DB demo to frontpage.
  • Fix demo.py of DB.
  • Add PAN net (more accuracy but need cpp file to support post-processing ).

Second Period

  • Add fp16 version of ppocr-v2 detect and DB.
  • Implement high level api of ppocr-v2 detect in OpenCV.
  • Implement 4 metrics in benchmark of opencv zoo.
  • Writing summary and make videos.

Benchmarks for text detection models

Their performance at different text recognition datasets is shown in the table below:

Model name precision recall hmean AP time per photo(s) size(MB)
DB 0.731 0.350 0.472 0.256 0.080 48.8
ppocr-v2 detect 0.681 0.350 0.462 0.285 0.0003 2.3

The performance of the text detection models are tesred on OpenCV Zoo.

Results

DB:https://drive.google.com/file/d/1fH2B8yKpL_4K1XlF1OMrkEwggWxQz2kW/view?usp=sharing

ppocr-v2detect:https://drive.google.com/file/d/1kZCTU3VEoHaO5Td-QCMMCKZnuedPHQ2e/view?usp=sharing

add metrics:https://drive.google.com/file/d/1-T3KUXAx4JFlxJY8JXi2hfZhyrkkMI02/view?usp=sharing

Obviously,in real time detection,ppdetect works much better than DB.I also obeserve a fact that DB's high level api may make the program crash when there is a transparrent bottle in front of camera.I am still working on modify the high level api.

Computer Enviroment

Hardware:

CPU: i7-8750 RAM: 16GB GPU: GeForce GTX 1060 Mobile

Software:

Ubuntu 20.04 OpenCV 4.6

Text detection

image.png cola.jpg

References

[1] https://arxiv.org/abs/1911.08947

[2] https://github.com/PaddlePaddle/PaddleOCR

[3] https://github.com/BADBADBADBOY/DBnet-lite.pytorch

[4] https://rrc.cvc.uab.es/?ch=2&com=mymethods&task=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment