the-star-sea/gsoc2022_text_detection.md

## gsoc2022_text_detection.md

      
    Raw
  

              gsoc2022_text_detection.md
            
          
    Lightweight Text Detection Models

Student: Tong Zhang
Mentor: Zihao Mu, Shiqi yu
Link to accomplished work:

ppocr-v2 detect and ppocr-v2 detect fp16 support: opencv_zoo/pull/66
DB fp16 support:  opencv_zoo/pull/92
Text detection metrics: opencv_zoo/pull/73
Fix DB bug:opencv_zoo/pull/58
Add DB demo to frontpage:opencv_zoo/pull/64
Support PAN：google drive
ppocr-v2 detect API:opencv/pull/22500

Introduction

Hi, I'm Tong Zhang! I am the developer of OpenCV GSoC2022. The goal of this project is to provide lightweight text detection model for OpenCV. My work has four parts:support ppocr-v2 detect , support fp16 of ppocr-v2 and DB,add 3 metrics of text detection in benchmark,add front documents and fix bugs.
Timeline

First Period


Add pp-detect model and write benchmark and document
Add 3 metrics of text detection(AP,recall,precision,hmean).
Add DB demo to frontpage.
Fix demo.py of DB.
Add PAN net (more accuracy but need cpp file to support post-processing ).

Second Period


Add fp16 version of ppocr-v2 detect and DB.
Implement high level api of ppocr-v2 detect in OpenCV.
Implement 4 metrics in benchmark of opencv zoo.
Writing summary and make videos.

Benchmarks for text detection models

Their performance at different text recognition datasets is shown in the table below:


Model name
precision
recall
hmean
AP
time per photo(s)
size(MB)


DB
0.731
0.350
0.472
0.256
0.080
48.8


ppocr-v2 detect
0.681
0.350
0.462
0.285
0.0003
2.3


The performance of the text detection models are tesred on OpenCV Zoo.
Results

DB:https://drive.google.com/file/d/1fH2B8yKpL_4K1XlF1OMrkEwggWxQz2kW/view?usp=sharing
ppocr-v2detect:https://drive.google.com/file/d/1kZCTU3VEoHaO5Td-QCMMCKZnuedPHQ2e/view?usp=sharing
add metrics:https://drive.google.com/file/d/1-T3KUXAx4JFlxJY8JXi2hfZhyrkkMI02/view?usp=sharing
Obviously,in real time detection,ppdetect works much better than DB.I also obeserve a fact that DB's high level api may make the program crash when there is a transparrent bottle in front of camera.I am still working on modify the high level api.
Computer Enviroment

Hardware:

CPU: i7-8750
RAM: 16GB
GPU: GeForce GTX 1060 Mobile
Software:

Ubuntu 20.04
OpenCV 4.6
Text detection


References

[1] https://arxiv.org/abs/1911.08947
[2] https://github.com/PaddlePaddle/PaddleOCR
[3] https://github.com/BADBADBADBOY/DBnet-lite.pytorch
[4] https://rrc.cvc.uab.es/?ch=2&com=mymethods&task=1
Model name	precision	recall	hmean	AP	time per photo(s)	size(MB)
DB	0.731	0.350	0.472	0.256	0.080	48.8
ppocr-v2 detect	0.681	0.350	0.462	0.285	0.0003	2.3