Student: Tong Zhang
Mentor: Zihao Mu, Shiqi yu
Link to accomplished work:
- ppocr-v2 detect and ppocr-v2 detect fp16 support: opencv_zoo/pull/66
- DB fp16 support: opencv_zoo/pull/92
- Text detection metrics: opencv_zoo/pull/73
- Fix DB bug:opencv_zoo/pull/58
- Add DB demo to frontpage:opencv_zoo/pull/64
- Support PAN:google drive
- ppocr-v2 detect API:opencv/pull/22500
Hi, I'm Tong Zhang! I am the developer of OpenCV GSoC2022. The goal of this project is to provide lightweight text detection model for OpenCV. My work has four parts:support ppocr-v2 detect , support fp16 of ppocr-v2 and DB,add 3 metrics of text detection in benchmark,add front documents and fix bugs.
- Add pp-detect model and write benchmark and document
- Add 3 metrics of text detection(AP,recall,precision,hmean).
- Add DB demo to frontpage.
- Fix demo.py of DB.
- Add PAN net (more accuracy but need cpp file to support post-processing ).
- Add fp16 version of ppocr-v2 detect and DB.
- Implement high level api of ppocr-v2 detect in OpenCV.
- Implement 4 metrics in benchmark of opencv zoo.
- Writing summary and make videos.
Their performance at different text recognition datasets is shown in the table below:
Model name | precision | recall | hmean | AP | time per photo(s) | size(MB) |
---|---|---|---|---|---|---|
DB | 0.731 | 0.350 | 0.472 | 0.256 | 0.080 | 48.8 |
ppocr-v2 detect | 0.681 | 0.350 | 0.462 | 0.285 | 0.0003 | 2.3 |
The performance of the text detection models are tesred on OpenCV Zoo.
DB:https://drive.google.com/file/d/1fH2B8yKpL_4K1XlF1OMrkEwggWxQz2kW/view?usp=sharing
ppocr-v2detect:https://drive.google.com/file/d/1kZCTU3VEoHaO5Td-QCMMCKZnuedPHQ2e/view?usp=sharing
add metrics:https://drive.google.com/file/d/1-T3KUXAx4JFlxJY8JXi2hfZhyrkkMI02/view?usp=sharing
Obviously,in real time detection,ppdetect works much better than DB.I also obeserve a fact that DB's high level api may make the program crash when there is a transparrent bottle in front of camera.I am still working on modify the high level api.
CPU: i7-8750 RAM: 16GB GPU: GeForce GTX 1060 Mobile
Ubuntu 20.04 OpenCV 4.6
[1] https://arxiv.org/abs/1911.08947
[2] https://github.com/PaddlePaddle/PaddleOCR