Skip to content

Instantly share code, notes, and snippets.

@zihaomu
Last active August 31, 2020 01:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zihaomu/d010417865ddeac71b4d39a526dc7d2d to your computer and use it in GitHub Desktop.
Save zihaomu/d010417865ddeac71b4d39a526dc7d2d to your computer and use it in GitHub Desktop.
Google Summer of Code 2020 with OpenCV Add digit and text recognition

Revise/improve Text & Digit Recognition Samples

Student: Zihao Mu

Mentor: Vladimir Tyan

Link to accomplished work:

Introduction

Hi, I'm Zihao Mu! I was the developer of openCV GSoC2020. The goal of this project is to improve text & digit recognition samples in OpenCV. In this deep learning era, we can implement some more efficient text recognition methods. The project mainly consists of two parts:

  1. First Part Digital Recognition through live camera: Digital Detector: Connected Component Analysis Digital Recognizer: LeNet-5 pre-trained on MINST dataset.

  2. Second Part: Text Recognition through live camera: Digital Detector: EAST Digital Recognizer: Multiple text recognition models based on deep learning

My Journey

First Period

Implement opencv/sample/cpp/digits_lenet.cpp base on Connected Component Analysis and LeNet-5. Finding stable preprocessing methods, and implementing ROI of digital rotation prediction.

Second Period

Implement opencv/sample/dnn/text_detection.cpp and opencv/sample/dnn/text_detection.py, let it not only detect text, but recognize text. Based this Github Project, multiple text recognition models have been trained and can be correctly called by the OpenCV DNN module.

Third Period

Provide a Detailed Tutorial, including how to train your own text recognition model, and how to convert the model to be called by OpenCV DNN.

Benchmarks for text recognition models

Their performance at different text recognition datasets is shown in the table below:

Model name IIIT5k(%) SVT(%) ICDAR03(%) ICDAR13(%) ICDAR15(%) SVTP(%) CUTE80(%) average acc (%) parameter( x10^6 )
DenseNet-CTC 72.267 67.39 82.81 80 48.38 49.45 42.50 63.26 0.24
DenseNet-BiLSTM-CTC 73.76 72.33 86.15 83.15 50.67 57.984 49.826 67.69 3.63
VGG-CTC 75.96 75.42 85.92 83.54 54.89 57.52 50.17 69.06 5.57
CRNN_VGG-BiLSTM-CTC 82.63 82.07 92.96 88.867 66.28 71.01 62.37 78.03 8.45
ResNet-CTC 84.00 84.08 92.39 88.96 67.74 74.73 67.60 79.93 44.28

The performance of the text recognition model were tesred on OpenCV DNN, and does not include the text detection model.

Results

Computer Enviroment

HW:

CPU: i5-8300 RAM: 16GB GPU: 1050 4GB

SW:

Ubuntu 18.01 OpenCV 4.4 CUDA 10.0

The demo video can be found here.

Digital Recognition

digit

Text Recognition

densenet

References

[1]Scene Text Detection and Recognition: The Deep Learning Era

[2]]http://cs-chan.com/doc/ICDAR17.pdf

[3]https://github.com/hwalsuklee/awesome-deep-text-detection-recognition

[4]https://arxiv.org/abs/1704.03155v2 (EAST)

[5]https://github.com/chineseocr/darknet-ocr (CTPN no BiLSTM)

[6]https://github.com/senlinuc/caffe_ocr (Densenet + BiLSTM and Densnet no BiLSTM)

[7]https://github.com/huoyijie/AdvancedEAST (EAST Advanced)

[8]https://github.com/meijieru/crnn.pytorch (CRNN)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment