zihaomu/gsoc20_dnn_digit_text_recognition.md

## gsoc20_dnn_digit_text_recognition.md

      
    Raw
  

              gsoc20_dnn_digit_text_recognition.md
            
          
    Revise/improve Text & Digit Recognition Samples

Student: Zihao Mu
Mentor: Vladimir Tyan
Link to accomplished work:

Merged PR: opencv/pull/17675
Multiple text recognition models for OpenCV DNN:  Shared model link
Train your own text recognition model for OpenCV : deep-text-recognition-benchmark
Detailed Tutorial: OpenCV OCR Tuorial

Introduction

Hi, I'm Zihao Mu! I was the developer of openCV GSoC2020. The goal of this project is to improve text & digit recognition samples in OpenCV. In this deep learning era,  we can implement some more efficient text recognition methods. The project mainly consists of two parts:


First Part
Digital Recognition through live camera:
Digital Detector: Connected Component Analysis
Digital Recognizer: LeNet-5 pre-trained on MINST dataset.


Second Part:
Text Recognition through live camera:
Digital Detector: EAST
Digital Recognizer: Multiple text recognition models based on deep learning


My Journey

First Period

Implement opencv/sample/cpp/digits_lenet.cpp base on Connected Component Analysis and LeNet-5. Finding stable preprocessing methods, and implementing ROI of digital rotation prediction.
Second Period

Implement opencv/sample/dnn/text_detection.cpp and opencv/sample/dnn/text_detection.py, let it not only detect text, but recognize text. Based this Github Project, multiple text recognition models have been trained and can be correctly called by the OpenCV DNN module.
Third Period

Provide a Detailed Tutorial, including how to train your own text recognition model, and how to convert the model to be called by OpenCV DNN.
Benchmarks for text recognition models

Their performance at different text recognition datasets is shown in the table below:


Model name
IIIT5k(%)
SVT(%)
ICDAR03(%)
ICDAR13(%)
ICDAR15(%)
SVTP(%)
CUTE80(%)
average acc (%)
parameter( x10^6 )


DenseNet-CTC
72.267
67.39
82.81
80
48.38
49.45
42.50
63.26
0.24


DenseNet-BiLSTM-CTC
73.76
72.33
86.15
83.15
50.67
57.984
49.826
67.69
3.63


VGG-CTC
75.96
75.42
85.92
83.54
54.89
57.52
50.17
69.06
5.57


CRNN_VGG-BiLSTM-CTC
82.63
82.07
92.96
88.867
66.28
71.01
62.37
78.03
8.45


ResNet-CTC
84.00
84.08
92.39
88.96
67.74
74.73
67.60
79.93
44.28


The performance of the text recognition model were tesred on OpenCV DNN, and does not include the text detection model.
Results

Computer Enviroment

HW:

CPU: i5-8300
RAM: 16GB
GPU: 1050 4GB
SW:

Ubuntu 18.01
OpenCV 4.4
CUDA 10.0
The demo video can be found here.

Digital Recognition


Text Recognition


References

[1]Scene Text Detection and Recognition: The Deep Learning Era
[2]]http://cs-chan.com/doc/ICDAR17.pdf
[3]https://github.com/hwalsuklee/awesome-deep-text-detection-recognition
[4]https://arxiv.org/abs/1704.03155v2 (EAST)
[5]https://github.com/chineseocr/darknet-ocr (CTPN no BiLSTM)
[6]https://github.com/senlinuc/caffe_ocr (Densenet + BiLSTM and Densnet no BiLSTM)
[7]https://github.com/huoyijie/AdvancedEAST (EAST Advanced)
[8]https://github.com/meijieru/crnn.pytorch (CRNN)
Model name	IIIT5k(%)	SVT(%)	ICDAR03(%)	ICDAR13(%)	ICDAR15(%)	SVTP(%)	CUTE80(%)	average acc (%)	parameter( x10^6 )
DenseNet-CTC	72.267	67.39	82.81	80	48.38	49.45	42.50	63.26	0.24
DenseNet-BiLSTM-CTC	73.76	72.33	86.15	83.15	50.67	57.984	49.826	67.69	3.63
VGG-CTC	75.96	75.42	85.92	83.54	54.89	57.52	50.17	69.06	5.57
CRNN_VGG-BiLSTM-CTC	82.63	82.07	92.96	88.867	66.28	71.01	62.37	78.03	8.45
ResNet-CTC	84.00	84.08	92.39	88.96	67.74	74.73	67.60	79.93	44.28