Skip to content

Instantly share code, notes, and snippets.

@sghoshcvc sghoshcvc/
Last active Feb 16, 2019

What would you like to do?
GSOC-2017-End to End text detection and recognition

GSOC-2017-End to End text detection and recognition

The proposal of the project was to implement an end-to-end text recognition pipeline. In order to achieve that we extend the existing holistic text recognizer implemented last year and implement a deep text detector, which can filter the image and pass only text bounding boxes to the recognizer. Thus a complete deep pipeline for end-to-end text recognition can be achieved. This will also facilitate to implement an end to end pipeline of combined model that can predict bounding boxes over an image and also provide transcriptions for each bounding box. The proposed text detector implements the algorithm of following article

Liao, Minghui; Shi, Baoguang; Bai, Xiang; Wang, Xinggang; Liu, WenyuTextBoxes: A Fast Text Detector with a Single Deep Neural Network in Arxiv

Achieved target

  • The existing text recognition module uses caffe as a backend for deep neural network forward pass calculation. Another backend based on opencv-dnn has been added. This will remove the dependency on caffe.

  • A deep text detction pipeline is added, with two backend support (caffe and opencv-dnn)

  • opencv-dnn module is modified to suit the requirements of deep text detection model.

Incomplete tiny-dnn attempt

Apart from the caffe and opencv-dnn backend, we also attempted to implement the same network using tiny-dnn and opencv-modern module.

  • However there is a discrepency between caffe and tiny-dnn in how max pooling are handled.

  • Due to this, the weight sizes does not match and the model could not be made using tiny-dnn

  • A simple soultion to the above is to do a zero padding, in the layer which produces even size receptive field.

Links to contributions

pull requests

How to Build


The text module now have a text detection and recognition using deep CNN. The text detector deep CNN that takes an image which may contain multiple words. This outputs a list of Rects with bounding boxes and probability of text there. The text recognizer provides a probabillity over a given vocabulary for each of these rects.

Two backends are supported 1) caffe 2) opencv-dnn

Instalation of Caffe backend

  • Please note a custom caffe based on SSD branch is required, the link of the custom caffe is provided below The caffe wrapping backend has the same requirements as caffe.
  • Caffe can be built against OpenCV, thus if the caffe backend is enabled a circular dependency arises. The simplest solution to avoid circular dependency is to build caffe without support for OpenCV.
  • All OS supported by Caffe are also supported by the backend. The scripts describing the module have been developed in ubuntu 16.04 and assume such a system. Other UNIX systems including OSX should be easy to adapt.

Sample script for building Caffe

mkdir -p "$SRCROOT"
git clone
cd TextBoxes
cat Makefile.config.example  > Makefile.config
echo 'USE_OPENCV := 0' >> Makefile.config
echo 'INCLUDE_DIRS += /usr/include/hdf5/serial/' >> Makefile.config
echo 'LIBRARY_DIRS += /usr/lib/x86_64-linux-gnu/hdf5/serial/' >> Makefile.config

echo "--- /tmp/caffe/include/caffe/net.hpp	2017-05-28 04:55:47.929623902 +0200
+++ caffe/distribute/include/caffe/net.hpp	2017-05-28 04:51:33.437090768 +0200
@@ -234,6 +234,7 @@

     template <typename T>
     friend class Net;
+    virtual ~Callback(){}
   const vector<Callback*>& before_forward() const { return before_forward_; }
   void add_before_forward(Callback* value) {

patch < /tmp/cleanup_caffe.diff

make -j 6

make pycaffe

make distribute
cd $OPENCV_BUILD_DIR #You must set this
CAFFEROOT="${HOME}/caffe_inst/" #If you used the previous code to compile Caffe in ubuntu 16.04

cmake  -DCaffe_LIBS:FILEPATH="$CAFFEROOT/caffe/distribute/lib/" -DBUILD_opencv_ts:BOOL="0" -DBUILD_opencv_dnn:BOOL="0" -DBUILD_opencv_dnn_modern:BOOL="0" -DCaffe_INCLUDE_DIR:PATH="$CAFFEROOT/caffe/distribute/include" -DWITH_MATLAB:BOOL="0" -DBUILD_opencv_cudabgsegm:BOOL="0"  -DWITH_QT:BOOL="1" -DBUILD_opencv_cudaoptflow:BOOL="0" -DBUILD_opencv_cudastereo:BOOL="0" -DBUILD_opencv_cudafilters:BOOL="0" -DBUILD_opencv_cudev:BOOL="1" -DOPENCV_EXTRA_MODULES_PATH:PATH="$OPENCV_CONTRIB/modules"   ./

Instalation of opencv-dnn backend

Use of opencv-dnn does not need any additional library.

The recent opencv-3.3.0 needs to be build with extra modules to use text module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.