saurabhshah0410/Work_Product_Submission.md

## Work_Product_Submission.md

      
    Raw
  

              Work_Product_Submission.md
            
          
    Google Summer of Code 2018 @ CCExtractor

Student Name: Saurabh Kumar M Shah

GSoC Project Proposal: Improve the OCR Subsystem

Email: saurabhshah.0410@gmail.com

Mentor: Abhinav Shukla

Project Synopsis:

The goal of my project was to make the hard subtitle extraction user friendly by making the subsystem independent of arbitrary user input parameters like sub_color, conf_thresh, luminance, whiteness etc. This would also extend CCExtractor's usage to extract burned in subtitles from video files containing multi color captions. The whole idea was to implement Neumann Mata's text detection algorithm which would meet the above objectives and also work with a reasonable time complexity and memory requirements.
Link to commits:

All my commits to the mainstream master branch can be seen here.
My patches and contributions:

All of my work related to GSoC project can be viewed here.
Compilation:

The compilation instructions will remain the same as before:
make ENABLE_HARDSUBX=yes ENABLE_OCR=yes
This command needs to be run from ccextractor/linux directory.
Usage:

The commands for this are not going to change much except that the user will now have to specify only the input video and other optional parameters whose description is given below. The -hardsubx flag needs to be specified to the ccextractor executable in order to enable burned-in subtitle extraction.

-ocr_mode : Set the OCR mode to either word-wise or textline-wise. e.g. -ocr_mode word
-min_sub_duration:  Specify the minimum duration(seconds) that a subtitle line must exist on the screen. Lower values give better timed results, but increase processing time. The default value is 0.5. e.g. -min_sub_duration 1.0(for duration of 1 sec)

List of new files:


Mat.c, Mat.h: contains initializers and other basic operators for the basic struct Mat
math.h: handles all the basic mathematical operations on Points, Rectangles, sequences etc.
erfilter.c, erfilter.h: consists of the functions required for the extraction of the text containing extremal regions.
color.c, color.h: converts image from RGB type to HSV, LAB and GRAY formats
floodfill.c, contours.c: contains functions to identify the contours around the text regions
types.h, storage.c, MemStorage.c: general functions to optimize memory requirements
trained_classifierNM1.xml, trained_classifierNM2.xml: trained classifiers for identifying character regions in the image

What have I learnt?

Last 3 to 4 months that I have worked on this project with CCExtractor have been a huge boost to my coding & communication skills. This project gave me a great opportunity to learn about the traditional as well as state of the art methods of text processing. Also, I'm very familiar with the source code, API and usage of opencv because I had to read and understand the functions of opencv whose text module contains the same algorithm which I had proposed in my proposal. During the course of this project, I have also became comfortable with C++ after working on this project which has also helped me a lot in my campus interviews. Overall, this project was fun and a good learning experience for me.
Future Contributions:

I loved the working environment of CCExtractor and I would keep contributing in the future too on my own time. There is much scope of improvement on the code that I've implemented and I'll keep improving and updating it. I'll try to use more robust trained models and boost the accuracy and quality of the extracted subtitles. I'll also try implementing a CNN based approach and somehow make it work on an average computer.