Skip to content

Instantly share code, notes, and snippets.

@saurabhshah0410
saurabhshah0410 / Work_Product_Submission.md
Last active August 31, 2018 23:51
This file contains the overview of the work I've done for CCExtractor in summer 2018.

Google Summer of Code 2018 @ CCExtractor

Student Name: Saurabh Kumar M Shah

GSoC Project Proposal: Improve the OCR Subsystem

Mentor: Abhinav Shukla

Project Synopsis:

The goal of my project was to make the hard subtitle extraction user friendly by making the subsystem independent of arbitrary user input parameters like sub_color, conf_thresh, luminance, whiteness etc. This would also extend CCExtractor's usage to extract burned in subtitles from video files containing multi color captions. The whole idea was to implement Neumann Mata's text detection algorithm which would meet the above objectives and also work with a reasonable time complexity and memory requirements.

The Frobenius Number for small n

Formula for g(p,q):

Let p, q be non-negative relatively prime integers. Then, g(p,q) = pq-p-q.

A Formula for g(a1,a2,a3)

Theorem:

Let A = {(a1,a2,a3) ∈ N3 | a1 < a2 < a3 , a1 and a2 are prime, and a1, a2 do not divide a3}. Then there is no non-zero polynomial

@saurabhshah0410
saurabhshah0410 / ccextractor_OCR_gsoc2018.md
Created March 23, 2018 22:09
GSoC Proposal CCExtractor

Improve the OCR subsytem for CCExtractor

Aim

To improve the OCR subsystem of CCExtractor.

Summary

When hard subtitles are extracted from a video, the results obtained are very poor in many cases. For example:

39
00:04:59,418 --> 00:05:00,383
‘ In America. there was lhll guy