I've competed in Google Code-In and I spotted a neat task from CCExtractor.
Task's title was "How can we identify movies based on scenes in them?" and I'm going to answer that question.
The first thing that came to my mind was to split a video into frames using FFmpeg. After splitting we perhaps could test our input against those frames.
But that's a disastrous idea!
Splitting video on each frame is just a waste of our precious disk space. 24-minute long video when split on each frame ended... jamming up my entire drive.
And then we have to deal with comparing the frames. How in the world are you gonna do that?!
But maybe we can think of something better? Maybe we can split it once every n frames and try to increase tolerance of our algorithm?
Sure thing! There are algorithms called image hashes that work that way. I found a nicely done article about perceptual hashing. I read it a few times and then I found a Python library that does the work!
Now, how are we going to store those hashes? For likable performance, I presume we can have a table in a database that's holding a hash as a 64-bit primary key and a list of movie names.
I was able to build a small prototype. Its performance isn't the best, because it's just a prototype, BUT IT'S WORKING!
It properly identified that slightly oversaturated image is not very different than the base one.
Do they exist? Of course. Developers regularly find new ways to do something!
A somewhat similar to finding animes based on the scene is used in https://trace.moe. They're using a color layout descriptor algorithm for this.
And there is even an ML approach to this problem and it works. During my further investigation, I stumbled upon a research paper that used Deep Learning to extract the features from an image (link).
Apropos Machine Learning, possibly we can think of another way to accomplish our task? Maybe some more theoretical one?
I'm not an expert on the matter (to be fair, I've never tried ML in my life), but what if we could train our algorithm to just recognize movies. I'm probably totally overcomplicating the idea, but what if? What if we could teach our program to recognize movies and say from which movie it came from?
Yeah, what if...
proof of work?
base image:
out target:
index 21 has really small change so it's working as it should