Some days ago I read a post by Samuirai about him cracking the minteye CAPTCHA (cracking minteye captcha without skill) with the google speech2text api and two days later it was covered on Hack a day. For me, it was a really straightforward approach to crack such a CAPTCHA and it was something nobody really noticed or pointed out before. It also shows how little the developers have thought about the real usefulness and security behind their idea.
Now, I have heard about minteye before and I honestly liked the idea behind it, because it is basically a smart business model, it is so easy to solve, and they even claimed it was more secure than typed text:
But Hadar claims Minteye’s puzzles are even more secure than typed Captchas. It’s easier to build a bot that can identify letters and numbers than one to recognize images “that can be deformed in practically endless ways,” he says.
But if you really think about it: Aren’t things that are really easy to solve as a human, mostly even easier to solve as a computer? But how? And isn't this why so many CAPTCHAs are flawed?
- People have pointed out that the incorrect images have a certain amount of blurriness (which is correct),
- you could just scan for straight lines (which is even easier),
- you could just bruteforce it,
- as you have only a certain set of images, why not save the solution in a big database?.
There are way to many ways to crack this thing, and I think Samuirai chose the hardest path. Probably just for fun.
What really bugged me, were all the people who said: "aah, this thing is crackable, it's easy" - but nobody released anything. But neither did I, because I really hoped that minteye was a good idea (and really didn't think as far as others did).
Anyway, let's have a look.
Let’s start with the most obvious one : bruteforce. And if that wasn’t enough, to crack the captcha you basically choose between 30 pictures from which minteye will accept the original one (green)... oh, and the four neighbours next to it (light green). Also, the correct one is, as far as I have seen it, never one of the outer four (red).
It would then be silly to guess on the third or fourth(pink) outer image, because even if that was the correct image, you would be correct with the fifth.
Now, we can guess on 22 different images. Which leaves us with a probability of 5/22. So basically even if we don’t have some sort of brilliant script, we should be right to more than 22%. Even if they would just ask for a simple letter (even a number), it would have a higher randomness(cracking aside). (Some people might even get to higher probability by taking other things in consideration.. oh the joys of maths.... oh how I hate stochastics).
But hey, we already have a script that does better than that, and there are even more scripts that should be incredibly really easy to code. Let’s see about that.
The internet is awesome: once you have no idea what to do, it always has some answer somewhere.
Because if you have a detailed look on the different images, you can see that the distorted images tend to be more blurry. And you can easily calculate the blurriness of a picture by using a fourier transform or by simply applying a Laplace filter which can be found in more detail on stackoverflow(thus, the internet is awesome). The Python scipy library offers the simple function scipy.ndimage.filters.laplace, with which we can easily get some sort of blurriness index.
And then we just look at the global min of our thirty pictures, here #15:
But I honestly did not yet try this solution, of that I am sure will work.
Edit: just found a solution that is using the fourier transformation. Good read if you are interested.
Take the detection of lines for example, it took me not even half an hour to write following piece of script:
from skimage import io, color, morphology image_pixels =  for i in range(0,29): img = color.rgb2gray(io.imread("captchas/image" + str(i) + '.ashx.jpg')) skel_im = morphology.skeletonize(img < 0.2) image_pixels.append(sum(sum(skel_im))) max_pixels = max(image_pixels) correct_image = image_pixels.index(max_pixels) print "Solution should be image" + str(correct_image) +" with exactly " + str(max_pixels) + " pixels"
What it does is simple: take all the pictures, make them black and white, sceletonize them, and afterwards calculate the sum of the white pixels, i.e. calculating the amount of fitting lines in the picture. Simple. You don’t have to be a genius. Not at all. It’s enough to know a little bit of python. You can see in the comparison below, that a lot of the lines that belonged together before, have disappeared to the right of the picture(img 28), i.e. information is lost.
You can see in the plot below how many pixels the individual images have in the end.
Note the peak for picture 4 (which is indeed the correct one).
This solution might not be perfect, and only work ~80% of the time, but honestly: if someone with a bit imageprocessing knowledge would spend a few more days (if not minutes) on it, a successrate of 99% is even more than likely.
In the form it is now, Minteye seems to be horribly broken, and the whole development team didn’t even try to think about their flaws. Because these are only five flaws that I could think of now, all of them way to easy to break without getting to deep into image analysis. We shouldn’t forget that the last years have shown, that there is always some sort of smart mind, that has even better ideas (Which quite frankly, happens always in IT-Security).
Other CAPTCHAs have been proven to be crackable too, but at least they required a certain amount of time and effort by a lot of teams. Minteye required neither. And I am happy that all those issues have been found(and made public) before Minteye became a success.
“Having said that, there is no Captcha in the world that cannot be cracked— it’s only the amount of resources invested in cracking.”
Quote by: Gadi Hadar - CEO of Minteye