dougmorato/captcha.md

## captcha.md

      
    Raw
  

              captcha.md
            
          
    G-WAN is a new free web server.
They seem to be very proud of it, or at least just want to make a lot of money.
Well anyway, in almost every sentence they write, they claim that they are 20% cooler than anything else. It feels a bit arrogant.
I have to admit, I don't know a lot about web servers, so I can't speak to how good they are.
However, then I saw their Captcha example. I also don't know much about machine learning algorithms, OCR, and stuff like that, but I do know how to read pixels. I also know how to compare values with python :P

They say the following about their Captcha:
[...] difficult or even completely impossible for robots.

Wait wat? If this is true, this is something really outstanding and maybe an alternative to reCaptcha...
But then I was like:

So I wrote this basic stupid pixel by pixel reading and comparing code, to decode the captcha.
smrrd$ python crack_captcha.py
GIF Image
---------
R0lGODlhGAAZAJEAAP///9//v4SkZAAAACH5BAEAAAAALAAAAAAYABkAAAJfhI+pGB0rmHuGAmtEPJj7E23VYlmbeDnMB2guu44J2lWqQi/6Drl0k7hlSKwSiHeBgV5BTK2FNOKIsmQVJekIkdzgTEOVIERY4ApDPoczTOvzCbVtq/G6kt4CK+BdRQEAOw==

Captcha Data Matrix
-------------------
                                                
  1 1 1 1               2         1 1 1 1 1     
  1       1           2 2         1             
  1       1         2   2         1             
  1       1       2     2         1 1 1 1       
  1       1       2 2 2 2 2       1             
  1       1             2         1             
  1 1 1 1               2         1             
                                                
  2 2 2 2 2         1 1 1           1 1 1       
  2               1       1       1       1     
  2                       1       1             
  2 2 2 2             1 1         1             
  2                       1       1             
  2               1       1       1       1     
  2                 1 1 1           1 1 1       
                                                
        1           2 2 2               1       
      1 1         2       2           1 1       
    1   1         2       2         1   1       
        1           2 2 2 2       1     1       
        1                 2       1 1 1 1 1     
        1         2       2             1       
        1           2 2 2               1       
                                            
color | pixel count
-------------------
    0 |        472
    1 |         81
    2 |         44

color   1 | color   2
---------------------
        3 |         4
        1 |         9
        4 |          
---------------------
        8 |        13

I also don't understand, what they think this means and why they are so excited about it:
The two sums are: 13 and 8... for the same Captcha image!
By just changing the HTML background color [...]

In the end, this was the first time I tried to solve a Captcha. I think this is the best example of how not to implement it.
YouTube Video Demo
kind regards,

samuirai
Also checkout my Website http://www.smrrd.de
edit:
to see really cool stuff with reCaptcha, check out what they did: http://www.dc949.org/projects/stiltwalker/

  
## crack_captcha.py
import base64, sys, io, Image, urllib2, re

bg = 0 # background color
cw = 8 # character width

# get the new captcha
url = urllib2.urlopen("http://62.75.175.163:8080/?captcha.c")
html = url.read().replace('\n','').replace('\r','')
url.close()

# get the base64 gif image code
# <img src="data:image/gif;base64,R0lGODlhG...AAADs=" alt="A tree" width="48" height="50" />
#   => R0lGODlhG...AAADs=
regex = re.compile('.*base64,(.*)" alt')
gif_b64 = regex.match(html).group(1)

print "GIF Image"
print "---------"
print gif_b64

# load the string as image
f = io.BytesIO(base64.b64decode(gif_b64))
img = Image.open(f)
pix = img.load()


# print and analyse the pixels
print "Captcha Data Matrix"
print "-------------------"
pixels = {}
for y in xrange(0,25):
    for x in xrange(0,24):
        # collect data
        if pix[x,y] not in pixels: pixels[pix[x,y]]=0
        else: pixels[pix[x,y]]+=1
        # print pixels
        if pix[x,y]!=bg: print pix[x,y],
        else: print ' ',
    print ''

# print the analyse - total useless, but looks cool
print "color | pixel count"
print "-------------------"
for color in pixels:
    print "%5d | %10d" % (color,pixels[color])

# define all characters

charset = {
0:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
1:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
2:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0]],
3:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
4:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
5:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
6:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
7:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0]],
8:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
9:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
}

def find_character(tmp_char):
    highest_char = ['X',0,0,]
    for char in xrange(0,10):
        match = 0
        fails = 0
        for x in range(len(tmp_char)):
            for y in range(len(tmp_char[x])):
                #print str(tmp_char[x][y])+str(charset[char][x][y]),
                if (charset[char][x][y] != 0 and tmp_char[x][y] != 0) or (charset[char][x][y] == tmp_char[x][y]):
                    match+=1
                else: fails+=1
            #print
        #print [char,match,fails,]
        if match>highest_char[1]:
            highest_char=[char,match,fails,]
    return highest_char

def get_color(tmp_char):
    color = {}
    for x in range(len(tmp_char)):
        for y in range(len(tmp_char[x])):
            if tmp_char[x][y]!=bg:
                if tmp_char[x][y] not in color:
                    color[tmp_char[x][y]] = 0
                else:
                    color[tmp_char[x][y]] += 1
    return color.keys()

# analyse each character
color = {}
for yl in xrange(0,3):
    for xl in xrange(0,3):
        tmp_char = []
        for y in xrange(yl*cw,yl*cw+cw):
            tmp_line = []
            for x in xrange(xl*cw,xl*cw+cw):
                tmp_line.append(pix[x,y])
                #print pix[x,y],
            #print
            tmp_char.append(tmp_line)
        match = find_character(tmp_char)
        col = get_color(tmp_char)
        if not match[2]:
            if col[0] not in color:
                color[col[0]] = [match[0]]
            else:
                color[col[0]].append(match[0])

# print the sums
print ""
if len(color.keys())>=2:
    print "color %3d | color %3d" % (color.keys()[0],color.keys()[1])
else:
    print "color %3d | " % (color.keys()[0])
print "---------------------"
left_list = color[color.keys()[0]]
if len(color.keys())>=2:
    right_list = color[color.keys()[1]]
else:
    right_list = []
longest_length = 0
if len(left_list)>len(right_list): longest_length = len(left_list)
else: longest_length = len(right_list)
for row in xrange(0,longest_length):
    left_val = ''
    right_val = ''
    if len(left_list)>row: left_val = str(left_list[row])
    if len(right_list)>row: right_val = str(right_list[row])
    print "%9s | %9s" % (left_val,right_val)
print "---------------------"
print "%9s | %9s" % (sum(left_list),sum(right_list))
	import base64, sys, io, Image, urllib2, re

	bg = 0 # background color
	cw = 8 # character width

	# get the new captcha
	url = urllib2.urlopen("http://62.75.175.163:8080/?captcha.c")
	html = url.read().replace('\n','').replace('\r','')
	url.close()

	# get the base64 gif image code
	# <img src="data:image/gif;base64,R0lGODlhG...AAADs=" alt="A tree" width="48" height="50" />
	# => R0lGODlhG...AAADs=
	regex = re.compile('.base64,(.)" alt')
	gif_b64 = regex.match(html).group(1)

	print "GIF Image"
	print "---------"
	print gif_b64

	# load the string as image
	f = io.BytesIO(base64.b64decode(gif_b64))
	img = Image.open(f)
	pix = img.load()


	# print and analyse the pixels
	print "Captcha Data Matrix"
	print "-------------------"
	pixels = {}
	for y in xrange(0,25):
	for x in xrange(0,24):
	# collect data
	if pix[x,y] not in pixels: pixels[pix[x,y]]=0
	else: pixels[pix[x,y]]+=1
	# print pixels
	if pix[x,y]!=bg: print pix[x,y],
	else: print ' ',
	print ''

	# print the analyse - total useless, but looks cool
	print "color \| pixel count"
	print "-------------------"
	for color in pixels:
	print "%5d \| %10d" % (color,pixels[color])

	# define all characters

	charset = {
	0:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 1, 1, 0, 0],
	[0, 1, 0, 1, 0, 1, 0, 0],
	[0, 1, 1, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0]],
	1:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 1, 1, 0, 0, 0],
	[0, 0, 1, 0, 1, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0]],
	2:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 1, 0, 0, 0, 0],
	[0, 0, 1, 0, 0, 0, 0, 0],
	[0, 1, 1, 1, 1, 1, 0, 0]],
	3:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 0, 0, 1, 1, 0, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0]],
	4:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 1, 1, 0, 0, 0],
	[0, 0, 1, 0, 1, 0, 0, 0],
	[0, 1, 0, 0, 1, 0, 0, 0],
	[0, 1, 1, 1, 1, 1, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0]],
	5:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 1, 1, 1, 1, 1, 0, 0],
	[0, 1, 0, 0, 0, 0, 0, 0],
	[0, 1, 1, 1, 1, 0, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0]],
	6:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 0, 0, 0],
	[0, 1, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0]],
	7:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 1, 1, 1, 1, 1, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 0, 0, 0, 1, 0, 0, 0],
	[0, 0, 0, 1, 0, 0, 0, 0],
	[0, 0, 0, 1, 0, 0, 0, 0],
	[0, 0, 1, 0, 0, 0, 0, 0],
	[0, 0, 1, 0, 0, 0, 0, 0]],
	8:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0]],
	9:
	[[0, 0, 0, 0, 0, 0, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 1, 0, 0],
	[0, 0, 0, 0, 0, 1, 0, 0],
	[0, 1, 0, 0, 0, 1, 0, 0],
	[0, 0, 1, 1, 1, 0, 0, 0]],
	}

	def find_character(tmp_char):
	highest_char = ['X',0,0,]
	for char in xrange(0,10):
	match = 0
	fails = 0
	for x in range(len(tmp_char)):
	for y in range(len(tmp_char[x])):
	#print str(tmp_char[x][y])+str(charset[char][x][y]),
	if (charset[char][x][y] != 0 and tmp_char[x][y] != 0) or (charset[char][x][y] == tmp_char[x][y]):
	match+=1
	else: fails+=1
	#print
	#print [char,match,fails,]
	if match>highest_char[1]:
	highest_char=[char,match,fails,]
	return highest_char

	def get_color(tmp_char):
	color = {}
	for x in range(len(tmp_char)):
	for y in range(len(tmp_char[x])):
	if tmp_char[x][y]!=bg:
	if tmp_char[x][y] not in color:
	color[tmp_char[x][y]] = 0
	else:
	color[tmp_char[x][y]] += 1
	return color.keys()

	# analyse each character
	color = {}
	for yl in xrange(0,3):
	for xl in xrange(0,3):
	tmp_char = []
	for y in xrange(ylcw,ylcw+cw):
	tmp_line = []
	for x in xrange(xlcw,xlcw+cw):
	tmp_line.append(pix[x,y])
	#print pix[x,y],
	#print
	tmp_char.append(tmp_line)
	match = find_character(tmp_char)
	col = get_color(tmp_char)
	if not match[2]:
	if col[0] not in color:
	color[col[0]] = [match[0]]
	else:
	color[col[0]].append(match[0])

	# print the sums
	print ""
	if len(color.keys())>=2:
	print "color %3d \| color %3d" % (color.keys()[0],color.keys()[1])
	else:
	print "color %3d \| " % (color.keys()[0])
	print "---------------------"
	left_list = color[color.keys()[0]]
	if len(color.keys())>=2:
	right_list = color[color.keys()[1]]
	else:
	right_list = []
	longest_length = 0
	if len(left_list)>len(right_list): longest_length = len(left_list)
	else: longest_length = len(right_list)
	for row in xrange(0,longest_length):
	left_val = ''
	right_val = ''
	if len(left_list)>row: left_val = str(left_list[row])
	if len(right_list)>row: right_val = str(right_list[row])
	print "%9s \| %9s" % (left_val,right_val)
	print "---------------------"
	print "%9s \| %9s" % (sum(left_list),sum(right_list))