Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bypass Captcha using 10 lines of code with Python, OpenCV & Tesseract OCR engine
import cv2.cv as cv
import tesseract
gray = cv.LoadImage('captcha.jpeg', cv.CV_LOAD_IMAGE_GRAYSCALE)
cv.Threshold(gray, gray, 231, 255, cv.CV_THRESH_BINARY)
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_SINGLE_WORD)
tesseract.SetCvImage(gray,api)
print api.GetUTF8Text()
@chroman

This comment has been minimized.

Owner

chroman commented May 30, 2013

Example Captcha

@ivansabik

This comment has been minimized.

ivansabik commented Aug 5, 2014

maestro! lo andaré probando en https://github.com/mexicapis/repuve-api

@ivansabik

This comment has been minimized.

ivansabik commented Aug 5, 2014

qué wrapper usaste para python? :S me da el error de module tesseract no encontrado

@Silur

This comment has been minimized.

Silur commented Jun 25, 2015

what tesseract library do you use for python?

@f-prime

This comment has been minimized.

f-prime commented Jul 29, 2015

Tesseract is used to convert images to text

@hussaintamboli

This comment has been minimized.

hussaintamboli commented Aug 25, 2015

Please paste your requirements.txt.
I have installed

sudo apt-get install libopencv-dev python-opencv
pip install cv2

But I still get an error saying

ImportError: No module named cv2
@tsphack

This comment has been minimized.

tsphack commented Jan 22, 2016

Hussaintamboli
Use opencv 2.x
Not 3.x

@JINDALG

This comment has been minimized.

JINDALG commented Mar 29, 2016

what about digit. means if image contain both digit and alphabet

@ytrezq

This comment has been minimized.

ytrezq commented Jul 11, 2016

@chroman : Of course, I suppose solving those kind of images is impossible whatever the Openᴄᴠ transformations are :
ackskin




@forum2k9

This comment has been minimized.

forum2k9 commented Dec 2, 2016

Hi,
How do I make it work with this kind of image?
re1
re2
re3
re4
re5
re6
re7
re8

@larose

This comment has been minimized.

larose commented Aug 11, 2017

@forum2k9 I solved the first one with convert and Tesseract:

$ convert ofdbmf.jpg -colorspace Gray -blur 0 -level 0,60% ofdbmf-1.jpg
$ tesseract -psm 8 ofdbmf-1.jpg -
OFDBMF

I posted the details at https://mathieularose.com/decoding-captchas

@jtanori

This comment has been minimized.

jtanori commented Sep 10, 2017

I solved the first one using the following

# import the necessary packages
from PIL import Image
from PIL import ImageEnhance
import PIL.ImageOps
import pytesseract
import argparse
import cv2
import os
import numpy

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image to be OCR'd")
args = vars(ap.parse_args())

# load the example image and convert it to RGB, invert it and adjust brightness
image = Image.open(args["image"]).convert('RGB')
image = PIL.ImageOps.invert(image)
image = ImageEnhance.Brightness(image)
image = image.enhance(10)
imageArray = numpy.array(image)
imageArray = imageArray[:, :, ::-1].copy()

filename = "{}.png".format(os.getpid())
image.save(filename)

# load the image as a PIL/Pillow image, apply OCR, and then delete
# the temporary file
text = pytesseract.image_to_string(Image.open(filename))
os.remove(filename)
print(text)

# show the output images
cv2.imshow("Image", imageArray)
cv2.waitKey(0)

Then just run your script with

python ocr.py --image <imagepath> 

This was just a draft so you can ignore cv2, I tried it with a bunch (around 200) of different images from the same generator and it had a 100% rate of success, didn't test that much though.

@marti1125

This comment has been minimized.

marti1125 commented Sep 14, 2017

captchaimage
how to I can get the text from this image ?

@alerin00

This comment has been minimized.

alerin00 commented Dec 7, 2017

How to i can get the text from this image ?
00diomv8woshdm1kfasbhrqd0prvzw2a
I tried this:
image.Morphology(MorphologyMethod.Erode, Kernel.Octagon, 1)
image.Negate()
image.Threshold(New Percentage(70))
But, appearing dots inside the letters, i think need fill the dots before morphology.
2017-12-07 19_21_26-form1

@alerin00

This comment has been minimized.

alerin00 commented Dec 8, 2017

I solved it using ReduceNoise before:

    image.ReduceNoise()
    image.Morphology(MorphologyMethod.Erode, Kernel.Octagon, Channels.Default)
    image.Negate()
    image.Threshold(New Percentage(70))

But the next images not solved correctly with this same solution. How to solve it?

0aemmqezdnfjbt3ekakk4cm1xnosmw9s
0c4trn7houtsgiknmfrm9dkgldlwpvk9
0dkmvdmpntougs3me0ksn1ivdchbzqlx
0dpnatw7goffukd7fyqcsocnu58s4yvn
0e3zagng7thg1ajndc8pc3tmuiebbwiy
0er0qmw5vp1mzyu45eryg0fnzdayyqwj
0fxo96pviglb17ylkjg51phji1ltr35j
0hmf2egld3ordil8fdl0c0xzjsrvsv8r
0hsvndk2uxmgfkvhfzs1leewx7cxh7b4
0ioztnynfyixch5vyrqmmufgsv6labde
0jofqglqars965foft4regfxy98kglod
0jzywlzdyfppbwd90d6hzskkahu6dd6o
0kbx1gyiswzrv4wvljvmoxaik7pce5oh
0kfhkpuu7rgvi7lfculgiknpszwarmie
0m4ewkjpmq5647kndhsisy78buzrrflt
0p4lxrtwqhhqyvnjq0geylnieenmkztf
0r95gszjss1eghqcynnpznpdo2teukxr
0svm0w1nglqit0wzlelvpyig9cm9otws
0xyhvdumov6xd2smphpssnvzfzerualk
1bc6ydstouj6y1n02sdazvugon98biek
1cytslinwcqid56kfmuvprowm6vj8hh1

@omacDota

This comment has been minimized.

omacDota commented Dec 13, 2017

one
How will we solve this?

@gabrielsegalla

This comment has been minimized.

gabrielsegalla commented Dec 20, 2017

How will we solve this?
captcha 1

@citotob

This comment has been minimized.

citotob commented Dec 28, 2017

sccimbcaptcha

how to solve this?

@citotob

This comment has been minimized.

citotob commented Dec 28, 2017

and how to solve this? thank u

captcha

@pkostrzewa

This comment has been minimized.

pkostrzewa commented Dec 29, 2017

Is it possible to solve such CAPTCHAs?
download
download 1

@KyleBoyer

This comment has been minimized.

KyleBoyer commented Jan 9, 2018

I've got a challenge, cracking these captchas:

bot4
bot2
bot
bot3

bot5

@MegsterDerpington

This comment has been minimized.

MegsterDerpington commented Jan 31, 2018

rendercaptcha
what's this one

@MiRHaDi

This comment has been minimized.

MiRHaDi commented Feb 4, 2018

Hi
how can i do it ?

image
image
image
image

@MabutasGroup

This comment has been minimized.

MabutasGroup commented Feb 16, 2018

@alerin00 @marti1125 @forum2k9 @MiRHaDi
Hi friends, You should use pytesseract library.
I solved captchas like these before.
Because of old stuff, If you want help, contact me under antoniomabutas84@gmail.com

Best regards,
Antonio

@jor3l

This comment has been minimized.

jor3l commented Feb 16, 2018

Best way is to use TensorFlow and train a model to solve the captchas (you should be able to genetare enough training data to make it reliable).

@OCR-Fanboy

This comment has been minimized.

OCR-Fanboy commented Mar 2, 2018

How can i crack captchas like this?
1
2
3
4
5
6
7
8
9
10

I usually use download the captcha with PHP, get certain pixels (Based on color), and save it as a jpg, and then run then throught gOCR.
But cant get those to work, and new to tesseract.

Any help is welcome

@WHK102

This comment has been minimized.

WHK102 commented Mar 7, 2018

Canot decode this captcha, the line is same weight as words.

captcha

@ppc52776

This comment has been minimized.

ppc52776 commented Mar 12, 2018

@WHK102, I have the same problem.
The line color and weight are same as words.
c01
c03
c12
c20

Commands I tried:

# convert in.jpg -scale 400% -threshold 90% -morphology dilate octagon:4 out.png
# tesseract out.png -psm 8 -c tessedit_char_whitelist=0123456789 -
@nfaycel

This comment has been minimized.

nfaycel commented Mar 16, 2018

01
02
03
04
Please i need some help here .

@yoandysse

This comment has been minimized.

yoandysse commented Apr 17, 2018

0d4aa746bf647ddb6403f34f457dc1aea4d546b3
Algo para este tipo de captcha?

@lzzy12

This comment has been minimized.

lzzy12 commented Jul 22, 2018

TensorFlow is the best solution probably here

@inventer88

This comment has been minimized.

inventer88 commented Aug 19, 2018

thresholdbin
Is it possible to recognize this image?
caption-work

@blanketyblank1

This comment has been minimized.

blanketyblank1 commented Sep 8, 2018

Hi, any suggestions for text intersected by multiple curved lines of the same weight?

download-3
download-4
download-6
download-7
download-2
download-5

@Klapkaak078

This comment has been minimized.

Klapkaak078 commented Oct 6, 2018

Someone that can Solve this 2 guys.
captcha1
unnamed

@dyohan9

This comment has been minimized.

dyohan9 commented Nov 1, 2018

Hello, how do I decipher these captcha with this parser? or using ImageMagick or even a python script, could you guys help me?
1
2
3
4
5
6
Thank!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment