Skip to content

Instantly share code, notes, and snippets.

@scrapehero
Last active September 26, 2024 12:37
Show Gist options
  • Save scrapehero/b85a280dc0d993f665c91e0332cf618f to your computer and use it in GitHub Desktop.
Save scrapehero/b85a280dc0d993f665c91e0332cf618f to your computer and use it in GitHub Desktop.
import pytesseract
import sys
import argparse
try:
import Image
except ImportError:
from PIL import Image
from subprocess import check_output
def resolve(path):
print("Resampling the Image")
check_output(['convert', path, '-resample', '600', path])
return pytesseract.image_to_string(Image.open(path))
if __name__=="__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('path',help = 'Captcha file path')
args = argparser.parse_args()
path = args.path
print('Resolving Captcha')
captcha_text = resolve(path)
print('Extracted Text',captcha_text)
@shekaryenagandula
Copy link

Hi,
I got below error.
Please help.

Resolving Captcha
Resampling the Image
Traceback (most recent call last):
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pytesseract\pytesseract.py", line 226, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "OCR_image_processing.py", line 22, in
captcha_text = resolve(path)
File "OCR_image_processing.py", line 14, in resolve
return pytesseract.image_to_string(Image.open(path))
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pytesseract\pytesseract.py", line 344, in image_to_string
return {
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pytesseract\pytesseract.py", line 347, in
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pytesseract\pytesseract.py", line 258, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\yenag\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pytesseract\pytesseract.py", line 230, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

@vamsi6696
Copy link

sudo apt-get install tesseract-ocr

@Shonty10
Copy link

Code works fine but doesn't extract the captcha.
Capture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment