Skip to content

Instantly share code, notes, and snippets.

@JulianaGuama
Created May 15, 2019 18:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JulianaGuama/3750a0b0ae2c854df50e93c0df2eaf6a to your computer and use it in GitHub Desktop.
Save JulianaGuama/3750a0b0ae2c854df50e93c0df2eaf6a to your computer and use it in GitHub Desktop.
Scrap de um pdf-imagem com tesseract
#imports
from PIL import Image
import pytesseract as ptr
import cv2
TESSDATA_PREFIX = r'C:/Users/your-user/AppData/Local/Tesseract-OCR'
ptr.pytesseract.tesseract_cmd = r"C:\Users\your-user\AppData\Local\Tesseract-OCR\tesseract.exe"
filename = r'C:/Users/your-user/fileLocal/file.jpg'
pdf = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
pdfNF = ptr.image_to_string(pdf, lang='por')
print (pdfNF)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment