Skip to content

Instantly share code, notes, and snippets.

@clairesg
Forked from manucabral/converter.py
Created September 4, 2021 23:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save clairesg/25ec48c481e4526a3ca8aec0adc48588 to your computer and use it in GitHub Desktop.
Save clairesg/25ec48c481e4526a3ca8aec0adc48588 to your computer and use it in GitHub Desktop.
import fitz
import pytesseract
def convert_to_img(pdf):
doc = fitz.open(pdf)
page = doc.loadPage(0)
pix = page.getPixmap()
output = f"{pdf.split(".")[0]}.png"
pix.writePNG(output)
def convert_to_string(image)
img_cv = cv2.imread(image)
img_rgb = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
print(pytesseract.image_to_string(img_rgb))
if __name__ == '__main__':
convert_to_img('text.pdf')
convert_to_string('text.png')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment