Skip to content

Instantly share code, notes, and snippets.

@manucabral
Last active September 21, 2021 22:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save manucabral/345f22a9c37e2d4c41d572065bdb1a77 to your computer and use it in GitHub Desktop.
Save manucabral/345f22a9c37e2d4c41d572065bdb1a77 to your computer and use it in GitHub Desktop.
pdf to string
import fitz
import pytesseract
def convert_to_img(pdf):
doc = fitz.open(pdf)
page = doc.loadPage(0)
pix = page.getPixmap()
output = f"{pdf.split(".")[0]}.png"
pix.writePNG(output)
def convert_to_string(image)
img_cv = cv2.imread(image)
img_rgb = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
print(pytesseract.image_to_string(img_rgb))
if __name__ == '__main__':
convert_to_img('test.pdf')
convert_to_string('test.png')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment