Skip to content

Instantly share code, notes, and snippets.

@DavidBruant
Created July 13, 2023 13:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DavidBruant/2326bc61f5fceecf9a80e3e5a61e8dba to your computer and use it in GitHub Desktop.
Save DavidBruant/2326bc61f5fceecf9a80e3e5a61e8dba to your computer and use it in GitHub Desktop.
Tesseract OCR

OCR

Install tessaract https://tesseract-ocr.github.io/tessdoc/Installation.html

sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-fra

Test 1 - recup les données du repas scribouilli camp 2

tesseract docs/repas-scribouilli-camp-2.jpg - -l fra

Et bah ça marche plutôt bien !!

Test 2 - un document de Préfecture de Gironde

pdf/images

pdfimages -png pdf/RAA-33-SPECIAL-N-2023-081.pdf pdf/images
find . -type f -name "*.png" -size -10k -delete
tesseract pdf/images-005.png - -l fra > prefecture-1.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment