Skip to content

Instantly share code, notes, and snippets.

@gtfierro
Last active January 3, 2016 22:49
Show Gist options
  • Save gtfierro/8530452 to your computer and use it in GitHub Desktop.
Save gtfierro/8530452 to your computer and use it in GitHub Desktop.
Install/configure tesseract for EC2 instance
#!/bin/bash
# for Ubuntu 12.04/12.10
sudo apt-get update
sudo apt-get -y install autoconf automake make buildessential
sudo apt-get -y install tesseract-ocr tesseract-ocr-eng imagemagick
# install parallel
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
# convert PDF to TIFF
find . -name '*.pdf' | /home/ubuntu/bin/parallel --gnu -j $NUMCORES convert -background white +matte -depth 8 -density 200 {}[0-19] {}.tif
find . -name '*.tif' | /home/ubuntu/bin/parallel --gnu -j $NUMCORES tesseract -l eng {} {}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment