Last active
January 3, 2016 22:49
-
-
Save gtfierro/8530452 to your computer and use it in GitHub Desktop.
Install/configure tesseract for EC2 instance
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# for Ubuntu 12.04/12.10 | |
sudo apt-get update | |
sudo apt-get -y install autoconf automake make buildessential | |
sudo apt-get -y install tesseract-ocr tesseract-ocr-eng imagemagick | |
# install parallel | |
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash | |
# convert PDF to TIFF | |
find . -name '*.pdf' | /home/ubuntu/bin/parallel --gnu -j $NUMCORES convert -background white +matte -depth 8 -density 200 {}[0-19] {}.tif | |
find . -name '*.tif' | /home/ubuntu/bin/parallel --gnu -j $NUMCORES tesseract -l eng {} {} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment