Skip to content

Instantly share code, notes, and snippets.

@compleatang
Created February 16, 2013 21:34
Show Gist options
  • Save compleatang/4968841 to your computer and use it in GitHub Desktop.
Save compleatang/4968841 to your computer and use it in GitHub Desktop.
Since Google Drive PDF viewer has a pretty functional OCR system, but since that system only looks at the first 10 pages, I built a simple shell script that will explode a pdf into 10 page blocks. I just call this script from my GDrive folder on my HD and then pull up the exploded pdfs online to copy and paste from the Google supplied OCRing! **…
#!/bin/zsh
dump_data=$(pdftk $1 dump_data | grep NumberOfPages:)
base_name=$(echo $1 | sed -E 's/(.*).pdf(.*)/\1/')
number_pages=$(echo $dump_data | sed -E 's/(.*): (.*)/\2/')
((splits = $number_pages / 10))
((do_end = $number_pages % 10))
start=1
stop=10
for ((i=0; i < $splits; i++)); do
pdftk $1 cat $start-$stop output $base_name-$i.pdf
start=$(($start + 10))
stop=$(($stop + 10))
done
((start_final = $splits * 10 + 1))
if [ $do_end != 0 ]; then
pdftk $1 cat $start_final-end output $base_name-$i.pdf
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment