Skip to content

Instantly share code, notes, and snippets.

@garcon
Last active July 5, 2020 23:39
Show Gist options
  • Save garcon/8d3c6ff10d1703d455169a184826f774 to your computer and use it in GitHub Desktop.
Save garcon/8d3c6ff10d1703d455169a184826f774 to your computer and use it in GitHub Desktop.
# OCR PDF file
ocrmypdf -l ces input.pdf output.pdf
# -l => language: -l eng+deu, -l ces
# --sidecar => Generate text files that contain the same text recognized by OCR
# --title TITLE => Set document title (place multiple words in quotes)
# --author AUTHOR => Set document author
# --subject SUBJECT => Set document subject description
# --keywords KEYWORDS => Set document keywords
# -r => Automatically rotate pages based on detected text orientation
# -q => fewer messages
# --remove-background => Attempt to remove background from gray or color pages
# --unpaper-args '--layout double --no-noisefilter' => when two pages are scanned together
## OCR all PDF files
find . -name '*.pdf' | while read pdf; do ocrmypdf "$pdf" "${pdf}_ocr.pdf"; done
## Reduce size of PDF
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -sOutputFile=output.pdf input.pdf
# -dNOPAUSE => no pause after page
# -dBATCH => Exit after last file
# -sDEVICE=pdfwrite => PDF writer
# -dCompatibilityLevel => 1.5 (compatibility with Preview.app), 1.7 (compatibility with Acrobat)
# -dPDFSETTINGS => (small) /screen /ebook /printer /prepress (large)
# -g<width>x<height> => page size in pixels
# -r<res> => pixels/inch resolution
# -q => fewer messages
# -sPAPERSIZE => a4, legal…
# -dColorConversionStrategy => /Gray
# -dProcessColorModel => /DeviceGray
## Reduce all PDF files
find . -name '*.pdf' | while read pdf; do gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile="${pdf}_new.pdf" "$pdf"; done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment