Skip to content

Instantly share code, notes, and snippets.

@mayjs
Last active March 7, 2021 16:07
Show Gist options
  • Save mayjs/0745e5da81f1f1fa5dca4b45d8241dfe to your computer and use it in GitHub Desktop.
Save mayjs/0745e5da81f1f1fa5dca4b45d8241dfe to your computer and use it in GitHub Desktop.
This script can be used to convert a series of document pictures to a pdf, automatically applying edge detection at dewarping the images.
#!/usr/bin/env bash
# Given an output PDF filename and a list of images, adjust and OCR the images and create a PDF
OUT_NAME=$1
shift
mogrify -auto-orient $@
TMPDIR=`mktemp -d`
scantailor-cli -l=1 --dewarping=auto --start-filter=1 --end-filter=6 $@ "$TMPDIR"
for IN in $@
do
FILENAME=`basename -s .jpg "$IN"`
PDFINPUTS="$PDFINPUTS $TMPDIR/$FILENAME.tif"
done
convert $PDFINPUTS $TMPDIR/output.pdf
ocrmypdf $TMPDIR/output.pdf "$OUT_NAME"
#!/usr/bin/env bash
# Create scans from images in subdirectories of the CWD
for DIR in $(ls -d */)
do
OUTFILE="$(basename $DIR).pdf"
if [ ! -f "$OUTFILE" ]; then
echo "Scanning new document $DIR to $OUTFILE"
./autoscan.sh "$OUTFILE" $DIR/*.jpg
fi
done
with import <nixpkgs> {};
pkgs.mkShell {
buildInputs = [
imagemagick
scantailor
ocrmypdf
bash
];
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment