Skip to content

Instantly share code, notes, and snippets.

@mritzco
Created July 28, 2019 07:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mritzco/310908a20e0ea60a646e5251aa63f4ca to your computer and use it in GitHub Desktop.
Save mritzco/310908a20e0ea60a646e5251aa63f4ca to your computer and use it in GitHub Desktop.
PDF to text all files in a directory, plus some cleaning
#!/bin/bash
FILES=*.pdf
for f in $FILES
do
SAVEAS="${f/.pdf/.txt}"
if [ ! -f "librosTexto/$SAVEAS" ]; then
echo "-Processing $f file as $SAVEAS..."
pdftotext -enc UTF-8 -nopgbrk "$f" "librosTexto/tmp.txt"
tr -s '\t' ' ' <"librosTexto/tmp.txt" > "librosTexto/$SAVEAS"
rm librosTexto/tmp.txt
else
echo "-Not processing $f, already exists"
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment