Skip to content

Instantly share code, notes, and snippets.

@phiresky
Created July 13, 2018 16:40
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save phiresky/5025490526ba70663ab3b8af6c40a8db to your computer and use it in GitHub Desktop.
Save phiresky/5025490526ba70663ab3b8af6c40a8db to your computer and use it in GitHub Desktop.
#!/bin/bash
fname="$1"
cachedir=/tmp/pdfextract
mkdir -p "$cachedir"
mtime="$(stat -c %Y "$1")"
hash=$(echo $fname.$mtime | sha256sum | cut -c1-64)
echo $hash $fname $mtime
cachefname="$cachedir/$hash.txt"
if [[ ! -f "$cachefname" ]]; then
pdftotext -layout "$fname" - |
# add "Page X: " prefix to each line
awk 'BEGIN {page=1} /\f/{page+=1}; { sub(/\f/, ""); print "Page " page ":", $0}' > "$cachefname"
fi
exec cat "$cachefname"
@phiresky
Copy link
Author

phiresky commented Dec 3, 2020

in case someone still finds this: i packaged a much better version of this in https://github.com/phiresky/ripgrep-all :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment