Skip to content

Instantly share code, notes, and snippets.

@1ec5
Created May 25, 2021 05:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ec5/40449d7048808cc2fb40c18f43933bb1 to your computer and use it in GitHub Desktop.
Save 1ec5/40449d7048808cc2fb40c18f43933bb1 to your computer and use it in GitHub Desktop.
SDP PDFs by category
#!/bin/bash
ls -1 pdfs_by_category | sed 's/^\(.*\)\.txt/mkdir -p "extracts\/\1"/' | bash
function download_category {
cd "extracts/$1"
cat "../../pdfs_by_category/$1.txt" | xargs -L 1 wget
cd -
}
find . -name '*.pdf' -exec textutil -convert txt {} \;
cat "pdfs_by_category/Other, please specify.txt" | xargs -L 1 curl -sI | grep 'x-ms-meta-typeofbusinessother' | sed 's/^.*: //' | atob
while read url; do
type=$(curl -sI "$url" | grep 'x-ms-meta-typeofbusinessother' | sed 's/^.*: //' | atob)
echo -e "$url\t$type" >> othertypes.tsv
done < "pdfs_by_category/Other, please specify.txt"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment