Skip to content

Instantly share code, notes, and snippets.

@hamoid
Last active January 24, 2024 13:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hamoid/a9b0bdc1c96e6e6995cfad6f4b069279 to your computer and use it in GitHub Desktop.
Save hamoid/a9b0bdc1c96e6e6995cfad6f4b069279 to your computer and use it in GitHub Desktop.
Downloads a just-the-docs website and converts it into a PDF file for offline reading
#!/bin/bash
mkdir -p /tmp/manual
cd /tmp/manual
# curl downloads the index page of the website
# grep extracts the <nav> ... </nav> section
# sed(1) injects a line break in front of every URL and adds the full domain
# sed(2) deletes from each line the " character and everything that follows, leaving the clean URL
# tail deletes the first line, which contains a lonely <nav> tag
urlstr=$(curl -s "https://guide.openrndr.org" | grep -o -E '<nav .*</nav>' | sed "s/href=\"\//href=\"\nhttps:\/\/guide.openrndr.org\//g" | sed "s/\".*//g" | tail +2)
# convert a long string into an array
urls=($urlstr)
# count how many items in the array
length=${#urls[@]}
echo "Found $length URLs"
# one by one create NNNN.pdf files from each URL
for (( i=0; i<${length}; i++ ));
do
padded=$(printf "%04d" $i)
wkhtmltopdf ${urls[$i]} $padded.pdf
done
date=$(date +"%F")
# finally join all the PDF files into one
pdfunite *.pdf /tmp/openrndr-guide-$date.pdf
@hamoid
Copy link
Author

hamoid commented Jun 16, 2022

Currently tied to guide.openrndr.org but can be adapted for other websites

@hamoid
Copy link
Author

hamoid commented Jan 24, 2024

Dependencies: wkhtmltopdf, curl, sed, pdfunite, grep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment