Skip to content

Instantly share code, notes, and snippets.

@moriwaka
Last active November 21, 2024 08:03
Show Gist options
  • Save moriwaka/55ff6cb8b43655708e99283f6aa886ac to your computer and use it in GitHub Desktop.
Save moriwaka/55ff6cb8b43655708e99283f6aa886ac to your computer and use it in GitHub Desktop.
PDF Document Downloader from docs.redhat.com

Usage:

```
$ mkdir RHEL9Doc
$ cd RHEL9Doc
$ fetchdoc.sh https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9
```

Prerequirement: curl, GNU Parallel

2024-11-21: fix for docs.redhat.com update

#!/bin/bash
if [ -z "$1" ]; then
echo "Usage: $0 <URL>"
exit 1
fi
# BASE_URL
URL="$1"
BASE_URL=$(echo "$URL" | awk -F/ '{print $1"//"$3}')
# Download index page
curl -s "$URL" | grep -oP '(?<=href=")[^"]*' | grep '/html/' | parallel -j 10 '
RELATIVE_URL={}
FULL_URL='"$BASE_URL"'"$RELATIVE_URL"
FULL_URL="${FULL_URL%/}/index"
PDF_URL=${FULL_URL/html/pdf}
if [ -n "$PDF_URL" ]; then
# Download PDF
FILENAME=${PDF_URL#*/pdf/}
FILENAME=${FILENAME%%/*}.pdf
curl -s -o "$FILENAME" "$PDF_URL"
fi
'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment