Created
January 20, 2023 00:11
-
-
Save dioxias/225b239e91f04a25a6270da1dc10d845 to your computer and use it in GitHub Desktop.
This bash script helps downloading a whole subreddit wiki. Thanks to u/ThePixelHunter (https://www.reddit.com/user/ThePixelHunter/) for the original script adapted to work on Windows machine with added line 10.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/bash | |
set -x | |
# Requires: bash coreutils curl jq | |
USER_AGENT='superagent/1.0' | |
EXPORTDIR="exports" | |
while read -r line; do | |
SUBREDDIT="$line" | |
SUBREDDIT=`echo $SUBREDDIT | sed 's/\\r//g'` | |
while read -r line; do | |
PAGE="$line" ; PAGE=$(echo $PAGE | sed 's|\/|\-\-|g') | |
mkdir -p "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE" | |
curl -s --user-agent "$USER_AGENT" "https://www.reddit.com/r/$SUBREDDIT/wiki/$PAGE.json" > "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE.json" | |
printf "$SUBREDDIT/wiki/$PAGE " ; echo $? | |
jq -r '.data.content_md' "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE.json" > "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE.md" | |
find . -type d -exec rmdir '{}' \; > /dev/null 2>&1 | |
done < <(curl -S --user-agent "$USER_AGENT" "https://www.reddit.com/r/$SUBREDDIT/wiki/pages.json" | jq -r '.data | .[]') | |
done < subreddits.list |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Basic idea :
The Reddit API lets you fetch a subreddit's complete list of wiki pages: https://www.reddit.com/r/DataHoarder/wiki/pages.json
And you can fetch each page as a JSON object which includes both the rendered HTML and the original Markdown source: https://www.reddit.com/r/DataHoarder/wiki/zfs.json
Guide
Edit work on Windows with git-bash or WSL (Windows Subsystem for Linux) :